I'm not a math genius, but I did sleep at a Holiday Inn.

Looking at the wikipedia on it, there are assumptions for the method to hold. Are you sure the data (collection) meets those assumptions?

From the wikipedia (though apparently there is a lot of disagreement in the talk that the article is sound):

**Assumption on Linearity **

We assumed the observed data set to be linear combinations of certain basis. Non-linear methods such as kernel PCA have been developed without assuming linearity.

**Assumption on the statistical importance of mean and covariance **

PCA uses the eigenvectors of the covariance matrix and it only finds the independent axes of the data under the Gaussian assumption. For non-Gaussian or multi-modal Gaussian data, PCA simply de-correlates the axes. When PCA is used for clustering, its main limitation is that it does not account for class separability since it makes no use of the class label of the feature vector. There is no guarantee that the directions of maximum variance will contain good features for discrimination.

**Assumption that large variances have important dynamics **

PCA simply performs a coordinate rotation that aligns the transformed axes with the directions of maximum variance. It is only when we believe that the observed data has a high signal-to-noise ratio that the principal components with larger variance correspond to interesting dynamics and lower ones correspond to noise.

Essentially, PCA involves only rotation and scaling. The above assumptions are made in order to simplify the algebraic computation on the data set. Some other methods have been developed without one or more of these assumptions; these are briefly described below.

edit: this website seems like a proper tutorial on it.

http://neon.otago.ac.nz/chemlect/chem306/pca/Theory_PCA/index.html EDIT: And oh, seems to be the one you used for your OP.

EDIT: Page 9 of the url seems to be where they give the breakdown on region 2.

http://neon.otago.ac.nz/chemlect/chem306/pca/Spectroscopy_PCA/page9.html And I see now on the conclusion why you're pissed off. They say choose the best parts of the region, but don't say how.

It seems to me that since the "PC" or eigenvectors are basically analogous to slices of an orange, with the orange being sliced in multiple, different ways, trying to see how many slices are relevant to the problem. Only what they're doing in the example is unraveling a knot of strings into separate strands. I suspect, in region I, the knot is pretty simple. There's only a few strings in it. In region 2, I think the knot is pretty messy, with lots of strands. (The reason is that they're looking at nearly similar chemicals, and in region 2 there's more chemicals, so more strands similarities to isolate).

Based on those analogies, my hunch is that the best regions to focus on are where you see actual differences between species in your mixture. E.g. if one chemical is different from the other by the addition of 2 hydrogens, and that causes a slightly different spectra peak at wavelength XYZ, then an area around XYZ is a good place to focus on, is my hunch.

The hands-on method for estimating if a region is useful for PCA seems to be the scree plot, though I'm not comprehending this method fully.

http://neon.otago.ac.nz/chemlect/chem306/pca/Theory_PCA/page6.html and page7. Though on the scree plot it seems that when the y-axis hits about 1 to 0.7, the corresponding x-axis value is the maximum number of PC's or 'slices' that you should use in your analysis. And that about corresponds to number of different species muddling together to make the 'knot', I'd assume. I assume once you've got a set of PC's and numbers attached to each one to give it's relative contribution to the data then I think one transforms that into the raw data spectra to infer where each contributing part actually is. Basically deconvulting the knot by estimating what the component strings are based on the estimate that the PC told you that there were so many strings of so much relative strength to each other.