data mining for dummies

Ayatollah So

the spoof'll set you free
Feb 20, 2002
SE Michigan
What's a good website, or failing that, book, for learning about statistical data mining tools like neural networks (NN), partial least squares (PLS), principal components analysis (PCA), etc. Especially, guidance for choosing which technique(s) to apply to my problems would be good.

Right now my problem is to use spectroscopic data (~1000 data points (wavelengths) per measurement) to predict chemical types (only a few types, and easily grouped into binary splits, if that helps). I have ~10,000 spectra on known samples of each type. Popular methods in the literature are NN and PLS. Currently I'm using PCA + multiple linear regression, which works OK, but probably isn't best. I've also experimented a little with Differential Evolution, a kind of evolutionary algorithm, but that's too slow unless I do some data reduction first, i.e. replace the ~1000 pixels with some smaller number of features (such as principal components, or summed intensities of all pixels in each presumed spectral line).

Back when I started getting serious about using statistics on these issues I started this other thread. Thanks again to all who helped there.


Red, White, & Blue, baby!
Dec 17, 2004
I don't know, but you might want to take a look at the references here:
regarding protein prediction methods based off of circular dichroism. It's possibly an analogous situation to your problem, since at least it's extrapolating from measurements of chemicals. At least how the authors have developed their methodology over the decades might be of passing use.
Top Bottom