Data mining

From "A B C"
Jump to navigation Jump to search

Data mining


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


Data mining (or knowledge discovery) is a collection of methods for discovering patterns of interest in large datasets. Similar to Exploratory Data Analysis in that one aims to approach the data without preconceived notions about what the patterns could be, data mining relies less on visualization to allow the investigator to discover patterns by inspection, and more on computable descriptions of patterns. Such strategies are especially well suited to situations in which it is hard to devise good visual presentations, such as text mining, or mining of phenotype descriptions, or when the data is very high-dimensional. There are however significant overlaps between the two concepts. This page focusses predominantly on text mining, but David Reshef's work on the maximal information coefficient deserves special mention.


Introductory reading

Speed (2011) Mathematics. A correlation for the 21st century. Science 334:1502-3. (pmid: 22174235)

PubMed ] [ DOI ]


Contents

...


Further reading and resources

Reshef et al. (2011) Detecting novel associations in large data sets. Science 334:1518-24. (pmid: 22174245)

PubMed ] [ DOI ]

Clegg & Shepherd (2008) Text mining. Methods Mol Biol 453:471-91. (pmid: 18712320)

PubMed ] [ DOI ]

Krallinger et al. (2010) Analysis of biological processes and diseases using text mining approaches. Methods Mol Biol 593:341-82. (pmid: 19957157)

PubMed ] [ DOI ]

Groth et al. (2010) Phenoclustering: online mining of cross-species phenotypes. Bioinformatics 26:1924-5. (pmid: 20562418)

PubMed ] [ DOI ]

Groth et al. (2011) Phenotype mining for functional genomics and gene discovery. Methods Mol Biol 760:159-73. (pmid: 21779996)

PubMed ] [ DOI ]