Enrichment

From "A B C"
Revision as of 23:34, 26 January 2012 by Boris (talk | contribs)
Jump to navigation Jump to search

Enrichment Analysis


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


Enrichment analysis addresses the question: do genes in a set have a remarkable property in common? The methodologies discussed here have applications in many fields of computational biology.


Introductory reading

Tilford & Siemers (2009) Gene set enrichment analysis. Methods Mol Biol 563:99-121. (pmid: 19597782)

PubMed ] [ DOI ] Set enrichment analytical methods have become commonplace tools applied to the analysis and interpretation of biological data. The statistical techniques are used to identify categorical biases within lists of genes, proteins, or metabolites. The goal is to discover the shared functions or properties of the biological items represented within the lists. Application of these methods can provide great biological insight, including the discovery of participation in the same biological activity or pathway, shared interacting genes or regulators, common cellular compartmentalization, or association with disease. The methods require ordered or unordered lists of biological items as input, understanding of the reference set from which the lists were selected, categorical classifiers describing the items, and a statistical algorithm to assess bias of each classifier. Due to the complexity of most algorithms and the number of calculations performed, computer software is almost always used for execution of the algorithm, as well as for presentation of the results. This chapter will provide an overview of the statistical methods used to perform an enrichment analysis. Guidelines for assembly of the requisite information will be presented, with a focus on careful definition of the sets used by the statistical algorithms. The need for multiple test correction when working with large libraries of classifiers is emphasized, and we outline several options for performing the corrections. Finally, interpreting the results of such analysis will be discussed along with examples of recent research utilizing the techniques.


Further reading and resources

Subramanian et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U.S.A 102:15545-50. (pmid: 16199517)

PubMed ] [ DOI ] Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

Merico et al. (2011) Visualizing gene-set enrichment results using the Cytoscape plug-in enrichment map. Methods Mol Biol 781:257-77. (pmid: 21877285)

PubMed ] [ DOI ] Gene-set enrichment analysis finds functionally coherent gene-sets, such as pathways, that are statistically overrepresented in a given gene list. Ideally, the number of resulting sets is smaller than the number of genes in the list, thus simplifying interpretation. However, the increasing number and redundancy of -gene-sets used by many current enrichment analysis resources work against this ideal. "Enrichment Map" is a Cytoscape plug-in that helps overcome gene-set redundancy and aids in the interpretation of enrichment results. Gene-sets are organized in a network, where each set is a node and links represent gene overlap between sets. Automated network layout groups related gene-sets into -network clusters, enabling the user to quickly identify the major enriched functional themes and more easily interpret enrichment results.