CSB Gene lists
Gene Lists
Even though there are many different types of -omics data, many high-throughput or cross-sectional studies in molecular- and systems biology have as their result a list of genes or proteins. Whether these are significantly differentially expressed genes in a microarray study, chromosomal loci in a ChIP-Chip experiment, functionally related genes in a synthetic lethality screen, or co-purified proteins in a tandem-affinity MS experiment, the "list of genes" is a common denominator of all these approaches. Accordingly, similar or identical principles can be applied to their interpretation.
Contents
Further reading and resources
Durinck et al. (2009) Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 4:1184-91. (pmid: 19617889) |
[ PubMed ] [ DOI ] Genomic experiments produce multiple views of biological systems, among them are DNA sequence and copy number variation, and mRNA and protein abundance. Understanding these systems needs integrated bioinformatic analysis. Public databases such as Ensembl provide relationships and mappings between the relevant sets of probe and target molecules. However, the relationships can be biologically complex and the content of the databases is dynamic. We demonstrate how to use the computational environment R to integrate and jointly analyze experimental datasets, employing BioMart web services to provide the molecule mappings. We also discuss typical problems that are encountered in making gene-to-transcript-to-protein mappings. The approach provides a flexible, programmable and reproducible basis for state-of-the-art bioinformatic data integration. |
Boulesteix & Slawski (2009) Stability and aggregation of ranked gene lists. Brief Bioinformatics 10:556-68. (pmid: 19679825) |
[ PubMed ] [ DOI ] Ranked gene lists are highly instable in the sense that similar measures of differential gene expression may yield very different rankings, and that a small change of the data set usually affects the obtained gene list considerably. Stability issues have long been under-considered in the literature, but they have grown to a hot topic in the last few years, perhaps as a consequence of the increasing skepticism on the reproducibility and clinical applicability of molecular research findings. In this article, we review existing approaches for the assessment of stability of ranked gene lists and the related problem of aggregation, give some practical recommendations, and warn against potential misuse of these methods. This overview is illustrated through an application to a recent leukemia data set using the freely available Bioconductor package GeneSelector. |
Feng et al. (2012) Using the bioconductor GeneAnswers package to interpret gene lists. Methods Mol Biol 802:101-12. (pmid: 22130876) |
[ PubMed ] [ DOI ] Use of microarray data to generate expression profiles of genes associated with disease can aid in identification of markers of disease and potential therapeutic targets. Pathway analysis methods further extend expression profiling by creating inferred networks that provide an interpretable structure of the gene list and visualize gene interactions. This chapter describes GeneAnswers, a novel gene-concept network analysis tool available as an open source Bioconductor package. GeneAnswers creates a gene-concept network and also can be used to build protein-protein interaction networks. The package includes an example multiple myeloma cell line dataset and tutorial. Several network analysis methods are included in GeneAnswers, and the tutorial highlights the conditions under which each type of analysis is most beneficial and provides sample code. |
Warde-Farley et al. (2010) The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res 38:W214-20. (pmid: 20576703) |
[ PubMed ] [ DOI ] GeneMANIA (http://www.genemania.org) is a flexible, user-friendly web interface for generating hypotheses about gene function, analyzing gene lists and prioritizing genes for functional assays. Given a query list, GeneMANIA extends the list with functionally similar genes that it identifies using available genomics and proteomics data. GeneMANIA also reports weights that indicate the predictive value of each selected data set for the query. Six organisms are currently supported (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Homo sapiens and Saccharomyces cerevisiae) and hundreds of data sets have been collected from GEO, BioGRID, Pathway Commons and I2D, as well as organism-specific functional genomics data sets. Users can select arbitrary subsets of the data sets associated with an organism to perform their analyses and can upload their own data sets to analyze. The GeneMANIA algorithm performs as well or better than other gene function prediction methods on yeast and mouse benchmarks. The high accuracy of the GeneMANIA prediction algorithm, an intuitive user interface and large database make GeneMANIA a useful tool for any biologist. |