CSB Gene lists
Gene Lists
Even though there are many different types of -omics data, many high-throughput or cross-sectional studies in molecular- and systems biology have as their result a list of genes or proteins. Whether these are significantly differentially expressed genes in a microarray study, chromosomal loci in a ChIP-Chip experiment, functionally related genes in a synthetic lethality screen, or co-purified proteins in a tandem-affinity MS experiment, the "list of genes" is a common denominator of all these approaches. Accordingly, similar or identical principles can be applied to their interpretation.
Contents
Further reading and resources
Durinck et al. (2009) Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 4:1184-91. (pmid: 19617889) |
[ PubMed ] [ DOI ] Genomic experiments produce multiple views of biological systems, among them are DNA sequence and copy number variation, and mRNA and protein abundance. Understanding these systems needs integrated bioinformatic analysis. Public databases such as Ensembl provide relationships and mappings between the relevant sets of probe and target molecules. However, the relationships can be biologically complex and the content of the databases is dynamic. We demonstrate how to use the computational environment R to integrate and jointly analyze experimental datasets, employing BioMart web services to provide the molecule mappings. We also discuss typical problems that are encountered in making gene-to-transcript-to-protein mappings. The approach provides a flexible, programmable and reproducible basis for state-of-the-art bioinformatic data integration. |
Boulesteix & Slawski (2009) Stability and aggregation of ranked gene lists. Brief Bioinformatics 10:556-68. (pmid: 19679825) |
[ PubMed ] [ DOI ] Ranked gene lists are highly instable in the sense that similar measures of differential gene expression may yield very different rankings, and that a small change of the data set usually affects the obtained gene list considerably. Stability issues have long been under-considered in the literature, but they have grown to a hot topic in the last few years, perhaps as a consequence of the increasing skepticism on the reproducibility and clinical applicability of molecular research findings. In this article, we review existing approaches for the assessment of stability of ranked gene lists and the related problem of aggregation, give some practical recommendations, and warn against potential misuse of these methods. This overview is illustrated through an application to a recent leukemia data set using the freely available Bioconductor package GeneSelector. |
Feng et al. (2012) Using the bioconductor GeneAnswers package to interpret gene lists. Methods Mol Biol 802:101-12. (pmid: 22130876) |
[ PubMed ] [ DOI ] Use of microarray data to generate expression profiles of genes associated with disease can aid in identification of markers of disease and potential therapeutic targets. Pathway analysis methods further extend expression profiling by creating inferred networks that provide an interpretable structure of the gene list and visualize gene interactions. This chapter describes GeneAnswers, a novel gene-concept network analysis tool available as an open source Bioconductor package. GeneAnswers creates a gene-concept network and also can be used to build protein-protein interaction networks. The package includes an example multiple myeloma cell line dataset and tutorial. Several network analysis methods are included in GeneAnswers, and the tutorial highlights the conditions under which each type of analysis is most beneficial and provides sample code. |