Proteome

From "A B C"
Revision as of 20:34, 3 February 2013 by Boris (talk | contribs)
Jump to navigation Jump to search

Proteome


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


The proteome may be thought of as the realization of the information encoded in the genome. Quantifying the proteome is the domain of 2D-gel electrophoresis, tandem affinity purification, or other methods capable of accurately separating proteins or protein complexes, followed by the identification of proteins through mass-spectroscopy. The proteome is not merely a reflection of the organism's genes, it has its own levels of regulation among which differentially spliced isoforms and extensive post-translational modifications underly a variability that can not be directly inferred from genome sequences alone.



 

Introductory reading

State-of-the-art proteome analysis. In your reading, make sure you understand the experimental principles applied, but focus on the computational techniques.

Beck et al. (2011) The quantitative proteome of a human cell line. Mol Syst Biol 7:549. (pmid: 22068332)

PubMed ] [ DOI ] The generation of mathematical models of biological processes, the simulation of these processes under different conditions, and the comparison and integration of multiple data sets are explicit goals of systems biology that require the knowledge of the absolute quantity of the system's components. To date, systematic estimates of cellular protein concentrations have been exceptionally scarce. Here, we provide a quantitative description of the proteome of a commonly used human cell line in two functional states, interphase and mitosis. We show that these human cultured cells express at least -10 000 proteins and that the quantified proteins span a concentration range of seven orders of magnitude up to 20 000 000 copies per cell. We discuss how protein abundance is linked to function and evolution.


 

Contents

  • Sample identification by fingerprinting
  • Sample identification from MS/MS data
  • relative vs. quantitative proteomics
  • Proteome databases (PeptideAtlas, PRIDE)
  • Transcriptome / Proteome correlation: codon usage or more?

   

Further reading and resources

Wang et al. (2012) PRIDE Inspector: a tool to visualize and validate MS proteomics data. Nat Biotechnol 30:135-7. (pmid: 22318026)

PubMed ] [ DOI ]

Beck et al. (2011) The quantitative proteome of a human cell line. Mol Syst Biol 7:549. (pmid: 22068332)

PubMed ] [ DOI ] The generation of mathematical models of biological processes, the simulation of these processes under different conditions, and the comparison and integration of multiple data sets are explicit goals of systems biology that require the knowledge of the absolute quantity of the system's components. To date, systematic estimates of cellular protein concentrations have been exceptionally scarce. Here, we provide a quantitative description of the proteome of a commonly used human cell line in two functional states, interphase and mitosis. We show that these human cultured cells express at least -10 000 proteins and that the quantified proteins span a concentration range of seven orders of magnitude up to 20 000 000 copies per cell. We discuss how protein abundance is linked to function and evolution.

Nagaraj et al. (2011) Deep proteome and transcriptome mapping of a human cancer cell line. Mol Syst Biol 7:548. (pmid: 22068331)

PubMed ] [ DOI ] While the number and identity of proteins expressed in a single human cell type is currently unknown, this fundamental question can be addressed by advanced mass spectrometry (MS)-based proteomics. Online liquid chromatography coupled to high-resolution MS and MS/MS yielded 166 420 peptides with unique amino-acid sequence from HeLa cells. These peptides identified 10 255 different human proteins encoded by 9207 human genes, providing a lower limit on the proteome in this cancer cell line. Deep transcriptome sequencing revealed transcripts for nearly all detected proteins. We calculate copy numbers for the expressed proteins and show that the abundances of > 90% of them are within a factor 60 of the median protein expression level. Comparisons of the proteome and the transcriptome, and analysis of protein complex databases and GO categories, suggest that we achieved deep coverage of the functional transcriptome and the proteome of a single cell type.

Jones & Hubbard (2010) An introduction to proteome bioinformatics. Methods Mol Biol 604:1-5. (pmid: 20013360)

PubMed ] [ DOI ] This book is part of the Methods in Molecular Biology series, and provides a general overview of computational approaches used in proteome research. In this chapter, we give an overview of the scope of the book in terms of current proteomics experimental techniques and the reasons why computational approaches are needed. We then give a summary of each chapter, which together provide a picture of the state of the art in proteome bioinformatics research.

Jung (2010) Statistical methods for proteomics. Methods Mol Biol 620:497-507. (pmid: 20652518)

PubMed ] [ DOI ] During the last decade, analytical methods for the detection and quantification of proteins and peptides in biological samples have been considerably improved. It is therefore now possible to compare simultaneously the expression levels of hundreds or thousands of proteins in different types of tissue, for example, normal and cancerous, or in different cell lines. In this chapter, we illustrate statistical designs for such proteomics experiments as well as methods for the analysis of resulting data. In particular, we focus on the preprocessing and analysis of protein expression levels recorded by the use of either two-dimensional gel electrophoresis or mass spectrometry.

Matthiesen & Amorim (2010) Proteomics facing the combinatorial problem. Methods Mol Biol 593:175-86. (pmid: 19957150)

PubMed ] [ DOI ] A large number of scoring functions for ranking peptide matches to observed MS/MS spectra have been discussed in the literature. In contrast to scoring functions, search strategies have received less attention, and an accurate description of search algorithms is limited. Proteomics is becoming more and more commonly used in potential clinical applications; for such approaches to be successful, the combinatorial problems from amino acid modifications and somatic and heredity SAPs (single amino acid substitutions) need to be seriously considered. The modifications and SAPs are problematic since MS and MS/MS search algorithms are optimization processes, which means that if the correct match is not iterated through during the search, then the data will be matched incorrectly, resulting in serious downstream flaws. This chapter discusses several search algorithm strategies in more detail.

Beltrao et al. (2007) Structures in systems biology. Curr Opin Struct Biol 17:378-84. (pmid: 17574836)

PubMed ] [ DOI ] Oil and water do not normally mix, and apparently structural biology and systems biology look like two different universes. It can be argued that structural biology could play a very important role in systems biology. Although at the final stage of understanding a signal transduction pathway, a cell, an organ or a living system, structures could be obviated, we need them to be able to reach that stage. Structures of macromolecules, especially molecular machines, could provide quantitative parameters, help to elucidate functional networks or enable rational designed perturbation experiments for reverse engineering. The role of structural biology in systems biology should be to provide enough understanding so that macromolecules can be translated into dots or even into equations devoid of atoms.