Proteome

From "A B C"
Jump to navigation Jump to search

Proteome


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


The proteome may be thought of as the realization of the information encoded in the genome. Quantifying the proteome is the domain of 2D-gel electrophoresis, tandem affinity purification, or other methods capable of accurately separating proteins or protein complexes, followed by the identification of proteins through mass-spectroscopy. The proteome is not merely a reflection of the organism's genes, it has its own levels of regulation among which differentially spliced isoforms and extensive post-translational modifications underly a variability that can not be directly inferred from genome sequences alone.



 

Introductory reading

State-of-the-art proteome analysis. In your reading, make sure you understand the experimental principles applied, but focus on the computational techniques.

Beck et al. (2011) The quantitative proteome of a human cell line. Mol Syst Biol 7:549. (pmid: 22068332)

PubMed ] [ DOI ] The generation of mathematical models of biological processes, the simulation of these processes under different conditions, and the comparison and integration of multiple data sets are explicit goals of systems biology that require the knowledge of the absolute quantity of the system's components. To date, systematic estimates of cellular protein concentrations have been exceptionally scarce. Here, we provide a quantitative description of the proteome of a commonly used human cell line in two functional states, interphase and mitosis. We show that these human cultured cells express at least -10 000 proteins and that the quantified proteins span a concentration range of seven orders of magnitude up to 20 000 000 copies per cell. We discuss how protein abundance is linked to function and evolution.


 

Contents

  • Sample identification by fingerprinting
  • Sample identification from MS/MS data
  • relative vs. quantitative proteomics
  • Proteome databases (PeptideAtlas, PRIDE)
  • Transcriptome / Proteome correlation: codon usage or more?

   

Further reading and resources

Clancy & Hovig (2014) From proteomes to complexomes in the era of systems biology. Proteomics 14:24-41. (pmid: 24243660)

PubMed ] [ DOI ] Protein complexes carry out almost the entire signaling and functional processes in the cell. The protein complex complement of a cell, and its network of complex-complex interactions, is referred to here as the complexome. Computational methods to predict protein complexes from proteomics data, resulting in network representations of complexomes, have recently being developed. In addition, key advances have been made toward understanding the network and structural organization of complexomes. We review these bioinformatics advances, and their discovery-potential, as well as the merits of integrating proteomics data with emerging methods in systems biology to study protein complex signaling. It is envisioned that improved integration of proteomics and systems biology, incorporating the dynamics of protein complexes in space and time, may lead to more predictive models of cell signaling networks for effective modulation.

Elber & Kirmizialtin (2013) Molecular machines. Curr Opin Struct Biol 23:206-11. (pmid: 23305848)

PubMed ] [ DOI ] Molecular machines (MM) are essential components of living cells. They conduct mechanical work, transport materials into and out of cells, assist in processing enzymatic reactions, and more. Their operations are frequently combined with significant conformational transitions. Computational studies of these conformational transitions and their coupling to molecular functions are discussed. It is argued that coarse descriptions of these molecules which are based on mass density and shape provide useful information on directions of action. It is further argued that MM are likely to have well focused and narrow reaction pathways. The proposal for such pathways is supported by evolutionary analyses of homologous machines. Finally, these observations are used to build atomically detailed models of these systems that are making the link from structure to functions (kinetics and thermodynamics). For that purpose enhanced sampling techniques are required.

Wang et al. (2012) PRIDE Inspector: a tool to visualize and validate MS proteomics data. Nat Biotechnol 30:135-7. (pmid: 22318026)

PubMed ] [ DOI ]

Beck et al. (2011) The quantitative proteome of a human cell line. Mol Syst Biol 7:549. (pmid: 22068332)

PubMed ] [ DOI ] The generation of mathematical models of biological processes, the simulation of these processes under different conditions, and the comparison and integration of multiple data sets are explicit goals of systems biology that require the knowledge of the absolute quantity of the system's components. To date, systematic estimates of cellular protein concentrations have been exceptionally scarce. Here, we provide a quantitative description of the proteome of a commonly used human cell line in two functional states, interphase and mitosis. We show that these human cultured cells express at least -10 000 proteins and that the quantified proteins span a concentration range of seven orders of magnitude up to 20 000 000 copies per cell. We discuss how protein abundance is linked to function and evolution.

Nagaraj et al. (2011) Deep proteome and transcriptome mapping of a human cancer cell line. Mol Syst Biol 7:548. (pmid: 22068331)

PubMed ] [ DOI ] While the number and identity of proteins expressed in a single human cell type is currently unknown, this fundamental question can be addressed by advanced mass spectrometry (MS)-based proteomics. Online liquid chromatography coupled to high-resolution MS and MS/MS yielded 166 420 peptides with unique amino-acid sequence from HeLa cells. These peptides identified 10 255 different human proteins encoded by 9207 human genes, providing a lower limit on the proteome in this cancer cell line. Deep transcriptome sequencing revealed transcripts for nearly all detected proteins. We calculate copy numbers for the expressed proteins and show that the abundances of > 90% of them are within a factor 60 of the median protein expression level. Comparisons of the proteome and the transcriptome, and analysis of protein complex databases and GO categories, suggest that we achieved deep coverage of the functional transcriptome and the proteome of a single cell type.

Malik et al. (2010) From proteome lists to biological impact--tools and strategies for the analysis of large MS data sets. Proteomics 10:1270-83. (pmid: 20077408)

PubMed ] [ DOI ] MS has become a method-of-choice for proteome analysis, generating large data sets, which reflect proteome-scale protein-protein interaction and PTM networks. However, while a rapid growth in large-scale proteomics data can be observed, the sound biological interpretation of these results clearly lags behind. Therefore, combined efforts of bioinformaticians and biologists have been made to develop strategies and applications to help experimentalists perform this crucial task. This review presents an overview of currently available analytical strategies and tools to extract biologically relevant information from large protein lists. Moreover, we also present current research publications making use of these tools as examples of how the presented strategies may be incorporated into proteomic workflows. Emphasis is placed on the analysis of Gene Ontology terms, interaction networks, biological pathways and PTMs. In addition, topics including domain analysis and text mining are reviewed in the context of computational analysis of proteomic results. We expect that these types of analyses will significantly contribute to a deeper understanding of the role of individual proteins, protein networks and pathways in complex systems.

Jones & Hubbard (2010) An introduction to proteome bioinformatics. Methods Mol Biol 604:1-5. (pmid: 20013360)

PubMed ] [ DOI ] This book is part of the Methods in Molecular Biology series, and provides a general overview of computational approaches used in proteome research. In this chapter, we give an overview of the scope of the book in terms of current proteomics experimental techniques and the reasons why computational approaches are needed. We then give a summary of each chapter, which together provide a picture of the state of the art in proteome bioinformatics research.

Jung (2010) Statistical methods for proteomics. Methods Mol Biol 620:497-507. (pmid: 20652518)

PubMed ] [ DOI ] During the last decade, analytical methods for the detection and quantification of proteins and peptides in biological samples have been considerably improved. It is therefore now possible to compare simultaneously the expression levels of hundreds or thousands of proteins in different types of tissue, for example, normal and cancerous, or in different cell lines. In this chapter, we illustrate statistical designs for such proteomics experiments as well as methods for the analysis of resulting data. In particular, we focus on the preprocessing and analysis of protein expression levels recorded by the use of either two-dimensional gel electrophoresis or mass spectrometry.

Matthiesen & Amorim (2010) Proteomics facing the combinatorial problem. Methods Mol Biol 593:175-86. (pmid: 19957150)

PubMed ] [ DOI ] A large number of scoring functions for ranking peptide matches to observed MS/MS spectra have been discussed in the literature. In contrast to scoring functions, search strategies have received less attention, and an accurate description of search algorithms is limited. Proteomics is becoming more and more commonly used in potential clinical applications; for such approaches to be successful, the combinatorial problems from amino acid modifications and somatic and heredity SAPs (single amino acid substitutions) need to be seriously considered. The modifications and SAPs are problematic since MS and MS/MS search algorithms are optimization processes, which means that if the correct match is not iterated through during the search, then the data will be matched incorrectly, resulting in serious downstream flaws. This chapter discusses several search algorithm strategies in more detail.

Beltrao et al. (2007) Structures in systems biology. Curr Opin Struct Biol 17:378-84. (pmid: 17574836)

PubMed ] [ DOI ] Oil and water do not normally mix, and apparently structural biology and systems biology look like two different universes. It can be argued that structural biology could play a very important role in systems biology. Although at the final stage of understanding a signal transduction pathway, a cell, an organ or a living system, structures could be obviated, we need them to be able to reach that stage. Structures of macromolecules, especially molecular machines, could provide quantitative parameters, help to elucidate functional networks or enable rational designed perturbation experiments for reverse engineering. The role of structural biology in systems biology should be to provide enough understanding so that macromolecules can be translated into dots or even into equations devoid of atoms.