De novo structure prediction

ab initio Structure Prediction and Design

This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.

Protein structure prediction

Summary ...

The problem

...

Prediction

...

Forcefield based approaches

Shaw et al. (2010) Atomic-level characterization of the structural dynamics of proteins. Science 330:341-6. (pmid: 20947758)

[ PubMed ] [ DOI ] Molecular dynamics (MD) simulations are widely used to study protein motions at an atomic level of detail, but they have been limited to time scales shorter than those of many biologically critical conformational changes. We examined two fundamental processes in protein dynamics--protein folding and conformational change within the folded state--by means of extremely long all-atom MD simulations conducted on a special-purpose machine. Equilibrium simulations of a WW protein domain captured multiple folding and unfolding events that consistently follow a well-defined folding pathway; separate simulations of the protein's constituent substructures shed light on possible determinants of this pathway. A 1-millisecond simulation of the folded protein BPTI reveals a small number of structurally distinct conformational states whose reversible interconversion is slower than local relaxations within those states by a factor of more than 1000.

Lane et al. (2013) To milliseconds and beyond: challenges in the simulation of protein folding. Curr Opin Struct Biol 23:58-65. (pmid: 23237705)

[ PubMed ] [ DOI ] Quantitatively accurate all-atom molecular dynamics (MD) simulations of protein folding have long been considered a holy grail of computational biology. Due to the large system sizes and long timescales involved, such a pursuit was for many years computationally intractable. Further, sufficiently accurate forcefields needed to be developed in order to realistically model folding. This decade, however, saw the first reports of folding simulations describing kinetics on the order of milliseconds, placing many proteins firmly within reach of these methods. Progress in sampling and forcefield accuracy, however, presents a new challenge: how to turn huge MD datasets into scientific understanding. Here, we review recent progress in MD simulation techniques and show how the vast datasets generated by such techniques present new challenges for analysis. We critically discuss the state of the art, including reaction coordinate and Markov state model (MSM) methods, and provide a perspective for the future.

Template based approaches

...

Rosetta: ...

TASSER: ...

Covariation based approaches

Marks et al. (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30:1072-80. (pmid: 23138306)

[ PubMed ] [ DOI ] Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics.

Design

Tinberg et al. (2013) Computational design of ligand-binding proteins with high affinity and selectivity. Nature 501:212-216. (pmid: 24005320)

[ PubMed ] [ DOI ] The ability to design proteins with high affinity and selectivity for any given small molecule is a rigorous test of our understanding of the physiochemical principles that govern molecular recognition. Attempts to rationally design ligand-binding proteins have met with little success, however, and the computational design of protein-small-molecule interfaces remains an unsolved problem. Current approaches for designing ligand-binding proteins for medical and biotechnological uses rely on raising antibodies against a target antigen in immunized animals and/or performing laboratory-directed evolution of proteins with an existing low affinity for the desired ligand, neither of which allows complete control over the interactions involved in binding. Here we describe a general computational method for designing pre-organized and shape complementary small-molecule-binding sites, and use it to generate protein binders to the steroid digoxigenin (DIG). Of seventeen experimentally characterized designs, two bind DIG; the model of the higher affinity binder has the most energetically favourable and pre-organized interface in the design set. A comprehensive binding-fitness landscape of this design, generated by library selections and deep sequencing, was used to optimize its binding affinity to a picomolar level, and X-ray co-crystal structures of two variants show atomic-level agreement with the corresponding computational models. The optimized binder is selective for DIG over the related steroids digitoxigenin, progesterone and β-oestradiol, and this steroid binding preference can be reprogrammed by manipulation of explicitly designed hydrogen-bonding interactions. The computational design method presented here should enable the development of a new generation of biosensors, therapeutics and diagnostics.

Ghirlanda (2013) Computational biology: A recipe for ligand-binding proteins. Nature 501:177-8. (pmid: 24005323)

[ PubMed ] [ DOI ]

Kiss et al. (2013) Molecular dynamics simulations for the ranking, evaluation, and refinement of computationally designed proteins. Meth Enzymol 523:145-70. (pmid: 23422429)

[ PubMed ] [ DOI ] Computational methods have been developed to redesign proteins so that they can perform novel functions such as the catalysis of nonnatural reactions. Active sites are constructed from the inside out by stochastically exploring mutations that favor the binding of transition states, small molecule binders, and protein surfaces-depending on the task at hand. The approach allows the use of many proteins for engineering scaffolds upon which to erect the necessary functionality. Beyond being of practical value for producing proteins with new applications, the approach tests our understanding of protein chemistry. The current success rate, however, is rather modest, and so the designers have become good only at making catalysts with low catalytic efficiencies. Directed evolution can be used to enhance function and stability, while more advanced computational techniques and physics-based simulations are useful at elucidating structural flaws and at guiding the design process. Here, we summarize work that focuses on the dynamic properties of computationally designed enzymes and their directed evolution variants. We utilized in silico methods to address three questions: (1) What are the shortcomings of these designs? (2) Can they be improved? (3) Can we screen out designs that are likely to be inactive?

Further reading and resources

FoldIt

Zhang Lab (UMich)

Baker Lab (UWash)

Springer: Protein Structure Prediction (2008)

Ambrish, R. & Zhang, Y. (2012) Protein Structure Prediction. Encyclopedia of Life Sciences

(pmid: None) [ Source URL ] The goal of protein structure prediction is to estimate the spatial position of every atom of protein molecules from the amino acid sequence by computational methods. Depending on the availability of homologous templates in the PDB library, structure prediction approaches are categorised into template-based modelling (TBM) and free modelling (FM). While TBM is by far the only reliable method for high-resolution structure prediction, challenges in the field include constructing the correct folds without using template structures and refining the template models closer to the native state when templates are available. Nevertheless, the usefulness of various levels of protein structure predictions have been convincingly demonstrated in biological and medical applications.

Morcos et al. (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U.S.A 108:E1293-301. (pmid: 22106262)

[ PubMed ] [ DOI ] The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced direct-coupling analysis (DCA). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intradomain residue contacts, arising, e.g., from alternative protein conformations, ligand-mediated residue couplings, and interdomain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.

Marks et al. (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6:e28766. (pmid: 22163331)

[ PubMed ] [ DOI ] The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α)-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.

Hopf et al. (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149:1607-21. (pmid: 22579045)

[ PubMed ] [ DOI ] We show that amino acid covariation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane) applies a maximum entropy approach to infer evolutionary covariation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints. We benchmark the approach with blinded de novo computation of known transmembrane protein structures from 23 families, demonstrating unprecedented accuracy of the method for large transmembrane proteins. We show how the method can predict oligomerization, functional sites, and conformational changes in transmembrane proteins. With the rapid rise in large-scale sequencing, more accurate and more comprehensive information on evolutionary constraints can be decoded from genetic variation, greatly expanding the repertoire of transmembrane proteins amenable to modeling by this method.

Marks et al. (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30:1072-80. (pmid: 23138306)

[ PubMed ] [ DOI ] Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics.

De novo structure prediction

Contents

The problem

Prediction

Forcefield based approaches

Template based approaches

Covariation based approaches

Design

Further reading and resources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools