Difference between revisions of "Template choice principles"
Jump to navigation
Jump to search
m |
|||
Line 1: | Line 1: | ||
− | The most important step of comparative modelling is a carefully done multiple sequence alignment of the '''target''' sequence with a protein of known structure. However, you can't expect a useful model either, if you use an unsuitable '''template''' and for many templates more than one coordinate file is available. | + | <div id="BIO"> |
+ | <div class="b1"> | ||
+ | Template choice for homology models<br /> | ||
+ | <span style="font-size: 70%">Homology Modeling</span> | ||
+ | </div> | ||
+ | |||
+ | |||
+ | The most important step of comparative modelling is a carefully done multiple sequence alignment of the '''target''' sequence with a protein of known structure. However, you can't expect a useful model either, if you use an unsuitable '''template''', and for many templates more than one coordinate file is available. | ||
;All homologues can contribute template information to your project! | ;All homologues can contribute template information to your project! | ||
Line 5: | Line 12: | ||
;How to find a template | ;How to find a template | ||
* Keyword searches are possible, but unrealiable: there is no guarantee that the researchers who have deposited the structure have used the keyword you are thinking of, or that it is correctly spelled. | * Keyword searches are possible, but unrealiable: there is no guarantee that the researchers who have deposited the structure have used the keyword you are thinking of, or that it is correctly spelled. | ||
− | * Sequence searches: '''BLAST and PSI-BLAST are the tools of first choice to find homologues structures.''' Try a BLAST search in the PDB subsection of the protein database first. If this is unsuccessful, do a PSI-BLAST search in "nr" and look for | + | * Sequence searches: '''BLAST and PSI-BLAST are the tools of first choice to find homologues structures.''' Try a BLAST search in the PDB subsection of the protein database first. If this is unsuccessful, do a PSI-BLAST search in "nr" and look for homologs that are flagged with the "known structure" icon. |
− | * Use [http://www.cathdb.info/ '''CATH'''] or [http://scop.mrc-lmb.cam.ac.uk/scop/ '''SCOP''']. These hierarchical classifications of the entire PDB will contain domains that may serve as templates, if you know your protein's folding architecture. A structural superposition of templates may | + | * Use [http://www.cathdb.info/ '''CATH'''] or [http://scop.mrc-lmb.cam.ac.uk/scop/ '''SCOP''']. These hierarchical classifications of the entire PDB will contain domains that may serve as templates, if you know your protein's folding architecture. A structural superposition of templates may pinpoint key conserved residues that must be represented in the sequence alignment you use for your modellling procedure. |
;Hard and easy results | ;Hard and easy results | ||
− | * Since structural similarity correlates with sequence similarity, use the structure with the highest degree of % sequence identity (not alignment score) as a template. Easy modeling tasks are those where no indels have to be considered. Structural modeling of indels is always unreliable. In selected cases you may consider using a closely related template for the | + | * Since structural similarity correlates with sequence similarity, use the structure with the highest degree of % sequence identity (not alignment score) as a template. Easy modeling tasks are those where no indels have to be considered. Structural modeling of indels is always unreliable. In selected cases you may consider using a closely related template for the global fold, but importing coordinates for a loop of same-length from a more distantly related template. Of course, if you consider the phylogenetic tree for such a situation, the same-length loop is more likely to have '''converged''' to the same length, rather than sharing this part of sequence by homology. Nevertheless, it provides an example of a low-energy loop configuration within the global context of the target protein. |
;Assessing suitability | ;Assessing suitability | ||
Line 15: | Line 22: | ||
The model must be '''relevant''' to your protein's function! If you have a choice: | The model must be '''relevant''' to your protein's function! If you have a choice: | ||
− | * Choose | + | * Choose orthologs fulfilling the ''Reciprocal Best Match''' criterium over paralogues that may be functionally diverged; |
* Choose protein-ligand complexes over unliganded structures; | * Choose protein-ligand complexes over unliganded structures; | ||
* Choose structures in a functional state (bound inhibitor? heterooligomer? phosphorylated? proteolytic processing?) over free, unmodified structures; | * Choose structures in a functional state (bound inhibitor? heterooligomer? phosphorylated? proteolytic processing?) over free, unmodified structures; | ||
Line 27: | Line 34: | ||
* Use the structure with the best resolution (low values: 2.0 Å is better than 2.5 Å). | * Use the structure with the best resolution (low values: 2.0 Å is better than 2.5 Å). | ||
* Treat NMR structures like crystal structures with a resolution (at best) worse than 2.5 Å | * Treat NMR structures like crystal structures with a resolution (at best) worse than 2.5 Å | ||
− | * Well refined structures have R-values better than 10% of their nominal resolution (2Å: R< 0.2). | + | * Well refined structures have R-values better than 10% of their nominal resolution (e.g. 2Å: R< 0.2). |
− | * R-free, and R-merge are additional quality metrics ... but are difficult to assess for the non-expert. | + | * R-free, and R-merge are additional quality metrics ... but are difficult to assess for the non-expert. Here too: lower is better. |
+ | |||
+ | |||
+ | |||
+ | |||
+ | | ||
+ | [[Category:Bioinformatics]] | ||
+ | </div> |
Revision as of 12:40, 29 October 2012
Template choice for homology models
Homology Modeling
The most important step of comparative modelling is a carefully done multiple sequence alignment of the target sequence with a protein of known structure. However, you can't expect a useful model either, if you use an unsuitable template, and for many templates more than one coordinate file is available.
- All homologues can contribute template information to your project!
- How to find a template
- Keyword searches are possible, but unrealiable: there is no guarantee that the researchers who have deposited the structure have used the keyword you are thinking of, or that it is correctly spelled.
- Sequence searches: BLAST and PSI-BLAST are the tools of first choice to find homologues structures. Try a BLAST search in the PDB subsection of the protein database first. If this is unsuccessful, do a PSI-BLAST search in "nr" and look for homologs that are flagged with the "known structure" icon.
- Use CATH or SCOP. These hierarchical classifications of the entire PDB will contain domains that may serve as templates, if you know your protein's folding architecture. A structural superposition of templates may pinpoint key conserved residues that must be represented in the sequence alignment you use for your modellling procedure.
- Hard and easy results
- Since structural similarity correlates with sequence similarity, use the structure with the highest degree of % sequence identity (not alignment score) as a template. Easy modeling tasks are those where no indels have to be considered. Structural modeling of indels is always unreliable. In selected cases you may consider using a closely related template for the global fold, but importing coordinates for a loop of same-length from a more distantly related template. Of course, if you consider the phylogenetic tree for such a situation, the same-length loop is more likely to have converged to the same length, rather than sharing this part of sequence by homology. Nevertheless, it provides an example of a low-energy loop configuration within the global context of the target protein.
- Assessing suitability
The model must be relevant to your protein's function! If you have a choice:
- Choose orthologs fulfilling the Reciprocal Best Match' criterium over paralogues that may be functionally diverged;
- Choose protein-ligand complexes over unliganded structures;
- Choose structures in a functional state (bound inhibitor? heterooligomer? phosphorylated? proteolytic processing?) over free, unmodified structures;
- Choose native sequences over mutated sequences (incl. His-tag, SeMet, non-physiological post-translational modifications);
- Chose coordinate sets in which the regions of interest are well ordered over regions that are locally disordered and have high B-factors, or regions that are highly divergent in NMR model sets;
- Choose structures where crystal packing contacts are distant from regions of interest over those where crystal packing may introduce conformational artefacts.
- Assessing quality
Use the highest-quality structure available:
- Use the structure with the best resolution (low values: 2.0 Å is better than 2.5 Å).
- Treat NMR structures like crystal structures with a resolution (at best) worse than 2.5 Å
- Well refined structures have R-values better than 10% of their nominal resolution (e.g. 2Å: R< 0.2).
- R-free, and R-merge are additional quality metrics ... but are difficult to assess for the non-expert. Here too: lower is better.