Difference between revisions of "Template choice principles"

From "A B C"
Jump to navigation Jump to search
 
Line 4: Line 4:
  
 
;How to find a template
 
;How to find a template
* Keyword searches
+
* Keyword searches are possible, but unrealiable: there is no guarantee that the keyword you are thinking of is used, rather than a synonym, or that it is correctly spelled.
* Sequence searches: BLAST and PSI-BAST
+
* Sequence searches: '''BLAST and PSI-BLAST are the tools of first choice to find homologues structures.''' Try a BLAST search in the PDB subsection of the protein database first. If this is unsuccessful, do a PSI-BLAST search in "nr" and look for homologues sequences that are flagged with the "known structre" icon.
* Use of CATH
+
* Use of CATH - a hierarchical classification of the entire PDB will contain domains you can use if you know your protein's folding architecture, however the actual alignment is likely to be very challenging.
  
 
;Hard and easy results
 
;Hard and easy results
* % sequence identity and RMSD: choose the most similar model
+
* Since structural similarity correlates with sequence similarity, use the structure with the highest degree of % sequence identity (not alignment score) as a template. Easy results are those where no indels ave to be considered. Modeling indels is unreliable. In selected cases you may consider using a closely related template overall, but importing a same-length loop from a more distantly related template.
  
 
;Assessing suitability
 
;Assessing suitability
* Orthology, paralogy
+
 
* Complex
+
The model must be relevant to your protein's function! If you have a choice:
* Function (inhibitor?)
+
 
* Mutations (incl. His-tag, SeMet, post-translational modifications) - non adapted structural changes
+
* Choose orthologues over paralogues;
* order
+
* Choose protein-ligand complexes over unliganded structures;
* Crystal packing
+
* Choose structures in a functional state (bound inhibitor? Heterooligomer?) over free structures;
 +
* Choose native sequences over mutated sequences (incl. His-tag, SeMet, post-translational modifications);
 +
* Chose coordinate sets in which the regions of interest are well ordered over regions that are locally disordered and have high B-factors,  or regions that are highly divergent in of NMR model sets;
 +
* Choose structures where crystal packing contacts are distant from regions of interest over those where crystal packing may introduce conformational artefacts.
  
 
;Assessing quality
 
;Assessing quality
* Resolution, R-factor and coordinate error
+
 
* R-free, R-merge ... but difficult to assess for the non-expert. Best: look at the map, but otherwise may need to check B-factors for sections of interest
+
Use the highest-quality structure available:
 +
* Use the structure with the best resolution (low values: 2.0 Å is better than 2.5 Å).
 +
* Treat NMR structures like crystal structures with a resulution (at best) worse than 2.5 Å
 +
* Well refined structures have R-values better than 10% of their nominal resolution.
 +
* R-free, and R-merge are additional quality metrics ... but are difficult to assess for the non-expert.

Revision as of 13:51, 1 November 2007

The most important step of comparative modelling is a carefully done multiple sequence alignment of the target sequence with a protein of known structure. However, you can't expect a useful model either, if you use an unsuitable template and for many templates more than one coordinate file is available.

All homologues can contribute template information to your project!
How to find a template
  • Keyword searches are possible, but unrealiable: there is no guarantee that the keyword you are thinking of is used, rather than a synonym, or that it is correctly spelled.
  • Sequence searches: BLAST and PSI-BLAST are the tools of first choice to find homologues structures. Try a BLAST search in the PDB subsection of the protein database first. If this is unsuccessful, do a PSI-BLAST search in "nr" and look for homologues sequences that are flagged with the "known structre" icon.
  • Use of CATH - a hierarchical classification of the entire PDB will contain domains you can use if you know your protein's folding architecture, however the actual alignment is likely to be very challenging.
Hard and easy results
  • Since structural similarity correlates with sequence similarity, use the structure with the highest degree of % sequence identity (not alignment score) as a template. Easy results are those where no indels ave to be considered. Modeling indels is unreliable. In selected cases you may consider using a closely related template overall, but importing a same-length loop from a more distantly related template.
Assessing suitability

The model must be relevant to your protein's function! If you have a choice:

  • Choose orthologues over paralogues;
  • Choose protein-ligand complexes over unliganded structures;
  • Choose structures in a functional state (bound inhibitor? Heterooligomer?) over free structures;
  • Choose native sequences over mutated sequences (incl. His-tag, SeMet, post-translational modifications);
  • Chose coordinate sets in which the regions of interest are well ordered over regions that are locally disordered and have high B-factors, or regions that are highly divergent in of NMR model sets;
  • Choose structures where crystal packing contacts are distant from regions of interest over those where crystal packing may introduce conformational artefacts.
Assessing quality

Use the highest-quality structure available:

  • Use the structure with the best resolution (low values: 2.0 Å is better than 2.5 Å).
  • Treat NMR structures like crystal structures with a resulution (at best) worse than 2.5 Å
  • Well refined structures have R-values better than 10% of their nominal resolution.
  • R-free, and R-merge are additional quality metrics ... but are difficult to assess for the non-expert.