Difference between revisions of "Template choice principles"
Jump to navigation
Jump to search
Line 4: | Line 4: | ||
;How to find a template | ;How to find a template | ||
− | * Keyword searches | + | * Keyword searches are possible, but unrealiable: there is no guarantee that the keyword you are thinking of is used, rather than a synonym, or that it is correctly spelled. |
− | * Sequence searches: BLAST and PSI- | + | * Sequence searches: '''BLAST and PSI-BLAST are the tools of first choice to find homologues structures.''' Try a BLAST search in the PDB subsection of the protein database first. If this is unsuccessful, do a PSI-BLAST search in "nr" and look for homologues sequences that are flagged with the "known structre" icon. |
− | * Use of CATH | + | * Use of CATH - a hierarchical classification of the entire PDB will contain domains you can use if you know your protein's folding architecture, however the actual alignment is likely to be very challenging. |
;Hard and easy results | ;Hard and easy results | ||
− | * % sequence identity | + | * Since structural similarity correlates with sequence similarity, use the structure with the highest degree of % sequence identity (not alignment score) as a template. Easy results are those where no indels ave to be considered. Modeling indels is unreliable. In selected cases you may consider using a closely related template overall, but importing a same-length loop from a more distantly related template. |
;Assessing suitability | ;Assessing suitability | ||
− | * | + | |
− | * | + | The model must be relevant to your protein's function! If you have a choice: |
− | * | + | |
− | * | + | * Choose orthologues over paralogues; |
− | * | + | * Choose protein-ligand complexes over unliganded structures; |
− | * | + | * Choose structures in a functional state (bound inhibitor? Heterooligomer?) over free structures; |
+ | * Choose native sequences over mutated sequences (incl. His-tag, SeMet, post-translational modifications); | ||
+ | * Chose coordinate sets in which the regions of interest are well ordered over regions that are locally disordered and have high B-factors, or regions that are highly divergent in of NMR model sets; | ||
+ | * Choose structures where crystal packing contacts are distant from regions of interest over those where crystal packing may introduce conformational artefacts. | ||
;Assessing quality | ;Assessing quality | ||
− | * | + | |
− | * R-free, R-merge ... but difficult to assess for the non-expert. | + | Use the highest-quality structure available: |
+ | * Use the structure with the best resolution (low values: 2.0 Å is better than 2.5 Å). | ||
+ | * Treat NMR structures like crystal structures with a resulution (at best) worse than 2.5 Å | ||
+ | * Well refined structures have R-values better than 10% of their nominal resolution. | ||
+ | * R-free, and R-merge are additional quality metrics ... but are difficult to assess for the non-expert. |
Revision as of 13:51, 1 November 2007
The most important step of comparative modelling is a carefully done multiple sequence alignment of the target sequence with a protein of known structure. However, you can't expect a useful model either, if you use an unsuitable template and for many templates more than one coordinate file is available.
- All homologues can contribute template information to your project!
- How to find a template
- Keyword searches are possible, but unrealiable: there is no guarantee that the keyword you are thinking of is used, rather than a synonym, or that it is correctly spelled.
- Sequence searches: BLAST and PSI-BLAST are the tools of first choice to find homologues structures. Try a BLAST search in the PDB subsection of the protein database first. If this is unsuccessful, do a PSI-BLAST search in "nr" and look for homologues sequences that are flagged with the "known structre" icon.
- Use of CATH - a hierarchical classification of the entire PDB will contain domains you can use if you know your protein's folding architecture, however the actual alignment is likely to be very challenging.
- Hard and easy results
- Since structural similarity correlates with sequence similarity, use the structure with the highest degree of % sequence identity (not alignment score) as a template. Easy results are those where no indels ave to be considered. Modeling indels is unreliable. In selected cases you may consider using a closely related template overall, but importing a same-length loop from a more distantly related template.
- Assessing suitability
The model must be relevant to your protein's function! If you have a choice:
- Choose orthologues over paralogues;
- Choose protein-ligand complexes over unliganded structures;
- Choose structures in a functional state (bound inhibitor? Heterooligomer?) over free structures;
- Choose native sequences over mutated sequences (incl. His-tag, SeMet, post-translational modifications);
- Chose coordinate sets in which the regions of interest are well ordered over regions that are locally disordered and have high B-factors, or regions that are highly divergent in of NMR model sets;
- Choose structures where crystal packing contacts are distant from regions of interest over those where crystal packing may introduce conformational artefacts.
- Assessing quality
Use the highest-quality structure available:
- Use the structure with the best resolution (low values: 2.0 Å is better than 2.5 Å).
- Treat NMR structures like crystal structures with a resulution (at best) worse than 2.5 Å
- Well refined structures have R-values better than 10% of their nominal resolution.
- R-free, and R-merge are additional quality metrics ... but are difficult to assess for the non-expert.