Difference between revisions of "BIO Assignment 5 2011"

From "A B C"
Jump to navigation Jump to search
Line 35: Line 35:
 
:The protein whose structure you are using as a guide to build the model.
 
:The protein whose structure you are using as a guide to build the model.
 
;Model
 
;Model
:The structure that results from the modeling process. It has the Target sequence and is similar to the Template structure.
+
:The structure that results from the modeling process. It has the '''Target sequence''' and is similar to the '''Template structure'''.
 
A brief overview article on the construction and use of homology models is linked to the resource section at the bottom of this page. That section also contains all links to other sites and resources you might require.
 
A brief overview article on the construction and use of homology models is linked to the resource section at the bottom of this page. That section also contains all links to other sites and resources you might require.
  
Line 63: Line 63:
 
<small>(for example my fifth assignment would be named: A5_steipe.boris.doc - and don't switch the order of your given name and familyname please!)</small>
 
<small>(for example my fifth assignment would be named: A5_steipe.boris.doc - and don't switch the order of your given name and familyname please!)</small>
  
Finally e-mail the document to [boris.steipe@utoronto.ca] before the due date.
+
Finally e-mail the document to [mailto: boris.steipe@utoronto.ca] before the due date.
  
 
Your document must not contain macros. Please turn off and/or remove all macros from your Word document; we will disable macros, since they pose a security risk.
 
Your document must not contain macros. Please turn off and/or remove all macros from your Word document; we will disable macros, since they pose a security risk.
Line 99: Line 99:
  
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
=== SUB section Heading (X marks)===
+
=== The input alignment (X marks)===
 
</div>
 
</div>
 
&nbsp;<br>
 
&nbsp;<br>
  
Instruction
+
The sequence alignment between target and template is the single most important factor that determines the quality of your model.
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
 
*Task
+
No homology modeling process will repair an incorrect alignment and it is useful to consider a homology model rather like a three-dimensional map of a sequence alignment, rather than a structure in its own right. In a homology modeling project, typically the largest amount of time should be spent on preparing the best possible alignment. Even though automated servers like the SwissModel server will align sequences and select template structures for you, it would be unwise to use these just because they are convenient, rather than the more sophisticated methods and more informed procedures we have discussed. Detailed analysis of fallacious models rarely leads to good results.
 +
 
 +
The best possible alignment is usually constructed from a multiple sequence alignment that includes at least the target and template sequence and other related sequences as well. The additional sequences are an important aid in identifying the correct placement of insertions and deletions. Typically such an alignment will also include additional optimization steps to move insertions or deletions between target and template out of the secondary structure elements of the template structure.
 +
 
 +
Here is an excerpt from the T-coffee aligned Mbp1 sequences: it contains all the residues of the yeast sequence that are found in the 1MB1 crystal structure, and it has been edited to remove the N-terminal gaps in the sequence. Thus the N-terminus is 21 amino acids longer than the definition of the APSES domain in CDD (which starts with <code>SIMKR...</code>), the C- terminus is slightly shorter. Since the sequences are very similar between each other, there is no ambiguity in the alignment and the construction of a homology model should be straightforward. Normally one would spend considerable some effort at this stage to consider which parts of the target sequence and the template sequence appear to  correctly aligned and to edit the alignment manually. In our case, evolutionary pressure was so strong that essentially all have evolved without a single indel in their sequence.
 +
 
 +
I have added to the alignment as a reference and fallback sequence the APSES domain of [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=Protein&list_uids=116197493&dopt=GenPept XP_001224558], the APSES domain of the ''Chaetomium globosum'' Mbp1 orthologue (MBP1_CHAGL).
 +
 
 +
1MB1            NQIYSARYSGVDVYEFIHSTG---SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEV
 +
MBP1_CANGL      NQIYSAKYSGVDVYEFIHPTG---SIMKRKNDGWVNATHILKAANFAKAKRTRILEKEV
 +
MBP1_EREGO      TQIYSAKYSGVEVYEFLHPTG---SIMKRKADDWVNATHILKAAKFAKAKRTRILEKEV
 +
MBP1_KLULA      NQIYSAKYSGVDVYEFIHPTG---SIMKRKADNWVNATHILKAAKFPKAKRTRILEKEV
 +
MBP1_CANAL      SQIYSATYSNVPAFEFVTSEG---PIMRRKKDSWINATHILKIAKFPKAKRTRILEKDV
 +
MBP1_DEBHA      TQIYSATYSNVPVFEFVTLEG---PIMRRKLDSWINATHILKIAKFPKAKRTRILEKDV
 +
MBP1_YARLI      MSIYKATYSGVPVYEFQCKNV---AVMRRKSDGWVNATHILKVAGFDKPQRTRILEKEV
 +
MBP1_SCHPO      SAVHVAVYSGVEVYECFIKGV---SVMRRRRDSWLNATQILKVADFDKPQRTRVLERQV
 +
MBP1_USTMA      KTIFKATYSGVPVYECIINNV---AVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREI
 +
MBP1_ASPNI      SNVYSATYSSVPVYEFKIGTD---SVMRRRSDDWINATHILKVAGFDKPARTRILEREV
 +
MBP1_ASPTE      SKIYSATYSSVPVYEFKIEGD---SVMRRRADDWINATHILKVAGFDKPARTRILEREV
 +
MBP1_CRYNE      PKVYASVYSGVPVFEAMIRGI---SVMRRASDSWVNATQILKVAGVHKSARTKILEKEV
 +
MBP1_GIBZE      G-IYSASYSGVDVYEMEVNNI---AVMRRRNDSWLNATQILKVAGVDKGKRTKILEKEI
 +
MBP1_NEUCR      IYSLQATYSGVGVYEMEVNNV---AVMRRQKDGWVNATQILKVANIDKGRRTKILEKEI
 +
MBP1_MAGGR      P-IYTAVYSNVEVYEFEVNGV---AVMKRIGDSKLNATQILKVAGVEKGKRTKILEKEI
 +
MBP1_ASPFU      PQIYKAVYSNVSVYEMEVNGV---AVMKRRSDSWLNATQILKVAGVVKARRTKTLEKEI
 +
MBP1_CHAGL      AGIYSATYSGIPVYEYQFGPDMKEHVMRRREDNWINATHILKAAGFDKPARTRILERDV
 +
 +
1MB1            LKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLF
 +
MBP1_CANGL      LKEMHEKVQGGFGKYQGTWVPLNIAINLAEKFDVYQDLKPLF
 +
MBP1_EREGO      IKDTHEKVQGGFGKYQGTWVPLDIARRLAQKFEVLEELRPLF
 +
MBP1_KLULA      ITDTHEKVQGGFGKYQGTWIPLELASKLAEKFEVLDELKPLF
 +
MBP1_CANAL      QTGIHEKVQGGYGKYQGTYVPLDLGAAIARNFGVYDVLKPIF
 +
MBP1_DEBHA      QTGVHEKVQGGYGKYQGTYVPLDLGADIAKNFGVFDSLRPIF
 +
MBP1_YARLI      QKGVHEKVQGGYGKYQGTWVPLERAREIATLYDVDSHLAPIF
 +
MBP1_SCHPO      QIGAHEKVQGGYGKYQGTWVPFQRGVDLATKYKVDGIMSPIL
 +
MBP1_USTMA      QKGIHEKVQGGYGKYQGTWIPLDVAIELAERYNIQGLLQPIT
 +
MBP1_ASPNI      QKGVHEKVQGGYGKYQGTWIPLQEGRQLAERNNILDKLLPIF
 +
MBP1_ASPTE      QKGVHEKVQGGYGKYQGTWIPLPEGRLLAERNNIIDKLRPIF
 +
MBP1_CRYNE      LNGIHEKIQGGYGKYQGTWVPLDRGRDLAEQYGVGSYLSSVF
 +
MBP1_GIBZE      QTGEHEKVQGGYGKYQGTWIKFERGLQVCRQYGVEELLRPLL
 +
MBP1_NEUCR      QIGEHEKVQGGYGKYQGTWIPFERGLEVCRQYGVEELLSKLL
 +
MBP1_MAGGR      QTGEHEKVQGGYGKYQGTWIKYERALEVCRQYGVEELLRPLL
 +
MBP1_ASPFU      AAGEHEKVQGGYGKYQGTWVNYQRGVELCREYHVEELLRPLL
 +
MBP1_CHAGL      QKDVHEKIQGGYGKYQGTWIPLEQGRALAQRNNIYDRLRPIF
 +
 
 +
&nbsp;<br>
 +
 
 +
<div style="padding: 5px; background: #EEEEEE;">
 +
*For your organism's Mbp1 sequence, define the start- and end- sequence numbers of the '''target''' sequence aligned above relative to the full-length protein. (You can easily access the full-length protein sequence at the NCBI through the [[Assignment_3#(1.1)_Mbp1_orthologues_(1 mark)|links in the RefSeq column table of Assignment 3]]  Prepare a FASTA formatted file for the '''target''' sequence in your organism, giving it an appropriate header and include the sequence numbers. Refer to the [[Assignment_5_fallback_data|'''Fallback data''']] file if you are not sure about the format.
 
</div>
 
</div>
 
&nbsp;<br>
 
&nbsp;<br>

Revision as of 21:06, 5 December 2006

   

Assignment 5 - Homology modeling

Please note: This assignment is currently inactive. Unannounced changes may be made at any time.  


 

Introduction  

How could the search for ultimate truth have revealed so hideous and visceral-looking an object?
Max Perutz (on his first glimpse of the Hemoglobin structure)

Where is the hidden beauty in structure, and where, the "ultimate truth"? In the previous assignments we have studied sequence conservation in APSES family domains and looked at how these domains have evolved over time. We have seen that this is an ancient family, that had several members already in the cenancestor of all fungi, an organism that lived in the vendian period of the proterozoic era of precambrian times, more than 600,000,000 years ago.

In order to understand how particular residues in the sequence contribute to the putative function of the protein, and why and how they are conserved throughout evolution, we would need to consider an explicit molecular model of an APSES domain protein, bound to its cognate DNA sequence. In particular, it would be interesting to correlate the conservation patterns we have observed in the MSAs with specific DNA binding interactions. Unfortunately, the 1MB1 structure does not have DNA bound and the evidence we have considered in Assignment 2 (Taylor et al., 2000) is not sufficient to define the details of how a DNA double helix might be bound. These details would require the structure of a complex that contains protein as well as DNA. No such complex of an APSES domain has yet been crystallized.

In this assignment you will construct a molecular model of the Mbp1 orthologue in your assigned organism, identify similar structures of distantly related domains for which protein-DNA complexes are known, define whether the available evidence allows you to distinguish between different modes of ligand binding, and assemble a hypothetical complex structure.

For the following, please remember the following terminology:

Target
The protein that you are planning to model.
Template
The protein whose structure you are using as a guide to build the model.
Model
The structure that results from the modeling process. It has the Target sequence and is similar to the Template structure.

A brief overview article on the construction and use of homology models is linked to the resource section at the bottom of this page. That section also contains all links to other sites and resources you might require.


Preparation, submission and due date

Read carefully. Be sure you have understood all parts of the assignment and cover all questions in your answers! Sadly, we see too many assignments which, arduously effected, nevertheless intimate nescience of elementary tenets of molecular biology. If the sentence above did not trigger an urge to open a dictionary, you have a tendency to guess, rather than confirm possibly important information.

Prepare a Microsoft Word document with a title page that contains:

  • your full name
  • your Student ID
  • your e-mail address
  • the organism name you have been assigned

Follow the steps outlined below. You are encouraged to write your answers in short answer form or point form, like you would document an analysis in a laboratory notebook. However, you must

  • document what you have done,
  • note what Web sites and tools you have used,
  • paste important data sequences, alignments, information etc.

If you do not document the process of your work, we will deduct marks. Try to be concise, not wordy! Use your judgement: are you giving us enough information so we could exactly reproduce what you have done? If not, we will deduct marks. Avoid RTF and unnecessary formating. Do not paste screendumps. Keep the size of your submission below 1.5 MB.

Write your answers into separate paragraphs and give each its title. Save your document with a filename of: A5_family name.given name.doc (for example my fifth assignment would be named: A5_steipe.boris.doc - and don't switch the order of your given name and familyname please!)

Finally e-mail the document to [mailto: boris.steipe@utoronto.ca] before the due date.

Your document must not contain macros. Please turn off and/or remove all macros from your Word document; we will disable macros, since they pose a security risk.

We do not have the resources to correct formatting errors or to convert assignments into different formats. Keep your image-file sizes manageable!

Image sizes are measured in pixels - 600px across is sufficient for the assignment, resolutions are measured in dpi (dots per imperial inch) - 72 dpi is the standard resolution for images that are viewed on a monitor; the displayed size may be scaled (in %) by an application program: stereo images should be presented so that equivalent points are approximately 6 cm apart; images can be stored uncompressed as .tiff or.bmp, or compressed as .gif or .jpg. .gif is preferred for images with large, monochrome areas and sharp, high-contrast edges; .jpg is preferred for images with shades and halftones such as the structure views required here; .tiff is preferred to archive master copies of images in a lossless fashion, use LZW compression for .tiff files if your system/application supports it; .bmp is not preferred for anything, its used because its easier to code.

Information that you present (such as added colouring, formatting etc.) should be meaningful. If you have technical difficulties, post your questions to the list and/or contact me.

All required stereo views are to be presented as divergent stereo frames (left eye's view in the left frame). Remember to list the Rasmol command input you have used to generate the images.

With the number of students in the course, we have to economize on processing the assignments. Thus we will not accept assignments that are not prepared as described above. If you have technical difficulties, contact me.

The due date for the assignment is Wednesday, December 20. at 10:00 in the morning.

Grading

Don't wait until the last day to find out there are problems! This assignment has been structured so that it should be doable in three or four hours. The assignment is excellent preparation for the exam, so even if its due later, its a good idea to do it earlier. Assignments that are received past the due date will have one mark deducted at the first minute of every twelve hour period past the due date. Assignments received more than 5 days past the due date will not be assessed. If you need an extension, you must arrange this beforehand.

Marks are noted below in the section headings for of the tasks. A total of 10 marks will be awarded, if your assignment answers all of the questions. A total of 2 bonus marks (up to a maximum of 10 overall) can be awarded for particularily interesting findings, or insightful comments. A total of 2 marks can be subtracted for lack of form or for glaring errors. The marks you receive will

  • count directly towards your final marks at the end of term, for BCH441 (undergraduates), or
  • be divided by two for BCH1441 (graduates).

   

(1) Preparation

   

The input alignment (X marks)

 

The sequence alignment between target and template is the single most important factor that determines the quality of your model.

No homology modeling process will repair an incorrect alignment and it is useful to consider a homology model rather like a three-dimensional map of a sequence alignment, rather than a structure in its own right. In a homology modeling project, typically the largest amount of time should be spent on preparing the best possible alignment. Even though automated servers like the SwissModel server will align sequences and select template structures for you, it would be unwise to use these just because they are convenient, rather than the more sophisticated methods and more informed procedures we have discussed. Detailed analysis of fallacious models rarely leads to good results.

The best possible alignment is usually constructed from a multiple sequence alignment that includes at least the target and template sequence and other related sequences as well. The additional sequences are an important aid in identifying the correct placement of insertions and deletions. Typically such an alignment will also include additional optimization steps to move insertions or deletions between target and template out of the secondary structure elements of the template structure.

Here is an excerpt from the T-coffee aligned Mbp1 sequences: it contains all the residues of the yeast sequence that are found in the 1MB1 crystal structure, and it has been edited to remove the N-terminal gaps in the sequence. Thus the N-terminus is 21 amino acids longer than the definition of the APSES domain in CDD (which starts with SIMKR...), the C- terminus is slightly shorter. Since the sequences are very similar between each other, there is no ambiguity in the alignment and the construction of a homology model should be straightforward. Normally one would spend considerable some effort at this stage to consider which parts of the target sequence and the template sequence appear to correctly aligned and to edit the alignment manually. In our case, evolutionary pressure was so strong that essentially all have evolved without a single indel in their sequence.

I have added to the alignment as a reference and fallback sequence the APSES domain of XP_001224558, the APSES domain of the Chaetomium globosum Mbp1 orthologue (MBP1_CHAGL).

1MB1            NQIYSARYSGVDVYEFIHSTG---SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEV
MBP1_CANGL      NQIYSAKYSGVDVYEFIHPTG---SIMKRKNDGWVNATHILKAANFAKAKRTRILEKEV
MBP1_EREGO      TQIYSAKYSGVEVYEFLHPTG---SIMKRKADDWVNATHILKAAKFAKAKRTRILEKEV
MBP1_KLULA      NQIYSAKYSGVDVYEFIHPTG---SIMKRKADNWVNATHILKAAKFPKAKRTRILEKEV
MBP1_CANAL      SQIYSATYSNVPAFEFVTSEG---PIMRRKKDSWINATHILKIAKFPKAKRTRILEKDV
MBP1_DEBHA      TQIYSATYSNVPVFEFVTLEG---PIMRRKLDSWINATHILKIAKFPKAKRTRILEKDV
MBP1_YARLI      MSIYKATYSGVPVYEFQCKNV---AVMRRKSDGWVNATHILKVAGFDKPQRTRILEKEV
MBP1_SCHPO      SAVHVAVYSGVEVYECFIKGV---SVMRRRRDSWLNATQILKVADFDKPQRTRVLERQV
MBP1_USTMA      KTIFKATYSGVPVYECIINNV---AVMRRRSDDWLNATQILKVVGLDKPQRTRVLEREI
MBP1_ASPNI      SNVYSATYSSVPVYEFKIGTD---SVMRRRSDDWINATHILKVAGFDKPARTRILEREV
MBP1_ASPTE      SKIYSATYSSVPVYEFKIEGD---SVMRRRADDWINATHILKVAGFDKPARTRILEREV
MBP1_CRYNE      PKVYASVYSGVPVFEAMIRGI---SVMRRASDSWVNATQILKVAGVHKSARTKILEKEV
MBP1_GIBZE      G-IYSASYSGVDVYEMEVNNI---AVMRRRNDSWLNATQILKVAGVDKGKRTKILEKEI
MBP1_NEUCR      IYSLQATYSGVGVYEMEVNNV---AVMRRQKDGWVNATQILKVANIDKGRRTKILEKEI
MBP1_MAGGR      P-IYTAVYSNVEVYEFEVNGV---AVMKRIGDSKLNATQILKVAGVEKGKRTKILEKEI
MBP1_ASPFU      PQIYKAVYSNVSVYEMEVNGV---AVMKRRSDSWLNATQILKVAGVVKARRTKTLEKEI
MBP1_CHAGL      AGIYSATYSGIPVYEYQFGPDMKEHVMRRREDNWINATHILKAAGFDKPARTRILERDV

1MB1            LKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLF
MBP1_CANGL      LKEMHEKVQGGFGKYQGTWVPLNIAINLAEKFDVYQDLKPLF
MBP1_EREGO      IKDTHEKVQGGFGKYQGTWVPLDIARRLAQKFEVLEELRPLF
MBP1_KLULA      ITDTHEKVQGGFGKYQGTWIPLELASKLAEKFEVLDELKPLF
MBP1_CANAL      QTGIHEKVQGGYGKYQGTYVPLDLGAAIARNFGVYDVLKPIF
MBP1_DEBHA      QTGVHEKVQGGYGKYQGTYVPLDLGADIAKNFGVFDSLRPIF
MBP1_YARLI      QKGVHEKVQGGYGKYQGTWVPLERAREIATLYDVDSHLAPIF
MBP1_SCHPO      QIGAHEKVQGGYGKYQGTWVPFQRGVDLATKYKVDGIMSPIL
MBP1_USTMA      QKGIHEKVQGGYGKYQGTWIPLDVAIELAERYNIQGLLQPIT
MBP1_ASPNI      QKGVHEKVQGGYGKYQGTWIPLQEGRQLAERNNILDKLLPIF
MBP1_ASPTE      QKGVHEKVQGGYGKYQGTWIPLPEGRLLAERNNIIDKLRPIF
MBP1_CRYNE      LNGIHEKIQGGYGKYQGTWVPLDRGRDLAEQYGVGSYLSSVF
MBP1_GIBZE      QTGEHEKVQGGYGKYQGTWIKFERGLQVCRQYGVEELLRPLL
MBP1_NEUCR      QIGEHEKVQGGYGKYQGTWIPFERGLEVCRQYGVEELLSKLL
MBP1_MAGGR      QTGEHEKVQGGYGKYQGTWIKYERALEVCRQYGVEELLRPLL
MBP1_ASPFU      AAGEHEKVQGGYGKYQGTWVNYQRGVELCREYHVEELLRPLL
MBP1_CHAGL      QKDVHEKIQGGYGKYQGTWIPLEQGRALAQRNNIYDRLRPIF

 

  • For your organism's Mbp1 sequence, define the start- and end- sequence numbers of the target sequence aligned above relative to the full-length protein. (You can easily access the full-length protein sequence at the NCBI through the links in the RefSeq column table of Assignment 3 Prepare a FASTA formatted file for the target sequence in your organism, giving it an appropriate header and include the sequence numbers. Refer to the Fallback data file if you are not sure about the format.

 

Instruction

 

  • Task.

   

(2) Homology model

   

SUB section Heading (X marks)

 

Instruction

 

  • Task

 

Instruction

 

  • Task.

   


(3) Model analysis

   

SUB section Heading (X marks)

 

Instruction

 

  • Task

 

Instruction

 

  • Task.

   

(3) Summary of Resources

 

Links
Alignments
Mbp1 proteins:
APSES domains:
Trees


   

[End of assignment]

If you have any questions at all, don't hesitate to mail me at boris.steipe@utoronto.ca or post your question to the Course Mailing List