Difference between revisions of "BIO Assignment Week 8"

From "A B C"
Jump to navigation Jump to search
m
m
Line 2: Line 2:
 
<div class="b1">
 
<div class="b1">
 
Assignment for Week 7<br />
 
Assignment for Week 7<br />
<span style="font-size: 70%">Multiple Sequence Alignment</span>
+
<span style="font-size: 70%">Predictions</span>
 
</div>
 
</div>
 
<table style="width:100%;"><tr>
 
<table style="width:100%;"><tr>
Line 15: Line 15:
  
 
__TOC__
 
__TOC__
 
  
 
==Introduction==
 
==Introduction==
  
In the last assignment we discovered homologs to ''S. cerevisiae'' Mbp1 in YFO. Some of these will be orthologs to Mbp1, some will be paralogs. Some will have similar function, some will not. We discussed previously that genes that evolve under continuously similar evolutionary pressure should be most similar in sequence, and should have the most similar "function".
 
 
In this assignment we will define the YFO gene that is the most similar ortholog to ''S. cerevisiae'' Mbp1, and perform a multiple sequence alignment with it.
 
 
Let us briefly review the basic concepts.
 
  
==Orthologs and Paralogs revisited==
+
<div style="padding: 15px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
 
+
;How could the search for ultimate truth have revealed so hideous and visceral-looking an object?
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
+
::''<small>Max Perutz (on his first glimpse of the Hemoglobin structure)</small>''
 
 
&nbsp;<br>
 
;All related genes are homologs.
 
 
</div>
 
</div>
 
+
&nbsp;
 
 
Two central definitions about the mutual relationships between related genes go back to Walter Fitch who stated them in the 1970s:
 
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
 
 
 
&nbsp;<br>
 
;Orthologs have diverged after speciation.
 
 
 
;Paralogs have diverged after duplication.
 
</div>
 
 
 
 
 
 
&nbsp;
 
&nbsp;
  
 +
Where is the hidden beauty in structure, and where, the "ultimate truth"? In the previous assignments we have studied sequence conservation in APSES family domains and we have discovered homologues in all fungal species. This is an ancient protein family that had already duplicated to several paralogues at the time the cenancestor of all fungi lived, more than 600,000,000 years ago, in the [http://www.ucmp.berkeley.edu/fungi/fungifr.html Vendian period] of the Proterozoic era of Precambrian times.
  
[[Image:OrthologParalog.jpg|frame|none|'''Hypothetical evolutionary tree.''' A single gene evolves through two speciation events and one duplication event. A duplication occurs during the evolution from reptilian to synapsid. It is easy to see how this pair of genes (paralogs) in the ancestral synapsid gives rise to two pairs of genes in pig and elephant, respectively. All ''circle'' genes are mutually orthologs, they form a "cluster of orthologs". All genes within one species are mutual paralogs&ndash;they are so called ''in-paralogs''. The ''circle'' gene in pig and the ''triangle'' gene in the elephant are so-called ''out-paralogs''. Somewhat counterintuitively, the ''triangle'' gene in the pig and the ''circle'' gene in the raven are also orthologs - but this has to be, since the last common ancestor diverged by '''speciation'''.
+
In order to understand how specific residues in the sequence contribute to the putative function of the protein, and why and how they are conserved throughout evolution, we would need to study an explicit molecular model of an APSES domain protein, bound to its cognate DNA sequence. Explanations of a protein's observed properties and functions can't rely on the general fact that it binds DNA, we need to consider details in terms of specific residues and their spatial arrangement. In particular, it would be interesting to correlate the conservation patterns of key residues with their potential to make specific DNA binding interactions. Unfortunately, no APSES domain structures in complex with bound DNA has been solved up to now, and the experimental evidence we have considered in Assignment 2 ([http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=10747782 Taylor ''et al.'', 2000]) is not sufficient to unambiguously define the details of how a DNA double helix might be bound. Moreover, at least two distinct modes of DNA binding are known for proteins of the winged-helix superfamily, of which the APSES domain is a member.
 
 
The "phylogram" on the right symbolizes the amount of evolutionary change as proportional to height difference to the "root". It is easy to see how a bidirectional BLAST search will only find pairs of most similar orthologs. If applied to a group of species, bidirectional BLAST searches will find clusters of orthologs only (except if genes were lost, or there are  anomalies in the evolutionary rate.)]]
 
 
 
  
==Defining orthologs==
+
In this and the following assignment you will (1) construct a molecular model of the APSES domain from the Mbp1 orthologue in your assigned species, (2) identify similar structures of distantly related domains for which protein-DNA complexes are known, (3) assemble a hypothetical complex structure and (4) consider whether the available evidence allows you to distinguish between different modes of ligand binding.
  
To be reasonably certain about orthology relationships, we would need to construct and analyze detailed evolutionary trees. This is computationally expensive and the results are not always unambiguous either, as we will see in a later assignment. But a number of different strategies are available that use precomputed results to define orthologs. These are especially useful for large, cross genome surveys. They are less useful for detailed analysis of individual genes. Pay the sites a visit and try a search.
+
For the following, please remember the following terminology:
  
 
+
;Target
;Orthologs by eggNOG
+
:The protein that you are planning to model.
:The [http://eggnog.embl.de/ '''eggNOG'''] (evolutionary genealogy of genes: Non-supervised Orthologous Groups) database contains orthologous groups of genes at the EMBL. It seems to be continuously updtaed, the search functionality is reasonable and the results for yeast Mbp1 show many genes from several fungi. Importantly, there is only one gene annotated for each species. Alignments and trees are also available, as are database downloads for algorithmic analysis.
+
;Template
<div class="mw-collapsible mw-collapsed" data-expandtext="more..." data-collapsetext="less" style="width:800px">
+
:The protein whose structure you are using as a guide to build the model.
 +
;Model
 +
:The structure that results from the modeling process. It has the '''Target sequence''' and is similar to the '''Template structure'''.
 
&nbsp;
 
&nbsp;
<div class="mw-collapsible-content">
 
 
{{#pmid: 24297252}}
 
  
</div>
+
A brief overview article on the construction and use of homology models is linked to the resource section at the bottom of this page. That section also contains links to other sites and resources you might find useful or interesting.
</div>
 
  
  
;Orthologs at OrthoDB
 
:[http://www.orthodb.org/ '''OrthoDB'''] includes a large number of species, among them all of our protein-sequenced fungi. However the search function (by keyword) retrieves many paralogs together with the orthologs, for example, the yeast Soc2 and Phd1 proteins are found in the same orthologous group these two are clearly paralogs.
 
<div class="mw-collapsible mw-collapsed" data-expandtext="more..." data-collapsetext="less" style="width:800px">
 
 
&nbsp;
 
&nbsp;
<div class="mw-collapsible-content">
+
==Warm-up: a minimal change==
+
Minimal changes to structure models can be done directly in Chimera. This illustrates the principle of full-scale modeling quite nicely. For an example, let us consider the residue <code>A&nbsp;42</code> of the 1BM8 structure. It is oriented twards the core of the protein, but most other Mbp1 orthologs have a larger amino acid in this position, <code>V</code>, or even <code>I</code>.
{{#pmid: 23180791}}
 
  
</div>
+
{{task|1=
</div>
+
# Open <code>1BM8</code> in Chimera, hide the ribbons and show all atoms as a stick model.
 +
# Color the protein white.
 +
# Open the sequence window and select <code>A&nbsp;42</code>. Color it red. Choose '''Actions&nbsp;&rarr;&nbsp;Set pivot'''. Then study how nicely the alanine sidechain fits into the cavity formed by its surrounding residues.
 +
# To emphasize this better, hide the solvent molecules and select only the protein atoms. Display them as a '''sphere''' model to better appreciate the packing, i.e. the Van der Waals contacts we discussed in class. Use the '''Favorites&nbsp;&rarr;&nbsp;Side view''' panel to move the clipping plane and see a section through the protein. Study the packing, in particular, note that the additional methyl groups of a valine or isoleucine would not have enough space in the structure. Then restore the clipping planes so you can see the whole molecule.
 +
# Lets simplify the view: choose '''Actions &rarr; Atoms/Bonds &rarr; backbone&nbsp;only &rarr; chain&nbsp;trace'''. Then select <code>A&nbsp;42</code> again in the sequence window and choose '''Actions &rarr; Atoms/Bonds &rarr; show'''.
 +
# Add the surrounding residues: choose '''Select &rarr; Zone...'''. In the window, see that the box is checked that selects all atoms at a distance of less then 5&Aring; to the current selection, and check the lower box to select the whole residue of any atom that matches the distance cutoff criterion. Click '''OK''' and choose '''Actions &rarr; Atoms/Bonds &rarr; show'''.
 +
#Select <code>A&nbsp;42</code> again: '''left-click''' (control click) on any atom of the alanine to select the atom, then '''up-arrow''' to select the entire residue. Now let's mutate this residue to isoleucine.
 +
#Choose '''Tools &rarr; Structure&nbsp;Editing &rarr; Rotamers''' and select <code>ILE</code> as the rotamer type. Click '''OK''', a window will pop up that shows you the possible rotamers for isoleucine together with their database-derived probabilities; you can select them in the window and cycle through them with your arrow keys. But note that the probabilities are '''very''' different - and thus show you high-energy and low-energy rotamers to choose from. Therefore, unless you have compelling reasons to do otherwise, try to find the highest-probability rotamer that may fit. This is where your stereo viewing practice becomes important, if not essential. It is really, really hard to do this reasonably in a 2D image! It becomes quite obvious in 3D. Btw: I find such "quantitative" work - where the real distances are important - easier in '''orthographic''' than in '''perspective''' view (cf. the '''Camera''' panel).
 +
#I find that the first rotamer is actually not such a bad fit. The <code>CD</code> atom comes close to the sidechains of <code>I&nbsp;25</code> and <code>L&nbsp;96</code>. But we can assume that these are somewhat mobile and can accommodate a denser packing, because - as you can easily verify in your Jalview alignment - it is '''NOT''' the case that sequences that have <code>I&nbsp;42</code>, have a smaller residue in position <code>25</code> and/or <code>96</code>. So let's accept the most frequent <code>ILE</code> rotamer by selecting it in the rotamer window and clicking '''OK''' (while '''existing side chain(s): replace''' is selected).
 +
#Done.
 +
}}
  
 +
If you want to go over this in more detail, check the video tutorial on YouTube published by the NIAID bioinformatics group [http://www.youtube.com/watch?v=bcXMexN6hjY '''here''']. I would also encourage you to go over [http://www.youtube.com/watch?v=eJkrvr-xeXY '''Part 2 of the video tutorial'''] that discusses how to check for and resolve (by energy minimization) steric clashes. But do remember that it is not clear whether energy minimization will make your structure more correct in the sense of a smaller overall RMSD with the real, mutated protein.
  
;Orthologs at OMA
+
What we have done here with one residue is exactly the way homology modeling works with entire sequences. Let's now build a homology model for YFO Mbp1.
[http://omabrowser.org/ '''OMA'''] (the Orthologous Matrix) maintained at the Swiss Federal Institute of Technology contains a large number of orthologs from sequenced genomes. Searching with <code>MBP1_YEAST</code> (this is the Swissprot ID) as a "Group" search finds the correct gene in EREGO, KLULA, CANGL and SACCE. But searching with the sequence of the ''Ustilago maydis'' ortholog does not find the yeast protein, but the orthologs in YARLI, SCHPO, LACCBI, CRYNE and USTMA. Apparently the orthologous group has been split into several subgroups across the fungi. However as a whole the database is carefully constructed and available for download and API access; a large and useful resource.
 
<div class="mw-collapsible mw-collapsed" data-expandtext="more..." data-collapsetext="less" style="width:800px">
 
&nbsp;
 
<div class="mw-collapsible-content">
 
 
{{#pmid: 21113020}}
 
  
... see also the related articles, much innovative and carefully done work on automated orthologue definition by the Dessimoz group.
+
==Preparation==
</div>
 
</div>
 
  
 +
===Target sequence===
 +
The first step of homology modelling is to determine which sequence to model. We have determined the putative orthologue with conserved function in YFO by reciprocal best match with ''saccharomyces cervisiae'' Mbp1. Your sequence was initially found with an APSES domain search in YFO and the alignments with the yeast sequence are straightforward for the most part.
  
;Orthologs by syntenic gene order conservation
+
There are two  exceptions however: the alignment of '''ASPFU''' gene XP_754232 and the '''CAPCO''' gene XP_007722875 both are missing part of the domin's N-terminus. This is odd, because this may imply the APSES domain of these genes might not be properly folded. When such surprising results of alignement occurr,  you '''must''' consider whether there could be an error in the published sequence, perhaps stemming from an erroneous gene model. This is not absolutely germane to this assignment, so I have placed the process into the collapsible section below - optional reading. However it may be useful for you to understand what the issue is here and how to address it.
:We will revisit this when we explore the UCSC genome browser.
 
  
 +
<div class="mw-collapsible mw-collapsed" data-expandtext="Expand to read about gene model correction" data-collapsetext="Collapse">
 +
;Correcting the ASPFU Mbp1 gene model.
  
;Orthologs by RBM
 
:Defining it yourself. RBM (or: Reciprocal Best Match) is easy to compute and half of the work you have already done in [[BIO_Assignment_Week_3|Assignment 3]]. Get the ID for the gene which you have identified and annotated as the best BLAST match for Mbp1 in YFO and confirm that this gene has Mbp1 as the most significant hit in the yeast proteome. <small>The results are unambiguous, but there may be residual doubt whether these two best-matching sequences are actually the most similar orthologs.</small>
 
  
{{task|1=
+
<div class="mw-collapsible-content">
# Navigate to the BLAST homepage.
+
An alignment of APSES domain sequence shows the shortened N-terminus of the ASPFU and the CAPCOprotein, relative to SACCE and e.g. the closely related ''aspergillus nidulans'', ASPNI:
# Paste the YFO RefSeq sequence identifier into the search field. (You don't have to search with sequences&ndash;you can search directly with an NCBI identifier '''IF''' you want to search with the full-length sequence.)
+
APSES domains:
# Set the database to refseq, and restrict the species to ''Saccharomyces cerevisiae''.
+
Mbp1_SACCE  QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAA...
# Run BLAST.
+
Mbp1_ASPNI  NVYSATYSSVPVYEFKIGTDSVMRRRSDDWINATHILKVA...
# Keep the window open for the next task.
+
Mbp1_ASPFU  ----------------------MRRRGDDWINATHILKVA...
 +
Mbp1_CAPCO  ----------------------MRRRSDDWVNATHILKVA...
  
The top hit should be yeast Mbp1 (NP_010227). E mail me your sequence identifiers if it is not.
+
We analyse this for the ASPFU gene.
If it is, you have confirmed the '''RBM''' or '''BBM''' criterion (Reciprocal Best Match or Bidirectional Best Hit, respectively).
 
  
<small>Technically, this is not perfectly true since you have searched with the APSES domain in one direction, with the full-length sequence in the other. For this task I wanted you to try the ''search-with-accession-number''. Therefore the procedural laxness, I hope it is permissible. In fact, performing the reverse search with the YFO APSES domain should actually be more stringent, i.e. if you find the right gene with the longer sequence, you are even more likely to find the right gene with the shorter one.</small>
+
Working from the possibility that this may be a gene model error - e.g. a false translational start, a frameshift due to a sequencing error, or an erroneously modelled intron, we check whether the translation of the genomic sequence supports the presence of the expected amino acids. This is easily done running TBLASTN - BLASTing the protein query against the six reading frames of the ASPFU genome. We find the following:
}}
 
  
  
;Orthology by annotation
+
Aspergillus fumigatus Af293 chromosome 3, whole genome shotgun sequence
:The NCBI precomputes BLAST results and makes them available at the RefSeq database entry for your protein.
+
Sequence ID: ref|NC_007196.1|Length: 4079167Number of Matches: 2
 +
[...]
 +
Query  10      VDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILE ...
 +
                V VYEF    S+M+R+ DDW+NATHILK A F K  RTRILE ...
 +
Sbjct  3691193  VPVYEFKVDGESVMRRRGDDWINATHILKVAGFDKPARTRILE ...
  
{{task|1=
+
Indeed, there is sequence upstream of the gene's published translation start that matches well with our query! But where is the correct translation start? For that we need to look at the actual nucleotide sequence and translate it. Remember: BLAST is a '''local''' sequence alignment algorithm and it won't retrieve everything that matches to our query, just the best matching segment. ASPFU chromosome 3 is over 4 megabases large, so let us try to obtain only the region we are actually interested in: downstream of bases 3691193, lets say 3691100 (make sure this offset is divisible by three, to stay in the same reading frame) and upstream to, say, 3691372.
# In your BLAST result page, click on the RefSeq link for your query to navigate to the RefSeq database entry for your protein.
 
# Follow the '''Blink''' link in the right-hand column under '''Related information'''.
 
# Restrict the view RefSeq under the "Display options" and to Fungi.  
 
  
You should see a number of genes with low E-values and high coverage in other fungi - however this search is problematic since the full length gene across the database finds mostly Ankyrin domains.
+
#At the [http://www.ncbi.nlm.nih.gov/genome/browse/ '''NCBI genome project site'''] we search for ''aspergillus fumigatus''.
}}
+
#At the [http://www.ncbi.nlm.nih.gov/genome/18 '''''aspergillus fumigatus''''' '''genome project site'''] we click on chromosome 3 to access the map viewer.
 +
#Hovering over the ''Download/View sequence'' link shows us how an URL to access sequence data is structured:
 +
<nowiki>http://www.ncbi.nlm.nih.gov/projects/mapview/seq_reg.cgi?taxid=746128&chr=3&from=1&to=4079167</nowiki>
 +
:We can easily adapt this to the sequence range we need ...
 +
<ol start="4">
 +
<li>... and follow: http://www.ncbi.nlm.nih.gov/nuccore/NC_007196.1?from=3691003&to=3691243&report=fasta to yield:
 +
</ol>
 +
>gi|71025130:3691003-3691243 Aspergillus fumigatus Af293 chromosome 3, whole genome shotgun sequence
 +
ACGGTTTGCGGAGACGGGCATTATGGCGGCGGTGGATTTCTCAAAAATCTATTCTGCTACATACAGCAGC
 +
GTAAGTCTCTTCTAATTGCGTATCTCTGTTTTCCCTACAGCCTCAAATTTTCCCCAATGCCTCTTTCCAT
 +
CCATTTTGCCCCTTCCTTCGCCGCGAAGCCAATCTAACGCAGTTCAATAGGTTCCAGTTTACGAGTTCAA
 +
AGTCGATGGCGAAAGTGTTATGCGCCGACGA
  
  
You will find that '''all''' of these approaches yield '''some''' of the orthologs. But none finds them all. The take home message is: precomputed results are good for large-scale survey-type investigations, where you can't humanly process the information by hand. But for more detailed questions, careful manual searches are still indsipensable.
+
<ol start="5">
 +
<li>To translate this, we navigate to any of the [http://bips.u-strasbg.fr/EMBOSS/ '''EMBOSS''' tools servers] and use "remap" - we want to see the translation matched to the nucleotide sequence. We turn restriction sites off, translate all three forward frames and paste and manually align the SACCE Mbp1 sequence into the output to see what we expect and what we got. I have selected only the frame(s) that actually give a match, and I have pasted the homologous CAPCO and SACCE sequences (lower case) to demonstrate their similarity:
 +
</ol>
 +
ASPFU    ACGGTTTGCGGAGACGGGCATTATGGCGGCGGTGGATTTCTCAAAAATCTATTCTGCTACATACAGCAGC
 +
                                                                       
 +
ASPFU      R  F  A  E  T  G  I  M  A  A  V  D  F  S  K  I  Y  S  A  T  Y  S  S 
 +
CAPCO                          m  -  a  f  d  -  k  e  i  y  s  a  t  y  s  n 
 +
SACCE                          m  s  -  -  -  -  n  q  i  y  s  a  r  y  s  g
 +
 +
         
 +
ASPFU    GTAAGTCTCTTCTAATTGCGTATCTCTGTTTTCCCTACAGCCTCAAATTTTCCCCAATGCCTCTTTCCAT
 +
 
 +
ASPFU    V  S  L  F  *  ...
 +
CAPCO    v  a  -  -    ...
 +
SACCE    v  d  - -     ...
 +
         
 +
ASPFU    CCATTTTGCCCCTTCCTTCGCCGCGAAGCCAATCTAACGCAGTTCAATAGGTTCCAGTTTACGAGTTCAA
 +
                                                              ...  V  Y  E  F  K
 +
CAPCO                                                        ...  v  y  e  l  k
 +
SACCE                                                        ...  v  y  e  f  i
 +
         
 +
ASPFU      AGTCGATGGCGAAAGTGTTATGCGCCGACGAGGCGATGATTGGATCAATGCTACACATATTCTTAAA
 +
 +
ASPFU      V  D  G  E  S  V  M  R  R  R  G  D  D  W  I  N  A  T  H  I  L  K ...
 +
CAPCO      v  a  g  d  h  i  m  r  r  r  s  d  d  w  v  n  a  t  h  i  l  k ...
 +
SACCE      h  s  t  g  s  i  m  k  r  k  k  d  d  w  v  n  a  t h  i  l  k ...
  
<div class="mw-collapsible mw-collapsed" data-expandtext="Expand for crowdsourcing" data-collapsetext="Collapse">
 
;Orthology by crowdsourcing
 
:Luckily a crowd of willing hands has prepared the necessary sequences for you: in the section below you will find a link to the annotated and verified Mbp1 orthologs from last year's course  :-)
 
  
<div class="mw-collapsible-content">
+
:This clearly shows us that there is N-terminal sequence that ought to be added to the gene model, upstream of the reported translational start of <tt>MRRR...</tt>. The sequences thus most likely begin as follows:
We could call this annotation by many hands {{WP|Crowdsourcing|"crowdsourcing"}} - handing out small parcels of work to many workers, who would typically allocate only a small share of their time, but here the strength is in numbers and especially projects that organize via the Internet can tally up very impressive manpower, for free, or as {{WP|Microwork}}. These developments have some interest for bioinformatics: many of our more difficult tasks  can not be easily built into an algorithm, language related tasks such as text-mining, or pattern matching tasks come to mind. Allocating this to a large number of human contributors may be a viable alternative to computation. A marketplace where this kind of work is already a reality is {{WP|Amazon Mechanical Turk|Amazon's "Mechanical Turk" Marketplace}}: programmers&ndash;"requesters"&ndash; use an open interface to post tasks for payment, "providers" from all over the world can engage in these. Tasks may include matching of pictures, or evaluating the aesthetics of competing designs. A quirky example I came across recently was when information designer David McCandless had 200 "Mechanical Turks" draw a small picture of their soul for his collection.
 
  
The name {{WP|The Turk|"Mechanical Turk"}} by the way relates to a famous ruse, when a Hungarian inventor and adventurer toured the imperial courts of 18<sup>th</sup> century Europe with an automaton, dressed in turkish robes and turban, that played chess at the grandmaster level against opponents that included Napoleon Bonaparte and Benjamin Franklin. No small mechanical feat in any case, it was only in the 19<sup>th</sup> century that it was revealed that the computational power was actually provided by a concealed human.  
+
ASPFU  MAAVDFSKIYSATYSSVSLFVYEFKVDGE-----SVMRRRGDDWINATHILK...
 +
CAPCO  ma-fd-keiysatysnva--vyelkvagd-----himrrrsddwvnathilk...
 +
SACCE  ms----nqiysarysgvd--ysgvdvyefihstgsimkrkkddwvnathilk...
  
Are you up for some "Turking"? Before the next quiz, edit [http://biochemistry.utoronto.ca/steipe/abc/students/index.php/BCH441_2014_Assignment_7_RBM '''the Mbp1 RBM page on the Student Wiki] and include the RBM for Mbp1, for a 10% bonus on the next quiz.
+
The fact that the truncated N-terminus appears in both closely '''related''' genes and species suggests that what we see here is a mis-annotated intron. The take-home lesson is: if your retrieved protein sequence does not conform to your expectations, it may be worthwhile to follow up with the actual nucleotide sequence.
  
 
</div>
 
</div>
Line 146: Line 156:
 
&nbsp;
 
&nbsp;
  
==Align and Annotate==
+
===Template choice and template sequence===
 
 
  
&nbsp;<br>
 
  
 +
The [http://swissmodel.expasy.org/ SWISS-MODEL] server provides several different options for constructing homology models. The easiest option requires only a target sequence as input. In this mode the program will automatically choose suitable templates and create an input alignment. I would argue however that that is not the best way to use such a service: template choice and alignment both may be significantly influenced by biochemical reasoning, and an automated algorithm cannot make the necessary decisions. Should you use a structure of reduced resolution that however has a ligand bound? Should you move an indel from an active site to a loop region even though the sequence similarity score might be less? Questions like that may yield answers that are counter to the best choices an automated algorithm could make. But Swiss Model is flexible and allows us to upload an explicit alignment between target and template. Please note: the model you will produce is "easy" - the sequence similarity is high and there are no indels to consider, the automated mode would have done just as well. But the strategy we pursue here is suitable also for much more difficult problems. The automated strategy probably is not.
  
===Review of domain annotations===
+
Template choice is the first step. Often more than one related structure can be found in the PDB. We have touched on principles of selecting template structures in the lectures; please refer to the [[Template_choice_principles|template choice principles]] page on this Wiki where I have reviewed the principles and discussed more details and alternatives. One can either search the PDB itself through its '''Advanced Search''' interface; for example one can search for sequence similarity with a BLAST search, or search for structural similarity by accessing structures according to their CATH or SCOP classification. But the BLAST search is probably the method of choice: after all, the most important measure of the probability of success for homology modeling is sequence similarity.
  
 +
In [[BIO_Assignment_Week_3#Search_input|Assignment 3]], you have defined the extent of the APSES domain in yeast Mbp1. In [[BIO_Assignment_Week_6|Assignment 6]], you have used PSI-BLAST to search for APSES domains in YFO. In [[BIO_Assignment_Week_7|Assignment 7]] you have confirmed by ''Reciprocal Best Match'' which of these APSES domain sequences is the closest related orthologue to yeast Mbp1. This sequence is the best candidate for having a conserved function similar to yeast Mbp1. Therefore, this sequence is the one you will model: it is called the '''target''' for the homology modeling procedure. In the same assignment you have also computed a multiple sequence alignment that includes the sequence of  Mbp1 with YFO.
  
APSES domains are relatively easy to identify and annotate but we have had problems with the ankyrin domains in Mbp1 homologues. Both CDD as well as SMART have identified such domains, but while the domain model was based on the same Pfam profile for both, and both annotated approximately the same regions, the details of the alignments and the extent of the predicted region was different.
+
Defining a '''template''' means finding a PDB coordinate set that has sufficient sequence similarity to your '''target''' that you can build a model based on that '''template'''. In  [[BIO_Assignment_Week_2#Structure_search|Assignment 2]] you have used a keyword search at the PDB to find "Mbp1" structures - but some of these structures were not homologs: keyword searches are notoriously unreliable. To find suitable PDB structures, we will perform a BLAST search at the PDB instead.
  
[http://www.yeastgenome.org/cgi-bin/locus.fpl?locus=mbp1 Mbp1] forms heterodimeric complexes with a homologue, [http://www.yeastgenome.org/cgi-bin/locus.fpl?locus=swi6 Swi6]. Swi6 does not have an APSES domain, thus it does not bind DNA. But it is similar to Mbp1 in the region spanning the ankyrin domains and in 1999 [http://www.ncbi.nlm.nih.gov/pubmed/10048928 Foord ''et al.''] published its crystal structure ([http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1SW6 1SW6]). This structure is a good model for Ankyrin repeats in Mbp1. For details, please refer to the consolidated [[Reference annotation yeast Mbp1|Mbp1 annotation page]] I have prepared.
 
  
In what follows, we will use the program JALVIEW - a Java based multiple sequence alignment editor to load and align sequences and to consider structural similarity between yeast Mbp1 and its closest homologue in your organism.
+
<!-- NOTE TO SELF: use the following sequence to test the procedure
 +
>Mbp1_SCHPO/2-100 NP_593032
 +
AVHVAVYSGVEVYECFIKGVSVMRRRRDSWLNATQILKVADFDKPQRTRVLERQVQIGAHEKVQGGYGKYQG
 +
TWVPFQRGVDLATKYKVDGIMSPILSL
 +
>1BM8_A
 +
QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQG
 +
TWVPLNIAKQLAEKFSVYDQLKPLFDF
 +
-->
  
In this part of the assignment,
 
  
#You will load sequences that are most similar to Mbp1 into an MSA editor;
 
#You will add sequences of ankyrin domain models;
 
#You will perform a multiple sequence alignment;
 
#You will try to improve the alignment manually;
 
<!-- Finally you will consider if the Mbp1 APSES domains could extend beyond the section of homology with Swi6 -->
 
  
 
===Jalview, loading sequences===
 
 
 
Geoff Barton's lab in Dundee has developed an integrated MSA editor and sequence annotation workbench with a number of very useful functions. It is written in Java and should run on Mac, Linux and Windows platforms without modifications.
 
 
 
{{#pmid: 19151095}}
 
 
 
We will use this tool for this assignment and explore its features as we go along.
 
  
 
{{task|1=
 
{{task|1=
#Navigate to the [http://www.jalview.org/ Jalview homepage] click on '''Download''', install Jalview on your computer and start it. A number of windows that showcase the program's abilities will load, you can close these.
+
# Retrieve your YFO Mbp1-like APSES domain sequence. You can find the domain boundaries for the yeast protein in the [[Reference annotation yeast Mbp1|Mbp1 annotation reference page]], and you can get the aligned sequence from your Jalview alignment, or simply recompute it with the <code>needle</code> program of the EMBOSS suite. This YFO sequence is your '''target''' sequence.
#Prepare homologous Mbp1 sequences for alignment:
+
# Navigate to the [http://www.pdb.org/pdb/home/home.do PDB].
##Open the '''[[Reference Mbp1 orthologues (all fungi)]]''' page. (This is the list of Mbp1 orthologs I mentioned above.)
+
# Click on '''Advanced''' to enter the advanced search interface.
##Copy the FASTA sequences of the reference proteins, paste them into a text file (TextEdit on the Mac, Notepad on Windows) and save the file; you could give it an extension of <code>.fa</code>&ndash;but you don't have to.
+
# Open the menu to '''Choose a Query Type:'''
##Check whether the sequence for YFO is included in the list. If it is, fine. If it is not, retrieve it from NCBI, paste it into the file and edit the header like the other sequences. If the wrong sequence from YFO is included, replace it and let me know.
+
# Find the '''Sequence features''' section and choose '''Sequence (BLAST...)'''
#Return to Jalview and select File &rarr; Input Alignment &rarr; from File and open your file. A window with sequences should appear.
+
# Paste your '''target''' sequence into the '''Sequence''' field, select '''not''' to mask low-complexity regions and '''Submit Query'''. Since the E-value is set rather high by default, you will get a number of low-confidence hits as well as the actual homologs, these have very low E-values.
#Copy the sequences for ankyrin domain models (below), click on the Jalview window, select File &rarr; Add sequences &rarr; from Textbox and paste them into the Jalview textbox. Paste two separate copies of the CD00204 consensus sequence and one copy of 1SW6.
 
##When all the sequences are present, click on '''Add'''.  
 
 
 
Jalview now displays all the sequences, but of course this is not yet an alignment.
 
 
 
}}
 
  
;Ankyrin domain models
+
All hits that are homologs are potentially suitable '''templates''', but some are more suitable than others. Consider how the coordinate sets differ and which features would make each more or less suitable for creating a homology model: you should consider ...
>CD00204 ankyrin repeat consensus sequence from CDD
 
NARDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTPLHLAAKNGHLEIVKLLL
 
EKGADVNARDKDGNTPLHLAARNGNLDVVKLLLKHGADVNARDKDGRTPLHLAAKNGHL
 
  
>1SW6 from PDB - unstructured loops replaced with xxxx
+
:*sequence similarity to your target
GPIITFTHDLTSDFLSSPLKIMKALPSPVVNDNEQKMKLEAFLQRLLFxxxxSFDSLLQE
+
:*size of expected model (= length of alignment)
VNDAFPNTQLNLNIPVDEHGNTPLHWLTSIANLELVKHLVKHGSNRLYGDNMGESCLVKA
+
:*presence or absence of ligands
VKSVNNYDSGTFEALLDYLYPCLILEDSMNRTILHHIIITSGMTGCSAAAKYYLDILMGW
+
:*experimental method and quality of the data set
IVKKQNRPIQSGxxxxDSILENLDLKWIIANMLNAQDSNGDTCLNIAARLGNISIVDALL
 
DYGADPFIANKSGLRPVDFGAG
 
  
===Computing alignments===
+
Sequence similarity is the most important, but we can have the PDB tabulate the other features concisely for this task.
  
The EBI has a very convenient [http://www.ebi.ac.uk/Tools/msa/ page to access a number of MSA algorithms]. This is especially convenient when you want to compare, e.g. T-Coffee and Muscle and MAFFT results to see which regions of your alignment are robust. You could use any of these tools, just paste your sequences into a Webform, download the results and load into Jalview. Easy.
+
# There is a menu to create '''Reports:''' - select '''customizable table'''.
 +
# Select (at least) the following information items:
 +
;Structure Summary
 +
* Experimental Method
 +
;Sequence
 +
* Chain Length
 +
;Ligands
 +
* Ligand Name
 +
;Biological details
 +
* Macromolecule Name
 +
; refinement Details
 +
* Resolution
 +
* R Work
 +
* R free
 +
# click: '''Create report'''.
  
But even easier is to calculate the alignments directly from Jalview. available. (Not today. <small>Bummer.</small>)
+
Unfortunately you don't get the E-values into the report, and those should strongly influence your final decision. However in our case the sequences and therefore the E-values of the top three hits are all the same. Neither of the structures has a bound DNA ligand, but the experimental methods and structure quality are different. Two of the sequences have a longer chain-length ... but those are only disordered residues (otherwise these would be better suited templates; regrettably, you'd need to check that in the ''real world'', there is no automatic tool to evaluate disorder and its effects on template choice). In my opinion that leaves pretty much only one unambiguous choice: 1BM8. In case you don't agree, please let me know.
  
;Calculate a MAFFT alignment using the Jalview Web service option:
+
;Finally: Click on the 1BM8 ID to navigate to the structure page for the '''template''' and save the FASTA sequence to your computer. This is '''the template sequence'''.
  
{{task|1=
 
#In Jalview, select '''Web Service &rarr; Alignment &rarr; MAFFT with defaults...'''. The alignment is calculated in a few minutes and displayed in a new window.
 
 
}}
 
}}
  
;Calculate a MAFFT alignment when the Jalview Web service is NOT available:
 
  
{{task|1=
+
&nbsp;
#In Jalview, select '''File &rarr; Output to Textbox &rarr; FASTA'''
 
#Copy the sequences.
 
#Navigate to the [http://www.ebi.ac.uk/Tools/msa/mafft/ '''MAFFT Input form'''] at the EBI.
 
#Paste your sequences into the form.
 
#Click on '''Submit'''.
 
#Close the Jalview sequence window and either save your MAFFT alignment to file and load in Jalview, or simply ''''File &rarr; Input Alignment &rarr; from Textbox''', paste and click '''New Window'''.
 
}}
 
  
  
In any case, you should now have an alignment.
 
 
{{task|1=
 
#Choose '''Colour &rarr; Hydrophobicity''' and '''&rarr; by Conservation'''. Then adjust the slider left or right to see which columns are highly conserved. You will notice that the Swi6 sequence that was supposed to align only to the ankyrin domains was in fact aligned to other parts of the sequence as well. This is one part of the MSA that we will have to correct manually and a common problem when aligning sequences of different lengths.
 
}}
 
  
 +
===Sequence numbering===
  
  
 
&nbsp;
 
&nbsp;
  
===Editing ankyrin domain alignments===
+
It is not straightforward at all how to number sequence in such a project. A "natural" numbering starts with the start-codon of the full length protein and goes sequentially from there. However, this does not map exactly to other numbering schemes we have encountered. As you know the first residue of the APSES domain (as defined by CDD) is not Residue 1 of the Mbp1 protein. The first residue of the 1BM8 FASTA file <small>(one of the related PDB structures)</small> '''is''' the fourth residue of the Mbp1 protein. The first residue in the structure is GLN 3, therefore Q is the first residue in a FASTA sequence derived from the cordinate section of the PDB file (the <code>ATOM  </code> records. In the 1MB1 structure, the original N-terminal amino acids are present in the molecule, therefore they are present in the FASTA file which starts with <code>MSNQIY...</code>, but they are disordered in the structure and no coordinates are present for M and S. A sequence derived explicitly from the coordinates is therefore different from the reported FASTA sequence, which is really bad because that is what the modeling program has to work with  ... and so on. It can get complicated. You need to remember: a sequence number is not absolute, but assigned in a particular context and you need to be careful how to do this.
  
 +
Fortunately, the numbering for the residues in the coordinate section of our '''target''' structure corresponds not to its FASTA sequence, but to the numbering of the gene. Otherwise we would need to renumber the sequence <small>(e.g. by using the bio3D R package)</small>. If we would not do this, the sequence numbers in the model might not correspond to the sequence numbers of our target.
  
A '''good''' MSA comprises only columns of residues that play similar roles in the proteins' mechanism and/or that evolve in a comparable structural context. Since the alignment reflects the result of biological selection and conservation, it has relatively few indels and the indels it has are usually not placed into elements of secondary structure or into functional motifs. The contiguous features annotated for Mbp1 are expected to be left intact by a good alignment.
+
<!--
 +
BELOW IS NOT NECESSARY FOR THE 1BM8 TEMPLATE. ALSO extraction can be done with bio3D
  
A '''poor''' MSA has many errors in its columns; these contain residues that actually have different functions or structural roles, even though they may look similar according to a (pairwise!) scoring matrix. A poor MSA also may have introduced indels in biologically irrelevant positions, to maximize spurious sequence similarities. Some of the features annotated for Mbp1 will be disrupted in a poor alignment and residues that are conserved may be placed into different columns.
 
  
Often errors or inconsistencies are easy to spot, and manually editing an MSA is not generally frowned upon, even though this is not a strictly objective procedure. The main goal of manual editing is to make an alignment biologically more plausible. Most comonly this means to mimize the number of rare evolutionary events that the alignment suggests and/or to emphasize conservation of known functional motifs. Here are some examples for what one might aim for in manually editing an alignment:
+
The homology '''model''' will be based on an alignment of '''target''' and '''template'''. Thus we have to define the target sequence. As discussed in class, PDB files have an explicit  and an implied sequence and these do not necessarily have to be the same. To compare the implied and the explicit sequence for the template, you need to extract sequence information from coordinates. One way to do this is via the Web interface for [http://swift.cmbi.ru.nl/servers/html/index.html '''WhatIf'''], a crystallography and molecular modeling package that offers many useful tools for coordinate manipulation tasks.
  
;Reduce number of indels
 
From a Probcons alignment:
 
0447_DEBHA    ILKTE-K<span style="color: rgb(255, 0, 0);">-</span>T<span style="color: rgb(255, 0, 0);">---</span>K--SVVK      ILKTE----KTK---SVVK
 
9978_GIBZE    MLGLN<span style="color: rgb(255, 0, 0);">-</span>PGLKEIT--HSIT      MLGLNPGLKEIT---HSIT
 
1513_CANAL    ILKTE-K<span style="color: rgb(255, 0, 0);">-</span>I<span style="color: rgb(255, 0, 0);">---</span>K--NVVK      ILKTE----KIK---NVVK
 
6132_SCHPO    ELDDI-I<span style="color: rgb(255, 0, 0);">-</span>ESGDY--ENVD      ELDDI-IESGDY---ENVD
 
1244_ASPFU    ----N<span style="color: rgb(255, 0, 0);">-</span>PGLREIC--HSIT  -&gt;  ----NPGLREIC---HSIT
 
0925_USTMA    LVKTC<span style="color: rgb(255, 0, 0);">-</span>PALDPHI--TKLK      LVKTCPALDPHI---TKLK
 
2599_ASPTE    VLDAN<span style="color: rgb(255, 0, 0);">-</span>PGLREIS--HSIT      VLDANPGLREIS---HSIT
 
9773_DEBHA    LLESTPKQYHQHI--KRIR      LLESTPKQYHQHI--KRIR
 
0918_CANAL    LLESTPKEYQQYI--KRIR      LLESTPKEYQQYI--KRIR
 
  
<small>Gaps marked in red were moved. The sequence similarity in the alignment does not change considerably, however the total number of indels in this excerpt is reduced to 13 from the original 22</small>
+
*Navigate to the '''Administration''' sub-menu of the [http://swift.cmbi.ru.nl/servers/html/index.html WhatIf Web server]. Follow the link to '''Make sequence file from PDB file'''. Enter the PDB-ID of your template into the form field and '''Send''' the request to the server. The server accesses the PDB file and extracts sequence information directly from the <code>ATOM&nbsp;&nbsp;</code> records of the file. The results will be returned in PIR format. Copy the results, edit them to FASTA format and save them in a text-only file. Make sure you create a valid FASTA formatted file! Use this '''implied''' sequence to check if and how it differs from the sequence ...
  
 +
:*... listed in the <code>SEQRES</code> records of the coordinate file;
 +
:*... given in the FASTA sequence for the template, which is provided by the PDB;
 +
:*... stored in the protein database of the NCBI.
 +
: and record your results.
  
;Move indels to more plausible position
+
* Establish how the sequence numbers in the coordinate section of your template(*) correspond to your target sequence numbering.
From a CLUSTAL alignment:
 
4966_CANGL    MKHEKVQ------GGYGRFQ---GTW      MKHEKV<span style="color: rgb(0, 170, 0);">Q</span>------GGYGRFQ---GTW
 
1513_CANAL    KIKNVVK------VGSMNLK---GVW      KIKNVV<span style="color: rgb(0, 170, 0);">K</span>------VGSMNLK---GVW
 
6132_SCHPO    VDSKHP<span style="color: rgb(255, 0, 0);">-</span>----------<span style="color: rgb(255, 0, 0);">Q</span>ID---GVW  -&gt;  VDSKHP<span style="color: rgb(0, 170, 0);">Q</span>-----------ID---GVW
 
1244_ASPFU    EICHSIT------GGALAAQ---GYW      EICHSI<span style="color: rgb(0, 170, 0);">T</span>------GGALAAQ---GYW
 
  
<small>The two characters marked in red were swapped. This does not change the number of indels but places the "Q" into a a column in which it is more highly conserved (green). Progressive alignments are especially prone to this type of error.</small>
+
:(*) <small>These residue numbers are important, since they are referenced e.g. by VMD when you visualize the structure. The easiest way to list them is via the ''Sequence Viewer'' extension of VMD.</small>.
 +
:<small>Don't do this for every residue individually but define ranges. Look at the correspondence of the first and last residue of target and template sequence and take indels into account. Establishing sequence correspondence precisely is crucially important! For example, when a publication refers to a residue by its sequence number, you have to be able to relate that number to the residue numbers of the model as well as your target sequence.</small>.
 +
&nbsp;
 +
&nbsp;
  
;Conserve motifs
+
-->
From a CLUSTAL alignment:
 
6166_SCHPO      --DKR<span style="color: rgb(255, 0, 0);">V</span>A---<span style="color: rgb(255, 0, 0);">G</span>LWVPP      --DKR<span style="color: rgb(0, 255, 0);">V</span>A--<span style="color: rgb(0, 255, 0);">G</span>-LWVPP
 
XBP1_SACCE      GGYIK<span style="color: rgb(255, 0, 0);">I</span>Q---<span style="color: rgb(255, 0, 0);">G</span>TWLPM      GGYIK<span style="color: rgb(0, 255, 0);">I</span>Q--<span style="color: rgb(0, 255, 0);">G</span>-TWLPM
 
6355_ASPTE      --DE<span style="color: rgb(255, 0, 0);">I</span>A<span style="color: rgb(255, 0, 0);">G</span>---NVWISP  -&gt;  ---DE<span style="color: rgb(0, 255, 0);">I</span>A--<span style="color: rgb(0, 255, 0);">G</span>NVWISP
 
5262_KLULA      GGYIK<span style="color: rgb(255, 0, 0);">I</span>Q---<span style="color: rgb(255, 0, 0);">G</span>TWLPY      GGYIK<span style="color: rgb(0, 255, 0);">I</span>Q--<span style="color: rgb(0, 255, 0);">G</span>-TWLPY
 
  
<small>The first of the two residues marked in red is a conserved, solvent exposed hydrophobic residue that may mediate domain interactions. The second residue is the conserved glycine in a beta turn that cannot be mutated without structural disruption. Changing the position of a gap and insertion in one sequence improves the conservation of both motifs.</small>
 
  
 +
&nbsp;
  
The Ankyrin domains are quite highly diverged, the boundaries not well defined and not even CDD, SMART and SAS agree on the precise annotations. We expect there to be alignment errors in this region. Nevertheless we would hope that a good alignment would recognize homology in that region and that ideally the required <i>indels</i> would be placed between the secondary structure elements, not in their middle. But judging from the sequence alignment alone, we cannot judge where the secondary structure elements ought to be. You should therefore add the following "sequence" to the alignment; it contains exactly as many characters as the Swi6 sequence above and annotates the secondary structure elements. I have derived it from the 1SW6 structure
 
  
>SecStruc 1SW6 E: strand  t: turn  H: helix  _: irregular
+
===The input alignment===
_EEE__tt___ttt______EE_____t___HHHHHHHHHHHHHHHH_xxxx_HHHHHHH
 
HHHH_t_____t_____t____HHHHHHH__tHHHHHHHHH____t___tt____HHHHH
 
HH__HHHH___HHHHHHHHHHHHHEE_t____HHHHHHHHH__t__HHHHHHHHHHHHHH
 
HHHHHH__EEE_xxxx_HHHHHt_HHHHHHH______t____HHHHHHHH__HHHHHHHH
 
H____t____t____HHHH___
 
  
<div class="reference-box">[http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=1sw6&template=protein.html&r=wiring&l=1&chain=A '''1SW6_A''' at the PDBSum database of structure annotations] You can compare the diagram there with this text string.</div>
 
  
 +
&nbsp;
 +
The sequence alignment between target and template is the single most important factor that determines the quality of your model. No comparative modeling process will repair an incorrect alignment; it is useful to consider a homology model rather like a three-dimensional map of a sequence alignment rather than a structure in its own right. In a homology modeling project, typically the largest amount of time should be spent on preparing the best possible alignment. Even though automated servers like the SwissModel server will align sequences and select template structures for you, it would be unwise to use these just because they are convenient. You should take advantage of the much more sophisticated alignment methods available. Analysis of wrong models can't be expected to produce right results.
  
To proceed:
+
The best possible alignment is usually constructed from a multiple sequence alignment that includes at least '''the target and template sequence''' and other related sequences as well. The additional sequences are an important aid in identifying the correct placement of insertions and deletions. Your alignment should have been carefully reviewed by you and wherever required, manually adjusted to move insertions or deletions between target and template out of the secondary structure elements of the template structure.
#Manually align the Swi6 sequence with yeast Mbp1
 
#Bring the Secondary structure annotation into its correct alignment with Swi6
 
#Bring both CDD ankyrin profiles into the correct alignment with yeast Mbp1
 
  
Proceed along the following steps:
+
In most of the Mbp1 orthologues, we do not observe indels in the APSES domain regions. Evolutionary pressure on the APSES domains has selected against indels in the more than 600 million years these sequences have evolved independently in their respective species. To obtain an alignment between the '''template sequence''' and the '''target sequence''' from your species, proceed as follows.
  
{{task|1=
 
#Add the secondary structure annotation to the sequence alignment in Jalview. Copy the annotation, select File &rarr; Add sequences &rarr; from Textbox and paste the sequence.
 
#Select Help &rarr; Documentation and read about '''Editing Alignments''', '''Cursor Mode''' and '''Key strokes'''.
 
#Click on the yeast Mbp1 sequence '''row''' to select the entire row. Then use the cursor key to move that sequence down, so it is directly above the 1SW6 sequence. Select the row of 1SW6 and use shift/mouse to move the sequence elements and edit the alignment to match yeast Mbp1. Refer to the alignment given in the [[Reference annotation yeast Mbp1|Mbp1 annotation page]] for the correct alignment.
 
#Align the secondary structure elements with the 1SW6 sequence: Every character of 1SW6 should be matched with either E, t, H, or _. The result should be similar to the [[Reference annotation yeast Mbp1|Mbp1 annotation page]]. If you need to insert gaps into all sequences in the alignment, simply drag your mouse over all row headers - movement of sequences is constrained to selected regions, the rest is locked into place to prevent inadvertent misalignments. Remember to save your project from time to time: '''File &rarr; save''' so you can reload a previous state if anything goes wrong and can't be fixed with '''Edit &rarr; Undo'''.
 
#Finally align the two CD00204 consensus sequences to their correct positions (again, refer to the [[Reference annotation yeast Mbp1|Mbp1 annotation page]]).
 
#You can now consider the principles stated above and see if you can improve the alignment, for example by moving indels out of regions of secondary structure if that is possible without changing the character of the aligned columns significantly. Select blocks within which to work to leave the remaining alignment unchanged. So that this does not become tedious, you can restrict your editing to one Ankyrin repeat that is structurally defined in Swi6. You may want to open the 1SW6 structure in VMD to define the boundaries of one such repeat. You can copy and paste sections from Jalview into your assignment for documentation or export sections of the alignment to HTML (see the example below).
 
}}
 
  
=== Editing ankyrin domain alignments - Sample===
+
&nbsp;
  
This sample was created by
+
{{task|1=
 +
Choose on of the following options to align your '''target''' and '''template''' sequence.
  
# Editing the alignments as described above;
 
# Copying a block of aligned sequence;
 
# Pasting it To New Alignment;
 
# Colouring the residues by Hydrophobicity and setting the colour saturation according to Conservation;
 
# Choosing File &rarr; Export Image &rarr; HTML and pasting the resulting HTML source into this Wikipage.
 
  
 +
;In Jalview...
 +
* Load your Jalview project with aligned APSES domain sequences or recreate it from the Mbp1 orthologue sequences from the [[Reference Mbp1 orthologues (all fungi)|'''Mbp1 protein orthologs page''']] that I prepared for Assignment 7. Include the sequence of your '''template protein''' and re-align.
 +
* Delete all sequence you no longer need, i.e. keep only the APSES domains of the '''target''' (from your species) and the '''template''' (from the PDB) and choose '''Edit &rarr; Remove empty columns'''. This is your '''input alignment'''.
 +
* Choose '''File&rarr;Output to textbox&rarr;FASTA''' to obtain the aligned sequences. They should both have exactly the same length, i.e. N- or C- termini have to be padded by hyphens if the original sequences had different length. Save the sequences in a text-file.
  
<table border="1"><tr><td>
 
<table border="0" cellpadding="0" cellspacing="0">
 
  
<tr><td colspan="6"></td>
+
;Using a different MSA program
<td colspan="9">10<br>|</td><td></td>
+
* Copy the FASTA formatted sequences of the Mbp1 proteins in the reference  species from the [[Reference APSES domains (reference species)|'''Reference APSES domain page''']].
<td colspan="9">20<br>|</td><td></td>
+
* Access e.g. the MSA tools page at the EBI.
<td colspan="9">30<br>|</td><td></td>
+
* Paste the Mbp1 sequence set, your '''target''' sequence and the '''template''' sequence into the input form.
<td colspan="3"></td><td colspan="3">40<br>|</td>
+
*Run the alignment and save the output.
  
</tr>
 
<tr><td nowrap="nowrap">MBP1_USTMA/341-368&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f3eef9">Y</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#fdeeef">L</td>
 
  
<td>-</td>
+
;Using the EMBOSS explorer
<td>-</td>
+
* Use the <code>needle</code> tool for the alignment  ... but remember that pairwise alignments will only be suitable in case the alignment is absolutely unambiguous (such as here) . If there are any indels, an MSA will give much more reliable information.
<td>-</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
  
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#ffd8d8">I</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
  
<td>-</td>
+
;By hand
<td>-</td>
+
APSES domains are strongly conserved and have few if any indels. You could also simply align by hand.
<td>-</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#fbeef1">F</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#eeeefe">E</td>
 
  
<td bgcolor="#cfaddc">G</td>
+
* Copy the CLUSTAL formatted reference alignment of the Mbp1 proteins in the reference species from the [[Reference APSES domains (reference species)|'''Reference APSES domain page''']].
<td bgcolor="#dad8fd">E</td>
+
* Open a new file in a text editor.
<td bgcolor="#d9c2e7">T</td>
+
* Paste the Mbp1 sequence set, your '''target''' sequence and the '''template''' sequence into the file.
<td bgcolor="#d3c2ee">P</td>
+
*Align by hand, replace all spaces with hyphens and save the output.
<td bgcolor="#f7adb3">L</td>
+
}}
<td bgcolor="#ccaddf">T</td>
 
<td bgcolor="#ecc2d5">M</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
  
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#f4eef8">S</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1B_SCHCO/470-498&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#eeeefe">E</td>
 
  
<td bgcolor="#eeeefe">D</td>
+
Whatever method you use: the result should be a two sequence alignment in '''multi-FASTA''' format, that was constructed from a number of supporting sequences and that contains your aligned '''target''' and '''template''' sequence. This is your '''input alignment''' for the homology modeling server. For a ''Schizosaccharomyces pombe'' model, which I am using as an example here, it looks like this:
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#f3eef9">Y</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#eeeeff">K</td>
 
<td bgcolor="#f4eef8">S</td>
 
  
<td>-</td>
+
>1BM8_A
<td>-</td>
+
QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRI
<td>-</td>
+
LEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDF
<td>-</td>
+
>Mbp1_SCHPO 2-100 NP_593032
<td>-</td>
+
AVHVAVYSGVEVYECFIKGVSVMRRRRDSWLNATQILKVADFDKPQRTRV
<td>-</td>
+
LERQVQIGAHEKVQGGYGKYQGTWVPFQRGVDLATKYKVDGIMSPILSL
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
  
<td>-</td>
 
<td bgcolor="#f7d8e0">F</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#fdeeef">L</td>
 
  
<td bgcolor="#eeeefe">Q</td>
+
&nbsp;
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">E</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#f7adb3">L</td>
 
  
<td bgcolor="#b0adfa">N</td>
+
==Homology model==
<td bgcolor="#ffc2c2">I</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#fcc2c4">V</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#eeeefe">N</td>
 
</tr>
 
  
<tr><td nowrap="nowrap">MBP1_ASHGO/465-494&nbsp;&nbsp;</td>
 
<td>F</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#f3eef9">Y</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td>-</td>
 
  
<td>-</td>
+
&nbsp;
<td>-</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#f4eef8">T</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
  
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#ffd8d8">I</td>
 
<td>-</td>
 
<td>-</td>
 
  
<td>-</td>
+
===SwissModel===
<td>-</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#efc2d0">C</td>
 
<td bgcolor="#eeeeff">K</td>
 
<td bgcolor="#cfaddc">G</td>
 
  
<td bgcolor="#e6d8f0">S</td>
+
&nbsp;<br>
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#d3c2ee">P</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#ffc2c2">I</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e5adc6">M</td>
 
  
<td bgcolor="#c5c2fb">N</td>
+
Access the Swissmodel server at '''http://swissmodel.expasy.org''' and click on '''Start Modelling'''. Then, under the '''Supported Inputs''', click on '''Target-Template Alignment'''.
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#eeeefe">D</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_CLALU/550-586&nbsp;&nbsp;</td>
 
<td>G</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#eeeefe">N</td>
 
  
<td bgcolor="#f4eef7">G</td>
+
{{task|1=
<td bgcolor="#eeeefe">N</td>
+
*Paste your alignment for target and model into the form field. Click on the question mark next to "Supported Inputs" if you are not sure about the format. SwissModel will analyse the sequences and ask you to identify target and template. The YFO sequence is your target. The 1BM8 sequence is the template.
<td bgcolor="#f4eef8">S</td>
 
<td>N</td>
 
<td>D</td>
 
<td>K</td>
 
<td bgcolor="#eeeeff">K</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td>-</td>
 
  
<td>-</td>
+
* Click '''Validate Target Template Alignment''' and check that the returned alignment is correct.
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
  
<td bgcolor="#fbd8db">L</td>
+
*Click '''Build Model''' to start the modeling process.
<td bgcolor="#ffd8d8">I</td>
 
<td>S</td>
 
<td>K</td>
 
<td>F</td>
 
<td>L</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#eeeefe">Q</td>
 
  
<td bgcolor="#c5c2fb">D</td>
+
* The resulting page returns information about the resulting model. Mouse over the '''Model 01''', open the '''PDB file''' and save the coordinates to your computer. Read the information on what is being returned by the server (click on the question mark icon). Study the quality measures.
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#edadbd">F</td>
 
<td bgcolor="#b3adf7">H</td>
 
  
<td bgcolor="#ffc2c2">I</td>
+
* Also save:
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#c6ade5">Y</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f9eef3">M</td>
 
<td bgcolor="#f4eef8">S</td>
 
</tr>
 
  
<tr><td nowrap="nowrap">MBPA_COPCI/514-542&nbsp;&nbsp;</td>
+
** The output page as pdf (for reference)
 +
** The modeling report (as pdf)
 +
}}
  
<td>-</td>
+
==Model analysis==
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#fbeef1">F</td>
 
<td>-</td>
 
<td>-</td>
 
  
<td>-</td>
+
&nbsp;
<td bgcolor="#eeeeff">R</td>
+
&nbsp;
<td bgcolor="#f4eef8">S</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fdd8da">V</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">E</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
 
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#ffadad">I</td>
 
<td bgcolor="#b0adfa">N</td>
 
<td bgcolor="#ffc2c2">I</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#fcc2c4">V</td>
 
 
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#eeeefe">N</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_DEBHA/507-550&nbsp;&nbsp;</td>
 
<td>I</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#eeeefe">Q</td>
 
 
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td>K</td>
 
<td>K</td>
 
 
 
<td>L</td>
 
<td>S</td>
 
<td>L</td>
 
<td>S</td>
 
<td>D</td>
 
<td>K</td>
 
<td>K</td>
 
<td>E</td>
 
<td bgcolor="#fbd8db">L</td>
 
 
 
<td bgcolor="#ffd8d8">I</td>
 
<td>A</td>
 
<td>K</td>
 
<td>F</td>
 
<td>I</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
 
 
<td bgcolor="#ffc2c2">I</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#edadbd">F</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#ffc2c2">I</td>
 
 
 
<td bgcolor="#fbadaf">V</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#c6ade5">Y</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td bgcolor="#eeeefe">N</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1A_SCHCO/388-415&nbsp;&nbsp;</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#f3eef9">Y</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeeff">K</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fdd8da">V</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#fbeef1">F</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">E</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">E</td>
 
<td bgcolor="#d9c2e7">T</td>
 
 
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#ccaddf">T</td>
 
<td bgcolor="#ecc2d5">M</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#efc2d0">C</td>
 
<td bgcolor="#eeeeff">R</td>
 
 
 
<td bgcolor="#f4eef8">S</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_AJECA/374-403&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#eeeefe">Q</td>
 
 
 
<td bgcolor="#ffeeee">I</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#f9eef3">M</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#e6d8f0">S</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#d8c2e8">S</td>
 
 
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
 
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">K</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#faeef2">C</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_PARBR/380-409&nbsp;&nbsp;</td>
 
<td>I</td>
 
<td bgcolor="#fdeeef">L</td>
 
 
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f4eef8">S</td>
 
 
 
<td bgcolor="#fdeeef">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#e6d8f0">S</td>
 
 
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#d8c2e8">S</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
 
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">K</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#faeef2">C</td>
 
 
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_NEOFI/363-392&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#faeef2">C</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#ffeeee">I</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#e6d8f0">S</td>
 
<td bgcolor="#faeef2">C</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#d8c2e8">S</td>
 
<td bgcolor="#eeeefe">N</td>
 
 
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#fcc2c4">V</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
 
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#f9eef3">A</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_ASPNI/365-394&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#fbeef1">F</td>
 
<td bgcolor="#f4eef8">S</td>
 
 
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#fdeeee">V</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#fdeeef">L</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#e6d8f0">S</td>
 
<td bgcolor="#faeef2">C</td>
 
 
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#d8c2e8">S</td>
 
<td bgcolor="#fdeeee">V</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#fbadaf">V</td>
 
 
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#fcc2c4">V</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#fdeeee">V</td>
 
</tr>
 
 
 
<tr><td nowrap="nowrap">MBP1_UNCRE/377-406&nbsp;&nbsp;</td>
 
<td>M</td>
 
<td bgcolor="#f3eef9">Y</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#fdeeee">V</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f2d8e5">A</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#d8c2e8">S</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#cfaddc">G</td>
 
 
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">K</td>
 
 
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#faeef2">C</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_PENCH/439-468&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#faeef2">C</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#eeeefe">Q</td>
 
 
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#f9eef3">M</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#e6d8f0">S</td>
 
<td bgcolor="#faeef2">C</td>
 
<td bgcolor="#eeeefe">Q</td>
 
 
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">Q</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#fbadaf">V</td>
 
<td bgcolor="#f7adb3">L</td>
 
 
 
<td bgcolor="#fcc2c4">V</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#f9eef3">A</td>
 
</tr>
 
 
 
<tr><td nowrap="nowrap">MBPA_TRIVE/407-436&nbsp;&nbsp;</td>
 
 
 
<td>V</td>
 
<td bgcolor="#fbeef1">F</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#e6d8f0">S</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
 
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">K</td>
 
<td bgcolor="#c5c2fb">N</td>
 
 
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#faeef2">C</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_PHANO/400-429&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#f4eef9">W</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeefe">E</td>
 
 
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#fdeeee">V</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f4eef8">T</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
 
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
 
 
<td bgcolor="#c5c2fb">Q</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#ffadad">I</td>
 
<td bgcolor="#e5adc6">M</td>
 
<td bgcolor="#ffc2c2">I</td>
 
 
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#f9eef3">A</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBPA_SCLSC/294-313&nbsp;&nbsp;</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#ffc2c2">I</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#d9c2e7">T</td>
 
 
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#ffadad">I</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#ffc2c2">I</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">K</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#eeeeff">K</td>
 
 
 
<td bgcolor="#f9eef3">A</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBPA_PYRIS/363-392&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#f4eef9">W</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#eeeefe">E</td>
 
 
 
<td bgcolor="#fdeeee">V</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f4eef8">T</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">Q</td>
 
 
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#ffadad">I</td>
 
<td bgcolor="#e5adc6">M</td>
 
<td bgcolor="#ffc2c2">I</td>
 
<td bgcolor="#e4adc7">A</td>
 
 
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#f9eef3">A</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_/361-390&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td>G</td>
 
<td>V</td>
 
<td>L</td>
 
<td bgcolor="#f4eef8">S</td>
 
 
 
<td bgcolor="#eeeefe">Q</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f7d8e0">F</td>
 
<td bgcolor="#f3d8e4">M</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">D</td>
 
 
 
<td bgcolor="#f4eef8">T</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
 
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#ffc2c2">I</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#d8c2e8">S</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#f9eef3">A</td>
 
 
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_ASPFL/328-364&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#fdeeee">V</td>
 
 
 
<td>I</td>
 
<td>T</td>
 
<td>L</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f7d8e0">F</td>
 
<td bgcolor="#ffd8d8">I</td>
 
<td>S</td>
 
 
 
<td>E</td>
 
<td>I</td>
 
<td>V</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#eeeefe">Q</td>
 
 
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#b0adfa">N</td>
 
<td bgcolor="#f9c2c7">L</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#cfaddc">G</td>
 
 
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#f4eef8">S</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBPA_MAGOR/375-404&nbsp;&nbsp;</td>
 
<td>Q</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#eeeefe">D</td>
 
 
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#fbeef1">F</td>
 
<td bgcolor="#fdeeee">V</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#eeeefe">Q</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#f9eef3">A</td>
 
 
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#fbadaf">V</td>
 
 
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#f9c2c7">L</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#b0adfa">Q</td>
 
<td bgcolor="#c2c2ff">R</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#f4eef8">S</td>
 
</tr>
 
 
 
<tr><td nowrap="nowrap">MBP1_CHAGL/361-390&nbsp;&nbsp;</td>
 
<td>S</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#cfaddc">G</td>
 
 
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#fbadaf">V</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#f9c2c7">L</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e5adc6">M</td>
 
 
 
<td bgcolor="#c2c2ff">R</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#f9eef3">A</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_PODAN/372-401&nbsp;&nbsp;</td>
 
<td>V</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#f2eefa">P</td>
 
 
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#fdeeee">V</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeefe">Q</td>
 
 
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">E</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#b3adf7">H</td>
 
 
 
<td bgcolor="#f9c2c7">L</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#fcc2c4">V</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#f9eef3">A</td>
 
</tr>
 
 
 
<tr><td nowrap="nowrap">MBP1_LACTH/458-487&nbsp;&nbsp;</td>
 
 
 
<td>F</td>
 
<td bgcolor="#f4eef8">S</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#f3eef9">Y</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#ffd8d8">I</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">Q</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
 
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#fbadaf">V</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#f9c2c7">L</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#b0adfa">Q</td>
 
<td bgcolor="#c5c2fb">N</td>
 
 
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#eeeefe">D</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_FILNE/433-460&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f3eef9">Y</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeefe">Q</td>
 
 
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fdd8da">V</td>
 
 
 
<td bgcolor="#ffd8d8">I</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#fbeef1">F</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
 
 
<td bgcolor="#c5c2fb">E</td>
 
<td bgcolor="#eeeefe">E</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">E</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#ccaddf">T</td>
 
<td bgcolor="#ffc2c2">I</td>
 
 
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">R</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#f4eef8">S</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_KLULA/477-506&nbsp;&nbsp;</td>
 
<td>F</td>
 
 
 
<td bgcolor="#f4eef8">T</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#f3eef9">Y</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#fdeeee">V</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#ffd8d8">I</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#d8c2e8">S</td>
 
 
 
<td bgcolor="#d3c2ee">P</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#d5c2ec">Y</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#ccaddf">T</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#eeeeff">K</td>
 
 
 
<td bgcolor="#eeeefe">D</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_SCHST/468-501&nbsp;&nbsp;</td>
 
<td>A</td>
 
<td bgcolor="#eeeeff">K</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#eeeefe">N</td>
 
 
 
<td bgcolor="#eeeeff">K</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#eeeeff">K</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#ffd8d8">I</td>
 
 
 
<td>A</td>
 
<td>K</td>
 
<td>F</td>
 
<td>I</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#d8c2e8">S</td>
 
 
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
<td bgcolor="#edadbd">F</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#ffc2c2">I</td>
 
<td bgcolor="#eaadc0">C</td>
 
 
 
<td bgcolor="#caade0">S</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td bgcolor="#eeeefe">N</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_SACCE/496-525&nbsp;&nbsp;</td>
 
<td>F</td>
 
<td bgcolor="#f4eef8">S</td>
 
 
 
<td bgcolor="#f2eefa">P</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#f3eef9">Y</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#eeeefe">E</td>
 
 
 
<td bgcolor="#fdeeef">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">N</td>
 
 
 
<td bgcolor="#f4eef8">T</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c2c2ff">K</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#ebc2d5">A</td>
 
 
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#ffc2c2">I</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#caade0">S</td>
 
<td bgcolor="#adadff">K</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#eeeefe">D</td>
 
 
 
</tr>
 
<tr><td nowrap="nowrap">CD00204/1-19&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c5c2fb">E</td>
 
<td bgcolor="#eeeefe">D</td>
 
 
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#d8d8ff">R</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#d3c2ee">P</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#f9c2c7">L</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
 
 
<td bgcolor="#caade0">S</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#efeefd">H</td>
 
</tr>
 
<tr><td nowrap="nowrap">CD00204/99-118&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fdd8da">V</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#f9eef3">A</td>
 
 
 
<td bgcolor="#eeeeff">R</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#c2c2ff">K</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#cfaddc">G</td>
 
<td bgcolor="#d8d8ff">R</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#d3c2ee">P</td>
 
<td bgcolor="#f7adb3">L</td>
 
 
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#f9c2c7">L</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">K</td>
 
<td bgcolor="#c5c2fb">N</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#efeefd">H</td>
 
</tr>
 
 
 
<tr><td nowrap="nowrap">1SW6/203-232&nbsp;&nbsp;</td>
 
<td>L</td>
 
<td bgcolor="#eeeefe">D</td>
 
<td bgcolor="#fdeeef">L</td>
 
<td bgcolor="#eeeeff">K</td>
 
<td bgcolor="#f4eef9">W</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td bgcolor="#ffeeee">I</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f3d8e4">M</td>
 
<td bgcolor="#fbd8db">L</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dad8fd">N</td>
 
<td bgcolor="#f9eef3">A</td>
 
<td bgcolor="#eeeefe">Q</td>
 
<td bgcolor="#c5c2fb">D</td>
 
<td bgcolor="#d8c2e8">S</td>
 
<td bgcolor="#eeeefe">N</td>
 
<td bgcolor="#cfaddc">G</td>
 
 
 
<td bgcolor="#dad8fd">D</td>
 
<td bgcolor="#d9c2e7">T</td>
 
<td bgcolor="#efc2d0">C</td>
 
<td bgcolor="#f7adb3">L</td>
 
<td bgcolor="#b0adfa">N</td>
 
<td bgcolor="#ffc2c2">I</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#e4adc7">A</td>
 
<td bgcolor="#adadff">R</td>
 
 
 
<td bgcolor="#f9c2c7">L</td>
 
<td bgcolor="#f4eef7">G</td>
 
<td bgcolor="#eeeefe">N</td>
 
</tr>
 
<tr><td nowrap="nowrap">SecStruc/203-232&nbsp;&nbsp;</td>
 
<td>t</td>
 
<td bgcolor="#f5eef6">_</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#efeefd">H</td>
 
 
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#efeefd">H</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#efeefd">H</td>
 
<td bgcolor="#efeefd">H</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td bgcolor="#ead8ed">_</td>
 
<td bgcolor="#ead8ed">_</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#ead8ed">_</td>
 
<td bgcolor="#f5eef6">_</td>
 
<td bgcolor="#f5eef6">_</td>
 
 
 
<td bgcolor="#dec2e3">_</td>
 
<td bgcolor="#d9c2e7">t</td>
 
<td bgcolor="#f5eef6">_</td>
 
<td bgcolor="#d2add8">_</td>
 
<td bgcolor="#ead8ed">_</td>
 
<td bgcolor="#dec2e3">_</td>
 
<td bgcolor="#c7c2f9">H</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#b3adf7">H</td>
 
 
 
<td bgcolor="#c7c2f9">H</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#b3adf7">H</td>
 
<td bgcolor="#c7c2f9">H</td>
 
<td bgcolor="#f5eef6">_</td>
 
<td bgcolor="#f5eef6">_</td>
 
</tr>
 
</table>
 
</td></tr>
 
 
 
</table>
 
;Aligned sequences before editing. The algorithm has placed gaps into the Swi6 helix <code>LKWIIAN</code> and the four-residue gaps before the block of well aligned sequence on the right are poorly supported.
 
 
 
 
 
<table border="1"><tr><td>
 
<table border="0" cellpadding="0" cellspacing="0">
 
 
 
<tr><td colspan="6"></td>
 
<td colspan="9">10<br>|</td><td></td>
 
<td colspan="9">20<br>|</td><td></td>
 
 
 
<td colspan="9">30<br>|</td><td></td>
 
<td colspan="3"></td><td colspan="3">40<br>|</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_USTMA/341-368&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dfd2f0">Y</td>
 
<td bgcolor="#e4d2ec">G</td>
 
 
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#ffbfbf">I</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#f5d2db">F</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#d4d2fc">E</td>
 
 
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">E</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#c2abe8">P</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#bf99d7">T</td>
 
<td bgcolor="#e5abc5">M</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
 
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#e2d2ee">S</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1B_SCHCO/470-498&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#d4d2fc">E</td>
 
 
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#dfd2f0">Y</td>
 
<td bgcolor="#d2d2ff">K</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f2bfcc">F</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">E</td>
 
 
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#9d99f9">N</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#dd99b9">A</td>
 
 
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#fcabae">V</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#d4d2fc">N</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_ASHGO/465-494&nbsp;&nbsp;</td>
 
<td>F</td>
 
<td bgcolor="#e2d2ee">S</td>
 
 
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#dfd2f0">Y</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#e2d2ed">T</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#ffbfbf">I</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
 
 
<td bgcolor="#eaabbf">C</td>
 
<td bgcolor="#d2d2ff">K</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#d6bfe7">S</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#c2abe8">P</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#ffabab">I</td>
 
 
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#df99b8">M</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#d4d2fc">D</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_CLALU/550-586&nbsp;&nbsp;</td>
 
<td>G</td>
 
 
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td>K</td>
 
 
 
<td>K</td>
 
<td>E</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>L</td>
 
<td>I</td>
 
<td>S</td>
 
<td>K</td>
 
<td bgcolor="#f2bfcc">F</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
 
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#e999ad">F</td>
 
<td bgcolor="#a199f6">H</td>
 
 
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#b899df">Y</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#f0d2df">M</td>
 
<td bgcolor="#e2d2ee">S</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBPA_COPCI/514-542&nbsp;&nbsp;</td>
 
 
 
<td>-</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#f5d2db">F</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#e2d2ee">S</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#fcbfc1">V</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#fbd2d5">L</td>
 
 
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">E</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#ff9999">I</td>
 
 
 
<td bgcolor="#9d99f9">N</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#fcabae">V</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#d4d2fc">N</td>
 
</tr>
 
 
 
<tr><td nowrap="nowrap">MBP1_DEBHA/507-550&nbsp;&nbsp;</td>
 
<td>I</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#d4d2fc">E</td>
 
 
 
<td bgcolor="#d4d2fc">N</td>
 
<td>K</td>
 
<td>K</td>
 
<td>L</td>
 
<td>S</td>
 
<td>L</td>
 
<td>S</td>
 
<td>D</td>
 
<td>K</td>
 
 
 
<td>K</td>
 
<td>E</td>
 
<td>L</td>
 
<td>I</td>
 
<td>A</td>
 
<td>K</td>
 
<td bgcolor="#f2bfcc">F</td>
 
<td bgcolor="#ffbfbf">I</td>
 
<td bgcolor="#c2bffc">N</td>
 
 
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
 
 
<td bgcolor="#e999ad">F</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#fb999c">V</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#b899df">Y</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#d4d2fc">N</td>
 
 
 
</tr>
 
<tr><td nowrap="nowrap">MBP1A_SCHCO/388-415&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dfd2f0">Y</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d2d2ff">K</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#fbd2d5">L</td>
 
 
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fcbfc1">V</td>
 
<td bgcolor="#f9bfc4">L</td>
 
 
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#f5d2db">F</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">E</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">E</td>
 
<td bgcolor="#cbabdf">T</td>
 
 
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#bf99d7">T</td>
 
<td bgcolor="#e5abc5">M</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#eaabbf">C</td>
 
<td bgcolor="#d2d2ff">R</td>
 
 
 
<td bgcolor="#e2d2ee">S</td>
 
</tr>
 
 
 
<tr><td nowrap="nowrap">MBP1_AJECA/374-403&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
 
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#f0d2df">M</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
 
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#d6bfe7">S</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#caabe0">S</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
 
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">K</td>
 
<td bgcolor="#afabfa">N</td>
 
 
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#f4d2dc">C</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_PARBR/380-409&nbsp;&nbsp;</td>
 
<td>I</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d5d2fb">H</td>
 
 
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#d6bfe7">S</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#caabe0">S</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#c399d4">G</td>
 
 
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">K</td>
 
 
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#f4d2dc">C</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_NEOFI/363-392&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#f4d2dc">C</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
 
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#d6bfe7">S</td>
 
<td bgcolor="#f4d2dc">C</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#caabe0">S</td>
 
<td bgcolor="#d4d2fc">N</td>
 
 
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#fcabae">V</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
 
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#f0d2e0">A</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_ASPNI/365-394&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#f5d2db">F</td>
 
<td bgcolor="#e2d2ee">S</td>
 
 
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#fcd2d3">V</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#d6bfe7">S</td>
 
<td bgcolor="#f4d2dc">C</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#caabe0">S</td>
 
 
 
<td bgcolor="#fcd2d3">V</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#fb999c">V</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#fcabae">V</td>
 
<td bgcolor="#dd99b9">A</td>
 
 
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#fcd2d3">V</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_UNCRE/377-406&nbsp;&nbsp;</td>
 
<td>M</td>
 
<td bgcolor="#dfd2f0">Y</td>
 
 
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#fcd2d3">V</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#eabfd3">A</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
 
 
<td bgcolor="#caabe0">S</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#cbabdf">T</td>
 
 
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">K</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#f4d2dc">C</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_PENCH/439-468&nbsp;&nbsp;</td>
 
<td>T</td>
 
 
 
<td bgcolor="#f4d2dc">C</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#f0d2df">M</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#d6bfe7">S</td>
 
<td bgcolor="#f4d2dc">C</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
 
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">Q</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#fb999c">V</td>
 
<td bgcolor="#f699a1">L</td>
 
 
 
<td bgcolor="#fcabae">V</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#f0d2e0">A</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBPA_TRIVE/407-436&nbsp;&nbsp;</td>
 
 
 
<td>V</td>
 
<td bgcolor="#f5d2db">F</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#fbd2d5">L</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#d6bfe7">S</td>
 
<td bgcolor="#e2d2ee">S</td>
 
 
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
 
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">K</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#f4d2dc">C</td>
 
</tr>
 
 
 
<tr><td nowrap="nowrap">MBP1_PHANO/400-429&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#e2d2ef">W</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#fcd2d3">V</td>
 
<td bgcolor="#e2d2ed">T</td>
 
 
 
<td bgcolor="#d2d2ff">R</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#c2bffc">N</td>
 
 
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">Q</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
 
 
<td bgcolor="#ff9999">I</td>
 
<td bgcolor="#df99b8">M</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#f0d2e0">A</td>
 
 
 
</tr>
 
<tr><td nowrap="nowrap">MBPA_SCLSC/294-313&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
 
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#cbabdf">T</td>
 
 
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#ff9999">I</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">K</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#d2d2ff">K</td>
 
 
 
<td bgcolor="#f0d2e0">A</td>
 
</tr>
 
 
 
<tr><td nowrap="nowrap">MBPA_PYRIS/363-392&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#e2d2ef">W</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#d4d2fc">E</td>
 
 
 
<td bgcolor="#fcd2d3">V</td>
 
<td bgcolor="#e2d2ed">T</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
 
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">Q</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
 
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#ff9999">I</td>
 
<td bgcolor="#df99b8">M</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#afabfa">N</td>
 
 
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#f0d2e0">A</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_/361-390&nbsp;&nbsp;</td>
 
<td>N</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#e4d2ec">G</td>
 
 
 
<td bgcolor="#fcd2d3">V</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td bgcolor="#f2bfcc">F</td>
 
<td bgcolor="#ebbfd3">M</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#e2d2ed">T</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#c399d4">G</td>
 
 
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">R</td>
 
 
 
<td bgcolor="#caabe0">S</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#f0d2e0">A</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_ASPFL/328-364&nbsp;&nbsp;</td>
 
<td>T</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#ded2f2">P</td>
 
 
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#fcd2d3">V</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#e2d2ed">T</td>
 
<td>L</td>
 
<td>G</td>
 
<td>R</td>
 
<td>F</td>
 
 
 
<td>I</td>
 
<td>S</td>
 
<td>E</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td bgcolor="#ffbfbf">I</td>
 
<td bgcolor="#fcbfc1">V</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
 
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#9d99f9">N</td>
 
<td bgcolor="#f7abb2">L</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#c399d4">G</td>
 
 
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#e2d2ee">S</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBPA_MAGOR/375-404&nbsp;&nbsp;</td>
 
<td>Q</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d4d2fc">D</td>
 
 
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#f5d2db">F</td>
 
<td bgcolor="#fcd2d3">V</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">N</td>
 
 
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#fb999c">V</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#f7abb2">L</td>
 
<td bgcolor="#dd99b9">A</td>
 
 
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9d99f9">Q</td>
 
<td bgcolor="#ababff">R</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#e2d2ee">S</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_CHAGL/361-390&nbsp;&nbsp;</td>
 
<td>S</td>
 
<td bgcolor="#d2d2ff">R</td>
 
 
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
 
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#fb999c">V</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#f7abb2">L</td>
 
 
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#df99b8">M</td>
 
<td bgcolor="#ababff">R</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#f0d2e0">A</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_PODAN/372-401&nbsp;&nbsp;</td>
 
<td>V</td>
 
 
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#fcd2d3">V</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
 
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">E</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#a199f6">H</td>
 
 
 
<td bgcolor="#f7abb2">L</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#fcabae">V</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#f0d2e0">A</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_LACTH/458-487&nbsp;&nbsp;</td>
 
 
 
<td>F</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#dfd2f0">Y</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#d4d2fc">N</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#ffbfbf">I</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#f0d2e0">A</td>
 
 
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">Q</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#fb999c">V</td>
 
 
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#f7abb2">L</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9d99f9">Q</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#d4d2fc">D</td>
 
</tr>
 
 
 
<tr><td nowrap="nowrap">MBP1_FILNE/433-460&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dfd2f0">Y</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#f0d2e0">A</td>
 
 
 
<td bgcolor="#d4d2fc">D</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fcbfc1">V</td>
 
<td bgcolor="#ffbfbf">I</td>
 
<td bgcolor="#c2bffc">N</td>
 
 
 
<td bgcolor="#f5d2db">F</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">E</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">E</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
 
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#bf99d7">T</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#e2d2ee">S</td>
 
 
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_KLULA/477-506&nbsp;&nbsp;</td>
 
<td>F</td>
 
<td bgcolor="#e2d2ed">T</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#dfd2f0">Y</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#ffd2d2">I</td>
 
 
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#fcd2d3">V</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#ffbfbf">I</td>
 
 
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#caabe0">S</td>
 
 
 
<td bgcolor="#c2abe8">P</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#c5abe5">Y</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#bf99d7">T</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#d2d2ff">K</td>
 
 
 
<td bgcolor="#d4d2fc">D</td>
 
</tr>
 
 
 
<tr><td nowrap="nowrap">MBP1_SCHST/468-501&nbsp;&nbsp;</td>
 
<td>A</td>
 
<td bgcolor="#d2d2ff">K</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#d4d2fc">N</td>
 
 
 
<td bgcolor="#d2d2ff">K</td>
 
<td bgcolor="#d2d2ff">K</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>L</td>
 
<td>I</td>
 
<td>A</td>
 
<td>K</td>
 
<td bgcolor="#f2bfcc">F</td>
 
 
 
<td bgcolor="#ffbfbf">I</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#caabe0">S</td>
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">N</td>
 
 
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#e999ad">F</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#e699b1">C</td>
 
<td bgcolor="#be99d9">S</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#afabfa">N</td>
 
 
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#d4d2fc">N</td>
 
</tr>
 
<tr><td nowrap="nowrap">MBP1_SACCE/496-525&nbsp;&nbsp;</td>
 
<td>F</td>
 
<td bgcolor="#e2d2ee">S</td>
 
<td bgcolor="#ded2f2">P</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#dfd2f0">Y</td>
 
 
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#d4d2fc">E</td>
 
<td bgcolor="#fbd2d5">L</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#e2d2ed">T</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#ababff">K</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#c399d4">G</td>
 
 
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#e3abc6">A</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#ffabab">I</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#be99d9">S</td>
 
<td bgcolor="#9999ff">K</td>
 
 
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#d4d2fc">D</td>
 
</tr>
 
<tr><td nowrap="nowrap">CD00204/1-19&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#afabfa">E</td>
 
<td bgcolor="#d4d2fc">D</td>
 
 
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#bfbfff">R</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#c2abe8">P</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#f7abb2">L</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
 
 
<td bgcolor="#be99d9">S</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#d5d2fb">H</td>
 
</tr>
 
<tr><td nowrap="nowrap">CD00204/99-118&nbsp;&nbsp;</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#fcbfc1">V</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d2d2ff">R</td>
 
<td bgcolor="#afabfa">D</td>
 
<td bgcolor="#ababff">K</td>
 
 
 
<td bgcolor="#d4d2fc">D</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#bfbfff">R</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#c2abe8">P</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#f7abb2">L</td>
 
<td bgcolor="#dd99b9">A</td>
 
 
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">K</td>
 
<td bgcolor="#afabfa">N</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#d5d2fb">H</td>
 
</tr>
 
<tr><td nowrap="nowrap">1SW6/203-232&nbsp;&nbsp;</td>
 
<td>L</td>
 
<td bgcolor="#d4d2fc">D</td>
 
 
 
<td bgcolor="#fbd2d5">L</td>
 
<td bgcolor="#d2d2ff">K</td>
 
<td bgcolor="#e2d2ef">W</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#ffd2d2">I</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#ebbfd3">M</td>
 
<td bgcolor="#f9bfc4">L</td>
 
<td bgcolor="#c2bffc">N</td>
 
<td bgcolor="#f0d2e0">A</td>
 
<td bgcolor="#d4d2fc">Q</td>
 
<td bgcolor="#afabfa">D</td>
 
 
 
<td bgcolor="#caabe0">S</td>
 
<td bgcolor="#d4d2fc">N</td>
 
<td bgcolor="#c399d4">G</td>
 
<td bgcolor="#c2bffc">D</td>
 
<td bgcolor="#cbabdf">T</td>
 
<td bgcolor="#eaabbf">C</td>
 
<td bgcolor="#f699a1">L</td>
 
<td bgcolor="#9d99f9">N</td>
 
<td bgcolor="#ffabab">I</td>
 
 
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#dd99b9">A</td>
 
<td bgcolor="#9999ff">R</td>
 
<td bgcolor="#f7abb2">L</td>
 
<td bgcolor="#e4d2ec">G</td>
 
<td bgcolor="#d4d2fc">N</td>
 
</tr>
 
<tr><td nowrap="nowrap">SecStruc/203-232&nbsp;&nbsp;</td>
 
<td>t</td>
 
 
 
<td bgcolor="#e6d2e9">_</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td bgcolor="#d5d2fb">H</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
 
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td>-</td>
 
<td bgcolor="#dcbfe1">_</td>
 
<td bgcolor="#dcbfe1">_</td>
 
<td bgcolor="#dcbfe1">_</td>
 
<td bgcolor="#e6d2e9">_</td>
 
<td bgcolor="#e6d2e9">_</td>
 
 
 
<td bgcolor="#d2abd8">_</td>
 
<td bgcolor="#cbabdf">t</td>
 
<td bgcolor="#e6d2e9">_</td>
 
<td bgcolor="#c799cf">_</td>
 
<td bgcolor="#dcbfe1">_</td>
 
<td bgcolor="#d2abd8">_</td>
 
<td bgcolor="#b2abf7">H</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#a199f6">H</td>
 
 
 
<td bgcolor="#b2abf7">H</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#a199f6">H</td>
 
<td bgcolor="#b2abf7">H</td>
 
<td bgcolor="#e6d2e9">_</td>
 
<td bgcolor="#e6d2e9">_</td>
 
</tr>
 
</table>
 
</td></tr>
 
 
 
</table>
 
;Aligned sequence after editing. A significant cleanup of the frayed region is possible. Now there is only one insertion event, and it is placed into the loop that connects two helices of the 1SW6 structure.
 
 
 
 
 
===Final analysis===
 
  
 +
=== The PDB file ===
 +
&nbsp;<br>
  
 
{{task|1=
 
{{task|1=
* Compare the distribution of indels in the ankyrin repeat regions of your alignments.
+
Open your '''model''' coordinates in a text-editor (make sure you view the PDB file in a fixed-width font (like "courier") so all the columns line up correctly) and consider the following questions:
**'''Review''' whether the indels in this region are concentrated in segments that connect the helices, or if they are more or less evenly distributed along the entire region of similarity.
 
**Think about whether the assertion that ''indels should not be placed in elements of secondary structure'' has merit in your alignment.
 
**Recognize that an indel in an element of secondary structure could be interpreted in a number of different ways:
 
*** The alignment is correct, the annotation is correct too: the indel is tolerated in that particular case, for example by extending the length of an &alpha;-helix or &beta;-strand;
 
*** The alignment algorithm has made an error, the structural annotation is correct: the indel should be moved a few residues;
 
*** The alignment is correct, the structural annotation is wrong, this is not a secondary structure element after all;
 
*** Both the algorithm and the annotation are probably wrong, but we have no data to improve the situation.
 
 
 
(<small>NB: remember that the structural annotations have been made for the yeast protein and might have turned out differently for the other proteins...</small>)
 
 
 
You should be able to analyse discrepancies between annotation and expectation in a structured and systematic way. In particular if you notice indels that have been placed into an '''annotated''' region of secondary structure, you should be able to comment on whether the location of the indel has strong support from aligned sequence motifs, or whether the indel could possibly be moved into a different location without much loss in alignment quality.
 
 
 
*Considering the whole alignment and your experience with editing, you should be able to state whether the position of indels relative to structural features of the ankyrin domains in your organism's Mbp1 protein is reliable. That would be the result of this task, in which you combine multiple sequence and structural information.
 
  
*You can also critically evaluate database information that you have encountered:
+
*What is the residue number of the first residue in the '''model'''? What should it be, based on the alignment? If the putative DNA binding region was reported to be residues 50-74 in the Mbp1 protein, which residues of your '''model''' correspond to that region?
# Navigate to the [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=precalc&SEQUENCE=6320147 '''CDD annotation'''] for yeast Mbp1.
 
# You can check the precise alignment boundaries of the ankyrin domains by clicking on the (+) icon to the left of the matching domain definition.
 
# Confirm that CDD extends the ankyrin domain annotation beyond the 1SW6 domain boundaries. Given your assessment of conservation in the region beyond the structural annotation:  do you think that extending the annotation is reasonable also in YFO's protein? Is there evidence for this in the alignment of the CD00204 consensus with well aligned blocks of sequence beyond the positions that match Swi6?  
 
 
}}
 
}}
  
==R code: load alignment and compute information scores==
+
<!-- discuss flagging of loops - setting of B-factor to 99.0 phps. ANOLEA vs. Gromos ... packing vs. energy? -->
<!-- Add sequence weighting and sampling bias correction ? -->
 
  
As discussed in the lecture, Shannon information is calculated as the difference between expected and observed entropy, where entropy is the negative sum over probabilities times the log of those probabilities:
 
  
 +
===R code: renumbering the model ===
  
 +
As you have seen, SwissModel numbers the first residue "1" and does not keep the numbering of the template. We should renumber the model so we can compare the model and the template with the same residue numbers. Fortunately there is a very useful R package that will help us with that.
  
 +
{{task|1=
 +
# Navigate to the [http://thegrantlab.org/bio3d/index.php '''bio3D'''] home page. '''bio3d''' is not available for installation via CRAN, but needs to be installed from source. Instructions for the different platforms are here http://thegrantlab.org/bio3d/tutorials/installing-bio3d Follow the instructions and install '''bio3d''' for '''R''' on your platform.
  
 +
# Explore and execute the following '''R''' script. I am assuming that your model is in your working directory, change paths and filenames as required.
  
Here we compute Shannon information scores for aligned positions of the APSES domain, and plot the values in '''R'''. You can try this with any part of your alignment, but I have used only the aligned residues for the APSES domain for my example. This is a good choice for a first try, since there are (almost) no gaps.
+
<source lang="rsplus">
 +
# renumberPDB.R
  
{{task|1=
+
# This is a simple renumbering script that uses the bio3D
# Export only the sequences of the aligned APSES domains to a file on your computer, in FASTA format as explained below. You could call this: <code>Mbp1_All_APSES.fa</code>.
+
# package. We simply set the first residue number to what it
##Use your mouse and clik and drag to ''select'' the aligned APSES domains in the alignment window.
+
# should be and renumber all residues based on the first one.
##Copy your selection to the clipboard.
+
# The script assumes your input PDBfile is in your working
##Use the main menu (not the menu of your alignment window) and select '''File &rarr; Input alignment &rarr; from Textbox'''; paste the selection into the textbox and click '''New Window'''.
+
# directory.
##Use '''File &rarr; save as''' to save the aligned siequences in multi-FASTA format under the filename you want in your '''R''' project directory.
 
 
 
# Explore the R-code below. Be sure that you understand it correctly. Note that this code does not implement any sampling bias correction, so positions with large numbers of gaps will receive artificially high scores (the alignment looks like the gap charecter were a conserved character).
 
  
 +
# To run this, you must have installed the bio3D R package; instructions
 +
# are here: http://thegrantlab.org/bio3d/tutorials/installing-bio3d
  
<source lang="rsplus">
+
setwd("~/my/working/directory")
 +
PDBin      <- "YFO_model.pdb"
 +
PDBout    <- "YFO_model_ren.pdb"
  
# CalculateInformation.R
+
first <- 4 # residue number that the first residue should have
# Calculate Shannon information for positions in a multiple sequence alignment.
 
# Requires: an MSA in multi FASTA format
 
   
 
# It is good practice to set variables you might want to change
 
# in a header block so you don't need to hunt all over the code
 
# for strings you need to update.
 
#
 
setwd("/your/R/working/directory")
 
mfa      <- "MBP1_All_APSES.fa"
 
 
   
 
   
 
# ================================================
 
# ================================================
#    Read sequence alignment fasta file
+
#    Read coordinate file
 
# ================================================
 
# ================================================
 
   
 
   
# read MFA datafile using seqinr function read.fasta()
+
# read PDB file using bio3D function read.pdb()
library(seqinr)
+
library(bio3d)
tmp <- read.alignment(mfa, format="fasta")
+
pdb <- read.pdb(PDBin) # read the PDB file into a list
MSA  <- as.matrix(tmp)  # convert the list into a characterwise matrix
 
                        # with appropriate row and column names using
 
                        # the seqinr function as.matrix.alignment()
 
                        # You could have a look under the hood of this
 
                        # function to understand beter how to convert a
 
                        # list into something else ... simply type
 
                        # "as.matrix.alignment" - without the parentheses
 
                        # to retrieve the function source code (as for any
 
                        # function btw).
 
  
### Explore contents of and access to the matrix of sequences
+
pdb            # examine the information
MSA
+
pdb$atom[1,]   # get information for the first atom
MSA[1,]
 
MSA[,1]
 
length(MSA[,1])
 
  
 +
# you can explore ?read.pdb and study the examples.
  
 
# ================================================
 
# ================================================
#    define function to calculate entropy
+
#    Change residue numbers
 
# ================================================
 
# ================================================
  
entropy <- function(v) { # calculate shannon entropy for the aa vector v
+
 
                    # Note: we are not correcting for small sample sizes
+
resNum <- as.numeric(pdb$atom[,"resno"]) # get residue numbers for all atoms
                    # here. Thus if there are a large number of gaps in
+
resNum <- resNum + (first - resNum[1])        # calculate offset
                    # the alignment, this will look like small entropy
+
pdb$atom[,"resno"] <- resNum             # replace old numbers with new
                    # since only a few amino acids are present. In the
+
pdb$atom[1,]                                   # check result
                    # extreme case: if a position is only present in
+
 
                    # one sequence, that one amino acid will be treated
 
                    # as 100% conserved - zero entropy. Sampling error
 
                    # corrections are discussed eg. in Schneider et al.
 
                    # (1986) JMB 188:414
 
l <- length(v)
 
a <- rep(0, 21)      # initialize a vector with 21 elements (20 aa plus gap)
 
                    # the set the name of each row to the one letter
 
                    # code. Through this, we can access a row by its
 
                    # one letter code.
 
names(a)  <- unlist(strsplit("acdefghiklmnpqrstvwy-", ""))
 
 
for (i in 1:l) {      # for the whole vector of amino acids
 
c <- v[i]          # retrieve the character
 
a[c] <- a[c] + 1  # increment its count by one
 
} # note: we could also have used the table() function for this
 
 
tot <- sum(a) - a["-"] # calculate number of observed amino acids
 
                      # i.e. subtract gaps
 
a <- a/tot             # frequency is observations of one amino acid
 
                      # divided by all observations. We assume that
 
                      # frequency equals probability.
 
a["-"] <- 0                             
 
for (i in 1:length(a)) {
 
if (a[i] != 0) { # if a[i] is not zero, otherwise leave as is.
 
            # By definition, 0*log(0) = 0  but R calculates
 
            # this in parts and returns NaN for log(0).
 
a[i] <- a[i] * (log(a[i])/log(2)) # replace a[i] with
 
                                  # p(i) log_2(p(i))
 
}
 
}
 
return(-sum(a)) # return Shannon entropy
 
}
 
  
 
# ================================================
 
# ================================================
#    calculate entropy for reference distribution
+
#    Write output to file
#    (from UniProt, c.f. Assignment 2)
 
 
# ================================================
 
# ================================================
  
refData <- c(
+
write.pdb(pdb=pdb,file=PDBout)
    "A"=8.26,
 
    "Q"=3.93,
 
    "L"=9.66,
 
    "S"=6.56,
 
    "R"=5.53,
 
    "E"=6.75,
 
    "K"=5.84,
 
    "T"=5.34,
 
    "N"=4.06,
 
    "G"=7.08,
 
    "M"=2.42,
 
    "W"=1.08,
 
    "D"=5.45,
 
    "H"=2.27,
 
    "F"=3.86,
 
    "Y"=2.92,
 
    "C"=1.37,
 
    "I"=5.96,
 
    "P"=4.70,
 
    "V"=6.87
 
    )
 
  
### Calculate the entropy of this distribution
+
# Done. Open the PDB file you have written in a text editor and confirm
 +
# that this has worked.
  
H.ref <- 0
+
</source>
for (i in 1:length(refData)) {
+
}}
p <- refData[i]/sum(refData) # convert % to probabilities
 
    H.ref <- H.ref - (p * (log(p)/log(2)))
 
}
 
 
 
# ================================================
 
#    calculate information for each position of
 
#    multiple sequence alignment
 
# ================================================
 
  
lAli <- dim(MSA)[2] # length of row in matrix is second element of dim(<matrix>).
 
I <- rep(0, lAli)  # initialize result vector
 
for (i in 1:lAli) {
 
I[i] = H.ref - entropy(MSA[,i])  # I = H_ref - H_obs
 
}
 
  
### evaluate I
+
&nbsp;
I
 
quantile(I)
 
hist(I)
 
plot(I)
 
  
# you can see that we have quite a large number of columns with the same,
+
===First visualization===
# high value ... what are these?
 
  
which(I > 4)
+
&nbsp;<br>
MSA[,which(I > 4)]
 
  
# And what is in the columns with low values?
+
Since a homology model inherits its structural details from the '''template''', your model of the YFO sequence should look very similar to the original 1BM8 structure.
MSA[,which(I < 1.5)]
 
  
 +
{{task|1=
 +
# Start Chimera and load the '''model''' coordinates that you have just renumbered.
 +
# From the PDB, also load the '''template''' structure. (Use File &rarr; Fetch by ID ...)
 +
# In the '''Favourites''' &rarr; '''Model Panel''' window you can switch between the two molecules.
 +
# Hide the ribbon and choose '''backbone only &rarr; full'''. You will note that the backbone of the two structures is virtually identical.
 +
# Next, choose '''Actions &rarr; Atoms/Bonds &rarr; show''' to display display the two molecules in a stick style and note how the sidechains have been modeled. Note especially how sidechain coordinates have been guessed, where the template had shorter sidechains than the target. It may be more clear if you hide H-atoms: '''Select &rarr; Chemistry &rarr; Element &rarr; H''' and '''Actions &rarr; Atoms/Bonds &rarr; hide'''
 +
# Display only residue 50 to 74 to focus on the putative helix-turn-helix domain. Choose '''Favourites &rarr; Sequence''', select the residues for one model, then '''Select &rarr; Invert (selected model)''' and '''Actions &rarr; Atoms/Bonds &rarr; hide'''.
 +
# Study the result. A model of the HTH domain of YFO Mbp1.
 +
}}
  
# ===================================================
+
&nbsp;<br>
#    plot the information
+
&nbsp;<br>
#    (c.f. Assignment 5, see there for explanations)
 
# ===================================================
 
  
IP <- (I-min(I))/(max(I) - min(I) + 0.0001)
+
==Coloring the model by energy ==
nCol <- 15
 
IP <- floor(IP * nCol) + 1
 
spect <- colorRampPalette(c("#DD0033", "#00BB66", "#3300DD"), bias=0.6)(nCol)
 
# lets set the information scores from single informations to grey. We 
 
# change the highest level of the spectrum to grey.
 
#spect[nCol] <- "#CCCCCC"
 
Icol <- vector()
 
for (i in 1:length(I)) {
 
Icol[i] <- spect[ IP[i] ]
 
}
 
 
plot(1,1, xlim=c(0, lAli), ylim=c(-0.5, 5) ,
 
    type="n", bty="n", xlab="position in alignment", ylab="Information (bits)")
 
  
# plot as rectangles: height is information and color is coded to information
+
SwissModel calculates energies for each residue of the model with a molecular mechanics forcefield. The SwissModel modeling summary page contains a plot of these energies as a function of sequence number like. The values - between 0.0 and 1.0 - are stored in the PDB files B-factor field.
for (i in 1:lAli) {
 
  rect(i, 0, i+1, I[i], border=NA, col=Icol[i])
 
}
 
  
# As you can see, some of the columns reach very high values, but they are not
 
# contiguous in sequence. Are they contiguous in structure? We will find out in
 
# a later assignment, when we map computed values to structure.
 
  
</source>
+
{{task|1=
 +
# Back in Chimera, use the model panel to '''close''' the 1BM8 structure.
 +
# Choose '''Tools &rarr; Depiction &rarr; Render by attribute''' and select '''attributes of atoms''', '''Attribute: bfactor''', check '''color atoms''' and click '''OK'''.
 +
# Study the result: It seems that residues in the core of the protein have better energies than residues at the surface. Why could that be the case?
 
}}
 
}}
  
 +
Study the options of this window a bit, rendering by attribute is a powerful way to store and depict all manners of information with the molecule. Simply write a little R script that uses bio3D to replace the B-factor or occupancy values with any value you might be interested in: energies, conservation scores, information ... whatever. The rewnder this property to map it on the 3D structure of your molecule. If you want to experience with this a bit, you could apply the information scores from the previous assignment to your model, using a script that is easy to derive from the renumbering R-script you have studied above.
  
[[Image:InformationPlot.jpg|frame|none|Plot of information vs. sequence position produced by the '''R''' script above, for an alignment of Mbp1 ortholog APSES domains.]]
 
  
  
 
&nbsp;
 
  
 
== Links and resources ==
 
== Links and resources ==

Revision as of 14:35, 2 October 2015

Assignment for Week 7
Predictions

< Assignment 6 Assignment 8 >

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

 
 

Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.


Introduction

How could the search for ultimate truth have revealed so hideous and visceral-looking an object?
Max Perutz (on his first glimpse of the Hemoglobin structure)

   

Where is the hidden beauty in structure, and where, the "ultimate truth"? In the previous assignments we have studied sequence conservation in APSES family domains and we have discovered homologues in all fungal species. This is an ancient protein family that had already duplicated to several paralogues at the time the cenancestor of all fungi lived, more than 600,000,000 years ago, in the Vendian period of the Proterozoic era of Precambrian times.

In order to understand how specific residues in the sequence contribute to the putative function of the protein, and why and how they are conserved throughout evolution, we would need to study an explicit molecular model of an APSES domain protein, bound to its cognate DNA sequence. Explanations of a protein's observed properties and functions can't rely on the general fact that it binds DNA, we need to consider details in terms of specific residues and their spatial arrangement. In particular, it would be interesting to correlate the conservation patterns of key residues with their potential to make specific DNA binding interactions. Unfortunately, no APSES domain structures in complex with bound DNA has been solved up to now, and the experimental evidence we have considered in Assignment 2 (Taylor et al., 2000) is not sufficient to unambiguously define the details of how a DNA double helix might be bound. Moreover, at least two distinct modes of DNA binding are known for proteins of the winged-helix superfamily, of which the APSES domain is a member.

In this and the following assignment you will (1) construct a molecular model of the APSES domain from the Mbp1 orthologue in your assigned species, (2) identify similar structures of distantly related domains for which protein-DNA complexes are known, (3) assemble a hypothetical complex structure and (4) consider whether the available evidence allows you to distinguish between different modes of ligand binding.

For the following, please remember the following terminology:

Target
The protein that you are planning to model.
Template
The protein whose structure you are using as a guide to build the model.
Model
The structure that results from the modeling process. It has the Target sequence and is similar to the Template structure.

 

A brief overview article on the construction and use of homology models is linked to the resource section at the bottom of this page. That section also contains links to other sites and resources you might find useful or interesting.


 

Warm-up: a minimal change

Minimal changes to structure models can be done directly in Chimera. This illustrates the principle of full-scale modeling quite nicely. For an example, let us consider the residue A 42 of the 1BM8 structure. It is oriented twards the core of the protein, but most other Mbp1 orthologs have a larger amino acid in this position, V, or even I.

Task:

  1. Open 1BM8 in Chimera, hide the ribbons and show all atoms as a stick model.
  2. Color the protein white.
  3. Open the sequence window and select A 42. Color it red. Choose Actions → Set pivot. Then study how nicely the alanine sidechain fits into the cavity formed by its surrounding residues.
  4. To emphasize this better, hide the solvent molecules and select only the protein atoms. Display them as a sphere model to better appreciate the packing, i.e. the Van der Waals contacts we discussed in class. Use the Favorites → Side view panel to move the clipping plane and see a section through the protein. Study the packing, in particular, note that the additional methyl groups of a valine or isoleucine would not have enough space in the structure. Then restore the clipping planes so you can see the whole molecule.
  5. Lets simplify the view: choose Actions → Atoms/Bonds → backbone only → chain trace. Then select A 42 again in the sequence window and choose Actions → Atoms/Bonds → show.
  6. Add the surrounding residues: choose Select → Zone.... In the window, see that the box is checked that selects all atoms at a distance of less then 5Å to the current selection, and check the lower box to select the whole residue of any atom that matches the distance cutoff criterion. Click OK and choose Actions → Atoms/Bonds → show.
  7. Select A 42 again: left-click (control click) on any atom of the alanine to select the atom, then up-arrow to select the entire residue. Now let's mutate this residue to isoleucine.
  8. Choose Tools → Structure Editing → Rotamers and select ILE as the rotamer type. Click OK, a window will pop up that shows you the possible rotamers for isoleucine together with their database-derived probabilities; you can select them in the window and cycle through them with your arrow keys. But note that the probabilities are very different - and thus show you high-energy and low-energy rotamers to choose from. Therefore, unless you have compelling reasons to do otherwise, try to find the highest-probability rotamer that may fit. This is where your stereo viewing practice becomes important, if not essential. It is really, really hard to do this reasonably in a 2D image! It becomes quite obvious in 3D. Btw: I find such "quantitative" work - where the real distances are important - easier in orthographic than in perspective view (cf. the Camera panel).
  9. I find that the first rotamer is actually not such a bad fit. The CD atom comes close to the sidechains of I 25 and L 96. But we can assume that these are somewhat mobile and can accommodate a denser packing, because - as you can easily verify in your Jalview alignment - it is NOT the case that sequences that have I 42, have a smaller residue in position 25 and/or 96. So let's accept the most frequent ILE rotamer by selecting it in the rotamer window and clicking OK (while existing side chain(s): replace is selected).
  10. Done.

If you want to go over this in more detail, check the video tutorial on YouTube published by the NIAID bioinformatics group here. I would also encourage you to go over Part 2 of the video tutorial that discusses how to check for and resolve (by energy minimization) steric clashes. But do remember that it is not clear whether energy minimization will make your structure more correct in the sense of a smaller overall RMSD with the real, mutated protein.

What we have done here with one residue is exactly the way homology modeling works with entire sequences. Let's now build a homology model for YFO Mbp1.

Preparation

Target sequence

The first step of homology modelling is to determine which sequence to model. We have determined the putative orthologue with conserved function in YFO by reciprocal best match with saccharomyces cervisiae Mbp1. Your sequence was initially found with an APSES domain search in YFO and the alignments with the yeast sequence are straightforward for the most part.

There are two exceptions however: the alignment of ASPFU gene XP_754232 and the CAPCO gene XP_007722875 both are missing part of the domin's N-terminus. This is odd, because this may imply the APSES domain of these genes might not be properly folded. When such surprising results of alignement occurr, you must consider whether there could be an error in the published sequence, perhaps stemming from an erroneous gene model. This is not absolutely germane to this assignment, so I have placed the process into the collapsible section below - optional reading. However it may be useful for you to understand what the issue is here and how to address it.

Correcting the ASPFU Mbp1 gene model.


An alignment of APSES domain sequence shows the shortened N-terminus of the ASPFU and the CAPCOprotein, relative to SACCE and e.g. the closely related aspergillus nidulans, ASPNI:

APSES domains:
Mbp1_SACCE  QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAA...
Mbp1_ASPNI  NVYSATYSSVPVYEFKIGTDSVMRRRSDDWINATHILKVA...
Mbp1_ASPFU  ----------------------MRRRGDDWINATHILKVA...
Mbp1_CAPCO  ----------------------MRRRSDDWVNATHILKVA...

We analyse this for the ASPFU gene.

Working from the possibility that this may be a gene model error - e.g. a false translational start, a frameshift due to a sequencing error, or an erroneously modelled intron, we check whether the translation of the genomic sequence supports the presence of the expected amino acids. This is easily done running TBLASTN - BLASTing the protein query against the six reading frames of the ASPFU genome. We find the following:


Aspergillus fumigatus Af293 chromosome 3, whole genome shotgun sequence
Sequence ID: ref|NC_007196.1|Length: 4079167Number of Matches: 2
[...]
Query  10       VDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILE ...
                V VYEF     S+M+R+ DDW+NATHILK A F K  RTRILE ...
Sbjct  3691193  VPVYEFKVDGESVMRRRGDDWINATHILKVAGFDKPARTRILE ...

Indeed, there is sequence upstream of the gene's published translation start that matches well with our query! But where is the correct translation start? For that we need to look at the actual nucleotide sequence and translate it. Remember: BLAST is a local sequence alignment algorithm and it won't retrieve everything that matches to our query, just the best matching segment. ASPFU chromosome 3 is over 4 megabases large, so let us try to obtain only the region we are actually interested in: downstream of bases 3691193, lets say 3691100 (make sure this offset is divisible by three, to stay in the same reading frame) and upstream to, say, 3691372.

  1. At the NCBI genome project site we search for aspergillus fumigatus.
  2. At the aspergillus fumigatus genome project site we click on chromosome 3 to access the map viewer.
  3. Hovering over the Download/View sequence link shows us how an URL to access sequence data is structured:
http://www.ncbi.nlm.nih.gov/projects/mapview/seq_reg.cgi?taxid=746128&chr=3&from=1&to=4079167
We can easily adapt this to the sequence range we need ...
  1. ... and follow: http://www.ncbi.nlm.nih.gov/nuccore/NC_007196.1?from=3691003&to=3691243&report=fasta to yield:
>gi|71025130:3691003-3691243 Aspergillus fumigatus Af293 chromosome 3, whole genome shotgun sequence
ACGGTTTGCGGAGACGGGCATTATGGCGGCGGTGGATTTCTCAAAAATCTATTCTGCTACATACAGCAGC
GTAAGTCTCTTCTAATTGCGTATCTCTGTTTTCCCTACAGCCTCAAATTTTCCCCAATGCCTCTTTCCAT
CCATTTTGCCCCTTCCTTCGCCGCGAAGCCAATCTAACGCAGTTCAATAGGTTCCAGTTTACGAGTTCAA
AGTCGATGGCGAAAGTGTTATGCGCCGACGA


  1. To translate this, we navigate to any of the EMBOSS tools servers and use "remap" - we want to see the translation matched to the nucleotide sequence. We turn restriction sites off, translate all three forward frames and paste and manually align the SACCE Mbp1 sequence into the output to see what we expect and what we got. I have selected only the frame(s) that actually give a match, and I have pasted the homologous CAPCO and SACCE sequences (lower case) to demonstrate their similarity:
ASPFU     ACGGTTTGCGGAGACGGGCATTATGGCGGCGGTGGATTTCTCAAAAATCTATTCTGCTACATACAGCAGC
                                                                        
ASPFU      R  F  A  E  T  G  I  M  A  A  V  D  F  S  K  I  Y  S  A  T  Y  S  S  
CAPCO                           m  -  a  f  d  -  k  e  i  y  s  a  t  y  s  n  
SACCE                           m  s  -  -  -  -  n  q  i  y  s  a  r  y  s  g

         
ASPFU     GTAAGTCTCTTCTAATTGCGTATCTCTGTTTTCCCTACAGCCTCAAATTTTCCCCAATGCCTCTTTCCAT
 
ASPFU     V  S  L  F  *  ... 
CAPCO     v  a  -  -     ...
SACCE     v  d  -  -     ...
         
ASPFU     CCATTTTGCCCCTTCCTTCGCCGCGAAGCCAATCTAACGCAGTTCAATAGGTTCCAGTTTACGAGTTCAA
                                                             ...  V  Y  E  F  K 
CAPCO                                                        ...  v  y  e  l  k 
SACCE                                                        ...  v  y  e  f  i
         
ASPFU      AGTCGATGGCGAAAGTGTTATGCGCCGACGAGGCGATGATTGGATCAATGCTACACATATTCTTAAA

ASPFU       V  D  G  E  S  V  M  R  R  R  G  D  D  W  I  N  A  T  H  I  L  K ...
CAPCO       v  a  g  d  h  i  m  r  r  r  s  d  d  w  v  n  a  t  h  i  l  k ...
SACCE       h  s  t  g  s  i  m  k  r  k  k  d  d  w  v  n  a  t  h  i  l  k ...


This clearly shows us that there is N-terminal sequence that ought to be added to the gene model, upstream of the reported translational start of MRRR.... The sequences thus most likely begin as follows:
ASPFU   MAAVDFSKIYSATYSSVSLFVYEFKVDGE-----SVMRRRGDDWINATHILK...
CAPCO   ma-fd-keiysatysnva--vyelkvagd-----himrrrsddwvnathilk...
SACCE   ms----nqiysarysgvd--ysgvdvyefihstgsimkrkkddwvnathilk...

The fact that the truncated N-terminus appears in both closely related genes and species suggests that what we see here is a mis-annotated intron. The take-home lesson is: if your retrieved protein sequence does not conform to your expectations, it may be worthwhile to follow up with the actual nucleotide sequence.


 

Template choice and template sequence

The SWISS-MODEL server provides several different options for constructing homology models. The easiest option requires only a target sequence as input. In this mode the program will automatically choose suitable templates and create an input alignment. I would argue however that that is not the best way to use such a service: template choice and alignment both may be significantly influenced by biochemical reasoning, and an automated algorithm cannot make the necessary decisions. Should you use a structure of reduced resolution that however has a ligand bound? Should you move an indel from an active site to a loop region even though the sequence similarity score might be less? Questions like that may yield answers that are counter to the best choices an automated algorithm could make. But Swiss Model is flexible and allows us to upload an explicit alignment between target and template. Please note: the model you will produce is "easy" - the sequence similarity is high and there are no indels to consider, the automated mode would have done just as well. But the strategy we pursue here is suitable also for much more difficult problems. The automated strategy probably is not.

Template choice is the first step. Often more than one related structure can be found in the PDB. We have touched on principles of selecting template structures in the lectures; please refer to the template choice principles page on this Wiki where I have reviewed the principles and discussed more details and alternatives. One can either search the PDB itself through its Advanced Search interface; for example one can search for sequence similarity with a BLAST search, or search for structural similarity by accessing structures according to their CATH or SCOP classification. But the BLAST search is probably the method of choice: after all, the most important measure of the probability of success for homology modeling is sequence similarity.

In Assignment 3, you have defined the extent of the APSES domain in yeast Mbp1. In Assignment 6, you have used PSI-BLAST to search for APSES domains in YFO. In Assignment 7 you have confirmed by Reciprocal Best Match which of these APSES domain sequences is the closest related orthologue to yeast Mbp1. This sequence is the best candidate for having a conserved function similar to yeast Mbp1. Therefore, this sequence is the one you will model: it is called the target for the homology modeling procedure. In the same assignment you have also computed a multiple sequence alignment that includes the sequence of Mbp1 with YFO.

Defining a template means finding a PDB coordinate set that has sufficient sequence similarity to your target that you can build a model based on that template. In Assignment 2 you have used a keyword search at the PDB to find "Mbp1" structures - but some of these structures were not homologs: keyword searches are notoriously unreliable. To find suitable PDB structures, we will perform a BLAST search at the PDB instead.




Task:

  1. Retrieve your YFO Mbp1-like APSES domain sequence. You can find the domain boundaries for the yeast protein in the Mbp1 annotation reference page, and you can get the aligned sequence from your Jalview alignment, or simply recompute it with the needle program of the EMBOSS suite. This YFO sequence is your target sequence.
  2. Navigate to the PDB.
  3. Click on Advanced to enter the advanced search interface.
  4. Open the menu to Choose a Query Type:
  5. Find the Sequence features section and choose Sequence (BLAST...)
  6. Paste your target sequence into the Sequence field, select not to mask low-complexity regions and Submit Query. Since the E-value is set rather high by default, you will get a number of low-confidence hits as well as the actual homologs, these have very low E-values.

All hits that are homologs are potentially suitable templates, but some are more suitable than others. Consider how the coordinate sets differ and which features would make each more or less suitable for creating a homology model: you should consider ...

  • sequence similarity to your target
  • size of expected model (= length of alignment)
  • presence or absence of ligands
  • experimental method and quality of the data set

Sequence similarity is the most important, but we can have the PDB tabulate the other features concisely for this task.

  1. There is a menu to create Reports: - select customizable table.
  2. Select (at least) the following information items:
Structure Summary
  • Experimental Method
Sequence
  • Chain Length
Ligands
  • Ligand Name
Biological details
  • Macromolecule Name
refinement Details
  • Resolution
  • R Work
  • R free
  1. click: Create report.

Unfortunately you don't get the E-values into the report, and those should strongly influence your final decision. However in our case the sequences and therefore the E-values of the top three hits are all the same. Neither of the structures has a bound DNA ligand, but the experimental methods and structure quality are different. Two of the sequences have a longer chain-length ... but those are only disordered residues (otherwise these would be better suited templates; regrettably, you'd need to check that in the real world, there is no automatic tool to evaluate disorder and its effects on template choice). In my opinion that leaves pretty much only one unambiguous choice: 1BM8. In case you don't agree, please let me know.

Finally
Click on the 1BM8 ID to navigate to the structure page for the template and save the FASTA sequence to your computer. This is the template sequence.


 


Sequence numbering

 

It is not straightforward at all how to number sequence in such a project. A "natural" numbering starts with the start-codon of the full length protein and goes sequentially from there. However, this does not map exactly to other numbering schemes we have encountered. As you know the first residue of the APSES domain (as defined by CDD) is not Residue 1 of the Mbp1 protein. The first residue of the 1BM8 FASTA file (one of the related PDB structures) is the fourth residue of the Mbp1 protein. The first residue in the structure is GLN 3, therefore Q is the first residue in a FASTA sequence derived from the cordinate section of the PDB file (the ATOM records. In the 1MB1 structure, the original N-terminal amino acids are present in the molecule, therefore they are present in the FASTA file which starts with MSNQIY..., but they are disordered in the structure and no coordinates are present for M and S. A sequence derived explicitly from the coordinates is therefore different from the reported FASTA sequence, which is really bad because that is what the modeling program has to work with ... and so on. It can get complicated. You need to remember: a sequence number is not absolute, but assigned in a particular context and you need to be careful how to do this.

Fortunately, the numbering for the residues in the coordinate section of our target structure corresponds not to its FASTA sequence, but to the numbering of the gene. Otherwise we would need to renumber the sequence (e.g. by using the bio3D R package). If we would not do this, the sequence numbers in the model might not correspond to the sequence numbers of our target.


 


The input alignment

  The sequence alignment between target and template is the single most important factor that determines the quality of your model. No comparative modeling process will repair an incorrect alignment; it is useful to consider a homology model rather like a three-dimensional map of a sequence alignment rather than a structure in its own right. In a homology modeling project, typically the largest amount of time should be spent on preparing the best possible alignment. Even though automated servers like the SwissModel server will align sequences and select template structures for you, it would be unwise to use these just because they are convenient. You should take advantage of the much more sophisticated alignment methods available. Analysis of wrong models can't be expected to produce right results.

The best possible alignment is usually constructed from a multiple sequence alignment that includes at least the target and template sequence and other related sequences as well. The additional sequences are an important aid in identifying the correct placement of insertions and deletions. Your alignment should have been carefully reviewed by you and wherever required, manually adjusted to move insertions or deletions between target and template out of the secondary structure elements of the template structure.

In most of the Mbp1 orthologues, we do not observe indels in the APSES domain regions. Evolutionary pressure on the APSES domains has selected against indels in the more than 600 million years these sequences have evolved independently in their respective species. To obtain an alignment between the template sequence and the target sequence from your species, proceed as follows.


 

Task:
Choose on of the following options to align your target and template sequence.


In Jalview...
  • Load your Jalview project with aligned APSES domain sequences or recreate it from the Mbp1 orthologue sequences from the Mbp1 protein orthologs page that I prepared for Assignment 7. Include the sequence of your template protein and re-align.
  • Delete all sequence you no longer need, i.e. keep only the APSES domains of the target (from your species) and the template (from the PDB) and choose Edit → Remove empty columns. This is your input alignment.
  • Choose File→Output to textbox→FASTA to obtain the aligned sequences. They should both have exactly the same length, i.e. N- or C- termini have to be padded by hyphens if the original sequences had different length. Save the sequences in a text-file.


Using a different MSA program
  • Copy the FASTA formatted sequences of the Mbp1 proteins in the reference species from the Reference APSES domain page.
  • Access e.g. the MSA tools page at the EBI.
  • Paste the Mbp1 sequence set, your target sequence and the template sequence into the input form.
  • Run the alignment and save the output.


Using the EMBOSS explorer
  • Use the needle tool for the alignment ... but remember that pairwise alignments will only be suitable in case the alignment is absolutely unambiguous (such as here) . If there are any indels, an MSA will give much more reliable information.


By hand

APSES domains are strongly conserved and have few if any indels. You could also simply align by hand.

  • Copy the CLUSTAL formatted reference alignment of the Mbp1 proteins in the reference species from the Reference APSES domain page.
  • Open a new file in a text editor.
  • Paste the Mbp1 sequence set, your target sequence and the template sequence into the file.
  • Align by hand, replace all spaces with hyphens and save the output.


Whatever method you use: the result should be a two sequence alignment in multi-FASTA format, that was constructed from a number of supporting sequences and that contains your aligned target and template sequence. This is your input alignment for the homology modeling server. For a Schizosaccharomyces pombe model, which I am using as an example here, it looks like this:

>1BM8_A 
QIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRI
LEKEVLKETHEKVQGGFGKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDF
>Mbp1_SCHPO 2-100 NP_593032
AVHVAVYSGVEVYECFIKGVSVMRRRRDSWLNATQILKVADFDKPQRTRV
LERQVQIGAHEKVQGGYGKYQGTWVPFQRGVDLATKYKVDGIMSPILSL


 

Homology model

 


SwissModel

 

Access the Swissmodel server at http://swissmodel.expasy.org and click on Start Modelling. Then, under the Supported Inputs, click on Target-Template Alignment.

Task:

  • Paste your alignment for target and model into the form field. Click on the question mark next to "Supported Inputs" if you are not sure about the format. SwissModel will analyse the sequences and ask you to identify target and template. The YFO sequence is your target. The 1BM8 sequence is the template.
  • Click Validate Target Template Alignment and check that the returned alignment is correct.
  • Click Build Model to start the modeling process.
  • The resulting page returns information about the resulting model. Mouse over the Model 01, open the PDB file and save the coordinates to your computer. Read the information on what is being returned by the server (click on the question mark icon). Study the quality measures.
  • Also save:
    • The output page as pdf (for reference)
    • The modeling report (as pdf)

Model analysis

   

The PDB file

 

Task:
Open your model coordinates in a text-editor (make sure you view the PDB file in a fixed-width font (like "courier") so all the columns line up correctly) and consider the following questions:

  • What is the residue number of the first residue in the model? What should it be, based on the alignment? If the putative DNA binding region was reported to be residues 50-74 in the Mbp1 protein, which residues of your model correspond to that region?


R code: renumbering the model

As you have seen, SwissModel numbers the first residue "1" and does not keep the numbering of the template. We should renumber the model so we can compare the model and the template with the same residue numbers. Fortunately there is a very useful R package that will help us with that.

Task:

  1. Navigate to the bio3D home page. bio3d is not available for installation via CRAN, but needs to be installed from source. Instructions for the different platforms are here http://thegrantlab.org/bio3d/tutorials/installing-bio3d Follow the instructions and install bio3d for R on your platform.
  1. Explore and execute the following R script. I am assuming that your model is in your working directory, change paths and filenames as required.
# renumberPDB.R

# This is a simple renumbering script that uses the bio3D 
# package. We simply set the first residue number to what it
# should be and renumber all residues based on the first one.
# The script assumes your input PDBfile is in your working
# directory.

# To run this, you must have installed the bio3D R package; instructions
# are here: http://thegrantlab.org/bio3d/tutorials/installing-bio3d

setwd("~/my/working/directory")
PDBin      <- "YFO_model.pdb"
PDBout     <- "YFO_model_ren.pdb"

first <- 4  # residue number that the first residue should have
 
# ================================================
#    Read coordinate file
# ================================================
 
# read PDB file using bio3D function read.pdb()
library(bio3d)
pdb  <- read.pdb(PDBin) # read the PDB file into a list

pdb            # examine the information
pdb$atom[1,]   # get information for the first atom

# you can explore ?read.pdb and study the examples.

# ================================================
#    Change residue numbers
# ================================================


resNum <- as.numeric(pdb$atom[,"resno"])  # get residue numbers for all atoms
resNum <- resNum + (first - resNum[1])         # calculate offset
pdb$atom[,"resno"] <- resNum             # replace old numbers with new
pdb$atom[1,]                                   # check result


# ================================================
#    Write output to file
# ================================================

write.pdb(pdb=pdb,file=PDBout)

# Done. Open the PDB file you have written in a text editor and confirm
# that this has worked.


 

First visualization

 

Since a homology model inherits its structural details from the template, your model of the YFO sequence should look very similar to the original 1BM8 structure.

Task:

  1. Start Chimera and load the model coordinates that you have just renumbered.
  2. From the PDB, also load the template structure. (Use File → Fetch by ID ...)
  3. In the FavouritesModel Panel window you can switch between the two molecules.
  4. Hide the ribbon and choose backbone only → full. You will note that the backbone of the two structures is virtually identical.
  5. Next, choose Actions → Atoms/Bonds → show to display display the two molecules in a stick style and note how the sidechains have been modeled. Note especially how sidechain coordinates have been guessed, where the template had shorter sidechains than the target. It may be more clear if you hide H-atoms: Select → Chemistry → Element → H and Actions → Atoms/Bonds → hide
  6. Display only residue 50 to 74 to focus on the putative helix-turn-helix domain. Choose Favourites → Sequence, select the residues for one model, then Select → Invert (selected model) and Actions → Atoms/Bonds → hide.
  7. Study the result. A model of the HTH domain of YFO Mbp1.

 
 

Coloring the model by energy

SwissModel calculates energies for each residue of the model with a molecular mechanics forcefield. The SwissModel modeling summary page contains a plot of these energies as a function of sequence number like. The values - between 0.0 and 1.0 - are stored in the PDB files B-factor field.


Task:

  1. Back in Chimera, use the model panel to close the 1BM8 structure.
  2. Choose Tools → Depiction → Render by attribute and select attributes of atoms, Attribute: bfactor, check color atoms and click OK.
  3. Study the result: It seems that residues in the core of the protein have better energies than residues at the surface. Why could that be the case?

Study the options of this window a bit, rendering by attribute is a powerful way to store and depict all manners of information with the molecule. Simply write a little R script that uses bio3D to replace the B-factor or occupancy values with any value you might be interested in: energies, conservation scores, information ... whatever. The rewnder this property to map it on the 3D structure of your molecule. If you want to experience with this a bit, you could apply the information scores from the previous assignment to your model, using a script that is easy to derive from the renumbering R-script you have studied above.



Links and resources

Altenhoff & Dessimoz (2012) Inferring orthology and paralogy. Methods Mol Biol 855:259-79. (pmid: 22407712)

PubMed ] [ DOI ] The distinction between orthologs and paralogs, genes that started diverging by speciation versus duplication, is relevant in a wide range of contexts, most notably phylogenetic tree inference and protein function annotation. In this chapter, we provide an overview of the methods used to infer orthology and paralogy. We survey both graph-based approaches (and their various grouping strategies) and tree-based approaches, which solve the more general problem of gene/species tree reconciliation. We discuss conceptual differences among the various orthology inference methods and databases, and examine the difficult issue of verifying and benchmarking orthology predictions. Finally, we review typical applications of orthologous genes, groups, and reconciled trees and conclude with thoughts on future methodological developments.



 


Footnotes and references


 

Ask, if things don't work for you!

If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.



< Assignment 6 Assignment 8 >