Difference between revisions of "BIO Assignment 3 2011"

From "A B C"
Jump to navigation Jump to search
 
(36 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
<!-- {{Template:Inactive}} -->
 +
{{Template:Active}}
 +
 +
&nbsp;<br>
 +
 
__TOC__
 
__TOC__
&nbsp;
+
 
&nbsp;
+
&nbsp;<br>
  
 
<div style="padding: 5px; background: #A6AFD0;  border:solid 1px #AAAAAA; font-size:200%;font-weight:bold;">
 
<div style="padding: 5px; background: #A6AFD0;  border:solid 1px #AAAAAA; font-size:200%;font-weight:bold;">
Assignment 3 - Multiple Sequence Alignment
+
Assignment 3 (last: 2011) - Multiple Sequence Alignment
 
</div>
 
</div>
  
<!-- Please note: This assignment is currently inactive. Unannounced changes may be made at any time.
+
&nbsp;<br>
&nbsp;-->
+
 
 +
{{Template:Preparation|
 +
care=Be sure you have understood all parts of the assignment and cover all questions in your answers! Sadly, we always get assignments back in which people have simply overlooked crucial questions. Sadly, we always get assignments back in which people have not described procedural details. If you did not notice that the above were two different sentences, you are still not reading carefully enough.|
 +
num=3|
 +
ord=third|
 +
due = Monday, November 21. at 12:00}}
 +
 
 +
;Your documentation for the procedures you follow in this assignment will be worth 1 mark.
  
'''Please note: This assignment is currently active. All changes will be announced on the course mailing list.'''
 
  
&nbsp;
+
&nbsp;<br>
  
 
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
 
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
 
Introduction
 
Introduction
 +
 +
&nbsp;<br>
 +
 +
;Take care of things, and they will take care of you.
 +
:''Shunryu Suzuki''
 
</div>
 
</div>
  
A carefully done multiple sequence alignment (MSA) is a cornerstone for the annotation of a gene or protein. MSAs combine the information from several related proteins, allowing us to study their essential, shared and conserved properties. They are useful to resolve ambiguities in the precise placement of gaps and to ensure that columns in alignments actually contain amino acids that evolve in a similar context. Therefore we need MSAs as input for  
+
Much of what we know about a protein's physiological function is based on the '''conservation''' of that function as the species evolves. We assess conservation by comparison to related proteins. Conservation - or variability - is a consequence of '''selection under constraints''': the multiple effects on a species' fitness function that are induced through changes to the structural or functional features of a protein. Conservation patterns can thus provide evidence for many different questions: structural conservation among proteins with similar 3D-structures, functional conservation among homologues with comparable roles, peaks of sequence variability that indicate domain boundaries in multi-domain proteins, or amino acid propensities as predictors for protein engineering and design tasks.
*protein homology modeling,
+
 
 +
Measuring conservation requires alignment. Therefore a carefully done multiple sequence alignment (MSA) is a cornerstone for the annotation of the essential properties a gene or protein. MSAs are also useful to resolve ambiguities in the precise placement of indels and to ensure that columns in alignments actually contain amino acids that evolve in a similar context. MSAs serve as input for  
 +
* functional annotation;
 +
* protein homology modeling;
 
* phylogenetic analyses, and
 
* phylogenetic analyses, and
 
* sensitive homology searches in databases.
 
* sensitive homology searches in databases.
  
Furthermore conservation - or the lack of conservation - reflects the requirements of structural or functional features of our protein, emphasizes domain boundaries in multi-domain proteins and it can guide mutations for protein engineering and design.
 
  
Given the ubiquitous importance of this procedure, it is somewhat surprising that by far the most frequently used algorithm is CLUSTAL, which has been shown to be significantly inferior to more modern approaches for sequences with about 30% identity or less.
+
As a first step, we will explore the search and retrieval of fungal proteins that are orthologous to yeast Mbp1, and of the APSES domains they contain. Each student is being assigned one genome-sequenced fungus. Briefly, you will
 +
 
 +
# Collect sequence identifiers for all APSES domain transcription factors in [[Species list|your assigned species]];
 +
# Retrieve the sequences;
 +
# Perform a multiple sequence alignment with these, and a number of reference domains;
 +
# Edit the alignment and annotate.
 +
 
  
In this assignment we will explore MSAs of the Mbp1 proteins and the APSES domains they contain and discuss several approaches to alignment:
+
Multiple Sequence Alignment is not a solved, computational problem and a significant number of alignment tools exist, each with different strengths and objectives. It is remarkable that by far the most frequently used MSA algorithm is CLUSTAL, a procedure that was first published for the microprocessors of the late 1980s, surpassed in performance many times, and shown to be significantly inferior to more modern approaches when aligning sequences with 30% identity or less. In this assignment we will encounter various approaches to multiple alignment:
  
 
* A model-based approach (based on the [[Glossary#PSSM| PSSM]] that PSI-BLAST generates)
 
* A model-based approach (based on the [[Glossary#PSSM| PSSM]] that PSI-BLAST generates)
* A progressive alignment - the CLUSTAL algorithm
+
* Progressive alignments - CLUSTAL and MAFFT
* A consistency based alignment - T-Coffee resp. Probcons
+
* Consistency based alignment - T-Coffee and MUSCLE
 +
 
 +
 
 +
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
 +
==(1) Mbp1 homologues==
 +
</div>
 +
 
 +
 
 +
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 +
===(1.1) Retrieving sequences===
 +
</div>
 +
 
 +
 
 +
In [[Assignment 2]] you retrieved the protein sequences of ''saccharomyces cerevisiae'' [http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=6320147 '''Mbp1'''] and defined its APSES (KilA-N) domain. Let us now search for an orthologue of this sequence in ''[[Species list|Your Species]]''. More precisely, you should identify proteins that fulfill the '''Reciprocal Best Match''' criterion.
 +
 
 +
First, we need to '''define the sequence''' you will use to find Mbp1 homologues. Since Mbp1 contains the very widely distributed Ankyrin motifs, a BLAST search with full length sequences will pick up a large number of Ankyrin-repeat containing proteins that are otherwise unrelated to our query. We will instead search for homologues using only the APSES domain as a query. However, the Pfam definition of the APSES domain (or KilA-N family, as it is now called) does not cover the entire length of the domain that has been crystallized. Therefore, we will use the sequence of the crystallized protein instead of the Pfam alignment. One of the results of our analysis will be '''whether APSES domains in fungi all have the same length as the Mbp1 domain, or whether some are indeed much shorter, as sugested by the Pfam alignment.''' To remind you, here is the full sequence of the [http://www.pdb.org/pdb/explore/derivedData.do?structureId=1MB1 1MB1 structure] (Note that the C-terminal His<sub>6</sub> tag that has been added for purification is not part of the Mbp1 protein sequence.) ...
 +
 
 +
 
 +
>PDB:1MB1
 +
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPL
 +
NIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH
 +
 
 +
 
 +
... and, for comparison, this is the corresponding alignment with the Pfam KilA-N model obtained from a '''[http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi RPS-BLAST]''' search of the above sequence against the '''[http://www.ncbi.nlm.nih.gov/cdd/ CDD database]''':
 +
 
 +
 
 +
<span style="color:#700777;">                          10        20        30        40        50        60        70        80</span>
 +
<span style="color:#700777;">                  ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|</span>
 +
<b>1MB1</b>          <span style="color:#229922;"> 19 </span><span style="color:#2233cc;">IHSTGS</span><span style="color:#ff4466;">I</span><span style="color:#2233cc;">MK</span><span style="color:#ff4466;">R</span><span style="color:#2233cc;">K</span><span style="color:#ff4466;">KD</span><span style="color:#2233cc;">DWV</span><span style="color:#ff4466;">NAT</span><span style="color:#2233cc;">HIL</span><span style="color:#ff4466;">KAA</span><span style="color:#2233cc;">NFA</span><span style="color:#ff4466;">K</span><span style="color:#888888;">a</span><span style="color:#2233cc;">KRTRI</span><span style="color:#ff4466;">L</span><span style="color:#2233cc;">EK</span><span style="color:#ff4466;">E</span><span style="color:#2233cc;">VL</span><span style="color:#ff4466;">KE</span><span style="color:#2233cc;">TH</span><span style="color:#ff4466;">E</span><span style="color:#2233cc;">KVQ</span><span style="color:#888888;">----------------</span><span style="color:#ff4466;">G</span><span style="color:#2233cc;">GF</span><span style="color:#ff4466;">G</span><span style="color:#2233cc;">KY</span><span style="color:#ff4466;">QGT</span><span style="color:#2233cc;">W</span><span style="color:#ff4466;">V</span><span style="color:#2233cc;">PLNI</span> <span style="color:#229922;">82</span>
 +
 +
Cdd:pfam04383  <span style="color:#229922;">  3 </span><span style="color:#2233cc;">YNDFEI</span><span style="color:#ff4466;">I</span><span style="color:#2233cc;">IR</span><span style="color:#ff4466;">R</span><span style="color:#2233cc;">D</span><span style="color:#ff4466;">KD</span><span style="color:#2233cc;">GYI</span><span style="color:#ff4466;">NAT</span><span style="color:#2233cc;">KLC</span><span style="color:#ff4466;">KAA</span><span style="color:#2233cc;">GAT</span><span style="color:#ff4466;">K</span><span style="color:#888888;">-</span><span style="color:#2233cc;">RFRNW</span><span style="color:#ff4466;">L</span><span style="color:#2233cc;">RL</span><span style="color:#ff4466;">E</span><span style="color:#2233cc;">ST</span><span style="color:#ff4466;">KE</span><span style="color:#2233cc;">LI</span><span style="color:#ff4466;">E</span><span style="color:#2233cc;">ELS</span><span style="color:#888888;">kennidvliievenkk</span><span style="color:#ff4466;">G</span><span style="color:#2233cc;">KN</span><span style="color:#ff4466;">G</span><span style="color:#2233cc;">RL</span><span style="color:#ff4466;">QGT</span><span style="color:#2233cc;">Y</span><span style="color:#ff4466;">V</span><span style="color:#2233cc;">HPDL</span> <span style="color:#229922;">81</span>
 +
 +
 +
<span style="color:#700777;">                          90</span>
 +
<span style="color:#700777;">                  ....*....|....*</span>
 +
<b>1MB1</b>          <span style="color:#229922;"> 83 </span><span style="color:#ff4466;">A</span><span style="color:#2233cc;">KQL</span><span style="color:#ff4466;">A</span><span style="color:#888888;">----</span><span style="color:#2233cc;">EK</span><span style="color:#ff4466;">F</span><span style="color:#2233cc;">SVY</span> <span style="color:#229922;">93</span>
 +
 +
Cdd:pfam04383  <span style="color:#229922;"> 82 </span><span style="color:#ff4466;">A</span><span style="color:#2233cc;">LAI</span><span style="color:#ff4466;">A</span><span style="color:#888888;">swis</span><span style="color:#2233cc;">PE</span><span style="color:#ff4466;">F</span><span style="color:#2233cc;">ALK</span> <span style="color:#229922;">96</span>
 +
 
 +
 
 +
As you can see, the Pfam alignment is 18 amino acids shorter at the N-terminus and 31 amino acids shorter at the C-terminus.
 +
 
 +
 
 +
;Find APSES domain proteins in your species:
 +
 
 +
<div style="padding: 5px; background: #EEEEEE;">
 +
#Access the [[Species list|species list]] and identify the species that has been assigned to you.
 +
#Navigate to the [http://www.ncbi.nlm.nih.gov '''NCBI's main page'''].
 +
#In the left-hand menu of links, follow the link to [http://www.ncbi.nlm.nih.gov/guide/genomes-maps/ '''Genomes &amp; Maps'''].
 +
#Under the '''Databases''' tab, follow the link to [http://www.ncbi.nlm.nih.gov/genome '''Genome'''].
 +
#In the '''Genome tools''' section of that page, follow the link to [http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?species=euk '''Genomic groups BLAST'''].
 +
#Click on link to the '''eukaryotic''' genomes tree, then on the link for the '''text table'''. This produces a BLAST interface to a list of species for which whole-genome sequences have been sequenced, annotated and entered into the various databases.
 +
#Paste the FASTA sequence of the structurally defined Mbp1 APSES domain (e.g. from [http://www.pdb.org/pdb/explore/derivedData.do?structureId=1MB1 1MB1]) into the search field (excluding the His-tag, of course), set the parameters correctly for a '''Protein''' search against '''Protein''' sequences using '''blastp'''. Then find your [[Species list|assigned species]] in the table and check the box next to its name. Remember to record the parameters for your search. I expect you to understand which parameters would be needed in order to make this search reproducible. Run the search.
 +
#On the next screen, check the box next to '''Format for: PSI-BLAST'''. Then click on '''View report''' to show the results of the first PSI-BLAST iteration.
 +
#Run subsequent iterations of PSI-BLAST simply by clicking on '''Go''' after checking the sequences that have been included.
 +
#Iterate the PSI-BLAST search until convergence (i.e. until no more '''new''' sequences are added); make sure to include only sequences for which the E-value is small (smaller than about 10e-03 should be safe). Sequences with borderline E-values that improve significantly in an iteration are probably homologues. Sequences with borderline E-values that do not improve much, or for which the E-value increases are probably not homologues.  If this step does not work for you or the results are not what you expect, please contact your TA right away.
 +
 
 +
*Note: Please spend a little time on each page to understand its contents. <small>Ask, if the page contains resources or features you don't understand. Think about what you are doing. If you simply click on the links I provide, you will miss the opportunity to understand how the resources fit into the workflow you are working on, and to be able to execute similar processes yourself. Questions on page contents can potentially appear on quizzes and exam.</small>
 +
</div>
 +
 
 +
 
 +
Familiarize yourself with the '''output form''' you obtain, this is by far the most frequently used bioinformatics result page. You may want to refer to the [http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new_view.html NCBI explanation].
 +
 
 +
Here is a list of things to look for, all of which I expect you to know and understand. (However you do not need to comment on these points in your submission.)
 +
 
 +
;On the alignment image:
 +
*What do the different colored bars mean?
 +
*What is the information you get when you "mouse-over" a colored bar on the alignment image.
 +
*What happens when you click on one of the bars?
 +
 
 +
;In the description list:
 +
*Where does the link next to an identifier take you?
 +
*Where does the link in the "score" column take you?
 +
*What does the icon at the end of each row mean? What other icons could appear there? <!-- cf. [http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new_view.html] -->
 +
 
 +
;In the alignment section:
 +
*What do the alignment metrics mean:
 +
**Score?
 +
**Expect (E-value)?
 +
**Identities?
 +
**Positives?
 +
**Gaps?
 +
*What is the alignment length?
 +
*Which sequence is labeled '''Query''' and which one is labelled '''Sbjct'''?
 +
 
 +
 
 +
;Next
 +
:retrieve the sequences that have E-values low enough to make you conclude they contain APSES domain homologues.
 +
 
 +
<div style="padding: 5px; background: #EEEEEE;">
 +
 
 +
#Review the sequences you have found: they should all be significantly similar to the query profile. In some of the assigned species you will find one hit for each distinct sequence in the genome, in others, you will find several versions of essentially the same gene (e.g. refseq and other accession numbers).
 +
#Explore the relationship between the hits by clicking on '''select all sequences''', then choosing '''Distance tree of results''' at the top or bottom of your search results to visualize a tree representation of similarity. Highly similar sequences will be collapsed into the same node in the distance tree; you can expand those nodes to list all the node's members.
 +
#Identify '''one''' representative for each distinct protein you have found. If possible, use proteins with refseq identifiers. Avoid duplicates or nearly identical variants. If there are length differences, use the longer version (shorter versions may contain only partial sequences). Click on the checkbox next to each protein you have identified.
 +
#Click on '''get selected sequences''' at the top or bottom of the page. Note and record the GIs for your sequences that are listed in the ''Search details'' box, you can use them to easily reproduce your results by pasting them into any Entrez search. Also note the URL that this has produced (in your browser's URL bar). As you see, you can retrieve a list of sequences from NCBI simply by adding a list of comma-separated GI numbers to the [http://www.ncbi.nlm.nih.gov/protein/ URL of the protein database].
 +
#Click on '''Display settings''' and choose '''FASTA (text)'''.
 +
 
 +
<small>If you want, for comparison, you can run a multiple alignment with an NCBI-developed MSA tool: '''COBALT'''. On the sequence list page, in the right-hand column, in the section '''Analyze these sequences''', click on '''Align sequences with COBALT'''. It is a convenient way to get a quick first look at an alignment of NCBI retrieved sequences.</small>
 +
</div>
 +
 
 +
You now have a collection of APSES domain-containing homologues in your organism. There are two more tasks we need to address before we can compute alignments and analyze them. (A) we need to rename our sequences, and (B) we need to define the boundaries of their APSES domains.
 +
 
 +
 
  
 +
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
  
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
+
===(1.2) Renaming Sequences===
Preparation, submission and due date
 
 
</div>
 
</div>
  
Please read carefully. Be sure you have understood all parts of the assignment and cover all questions in your answers! Sadly, we always get assignments back in which people have simply overlooked crucial questions. Sadly, we always get assignments back in which people have not described procedural details. If you did not notice that the above were two different sentences, you are still not reading carefully enough.
+
A phylogenetic tree or multiple alignment is not really informative if it that displays GI numbers or other abstract identifiers as labels of rows or nodes. The relationship between species is fundamental to the variation we observe and we need to make this relationship explicit.
 +
 
 +
Imagine that the rows in an MSA were completely unlabeled, or the nodes in the tree would be just circles: we would have a very hard time relating the computed relationships back to the biology they represent. Abstract identifiers like <tt>NP_010227</tt> are not much better.
 +
 
 +
Typically, the information that programs use to label sequences is taken from the FASTA header. This provides us with an easy way to make sure they display the information we need and that we can interpret. Typically such programs will use the first few (often ten) characters they find. We will therefore design short strings strings that identify potential gene family relationships as well as species.
 +
 
 +
 
 +
;Species codes
 +
 
 +
The scientific name of a species is formed according to Linnaean [http://en.wikipedia.org/wiki/Binomial_nomenclature binomial nomenclature] and Swissprot has for a long time condensed species names into mnemonic five-character codes, taking the first three from the [http://en.wikipedia.org/wiki/Genus genus name] and the last two from the [http://en.wikipedia.org/wiki/Specific_name specific name]. For example ''Saccharomyces cerevisiae'' is abbreviated as <tt>SACCE</tt> and ''Lachancea thermotolerans'' is <tt>LACTH</tt>. For the most part, this creates unique strings that are good mnemonic labels for the species. I have added these "codes" to the [[Species list]].
 +
 
 +
 
 +
;Gene families
 +
Most yeast genes have traditional names, like mbp1 or sok2. These names are convenient family labels since ''saccharomyces cerevisiae'' is one of the best studied [http://en.wikipedia.org/wiki/Model_organism model organisms]. Therefore, once we identify a protein family that includes a yeast gene, we can easily access expert knowledge in textbooks or manuscripts. Of course, such labels are arbitrary - whether we call a gene '''Mbp1''' or '''WXYZ''' makes no difference - as long as all genes that we presume to be family members carry the same label.  For higher eukaryotes, I would probably choose human gene names as a reference point, for bacteria I would choose ''E. coli''.
  
Prepare a Microsoft Word document with a title page that contains:
+
To define which gene belongs into which family, we can align all newly found genes with all yeast APSES domain homologues, to find out which ones they are most similar to. This creates common family labels.  We can use these as provisional family names for the encoded proteins, even though we may want to revise them once we have mapped out explicit phylogenetic trees.
*your full name
 
*your Student ID
 
*your e-mail address
 
*the organism name you have been [[Organism_list_2006|assigned]]
 
  
Follow the steps outlined below. You are encouraged to  write your answers in short answer form or point form, '''like you would document an analysis in a laboratory notebook'''. However, you must
 
*document what you have done,
 
*note what Web sites and tools you have used,
 
*paste important data sequences, alignments, information etc.
 
  
'''If you do not document the process of your work, we will deduct marks.'''  Try to be concise, not wordy! Use your judgement: are you giving us enough information so we could exactly reproduce what you have done? If not, we will deduct marks. Avoid RTF and unnecessary formating. Do not paste screendumps. Keep the size of your submission '''below 1.5 MB'''.
+
;Identifying APSES domains (general procedure).
 +
In order to identify the APSES domain boundaries, you can simply run a multiple sequence alignment of the structurally defined APSES domain sequence (e.g. taken from PDB-ID 1MB1) against all sequences you have found. The boundaries of the aligned APSES domain then define the domain boundaries in the aligned proteins.
  
Write your answers into separate paragraphs and give each its title. Save your document with a filename of:
 
<code>A3_family name.given name.doc</code>
 
<small>(for example my submission would be named: A3_steipe.boris.doc - and don't switch the order of your given name and family name please!)</small>
 
  
Finally e-mail the document to [mailto:boris.steipe@utoronto.ca boris.steipe@utoronto.ca] before the due date.
+
;Identifiying family relationships (in the same run)
 +
However, for efficiency, we can also determine '''family relationships''' in the same alignment that we use to define domain boundaries, if we simply include '''all''' yeast APSES domains in the MSA. Then we can judge similarity simply from examining the guide tree of the alignment and label the families accordingly. This has the added advantage that the domain boundaries are more securely defined, since we include more sequence information into the alignment.
  
Your document must not contain macros. Please turn off and/or remove all macros from your Word document; we will disable macros, since they pose a security risk.
+
;Proceed as follows.
  
With the number of students in the course, we have to economize on processing the assignments. '''Thus we will not accept assignments that are not prepared as described above.''' If you have technical difficulties, contact me.
+
<div style="padding: 5px; background: #EEEEEE;">
 +
#Open the [http://www.ebi.ac.uk/Tools/muscle/ Muscle MSA input page] at the EBI.
 +
#Access the [[APSES domains (yeast)|Yeast APSES domain collection]] I have prepared and copy the FASTA sequences. Paste them into the sequence field of the MUSCLE program input form.
 +
#Copy the FASTA sequenced of the full length APSES domain protein sequence collection from your PSI-BLAST search (above) and paste them into the MUSCLE input form as well.
 +
#Set the following parameters:
  
'''The due date for the assignment is Thursday, December 7. at 24:00 (last day of class). In case you need more time since the assignment was posted late, an extension is automatically granted to Friday, December 8. at 10:00 in the morning.'''
+
OUTPUT FORMAT: CLUSTALW2
 +
OUTPUT TREE: from second iteration
 +
OUTPUT ORDER: aligned
  
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
+
#Click on Submit.
Grading
 
 
</div>
 
</div>
  
Don't wait until the last day to find out there are problems! Assignments that are received past the due date will have one mark deducted at the first minute of every twelve hour period past the due date. Assignments received more than 5 days past the due date will not be assessed.
 
  
Marks are noted below in the section headings for of the tasks. A total of 10 marks will be awarded, if your assignment answers all of the questions. A total of 2 bonus marks (up to a maximum of 10 overall) can be awarded for particularily interesting findings, or insightful comments. A total of 2 marks can be subtracted for lack of form or for glaring errors. The marks you receive will  
+
The output should show the MSA. The overlap of the yeast APSES domains with your sequences defines the domain boundaries. Moreover, a tree has been calculated and you can view the tree to identify family relationships.
* count directly towards your final marks at the end of term, for BCH441 (undergraduates), or
+
 
* be divided by two for BCH1441 (graduates).
+
;Visualize the alignment tree and decide on names
 +
 
 +
<div style="padding: 5px; background: #EEEEEE;">
 +
Click on the link to the Guide tree. This is the so-called Newick tree format and there are a large number of online tree viewers to visualize such trees. The MUSCLE form will display one tree for you,  
 +
 
 +
<small>You could also navigate (for example) to the [http://www.proweb.org/treeviewer/ proWeb Tree viewer] and paste the tree data into the '''User-supplied Newick Tree''' input field. Choose any graphics format your browser can handle (JPEG is a pretty safe bet) and click on '''View tree'''.</small>
 +
 
 +
 
 +
#Interpret the tree to decide on the protein family names for your sequences:
 +
##If a yeast protein is grouped with exactly one of your proteins, your protein gets the same name.
 +
##If a yeast protein is grouped with more than one of your proteins, replace the number in the yeast protein with a, b, c ..., from most similar to least similar for your protein. For example: if one Aspergillus fumigatus protein is most similar to yeast Mbp1, you will give it the name MBP1_ASPFU. If two proteins are both most similar to yeast Sok2, you will name them SOKA_ASPFU and SOKB_ASPFU. Try to get it approximately right but remember that this is a process of estimation - we are not accurately measuring distances (yet).
 +
 
 +
That done, edit your FASTA headers and save your APSES domain sequence set. We will need them for the next assignment.
 +
 
 +
</div>
 +
 
  
&nbsp;
 
&nbsp;
 
  
 
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
==(1) Retrieve==
+
 
 +
==(2) Align and Annotate==
 +
</div>
 +
 
 +
&nbsp;<br>
 +
 
 +
 
 +
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 +
===(2.1) Review of domain annotations===
 +
</div>
 +
 
 +
APSES domains are relatively easy to identify and annotate but we have had problems with the ankyrin domains in Mbp1 homologues. Both CDD as well as SMART have identified such domains, but while the domain model was based on the same Pfam profile for both, and both annotated approximately the same regions, the details of the alignments and the extent of the predicted region was different.
 +
 
 +
[http://www.yeastgenome.org/cgi-bin/locus.fpl?locus=mbp1 Mbp1] forms heterodimeric complexes with a homologue, [http://www.yeastgenome.org/cgi-bin/locus.fpl?locus=swi6 Swi6]. Swi6 does not have an APSES domain, thus it does not bind DNA. But it is similar to Mbp1 in the region spanning the ankyrin domains and in 1999 [http://www.ncbi.nlm.nih.gov/pubmed/10048928 Foord ''et al.''] published its crystal structure ([http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1SW6 1SW6]). This structure is a good model for Ankyrin repeats in Mbp1. For details, please refer to the consolidated [[Mbp1 annotation|Mbp1 annotation page]] I have prepared.
 +
 
 +
In what follows, we will use the program JALVIEW - a Java based multiple sequence alignment editor to load and align sequences and to consider structural similarity between yeast Mbp1 and its closest homologue in your organism.
 +
 
 +
In this part of the assignment,
 +
 
 +
#You will load sequences that are most similar to Mbp1 into an MSA editor;
 +
#You will add sequences of ankyrin domain models;
 +
#You will perform a multiple sequence alignment;
 +
#You will try to improve the alignment manually;
 +
<!-- Finally you will consider if the Mbp1 APSES domains could extend beyond the section of homology with Swi6 -->
 +
 
 +
 
 +
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 +
 
 +
===(2.2) Jalview, loading sequences===
 +
</div>
 +
 
 +
Geoff Barton's lab in Dundee has developed an integrated MSA editor and sequence annotation workbench with a number of very useful functions. It is written in Java and should run on Mac, Linux and Windows platforms without modifications. We will use this tool for this assignment and explore its features as we go along.
 +
 
 +
<div style="padding: 5px; background: #EEEEEE;">
 +
#Navigate to the [http://www.jalview.org/ Jalview homepage] click on '''Download''', install Jalview on your computer and start it. A number of windows that showcase the program's abilities will load, you can close these.
 +
#Prepare homologous Mbp1 sequences for alignment:
 +
##Find the sequence in your assigned species that fulfills the Reciprocal Best Match crierion with yeast Mbp1.
 +
##Open the [[Mbp1 RBM reference sequences]] page.
 +
##Copy the FASTA sequences of the reference proteins, return to Jalview and select File &rarr; Input Alignment &rarr; from Textbox and paste the sequences into the textbox.
 +
##Also paste a FASTA sequence of your species' Mbp1 protein into the window.
 +
##Finally copy the sequences for ankyrin domain models (below) and paste them into the Jalview textbox as well. Paste two separate copies of the CD00204 consensus sequence and one copy of 1SW6.
 +
##When all the sequences are present, click on New Window. Jalview gives you all the sequences, but of course this is not yet an alignment.
 +
 
 +
</div>
 +
 
 +
;Ankyrin domain models
 +
>CD00204 ankyrin repeat consensus sequence from CDD
 +
NARDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTPLHLAAKNGHLEIVKLLL
 +
EKGADVNARDKDGNTPLHLAARNGNLDVVKLLLKHGADVNARDKDGRTPLHLAAKNGHL
 +
 
 +
>1SW6 from PDB - unstructured loops replaced with xxxx
 +
GPIITFTHDLTSDFLSSPLKIMKALPSPVVNDNEQKMKLEAFLQRLLFxxxxSFDSLLQE
 +
VNDAFPNTQLNLNIPVDEHGNTPLHWLTSIANLELVKHLVKHGSNRLYGDNMGESCLVKA
 +
VKSVNNYDSGTFEALLDYLYPCLILEDSMNRTILHHIIITSGMTGCSAAAKYYLDILMGW
 +
IVKKQNRPIQSGxxxxDSILENLDLKWIIANMLNAQDSNGDTCLNIAARLGNISIVDALL
 +
DYGADPFIANKSGLRPVDFGAG
 +
 
 +
 
 +
 
 +
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 +
 
 +
===(2.3) Computing alignments===
 
</div>
 
</div>
&nbsp;
 
&nbsp;
 
  
In [[Assignment 2]] you retrieved the ''saccharomyces cerevisiae'' [http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=6320147 '''Mbp1'''] protein sequence. Our first task is to compile a multi-FASTA file for all Mbp1 orthologues. First we need to define which sequences we are talking about. Then we need to retrieve them from the database.
+
Sequence alignments can be calculated directly from Jalview.
 +
 
 +
<div style="padding: 5px; background: #EEEEEE;">
 +
#In Jalview, select '''Web Service &rarr; Alignment &rarr; MAFFT Multiple Protein Sequence Alignment'''. The alignment is calculated in a few minutes and displayed in a new window.
 +
#Choose '''Colour &rarr; Hydrophobicity''' and '''&rarr; by Conservation'''. Then select '''Modify Conservation Threshold...''' and adjust the slider left or right to see which columns are highly conserved. You will notice that the Swi6 sequence that was supposed to align only to the ankyrin domains was in fact aligned to other parts of the sequence as well. This is one part of the MSA that we will have to correct manually and a common problem when aligning sequences of different lengths.
 +
#Other alignment algorithms are available and you may wish to explore whether the alignments differ significantly.
 +
</div>
  
&nbsp;
 
  
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
===(1.1)  Mbp1 orthologues (1 mark)===
+
===(2.4) Editing ankyrin domain alignments===
 
</div>
 
</div>
&nbsp;<br>
 
  
  
In your second assignments, you used BLAST to find the best matches to the yeast Mbp1 protein in your assigned organism's genome. Since there was some variation in the sequences you reported, I have generated a list ''de novo'' using the following procedure:
+
A '''good''' MSA comprises only columns of residues that play similar roles in the proteins' mechanism and/or that evolve in a comparable structural context. Since it is a result of biological selection and conservation, it has relatively few indels and the indels it has are usually not placed into elements of secondary structure or into functional motifs. The contiguous features annotated for Mbp1 are expected to be left intact by a good alignment.
 +
 
 +
A '''poor''' MSA has many errors in its columns; these contain residues that actually have different functions or structural roles, even though they may look similar according to a (pairwise!) scoring matrix. A poor MSA also may have introduced indels in biologically irrelevant positions, to maximize spurious sequence similarities. Some of the features annotated for Mbp1 will be disrupted in a poor alignment and residues that are conserved may be placed into different columns.
 +
 
 +
Often errors or inconsistencies are easy to spot, and manually editing an MSA is not generally frowned upon, even though this is not a strictly objective procedure. The main goal of manual editing is to make an alignment biologically more plausible. Most comonly this means to mimize the number of rare evolutionary events that the alignment suggests and/or to emphasize conservation of known functional motifs. Here are some examples for what one might aim for in manually editing an alignment:
 +
 
 +
;Reduce number of indels
 +
From a Probcons alignment:
 +
0447_DEBHA    ILKTE-K<span style="color: rgb(255, 0, 0);">-</span>T<span style="color: rgb(255, 0, 0);">---</span>K--SVVK      ILKTE----KTK---SVVK
 +
9978_GIBZE    MLGLN<span style="color: rgb(255, 0, 0);">-</span>PGLKEIT--HSIT      MLGLNPGLKEIT---HSIT
 +
1513_CANAL    ILKTE-K<span style="color: rgb(255, 0, 0);">-</span>I<span style="color: rgb(255, 0, 0);">---</span>K--NVVK      ILKTE----KIK---NVVK
 +
6132_SCHPO    ELDDI-I<span style="color: rgb(255, 0, 0);">-</span>ESGDY--ENVD      ELDDI-IESGDY---ENVD
 +
1244_ASPFU    ----N<span style="color: rgb(255, 0, 0);">-</span>PGLREIC--HSIT  -&gt;  ----NPGLREIC---HSIT
 +
0925_USTMA    LVKTC<span style="color: rgb(255, 0, 0);">-</span>PALDPHI--TKLK      LVKTCPALDPHI---TKLK
 +
2599_ASPTE    VLDAN<span style="color: rgb(255, 0, 0);">-</span>PGLREIS--HSIT      VLDANPGLREIS---HSIT
 +
9773_DEBHA    LLESTPKQYHQHI--KRIR      LLESTPKQYHQHI--KRIR
 +
0918_CANAL    LLESTPKEYQQYI--KRIR      LLESTPKEYQQYI--KRIR
 +
 
 +
<small>Gaps marked in red were moved. The sequence similarity in the alignment does not change considerably, however the total number of indels in this excerpt is reduced to 13 from the original 22</small>
 +
 
 +
 
 +
;Move indels to more plausible position
 +
From a CLUSTAL alignment:
 +
4966_CANGL    MKHEKVQ------GGYGRFQ---GTW      MKHEKV<span style="color: rgb(0, 170, 0);">Q</span>------GGYGRFQ---GTW
 +
1513_CANAL    KIKNVVK------VGSMNLK---GVW      KIKNVV<span style="color: rgb(0, 170, 0);">K</span>------VGSMNLK---GVW
 +
6132_SCHPO    VDSKHP<span style="color: rgb(255, 0, 0);">-</span>----------<span style="color: rgb(255, 0, 0);">Q</span>ID---GVW  -&gt;  VDSKHP<span style="color: rgb(0, 170, 0);">Q</span>-----------ID---GVW
 +
1244_ASPFU    EICHSIT------GGALAAQ---GYW      EICHSI<span style="color: rgb(0, 170, 0);">T</span>------GGALAAQ---GYW
 +
 
 +
<small>The two characters marked in red were swapped. This does not change the number of indels but places the "Q" into a a column in which it is more highly conserved (green). Progressive alignments are especially prone to this type of error.</small>
 +
 
 +
;Conserve motifs
 +
From a CLUSTAL alignment:
 +
6166_SCHPO      --DKR<span style="color: rgb(255, 0, 0);">V</span>A---<span style="color: rgb(255, 0, 0);">G</span>LWVPP      --DKR<span style="color: rgb(0, 255, 0);">V</span>A--<span style="color: rgb(0, 255, 0);">G</span>-LWVPP
 +
XBP1_SACCE      GGYIK<span style="color: rgb(255, 0, 0);">I</span>Q---<span style="color: rgb(255, 0, 0);">G</span>TWLPM      GGYIK<span style="color: rgb(0, 255, 0);">I</span>Q--<span style="color: rgb(0, 255, 0);">G</span>-TWLPM
 +
6355_ASPTE      --DE<span style="color: rgb(255, 0, 0);">I</span>A<span style="color: rgb(255, 0, 0);">G</span>---NVWISP  -&gt;  ---DE<span style="color: rgb(0, 255, 0);">I</span>A--<span style="color: rgb(0, 255, 0);">G</span>NVWISP
 +
5262_KLULA      GGYIK<span style="color: rgb(255, 0, 0);">I</span>Q---<span style="color: rgb(255, 0, 0);">G</span>TWLPY      GGYIK<span style="color: rgb(0, 255, 0);">I</span>Q--<span style="color: rgb(0, 255, 0);">G</span>-TWLPY
 +
 
 +
<small>The first of the two residues marked in red is a conserved, solvent exposed hydrophobic residue that may mediate domain interactions. The second residue is the conserved glycine in a beta turn that cannot be mutated without structural disruption. Changing the position of a gap and insertion in one sequence improves the conservation of both motifs.</small>
 +
 
 +
 
 +
The Ankyrin domains are quite highly diverged, the boundaries not well defined and not even CDD, SMART and SAS agree on the precise annotations. We expect there to be alignment errors in this region. Nevertheless we would hope that a good alignment would recognize homology in that region and that ideally the required <i>indels</i> would be placed between the secondary structure elements, not in their middle. But judging from the sequence alignment alone, we cannot judge where the secondary structure elements ought to be. You should therefore add the following "sequence" to the alignment; it contains exactly as many characters as the Swi6 sequence above and annotates the secondary structure elements. I have derived it from the 1SW6 structure
 +
 
 +
>SecStruc 1SW6 E: strand  t: turn  H: helix  _: irregular
 +
_EEE__tt___ttt______EE_____t___HHHHHHHHHHHHHHHH_xxxx_HHHHHHH
 +
HHHH_t_____t_____t____HHHHHHH__tHHHHHHHHH____t___tt____HHHHH
 +
HH__HHHH___HHHHHHHHHHHHHEE_t____HHHHHHHHH__t__HHHHHHHHHHHHHH
 +
HHHHHH__EEE_xxxx_HHHHHt_HHHHHHH______t____HHHHHHHH__HHHHHHHH
 +
H____t____t____HHHH___
 +
 
 +
 
 +
To proceed:
 +
#You should manually align the Swi6 sequence with yeast Mbp1
 +
#You should bring the Secondary structure annotation into its correct alignment with Swi6
 +
#You should bring both CDD ankyrin profiles into the correct alignment with yeast Mbp1
 +
 
 +
Proceed along the following steps:
 +
 
 +
<div style="padding: 5px; background: #EEEEEE;">
 +
#Add the secondary structure annotation to the sequence alignment in Jalview. Copy, select File &rarr; Add sequences &rarr; from Textbox and paste the sequence.
 +
#Select Help &rarr; Documentation and read about Editing Alignments, Cursor Mode and Key strokes.
 +
#Click on the yeast Mbp1 sequence row to select the entire row. Then use the cursor key to move that sequence directly above the 1SW6 sequence. Select the row of 1SW6 and use shift/mouse to move the sequence elements and realign them with yeast Mbp1. Refer to the alignment given in the [[Mbp1_annotation|Mbp1 annotation page]].
 +
#Align the secondary structure elements with the 1SW6 sequence: Every character of 1SW6 should be matched with either E, t, H, or _. The result should be similar to the [[Mbp1_annotation|Mbp1 annotation page]]. If you need to insert gaps into all sequences in the alignment, simply drag your mouse over all row headers - movement of sequences is constrained to selected regions, the rest is locked into place to prevent inadvertent misalignments. Remember to save your project from time to time: File → save so you can reload a previous state if anything goes wrong and can't be fixed with Edit → Undo.
 +
#Finally align the two CD00204 consensus sequences to their correct positions (again, refer to the [[Mbp1_annotation|Mbp1 annotation page]]).
 +
#You can now consider the principles stated above and see if you can improve the alignment, for example by moving indels out of regions of secondary structure if that is possible without changing the character of the aligned columns significantly. Select blocks within which to work to leave the remaining alignment unchanged. So that this does not become tedious, you can restrict your editing to one Ankyrin repeat that is structurally defined in Swi6. You may want to open the 1SW6 structure in VMD to define the boundaries of one such repeat. You can copy and paste sections from Jalview into your assignment for documentation or export sections of the alignment to HTML (see the example below).
 +
</div>
 +
 
 +
 
 +
<div style="padding: 5px; background: #F0F4FA;  border:solid 1px #AAAAAA;">
 +
 
 +
===(2.4.1) Editing ankyrin domain alignments - Sample===
 +
</div>
 +
 
 +
This sample was created by
 +
 
 +
# Editing the alignments as described above;
 +
# Copying a block of aligned sequence;
 +
# Pasting it To New Alignment;
 +
# Colouring the residues by Hydrophobicity and setting the colour saturation according to Conservation;
 +
# Choosing File &rarr; Export Image &rarr; HTML and pasting the resulting HTML source into this Wikipage.
 +
 
 +
 
 +
<table border="1"><tr><td>
 +
<table border="0" cellpadding="0" cellspacing="0">
 +
 
 +
<tr><td colspan="6"></td>
 +
<td colspan="9">10<br>|</td><td></td>
 +
<td colspan="9">20<br>|</td><td></td>
 +
<td colspan="9">30<br>|</td><td></td>
 +
<td colspan="3"></td><td colspan="3">40<br>|</td>
 +
 
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_USTMA/341-368&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
 
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">E</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#d3c2ee">P</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#ccaddf">T</td>
 +
<td bgcolor="#ecc2d5">M</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
 
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1B_SCHCO/470-498&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
 
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#f7d8e0">F</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
 
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
 
 +
<td bgcolor="#b0adfa">N</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#fcc2c4">V</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
</tr>
 +
 
 +
<tr><td nowrap="nowrap">MBP1_ASHGO/465-494&nbsp;&nbsp;</td>
 +
<td>F</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#f4eef8">T</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#efc2d0">C</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
 
 +
<td bgcolor="#e6d8f0">S</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#d3c2ee">P</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e5adc6">M</td>
 +
 
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_CLALU/550-586&nbsp;&nbsp;</td>
 +
<td>G</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
 
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td>N</td>
 +
<td>D</td>
 +
<td>K</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>S</td>
 +
<td>K</td>
 +
<td>F</td>
 +
<td>L</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#edadbd">F</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
 
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#c6ade5">Y</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f9eef3">M</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
</tr>
 +
 
 +
<tr><td nowrap="nowrap">MBPA_COPCI/514-542&nbsp;&nbsp;</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fdd8da">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
 
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#ffadad">I</td>
 +
<td bgcolor="#b0adfa">N</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#fcc2c4">V</td>
 +
 
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_DEBHA/507-550&nbsp;&nbsp;</td>
 +
<td>I</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td>K</td>
 +
<td>K</td>
 +
 
 +
<td>L</td>
 +
<td>S</td>
 +
<td>L</td>
 +
<td>S</td>
 +
<td>D</td>
 +
<td>K</td>
 +
<td>K</td>
 +
<td>E</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
 
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>A</td>
 +
<td>K</td>
 +
<td>F</td>
 +
<td>I</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
 
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#edadbd">F</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
 
 +
<td bgcolor="#fbadaf">V</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#c6ade5">Y</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1A_SCHCO/388-415&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fdd8da">V</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">E</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
 
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#ccaddf">T</td>
 +
<td bgcolor="#ecc2d5">M</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#efc2d0">C</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
 
 +
<td bgcolor="#f4eef8">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_AJECA/374-403&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#f9eef3">M</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#e6d8f0">S</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
 
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
 
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">K</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#faeef2">C</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PARBR/380-409&nbsp;&nbsp;</td>
 +
<td>I</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
 
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
 
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#e6d8f0">S</td>
 +
 
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
 
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">K</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#faeef2">C</td>
 +
 
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_NEOFI/363-392&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#faeef2">C</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#e6d8f0">S</td>
 +
<td bgcolor="#faeef2">C</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
 
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#fcc2c4">V</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
 
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_ASPNI/365-394&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
 
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#e6d8f0">S</td>
 +
<td bgcolor="#faeef2">C</td>
 +
 
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#fbadaf">V</td>
 +
 
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#fcc2c4">V</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
</tr>
 +
 
 +
<tr><td nowrap="nowrap">MBP1_UNCRE/377-406&nbsp;&nbsp;</td>
 +
<td>M</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f2d8e5">A</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
 
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">K</td>
 +
 
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#faeef2">C</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PENCH/439-468&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#faeef2">C</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f9eef3">M</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#e6d8f0">S</td>
 +
<td bgcolor="#faeef2">C</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">Q</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#fbadaf">V</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
 
 +
<td bgcolor="#fcc2c4">V</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
</tr>
 +
 
 +
<tr><td nowrap="nowrap">MBPA_TRIVE/407-436&nbsp;&nbsp;</td>
 +
 
 +
<td>V</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
#Retrieved the Mbp1 protein sequence by searching [http://www.ncbi.nlm.nih.gov/ Entrez] for  <code>Mbp1 AND "saccharomyces cerevisiae"[organism]</code>
+
<td>-</td>
#Clicked on the ''RefSeq tab'' to find the RefSeq ID "<code>[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=Protein&list_uids=6320147&dopt=GenPept NP_010227]</code>"
+
<td bgcolor="#e6d8f0">S</td>
#Accessed the [http://www.ncbi.nlm.nih.gov/blast '''BLAST'''] form for protein/protein BLAST and pasted the RefSeq ID into the ''query field''. Chose ''refseq'' as the database to search in, from the ''drop-down menu''. Kept default parameters but turned ''Filter'' off. Chose Fungi as an ENTREZ query limit in the ''Options'' section.
+
<td bgcolor="#f4eef8">S</td>
#On the results page, checked the checkbox next to the alignment '''of the most significant hit from each of the organisms''' we are studying.
+
<td bgcolor="#eeeefe">Q</td>
#Clicked on the "Get selected sequences" button. The results page lists the gene that is most similar to Mbp1 in each organism.
+
<td bgcolor="#c5c2fb">D</td>
#Verified that each of these sequences finds Mbp1 as the best match in the ''saccharomyces cerevisiae'' genome by clicking on each "[http://www.ncbi.nlm.nih.gov/sutils/blink.cgi?pid=68465419  BLink]" (<small>click for example</small>) in the retrieved list. Scrolled down the list to confirm that the '''top hit of a  ''saccharomyces cerevisiae'' protein''' is indeed Mbp1 (<code>NP_010227</code>).
+
<td bgcolor="#ebc2d5">A</td>
#Obtained UniProt accessions for all sequences, with a single query using the new UniProt [http://www.pir.uniprot.org/search/idmapping.shtml ID mapping service]. This service accepts a comma delimited list of RefSeq IDs and returns a list of Uniprot proteins.
+
<td bgcolor="#eeeefe">N</td>
#Assembled this information into the following table.
+
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
  
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">K</td>
 +
<td bgcolor="#c5c2fb">N</td>
  
<table style="border-left:1px solid #AAAAAA; border-bottom:1px solid #AAAAAA;" cellpadding="10" cellspacing="0">
+
<td bgcolor="#f4eef7">G</td>
<tr style="background: #BDC3DC;">
+
<td bgcolor="#faeef2">C</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><b><i>Organism</i></b></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>CODE</code></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><b>GI</b></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><b>Refseq</b></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><b>Uniprot Accession</b></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><b>Most similar yeast gene</b></td>
 
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PHANO/400-429&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#f4eef9">W</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f4eef8">T</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
 +
<td bgcolor="#c5c2fb">Q</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#ffadad">I</td>
 +
<td bgcolor="#e5adc6">M</td>
 +
<td bgcolor="#ffc2c2">I</td>
  
<tr style="background: #FFFFFF;">
+
<td bgcolor="#e4adc7">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Aspergillus fumigatus</i></td>
+
<td bgcolor="#e4adc7">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>ASPFU</code></td>
+
<td bgcolor="#adadff">R</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">70986922</td>
+
<td bgcolor="#c5c2fb">N</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_748947</td>
+
<td bgcolor="#f4eef7">G</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q4WGN2 </td>
+
<td bgcolor="#f9eef3">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
 
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">MBPA_SCLSC/294-313&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#ffadad">I</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">K</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeeff">K</td>
  
<tr style="background: #E9EBF3;">
+
<td bgcolor="#f9eef3">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Aspergillus nidulans</i></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>ASPNI</code></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">67525393</td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_660758</td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q5B8H6 </td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
 
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">MBPA_PYRIS/363-392&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#f4eef9">W</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">E</td>
  
<tr style="background: #FFFFFF;">
+
<td bgcolor="#fdeeee">V</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Aspergillus terreus</i></td>
+
<td>-</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>ASPTE</code></td>
+
<td>-</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">115391425</td>
+
<td>-</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_001213217</td>
+
<td bgcolor="#f4eef8">T</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q0CQJ5 </td>
+
<td bgcolor="#eeeeff">R</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">Q</td>
 +
 
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#ffadad">I</td>
 +
<td bgcolor="#e5adc6">M</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
 
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f9eef3">A</td>
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">MBP1_/361-390&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>G</td>
 +
<td>V</td>
 +
<td>L</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f7d8e0">F</td>
 +
<td bgcolor="#f3d8e4">M</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
 +
<td bgcolor="#f4eef8">T</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f9eef3">A</td>
  
<tr style="background: #E9EBF3;">
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Candida albicans</i></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>CANAL</code></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">68465419</td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_723071</td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q5ANP5 </td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
 
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">MBP1_ASPFL/328-364&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
 +
<td>I</td>
 +
<td>T</td>
 +
<td>L</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f7d8e0">F</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>S</td>
 +
 +
<td>E</td>
 +
<td>I</td>
 +
<td>V</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b0adfa">N</td>
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBPA_MAGOR/375-404&nbsp;&nbsp;</td>
 +
<td>Q</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#fbadaf">V</td>
  
<tr style="background: #FFFFFF;">
+
<td bgcolor="#b3adf7">H</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Candida glabrata</i></td>
+
<td bgcolor="#f9c2c7">L</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>CANGL</code></td>
+
<td bgcolor="#e4adc7">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">50286059</td>
+
<td bgcolor="#e4adc7">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_445458</td>
+
<td bgcolor="#b0adfa">Q</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q6FWD6 </td>
+
<td bgcolor="#c2c2ff">R</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
+
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f4eef8">S</td>
 
</tr>
 
</tr>
  
<tr style="background: #E9EBF3;">
+
<tr><td nowrap="nowrap">MBP1_CHAGL/361-390&nbsp;&nbsp;</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Cryptococcus neoformans</i></td>
+
<td>S</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>CRYNE</code></td>
+
<td bgcolor="#eeeeff">R</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">58266778</td>
+
<td bgcolor="#f4eef8">S</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_570545</td>
+
<td bgcolor="#f9eef3">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q5KHS0 </td>
+
<td bgcolor="#eeeefe">D</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
+
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
 
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#fbadaf">V</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e5adc6">M</td>
 +
 
 +
<td bgcolor="#c2c2ff">R</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#f9eef3">A</td>
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PODAN/372-401&nbsp;&nbsp;</td>
 +
<td>V</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b3adf7">H</td>
  
<tr style="background: #FFFFFF;">
+
<td bgcolor="#f9c2c7">L</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Debaryomyces hansenii</i></td>
+
<td bgcolor="#e4adc7">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>DEBHA</code></td>
+
<td bgcolor="#e4adc7">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">50420495</td>
+
<td bgcolor="#adadff">R</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_458784</td>
+
<td bgcolor="#fcc2c4">V</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q6BSN6 </td>
+
<td bgcolor="#eeeefe">N</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
+
<td bgcolor="#f9eef3">A</td>
 
</tr>
 
</tr>
  
<tr style="background: #E9EBF3;">
+
<tr><td nowrap="nowrap">MBP1_LACTH/458-487&nbsp;&nbsp;</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Eremothecium gossypii</i></td>
+
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>EREGO</code></td>
+
<td>F</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">45199118</td>
+
<td bgcolor="#f4eef8">S</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">NP_986147</td>
+
<td bgcolor="#f2eefa">P</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q752H3 </td>
+
<td bgcolor="#eeeeff">R</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
+
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">Q</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
 
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#fbadaf">V</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#b0adfa">Q</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
 
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">D</td>
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">MBP1_FILNE/433-460&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fdd8da">V</td>
 +
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">E</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#ccaddf">T</td>
 +
<td bgcolor="#ffc2c2">I</td>
  
<tr style="background: #FFFFFF;">
+
<td bgcolor="#e4adc7">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Gibberella zeae</i></td>
+
<td bgcolor="#e4adc7">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>GIBZE</code></td>
+
<td bgcolor="#adadff">R</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">46116756</td>
+
<td bgcolor="#ebc2d5">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_384396</td>
+
<td bgcolor="#eeeeff">R</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q4IEY8 </td>
+
<td bgcolor="#f4eef8">S</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
 
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">MBP1_KLULA/477-506&nbsp;&nbsp;</td>
 +
<td>F</td>
 +
 +
<td bgcolor="#f4eef8">T</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
 +
<td bgcolor="#d3c2ee">P</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#d5c2ec">Y</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#ccaddf">T</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeeff">K</td>
  
<tr style="background: #E9EBF3;">
+
<td bgcolor="#eeeefe">D</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Kluyveromyces lactis</i></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>KLULA</code></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">50308375</td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_454189</td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> P39679 </td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
 
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">MBP1_SCHST/468-501&nbsp;&nbsp;</td>
 +
<td>A</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
 +
<td bgcolor="#eeeeff">K</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
 +
<td>A</td>
 +
<td>K</td>
 +
<td>F</td>
 +
<td>I</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#edadbd">F</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#eaadc0">C</td>
  
<tr style="background: #FFFFFF;">
+
<td bgcolor="#caade0">S</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Magnaporthe grisea</i></td>
+
<td bgcolor="#b3adf7">H</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>MAGGR</code></td>
+
<td bgcolor="#c5c2fb">N</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">39964664</td>
+
<td bgcolor="#fdeeef">L</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_365024</td>
+
<td bgcolor="#eeeefe">N</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">ACC</td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1*</td>
 
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">MBP1_SACCE/496-525&nbsp;&nbsp;</td>
 +
<td>F</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
 +
<td bgcolor="#f4eef8">T</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c2c2ff">K</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#caade0">S</td>
 +
<td bgcolor="#adadff">K</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">D</td>
  
<tr style="background: #E9EBF3;">
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Neurospora crassa</i></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>NEUCR</code></td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">85109541</td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_962967</td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q7SBG9 </td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
 
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">CD00204/1-19&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#d8d8ff">R</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#d3c2ee">P</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
  
<tr style="background: #FFFFFF;">
+
<td bgcolor="#caade0">S</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Saccharomyces cerevisiae</i></td>
+
<td bgcolor="#c5c2fb">N</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>SACCE</code></td>
+
<td bgcolor="#f4eef7">G</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">6320147 </td>
+
<td bgcolor="#efeefd">H</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">NP_010227</td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> P39678 </td>
 
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
 
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">CD00204/99-118&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fdd8da">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c2c2ff">K</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#d8d8ff">R</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#d3c2ee">P</td>
 +
<td bgcolor="#f7adb3">L</td>
  
<tr style="background: #E9EBF3;">
+
<td bgcolor="#b3adf7">H</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Schizosaccharomyces pombe</i></td>
+
<td bgcolor="#f9c2c7">L</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>SCHPO</code></td>
+
<td bgcolor="#e4adc7">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">19113944</td>
+
<td bgcolor="#e4adc7">A</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">NP_593032</td>
+
<td bgcolor="#adadff">K</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> P41412 </td>
+
<td bgcolor="#c5c2fb">N</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
+
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#efeefd">H</td>
 
</tr>
 
</tr>
  
<tr style="background: #FFFFFF;">
+
<tr><td nowrap="nowrap">1SW6/203-232&nbsp;&nbsp;</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Ustilago maydis</i></td>
+
<td>L</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>USTMA</code></td>
+
<td bgcolor="#eeeefe">D</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">71024227</td>
+
<td bgcolor="#fdeeef">L</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_762343</td>
+
<td bgcolor="#eeeeff">K</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q4P117 </td>
+
<td bgcolor="#f4eef9">W</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
+
<td bgcolor="#ffeeee">I</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f3d8e4">M</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
 
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#efc2d0">C</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b0adfa">N</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
 
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">N</td>
 
</tr>
 
</tr>
 +
<tr><td nowrap="nowrap">SecStruc/203-232&nbsp;&nbsp;</td>
 +
<td>t</td>
 +
<td bgcolor="#f5eef6">_</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#efeefd">H</td>
 +
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td bgcolor="#ead8ed">_</td>
 +
<td bgcolor="#ead8ed">_</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#ead8ed">_</td>
 +
<td bgcolor="#f5eef6">_</td>
 +
<td bgcolor="#f5eef6">_</td>
 +
 +
<td bgcolor="#dec2e3">_</td>
 +
<td bgcolor="#d9c2e7">t</td>
 +
<td bgcolor="#f5eef6">_</td>
 +
<td bgcolor="#d2add8">_</td>
 +
<td bgcolor="#ead8ed">_</td>
 +
<td bgcolor="#dec2e3">_</td>
 +
<td bgcolor="#c7c2f9">H</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#b3adf7">H</td>
  
<tr style="background: #E9EBF3;">
+
<td bgcolor="#c7c2f9">H</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><i>Yarrowia lipolytica</i></td>
+
<td bgcolor="#b3adf7">H</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"><code>YARLI</code></td>
+
<td bgcolor="#b3adf7">H</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">50545439</td>
+
<td bgcolor="#b3adf7">H</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">XP_500257</td>
+
<td bgcolor="#c7c2f9">H</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;"> Q6CGF5 </td>
+
<td bgcolor="#f5eef6">_</td>
  <td style="border-right:1px solid #AAAAAA; border-top:1px solid #AAAAAA;">Mbp1</td>
+
<td bgcolor="#f5eef6">_</td>
 
</tr>
 
</tr>
 +
</table>
 +
</td></tr>
  
 
</table>
 
</table>
 +
;Aligned sequences before editing. The algorithm has placed gaps into the Swi6 helix <code>LKWIIAN</code> and the four-residue gaps before the block of well aligned sequence on the right are poorly supported.
 +
 +
 +
<table border="1"><tr><td>
 +
<table border="0" cellpadding="0" cellspacing="0">
 +
 +
<tr><td colspan="6"></td>
 +
<td colspan="9">10<br>|</td><td></td>
 +
<td colspan="9">20<br>|</td><td></td>
 +
 +
<td colspan="9">30<br>|</td><td></td>
 +
<td colspan="3"></td><td colspan="3">40<br>|</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_USTMA/341-368&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">E</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#c2abe8">P</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#bf99d7">T</td>
 +
<td bgcolor="#e5abc5">M</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1B_SCHCO/470-498&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f2bfcc">F</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">E</td>
 +
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#9d99f9">N</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_ASHGO/465-494&nbsp;&nbsp;</td>
 +
<td>F</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#e2d2ed">T</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
 +
<td bgcolor="#eaabbf">C</td>
 +
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#c2abe8">P</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
 +
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#df99b8">M</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_CLALU/550-586&nbsp;&nbsp;</td>
 +
<td>G</td>
 +
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td>K</td>
 +
 +
<td>K</td>
 +
<td>E</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>L</td>
 +
<td>I</td>
 +
<td>S</td>
 +
<td>K</td>
 +
<td bgcolor="#f2bfcc">F</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#e999ad">F</td>
 +
<td bgcolor="#a199f6">H</td>
 +
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#b899df">Y</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#f0d2df">M</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBPA_COPCI/514-542&nbsp;&nbsp;</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#fcbfc1">V</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">E</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#ff9999">I</td>
 +
 +
<td bgcolor="#9d99f9">N</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
</tr>
 +
 +
<tr><td nowrap="nowrap">MBP1_DEBHA/507-550&nbsp;&nbsp;</td>
 +
<td>I</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td>K</td>
 +
<td>K</td>
 +
<td>L</td>
 +
<td>S</td>
 +
<td>L</td>
 +
<td>S</td>
 +
<td>D</td>
 +
<td>K</td>
 +
 +
<td>K</td>
 +
<td>E</td>
 +
<td>L</td>
 +
<td>I</td>
 +
<td>A</td>
 +
<td>K</td>
 +
<td bgcolor="#f2bfcc">F</td>
 +
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
 +
<td bgcolor="#e999ad">F</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#fb999c">V</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#b899df">Y</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1A_SCHCO/388-415&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fcbfc1">V</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">E</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">E</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#bf99d7">T</td>
 +
<td bgcolor="#e5abc5">M</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#eaabbf">C</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
 +
<td bgcolor="#e2d2ee">S</td>
 +
</tr>
 +
 +
<tr><td nowrap="nowrap">MBP1_AJECA/374-403&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#f0d2df">M</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#caabe0">S</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">K</td>
 +
<td bgcolor="#afabfa">N</td>
 +
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PARBR/380-409&nbsp;&nbsp;</td>
 +
<td>I</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#caabe0">S</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">K</td>
 +
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_NEOFI/363-392&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#caabe0">S</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_ASPNI/365-394&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#caabe0">S</td>
 +
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#fb999c">V</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_UNCRE/377-406&nbsp;&nbsp;</td>
 +
<td>M</td>
 +
<td bgcolor="#dfd2f0">Y</td>
  
<small>* Note: This is a full-length homologue, however BLink shows that the C-terminal half is more similar to Swi6 than to Mbp1. Thus I would consider the ASPES domain orthologous, the remainder possibly paralogous.</small>
+
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td>-</td>
 +
<td>-</td>
  
&nbsp;<br>
+
<td>-</td>
Our second task is to obtain all FASTA sequences based on a list of identifiers and to save them in a format in which we can use them as input for other programs or services.
+
<td>-</td>
&nbsp;<br>
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#eabfd3">A</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
 
 +
<td bgcolor="#caabe0">S</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
 
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">K</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PENCH/439-468&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
 
 +
<td bgcolor="#f4d2dc">C</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#f0d2df">M</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
 
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">Q</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#fb999c">V</td>
 +
<td bgcolor="#f699a1">L</td>
 +
 
 +
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBPA_TRIVE/407-436&nbsp;&nbsp;</td>
 +
 
 +
<td>V</td>
 +
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
 
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
 
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">K</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
</tr>
 +
 
 +
<tr><td nowrap="nowrap">MBP1_PHANO/400-429&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#e2d2ef">W</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#e2d2ed">T</td>
 +
 
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
 
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">Q</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
 
 +
<td bgcolor="#ff9999">I</td>
 +
<td bgcolor="#df99b8">M</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f0d2e0">A</td>
  
 +
</tr>
 +
<tr><td nowrap="nowrap">MBPA_SCLSC/294-313&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
<td>-</td>
*From the information given here, briefly explain if the sequences listed above appear to be '''orthologues to yeast Mbp1''' (as evidenced through the "reciprocal best-match" criterium). Briefly explain if these sequences are necessarily also '''orthologues to each other'''.
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
*Review the resulting multi-FASTA file for the  [[All_Mbp1_proteins|'''all Mbp1 proteins (linked here)''']] and make sure you understand the procedure that led to it. Summarize the key steps of the procedure in point form. (Don't submit the entire file of course but make sure you understand (and could reproduce) the essential parts of the procedure). (1 mark)<br>
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
  
</div>
+
<td bgcolor="#c2bffc">D</td>
&nbsp;<br>
+
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#cbabdf">T</td>
  
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#ff9999">I</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">K</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d2d2ff">K</td>
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
<td bgcolor="#f0d2e0">A</td>
 +
</tr>
  
===(1.2)  Other APSES domain sequences (1 mark)===
+
<tr><td nowrap="nowrap">MBPA_PYRIS/363-392&nbsp;&nbsp;</td>
</div>
+
<td>T</td>
&nbsp;<br>
+
<td bgcolor="#e2d2ef">W</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
  
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#e2d2ed">T</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
Mbp1 orthologues are not the only proteins that contain APSES domains. In order to find all the rest, a PSI BLAST search was performed using the yeast Mbp1 APSES domain as query. From the list of hits, the APSES domains were extracted and summarized in a file.
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
<td bgcolor="#f9bfc4">L</td>
*Review the resulting file for the  [[All_APSES_domains|'''APSES domains''']] and make sure you understand the procedure that led to it. Summarize the key steps of the procedure in point form. (1 mark)
+
<td bgcolor="#c2bffc">N</td>
</div>
+
<td bgcolor="#f0d2e0">A</td>
&nbsp;<br>
+
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">Q</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#ff9999">I</td>
 +
<td bgcolor="#df99b8">M</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#afabfa">N</td>
  
===(1.3)  Orthologues (1 mark)===
+
<td bgcolor="#e4d2ec">G</td>
</div>
+
<td bgcolor="#f0d2e0">A</td>
&nbsp;<br>
+
</tr>
 +
<tr><td nowrap="nowrap">MBP1_/361-390&nbsp;&nbsp;</td>
 +
<td>N</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#e4d2ec">G</td>
  
For '''one''' of the the APSES domains in your organism, determine which yeast APSES domain (if any) it is orthologous to:
+
<td bgcolor="#fcd2d3">V</td>
# Choose at random one of the [[All_APSES_domains|APSES domains]] from your organism (but not one labelled with Mbp1) and copy it's [[All_APSES_domains|sequence]] into the input window of a [http://www.ncbi.nlm.nih.gov/blast/ BLAST] search.
+
<td bgcolor="#fbd2d5">L</td>
# Restrict the BLAST search to RefSeq sequences in ''saccharomyces cerevisiae''.
+
<td bgcolor="#e2d2ee">S</td>
# Run the search and determine the gene name of the best hit. (This is the best match.)
+
<td bgcolor="#d4d2fc">Q</td>
# Find the sequence of your best hit's APSES domain in the [[All_APSES_domains|sequence file]]. (Since the file contains all of them, your hit has to be in there, unless you found a non-RefSeq sequence).
+
<td>-</td>
# Copy that sequence (i.e. use the exact sequence from the file, not only the possibly truncated sequence from the BLAST results alignment) and perform the same kind of BLAST search, this time restricted to your organism instead of yeast. (This finds the reciprocal match.)
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
<td>-</td>
* Document the process and report briefly what you have found on the forward and on the reverse search. Does the gene you have chosen fulfill the ''reciprocal best match'' criterium for orthology with a yeast gene? (1 mark)
+
<td>-</td>
</div>
+
<td>-</td>
&nbsp;<br>
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
<td bgcolor="#f2bfcc">F</td>
 +
<td bgcolor="#ebbfd3">M</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#e2d2ed">T</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#c399d4">G</td>
  
==(2) Align==
+
<td bgcolor="#c2bffc">D</td>
</div>
+
<td bgcolor="#cbabdf">T</td>
&nbsp;
+
<td bgcolor="#e3abc6">A</td>
&nbsp;
+
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
  
Actually performing multiple sequence alignements used to involve downloading and installing software on your own computer. While most tools were available on the Web in principle, many groups have restricted the total number of sequences or the total number of characters to be aligned. The EBI however offers three of the most commonly used tools with few limitations and it was possible to run MSAs for all Mbp1 orthologues jointly.
+
<td bgcolor="#caabe0">S</td>
&nbsp;
+
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_ASPFL/328-364&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#ded2f2">P</td>
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
<td bgcolor="#e4d2ec">G</td>
===(2.1) Aligning the Mbp1 orthologues (1 mark)===
+
<td bgcolor="#d4d2fc">E</td>
</div>
+
<td bgcolor="#fcd2d3">V</td>
&nbsp;<br>
+
<td bgcolor="#ffd2d2">I</td>
+
<td bgcolor="#e2d2ed">T</td>
I used the following three servers:
+
<td>L</td>
 +
<td>G</td>
 +
<td>R</td>
 +
<td>F</td>
  
* [http://www.ebi.ac.uk/clustalw/ '''CLUSTAL-W''']  is a progressive alignment program, it is the most popular, most widely referenced MSA algorithm, it is reasonably fast and easy to use. But alignment errors that are made early can't get corrected and thus it is prone to misalignments on sets of sequences that have poor (<30% ID) local similarity. It is no longer considered state-of-the-art for carefully done alignments.
+
<td>I</td>
* [http://www.ebi.ac.uk/muscle/ '''MUSCLE'''] essentially starts out from a CLUSTAL like alignment as a draft, then identifies similar groups of sequences from which it calculates profiles, it then re-aligns the group to the profile. This procedure is iterated.
+
<td>S</td>
* [http://www.ebi.ac.uk/t-coffee/ '''T-Coffee'''] is one of my favourites - the tradeoffs appear to be especially well balanced. It too starts from a set of pairwise global alignments, like CLUSTAL, then additionally calculates sets of best local alignments. Global and local alignments are then combined to a similarity matrix and based on this matrix a guide-tree is constructed. This determines the order of steps in which sequences are added to the multiple alignment. A nice feature of T-Coffee is color coded output that allows you to quickly judge the local reliability of the alignment.
+
<td>E</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
We shall perform multiple sequence alignments for all 16 Mbp1 orthologues and compare the results. Since the results should look the same for all of you, it was possible to precompute the alignments to save some resources. Of course you are welcome to do this on your own, but it is not required. In fact, since we want to compare the alignments, I have also edited them: I have '''re-sorted the results so that the sequences appear in the same order in each case'''. Only CLUSTAL provides the option to order the output in the same way as the input, the other two programs order the output so that adjacent sequences are most similar. This is useful, because it emphasizes sequence features, but it makes it impossibly tedious to compare alignments.
+
<td>-</td>
 +
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#fcbfc1">V</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#d4d2fc">Q</td>
  
[[Image:A03_01.jpg|frame|none|Assignment 3, Figure 01<br>
+
<td bgcolor="#c399d4">G</td>
The guide tree computed by CLUSTAL-W for the 16 Mbp1 orthologue sequences. This tree is based on a matrix of pairwise distances. Sequences in the multiple alignments have been rearranged into the same order as they apppear in this diagram.
+
<td bgcolor="#c2bffc">D</td>
]]
+
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#9d99f9">N</td>
 +
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#c399d4">G</td>
  
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBPA_MAGOR/375-404&nbsp;&nbsp;</td>
 +
<td>Q</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">D</td>
  
The result files are linked here:
+
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
* [[All_Mbp1_CLUSTAL|Mbp1 proteins '''CLUSTAL''' aligned]]
+
<td>-</td>
* [[All_Mbp1_MUSCLE|Mbp1 proteins '''MUSCLE''' aligned]]
+
<td>-</td>
* [[All_Mbp1_T-COFFEE|Mbp1 proteins '''T-Coffee''' aligned (text version)]] and [[All_Mbp1_T-COFFEE_scores| (coloured according to scores)]]
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
Globally speaking, the alignments are quite similar. Let's first look at the common themes, before we discuss details of the results. The  [[All_Mbp1_T-COFFEE_scores| (score-colored T-COFFEE alignment)]] is well suited to look at general relationships between the sequences, since outliers can be easily identified.  For example, if one of the sequences would have a low-scoring domain, aligning poorly to the others of the group, it may be possible that that domain has been acquired in a separate evolutionary event and is not homologous to all others. We would notice an isolated stretch of poorly alignable sequence, i.e. it should be coloured wihth a low score in a set of otherwise high-scoring segments. Also a gene may have acquired significant lengths of N- or C-terminal extensions which may not be homologous (unless they are the reuslt of an internal duplication).
+
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">N</td>
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
<td bgcolor="#d4d2fc">D</td>
*Review the [[All_Mbp1_T-COFFEE_scores| (score-colored T-Coffee alignment)]]. Based on this alignment, how do you feel about our initial assertion that these proteins should be considered orthologous? (Answer briefly, but with reference to specific evidence in the alignment. Note that this is not about the general level of conservation, but about whether significant segments do not appear related/alignable at all.) (1 mark)
+
<td bgcolor="#c399d4">G</td>
</div>
+
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#fb999c">V</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#dd99b9">A</td>
  
&nbsp;
+
<td bgcolor="#dd99b9">A</td>
&nbsp;
+
<td bgcolor="#9d99f9">Q</td>
 +
<td bgcolor="#ababff">R</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_CHAGL/361-390&nbsp;&nbsp;</td>
 +
<td>S</td>
 +
<td bgcolor="#d2d2ff">R</td>
  
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
+
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td>-</td>
 +
<td>-</td>
  
==(3) Mbp1 orthologues: analysis of full length MSAs==
+
<td>-</td>
</div>
+
<td>-</td>
&nbsp;
+
<td>-</td>
&nbsp;
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
What do we mean by a ''good'' versus a ''poor'' multiple sequence alignment?
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
  
Let us first consider some of the features we have defined in the second assignment (and some structural features I have added). Here is an annotation of the yeast Mbp1 sequence. It was compiled with the following procedure.
+
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#fb999c">V</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#f7abb2">L</td>
  
# Performed [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi '''CDD'''] search with yeast Mbp1 protein sequence. This retrieves alignments of Mbp1 with the APSES and the ANKYRIN domains. These are profile based alignment and I would consider them more reliable than pairwise alignments.
+
<td bgcolor="#dd99b9">A</td>
# Performed  [http://smart.embl-heidelberg.de/ '''SMART'''] search with yeast Mbp1 protein sequence. This retrieved the APSES domain, annotated a number of low-complexity regions and a stretch of coiled coil.
+
<td bgcolor="#dd99b9">A</td>
# Performed a [http://www.ebi.ac.uk/thornton-srv/databases/sas/ '''SAS'''] search with yeast Mbp1 protein sequence. This retrieved pairwise alignments with the structures 1MB1 (APSES) and chain D of 1IKN (ankyrin domains of I<sub>kappa</sub>b), together with their respectve secondary structure annotations.
+
<td bgcolor="#df99b8">M</td>
# Copied GenPept sequence into Word-processor.
+
<td bgcolor="#ababff">R</td>
# Transferred annotations of low complexity and coiled-coil regions from SMART.
+
<td bgcolor="#d4d2fc">D</td>
# Transferred annotations of APSES seondary structure from SAS (this is a ''direct'' annotation, since the structure 1MB1 has the same sequence as the coressponding parts of the Mbp1 protein). The central helix of the binding region is slightly distorted and SAS annotates a break in the helix, this was bridged with lowercase "h" in the annotation.
+
<td bgcolor="#f0d2e0">A</td>
# Ankyrin domain annotation was not as straightforward. While CDD, SMART and SAS all annotate the same general regions, they disagree in details of the domain boundaries and in the precise alignment. Used the profile-based CDD alignment of 1IKN. Transferred annotations of secondary structure from SAS output for 1IKN to sequence (this is a ''transferred'' annotation, the original annotation was for 1IKN and we assume that it applies to Mbp1 as well).
+
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PODAN/372-401&nbsp;&nbsp;</td>
 +
<td>V</td>
  
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td>-</td>
  
MBP1_SACCE
+
<td>-</td>
Annotations based on
+
<td>-</td>
- CDD domain analysis,
+
<td>-</td>
- SAS structure annotation and
+
<td>-</td>
- literature data on binding region
+
<td>-</td>
+
<td>-</td>
Keys:
+
<td>-</td>
+
<td>-</td>
C  Coiled coil regions predicted by Coils2 program
+
<td>-</td>
x  Low complexity region
 
*  Proposed binding region
 
+  positively charged residues, oriented for possible DNA binding interactions
 
-   negatively charged residues, oriented for possible DNA binding interactions
 
 
E  beta strand
 
H  alpha helix
 
t  beta turn
 
 
 
                  10        20        30        40        50        60
 
          MSNQIYSARY SGVDVYEFIH STGSIMKRKK DDWVNATHIL KAANFAKAKR TRILEKEVLK
 
1MB1      ----EEEEEt t-EEEEEEEE t-EEEEEEtt ---EEHHHHH HH----HHHH HHHHhhhHHH
 
                                                                * *+**-+****
 
 
                  70        80        90        100        110        120
 
          ETHEKVQGGF GKYQGTWVPL NIAKQLAEKF SVYDQLKPLF DFTQTDGSAS PPPAPKHHHA
 
1MB1      ---EEE---- tt--EEEE-H HHHHHHHHH- --HHHHtt-        xxx xxxxxxxxxx
 
          **+*+***** ****
 
 
                  130        140        150        160        170        180
 
          SKVDRKKAIR SASTSAIMET KRNNKKAEEN QFQSSKILGN PTAAPRKRGR PVGSTRGSRR
 
          x                                                                         
 
 
 
                  190        200        210        220        230        240
 
          KLGVNLQRSQ SDMGFPRPAI PNSSISTTQL PSIRSTMGPQ SPTLGILEEE RHDSRQQQPQ
 
                                                                      xxxxx
 
 
 
                  250        260        270        280        290        300
 
          QNNSAQFKEI DLEDGLSSDV EPSQQLQQVF NQNTGFVPQQ QSSLIQTQQT ESMATSVSSS
 
          x                                        xx xxxxxxxxxx xxxxxxxxxx
 
 
 
                  310        320        330        340        350        360
 
          PSLPTSPGDF ADSNPFEERF PGGGTSPIIS MIPRYPVTSR PQTSDINDKV NKYLSKLVDY
 
          xxxxxxx
 
 
                  370        380        390        400        410        420
 
          FISNEMKSNK SLPQVLLHPP PHSAPYIDAP IDPELHTAFH WACSMGNLPI AEALYEAGTS
 
ANKYRIN                                -- t----HHHHH HH---HHHHH t-t--t-t--
 
 
 
                  430        440        450        460        470        480
 
          IRSTNSQGQT PLMRSSLFHN SYTRRTFPRI FQLLHETVFD IDSQSQTVIH HIVKRKSTTP
 
ANKYRIN  t----t---- HHHHHHHH-- -------HHH HHHHHH-ttH HH-----HHH HHHH--tH--
 
 
 
                  490        500        510        520        530        540
 
          SAVYYLDVVL SKIKDFSPQY RIELLLNTQD KNGDTALHIA SKNGDVVFFN TLVKMGALTT
 
ANKYRIN  HHHHHHHHH- ---------- -----t---- tt---HHHHH HH---HHHHH HHH--t-tt-
 
 
 
                  550        560        570        580        590        600
 
          ISNKEGLTAN EIMNQQYEQM MIQNGTNQHV NSSNTDLNIH VNTNNIETKN DVNSMVIMSP
 
ANKYRIN  ---t----HH HHHHHH--HH HHH-t--HHH -t----HHHH HHH--tHHHH HHHHHH---t
 
 
 
                  610        620        630        640        650        660
 
          VSPSDYITYP SQIATNISRN IPNVVNSMKQ MASIYNDLHE QHDNEIKSLQ KTLKSISKTK
 
ANKYRIN  ---tt----H HHHHHH---H HHHHHHH      CCCCCCCC CCCCCCCCCC CCCCC
 
 
 
                  670        680        690        700        710        720
 
          IQVSLKTLEV LKESSKDENG EAQTNDDFEI LSRLQEQNTK KLRKRLIRYK RLIKQKLEYR
 
                                                    x xxxxxxxxxx xxxxxxx
 
 
                  730        740        750        760        770        780
 
          QTVLLNKLIE DETQATTNNT VEKDNNTLER LELAQELTML QLQRKNKLSS LVKKFEDNAK
 
 
 
                  790        800        810        820        830
 
          IHKYRRIIRE GTEMNIEEVD SSLDVILQTL IANNNKNKGA EQIITISNAN SHA
 
 
 
A '''good''' MSA comprises only columns of residues that play similar roles in the proteins' mechanism and/or that evolve in a comparable structural context. Since it is a result of biological selection and conservation, it has relatively few indels and the indels it has are usually not placed into elements of secondary structure or into functional motifs.
 
  
A '''poor''' MSA has many errors in its columns in the sense that they contain residues that actuallly have diffferent functions or structural roles, even though they may look similar to a scoring matrix. It also may have introduced indels in biologically irrelevant positions, to maximize spurious sequence similarities.
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">Q</td>
  
In order to evaluate the MSAs for our proteins, we will analyze alignments relative to the features we have annotated above.
+
<td bgcolor="#afabfa">D</td>
&nbsp;
+
<td bgcolor="#afabfa">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#a199f6">H</td>
  
 +
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_LACTH/458-487&nbsp;&nbsp;</td>
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
<td>F</td>
===(3.1)  APSES domains (1 mark)===
+
<td bgcolor="#e2d2ee">S</td>
</div>
+
<td bgcolor="#ded2f2">P</td>
&nbsp;<br>
+
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">N</td>
  
The APSES domains in all of our Mbp1 orthologues are highly conserved and a program that would misalign such obvius similarity would not be worth the electrons it computes with.
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
<td>-</td>
*Consider the CLUSTAL, Muscle and T-Coffee alignments of the Mbp1 orthologues.  Orient yourselves as to where the APSES domains are located. Briefly note whether the three alignments agree and whether the charged residues in the proposed binding region are wholly or partially conserved. (Refer to the specific residues labelled (+) or (-) in the Mbp1 annotation above). (1 mark) <!-- Sequence variation may indicate variations in binding site -->
+
<td>-</td>
</div>
+
<td>-</td>
&nbsp;<br>
+
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
  
&nbsp;
+
<td bgcolor="#d4d2fc">Q</td>
&nbsp;
+
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">Q</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#fb999c">V</td>
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9d99f9">Q</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
</tr>
  
===(3.2)  Ankyrin domains (1 mark)===
+
<tr><td nowrap="nowrap">MBP1_FILNE/433-460&nbsp;&nbsp;</td>
</div>
+
<td>-</td>
&nbsp;<br>
+
<td>-</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#f0d2e0">A</td>
  
The Ankyrin domains are more highly diverged, the boundaries are less well defined and not even CDD, SMART and SAS agree on the precise annotations. Nevertheless we would hope that a good alignment would recognize homology in that region and that ideally the required indels would be placed between the secondary structure elements, not in their middle.
+
<td bgcolor="#d4d2fc">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
<td>-</td>
*For one of the alignments of your choice, identify the helices in the Ankyrin repeat region of Mbp1. To facilitate this, I have colored the annotated ankyrin helices red in the yeast Mbp1 protein. Briefly state whether the indels are concentrated in regions that connect the helices or if they are more or less evenly distributed along the entire region of similarity. Conclude whether the assertion that ''indels should not be placed in elelements of secondary structure'' has merit in this case, i.e. whether the indels that violate it have strong support from aligned sequence motifs. (1 mark)
+
<td>-</td>
</div>
+
<td>-</td>
&nbsp;<br>
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fcbfc1">V</td>
 +
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#c2bffc">N</td>
  
&nbsp;
+
<td bgcolor="#f5d2db">F</td>
&nbsp;
+
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">E</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#bf99d7">T</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#e2d2ee">S</td>
  
===(3.3)  Other features (2 marks)===
+
</tr>
</div>
+
<tr><td nowrap="nowrap">MBP1_KLULA/477-506&nbsp;&nbsp;</td>
&nbsp;<br>
+
<td>F</td>
 +
<td bgcolor="#e2d2ed">T</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#ffd2d2">I</td>
  
Aligning functional features like ''coiled coil domains'' or ''intrinsically disorderd regions'' is even more difficult, since this is to a large degree a property of the amino acid composition, not as much the precise sequence. Thus we would expect alignment algorithms to have difficulty to detect the correspondence between sequences in such regions.  I have marked the four low complexity regions of the yeast Mbp1 sequence with '''bold''' letters in all three alignments.
+
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
<td>-</td>
*Copy the Mbp1 sequence from your organism from the multi-FASTA files and run a [http://smart.embl-heidelberg.de/ SMART] sequence analysis: paste your sequence (or the Uniprot accession number), check only the checkbox for detecting '''intrinsic protein disorder''' and click "Sequence SMART". Locate the segments of '''low complexity''' for your sequence (they are in the lower part of the results page since they overlap with disordered segements). Find the corresponding positions for your sequence in '''one''' of the multiple sequence alignments. Briefly describe the situation: state whether these segments are found in the same general region, in the same detailed location, or perhaps even conserved in sequence, when you compare them to the ''saccharomyces cerevisiae'' sequence. (1 mark)
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#ffbfbf">I</td>
  
* Briefly discuss whether this observation should lead you to conclude that disorder in these proteins appears to be a conserved feature, i.e. that is selected for in evolution. (1 mark)
+
<td bgcolor="#c2bffc">N</td>
</div>
+
<td bgcolor="#d4d2fc">Q</td>
&nbsp;<br>
+
<td bgcolor="#d4d2fc">Q</td>
&nbsp;
+
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#caabe0">S</td>
  
<!-- add at a later time similar analysis of coils via 2ZIP server - conserved feature? [http://2zip.molgen.mpg.de/index.html 2Zip server]
+
<td bgcolor="#c2abe8">P</td>
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
<td bgcolor="#f699a1">L</td>
*Task
+
<td bgcolor="#a199f6">H</td>
</div>
+
<td bgcolor="#c5abe5">Y</td>
&nbsp;<br>
+
<td bgcolor="#dd99b9">A</td>
&nbsp;
+
<td bgcolor="#dd99b9">A</td>
-->
+
<td bgcolor="#bf99d7">T</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d2d2ff">K</td>
  
 +
<td bgcolor="#d4d2fc">D</td>
 +
</tr>
  
 +
<tr><td nowrap="nowrap">MBP1_SCHST/468-501&nbsp;&nbsp;</td>
 +
<td>A</td>
 +
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#d4d2fc">N</td>
  
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
+
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
==(4) APSES domain homologues: analysis of domain MSAs==
+
<td>-</td>
</div>
+
<td>-</td>
&nbsp;<br>
+
<td>-</td>
 +
<td>-</td>
 +
<td>L</td>
 +
<td>I</td>
 +
<td>A</td>
 +
<td>K</td>
 +
<td bgcolor="#f2bfcc">F</td>
  
The procedures for obtaining the MSAs for all APSES domains is summarized at the top of the page for each alignment. Read it and make sure you understand what has been done. Three approaches were used:
+
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#caabe0">S</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
  
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#e999ad">F</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#e699b1">C</td>
 +
<td bgcolor="#be99d9">S</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#afabfa">N</td>
  
* An [[APSES_domains_PSI-BLAST| alignment based on the PSI-BLAST reults]] as an example of a profile-based alignment.
+
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_SACCE/496-525&nbsp;&nbsp;</td>
 +
<td>F</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#dfd2f0">Y</td>
  
* A [[APSES_domains_CLUSTAL| CLUSTAL-W alignment]] as an example of our standard, plain vanilla progressive alignment procedure.
+
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
* A consistency based, iterated [[APSES_domains_probcons| alignment using '''probcons''']], as an example of the more modern methods. probcons was used rather than T-Coffee since the EBI server restricts the number of sequences it will accept to 50.
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
Comparing the three alignments, we note that they do not agree in detail over large stretches.
+
<td bgcolor="#f9bfc4">L</td>
&nbsp;
+
<td bgcolor="#f9bfc4">L</td>
&nbsp;
+
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#e2d2ed">T</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#ababff">K</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
  
===(4.1)  Manual improvement  (1 mark)===
+
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#be99d9">S</td>
 +
<td bgcolor="#9999ff">K</td>
  
Often errors or inconsistencies are easy to spot and manually editing an MSA is not generally frowned upon, even though this is not a strictly objective procedure. The main goal is to make an alignment biologically more plausible, usually this means to mimize the number of rare events that we need to postulate for the alignment: move indels into more appropriate positions and/or to emphasize conservation of known functional motifs. Here are some examples for what one might aim for in manually editing an alignment:
+
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">CD00204/1-19&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
* Reduce number of indels
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
From Probcons:
+
<td>-</td>
0447_DEBHA    ILKTE-K<span style="color:#FF0000;">-</span>T<span style="color:#FF0000;">---</span>K--SVVK      ILKTE----KTK---SVVK
+
<td>-</td>
9978_GIBZE    MLGLN<span style="color:#FF0000;">-</span>PGLKEIT--HSIT      MLGLNPGLKEIT---HSIT
+
<td>-</td>
1513_CANAL    ILKTE-K<span style="color:#FF0000;">-</span>I<span style="color:#FF0000;">---</span>K--NVVK      ILKTE----KIK---NVVK
+
<td>-</td>
6132_SCHPO    ELDDI-I<span style="color:#FF0000;">-</span>ESGDY--ENVD      ELDDI-IESGDY---ENVD
+
<td>-</td>
1244_ASPFU    ----N<span style="color:#FF0000;">-</span>PGLREIC--HSIT  ->  ----NPGLREIC---HSIT
+
<td>-</td>
0925_USTMA    LVKTC<span style="color:#FF0000;">-</span>PALDPHI--TKLK      LVKTCPALDPHI---TKLK
+
<td>-</td>
2599_ASPTE    VLDAN<span style="color:#FF0000;">-</span>PGLREIS--HSIT      VLDANPGLREIS---HSIT
+
<td>-</td>
9773_DEBHA    LLESTPKQYHQHI--KRIR      LLESTPKQYHQHI--KRIR
+
<td>-</td>
0918_CANAL    LLESTPKEYQQYI--KRIR      LLESTPKEYQQYI--KRIR
 
  
<small>Gaps marked in red were moved. The sequence similarity in the alignment does not change considerably, however the total number of indels in this excerpt is reduced to 13 from the original 22</small>
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">E</td>
 +
<td bgcolor="#d4d2fc">D</td>
  
* Move indels to more plausible position
+
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#bfbfff">R</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#c2abe8">P</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
  
From CLUSTAL:
+
<td bgcolor="#be99d9">S</td>
4966_CANGL    MKHEKVQ------GGYGRFQ---GTW      MKHEKV<span style="color:#00AA00;">Q</span>------GGYGRFQ---GTW
+
<td bgcolor="#afabfa">N</td>
1513_CANAL    KIKNVVK------VGSMNLK---GVW      KIKNVV<span style="color:#00AA00;">K</span>------VGSMNLK---GVW
+
<td bgcolor="#e4d2ec">G</td>
6132_SCHPO    VDSKHP<span style="color:#FF0000;">-</span>----------<span style="color:#FF0000;">Q</span>ID---GVW  -> VDSKHP<span style="color:#00AA00;">Q</span>-----------ID---GVW
+
<td bgcolor="#d5d2fb">H</td>
1244_ASPFU    EICHSIT------GGALAAQ---GYW      EICHSI<span style="color:#00AA00;">T</span>------GGALAAQ---GYW
+
</tr>
 +
<tr><td nowrap="nowrap">CD00204/99-118&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
<small>The two characters marked in red were swapped. This does not change the number of indels but places the "Q" into a a column in which it is more highly conserved (green). Progressive alignments are especially prone to this type of error.</small>
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
* Conserve motifs
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
From CLUSTAL:
+
<td>-</td>
6166_SCHPO      --DKR<span style="color:#FF0000;">V</span>A---<span style="color:#FF0000;">G</span>LWVPP      --DKR<span style="color:#FF0000;">V</span>A--<span style="color:#FF0000;">G</span>-LWVPP
+
<td>-</td>
XBP1_SACCE      GGYIK<span style="color:#FF0000;">I</span>Q---<span style="color:#FF0000;">G</span>TWLPM      GGYIK<span style="color:#FF0000;">I</span>Q--<span style="color:#FF0000;">G</span>-TWLPM
+
<td>-</td>
6355_ASPTE      --DE<span style="color:#FF0000;">I</span>A<span style="color:#FF0000;">G</span>---NVWISP  ->  ---DE<span style="color:#FF0000;">I</span>A--<span style="color:#FF0000;">G</span>NVWISP
+
<td bgcolor="#fcbfc1">V</td>
5262_KLULA      GGYIK<span style="color:#FF0000;">I</span>Q---<span style="color:#FF0000;">G</span>TWLPY      GGYIK<span style="color:#FF0000;">I</span>Q--<span style="color:#FF0000;">G</span>-TWLPY
+
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#ababff">K</td>
  
<small>The first of the two residues marked in red is a conserved, solvent exposed hydrophobic residue that may mediate domain interactions. The second residue is the conserved glycine in a beta turn that cannot be mutated without structural disruption. Changing the position of a gap and insertion in one sequence improves the conservation of both motifs.</small>
+
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#bfbfff">R</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#c2abe8">P</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#dd99b9">A</td>
  
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">K</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">1SW6/203-232&nbsp;&nbsp;</td>
 +
<td>L</td>
 +
<td bgcolor="#d4d2fc">D</td>
  
&nbsp;
+
<td bgcolor="#fbd2d5">L</td>
&nbsp;
+
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#e2d2ef">W</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td>-</td>
 +
<td>-</td>
  
Please consider the following excerpts from the alignments:
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
PSI-BLAST
+
<td>-</td>
'''MBP1_SACCE    SIMKRKKDDWVNATHILKA------A----------NFA--------KAKRTR-----'''
+
<td>-</td>
2599_ASPTE    -IMWDYNIGLVRTTPLFRS------Q----------NYS--------KTTPAK-----
+
<td>-</td>
9773_DEBHA    -IIWDYETGFVHLTGIWKA------S----------INDEVNTHRNLKADIVK-----
+
<td bgcolor="#ebbfd3">M</td>
0918_CANAL    -VIWDYETGWVHLTGIWKA------SLTIDGSNVSPSHL--------KADIVK-----
+
<td bgcolor="#f9bfc4">L</td>
9901_DEBHA    -ILRRVQDSYINISQLF--------SILLKIG----HLS--------EAQLTN-----
+
<td bgcolor="#c2bffc">N</td>
7766_ASPNI    -LMRRSKDGYVSATGMFKI------A-----------FP--------WAKLEEERSER
+
<td bgcolor="#f0d2e0">A</td>
5459_GIBZE    -LMRRSYDGFVSATGMFKASFPYAEA----------SDE--------DAERKY-----
+
<td bgcolor="#d4d2fc">Q</td>
2267_NEUCR    -LMRRSQDGYISATGMFKA------TFPYASQ----EEE--------EAERKY-----
+
<td bgcolor="#afabfa">D</td>
3510_ASPFU    -LMRRSKDGYVSATGMFKI------A-----------FP--------WAK--------
 
3762_MAGGR    -LMRRSSDGYVSATGMFKATFPYADA----------EDE--------EAERNY-----
 
3412_CANAL    -VLRRVQDSFVNVTQLFQI------LIKLE------VLP--------TSQVDN-----
 
  
 +
<td bgcolor="#caabe0">S</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#eaabbf">C</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#9d99f9">N</td>
 +
<td bgcolor="#ffabab">I</td>
  
CLUSTAL
+
<td bgcolor="#dd99b9">A</td>
'''MBP1_SACCE    SIMKRKKDDWVNATHILKAAN----------FAKAKRTRILE----------KEVLKETHE'''
+
<td bgcolor="#dd99b9">A</td>
2599_ASPTE    -IMWDYNIGLVRTTPLFRSQ----------NYSKTTPAKVLDAN--------P-GLREISH
+
<td bgcolor="#9999ff">R</td>
9773_DEBHA    -IIWDYETGFVHLTGIWKASIN-DEVNTHR-NLKADIVKLLEST--------PKQYHQHIK
+
<td bgcolor="#f7abb2">L</td>
0918_CANAL    -VIWDYETGWVHLTGIWKASLTIDGSNVSPSHLKADIVKLLEST--------PKEYQQYIK
+
<td bgcolor="#e4d2ec">G</td>
9901_DEBHA    -ILRRVQDSYINISQLFSILL----------KIGHLSEAQLTNFLNNEILTNTQYLSSGGS
+
<td bgcolor="#d4d2fc">N</td>
7766_ASPNI    -LMRRSKDGYVSATGMFKIAF----------PWAKLEEERSE----------REYLKTRPE
+
</tr>
5459_GIBZE    -LMRRSYDGFVSATGMFKASF----------PYAEASDEDAE----------RKYIKSLPT
+
<tr><td nowrap="nowrap">SecStruc/203-232&nbsp;&nbsp;</td>
2267_NEUCR    -LMRRSQDGYISATGMFKATF----------PYASQEEEEAE----------RKYIKSIPT
+
<td>t</td>
3510_ASPFU    -LMRRSKDGYVSATGMFKIAF----------PWAKLEEEKAE----------REYLKTREG
 
3762_MAGGR    -LMRRSSDGYVSATGMFKATF----------PYADAEDEEAE----------RNYIKSLPA
 
3412_CANAL    -VLRRVQDSFVNVTQLFQILI----------KLEVLPTSQVDNYFDNEILSNLKYFGSSSN
 
  
 +
<td bgcolor="#e6d2e9">_</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td>-</td>
  
Probcons
+
<td>-</td>
'''MBP1_SACCE    SIMKRKKDDWVNATHILKAANF----AKA----------KRTRILEKE-V-LKETH--E'''
+
<td>-</td>
2599_ASPTE    -IMWDYNIGLVRTTPLFRSQNY----SKT----------TPAKVLDAN-PGLREIS--H
+
<td>-</td>
9773_DEBHA    -IIWDYETGFVHLTGIWKASIN----DEV--NTHRNLKADIVKLLESTPKQYHQHI--K
+
<td>-</td>
0918_CANAL    -VIWDYETGWVHLTGIWKASLT----IDGSNVSPSHLKADIVKLLESTPKEYQQYI--K
+
<td>-</td>
9901_DEBHA    -ILRRVQDSYINISQLFSILLKIGHLSEA----------QLTNFLNNE-I-LTNTQYLS
+
<td>-</td>
7766_ASPNI    -LMRRSKDGYVSATGMFKIAFP----WAK----------LEEERSERE-Y-LK-----T
+
<td>-</td>
5459_GIBZE    -LMRRSYDGFVSATGMFKASFP----YAE----------ASDEDAERK-Y-IK-----S
+
<td>-</td>
2267_NEUCR    -LMRRSQDGYISATGMFKATFP----YAS----------QEEEEAERK-Y-IK-----S
+
<td>-</td>
3510_ASPFU    -LMRRSKDGYVSATGMFKIAFP----WAK----------LEEEKAERE-Y-LK-----T
 
3762_MAGGR    -LMRRSSDGYVSATGMFKATFP----YAD----------AEDEEAERN-Y-IK-----S
 
3412_CANAL    -VLRRVQDSFVNVTQLFQILIKLEVLPTS----------QVDNYFDNE-I-LSNLKYFG
 
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
<td>-</td>
*In any '''one''' of these excerpts, find at least one example where the alignment could be manually improved. Show the original version, the improved version and highlight the changes in red. (1 mark)
+
<td>-</td>
</div>
+
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dcbfe1">_</td>
 +
<td bgcolor="#dcbfe1">_</td>
 +
<td bgcolor="#dcbfe1">_</td>
 +
<td bgcolor="#e6d2e9">_</td>
 +
<td bgcolor="#e6d2e9">_</td>
  
 +
<td bgcolor="#d2abd8">_</td>
 +
<td bgcolor="#cbabdf">t</td>
 +
<td bgcolor="#e6d2e9">_</td>
 +
<td bgcolor="#c799cf">_</td>
 +
<td bgcolor="#dcbfe1">_</td>
 +
<td bgcolor="#d2abd8">_</td>
 +
<td bgcolor="#b2abf7">H</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#a199f6">H</td>
  
The fact that such improvements usually are not hard to find teaches us to be cautious with the results. Not in all cases will lack of conservation in a particular column mean that a residue has changed in evolution - sometimes this is simply a consequence of misalignment. MSAs can only take sequence information into account, while we may have additional information on structural and functional conservation patterns. This may include secondary structure (gaps should be moved out of regions of secondary structure, where possible), structurally required residues (expected to be conserved accross all structurally similar sequences) and functionally conserved residues (expected to have a high likelyhood of being conserved within groups of orthologues, but varying between orthologues and paralogues).
+
<td bgcolor="#b2abf7">H</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#b2abf7">H</td>
 +
<td bgcolor="#e6d2e9">_</td>
 +
<td bgcolor="#e6d2e9">_</td>
 +
</tr>
 +
</table>
 +
</td></tr>
  
In terms of structural conservation, we expect motif or consistency based alignments to be more accurate since they align to the "big picture". In terms of functional variation we expect progressive alignments to be more accurate, since they align to local similarities.
+
</table>
 +
;Aligned sequence after editing. A significant cleanup of the frayed region is possible. Now there is only one insertion event, and it is placed into the loop that connects two helices of the 1SW6 structure.
  
&nbsp;
 
&nbsp;
 
  
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
  
===(4.2)  Residue conservation  (1 mark)===
+
===(2.5) Final analysis===
 
</div>
 
</div>
&nbsp;<br>
 
  
Let us finally interpret the alignments in terms of their biological relevance. I have transferred the ligand-binding annotations for the yeast Mbp1 APSES domain into the multiple sequence alignments by color coding the charged residues that putatively could bind DNA <span style="color:#FF0000;">'''red'''</span> (-) and <span style="color:#0066FF;">'''blue'''</span> (+).  Thus these residues label columns in which we expect ''functional'' conservation. I have labeled two residues that are associated with important structural features <span style="color:#00AA33;">'''green'''</span>. These two residues are G75, a mandatory glycine in the third position of a particular type of beta-turn, and W77, a key component of the domain's hydrophobic core. Thus these two residues label columns in which we expect ''structural'' conservation. Let's assume that all the APSES domains fold into similar structures and that they all bind DNA, although not necessarily the same cognate sequence. This should allow you to answer the following questions:
+
 
 +
<div style="padding: 5px; background: #EEEEEE;">
 +
* Compare the distribution of indels in the ankyrin repeat regions of your alignments. '''Review''' whether the indels in this region are concentrated in segments that connect the helices, or if they are more or less evenly distributed along the entire region of similarity. Think about whether the assertion that ''indels should not be placed in elements of secondary structure'' has merit in your alignment. Recognize that an indel in an element of secondary structure could be interpreted in a number of different ways:
 +
** The alignment is correct, the annotation is correct too: the indel is tolerated in that particular case, for example by extending the length of an &alpha;-helix or &beta;-strand;
 +
** The alignment algorithm has made an error, the structural annotation is correct: the indel should be moved a few residues;
 +
** The alignment is correct, the structural annotation is wrong, this is not a secondary structure element after all;
 +
** Both the algorithm and the annotation are probably wrong, but we have no data to improve the situation.  
 +
 
 +
(<small>NB: remember that the structural annotations have been made for the yeast protein and might have turned out differently for the other proteins...</small>)
 +
 
 +
You should be able to analyse discrepancies between annotation and expectation in a structured and systematic way. In particular if you notice indels that have been placed into structurally annotated regions of secondary structure, you should be able to comment on whether the location of the indel has strong support from aligned sequence motifs, or whether the indel could possibly be moved into a different location without much loss in alignment quality.
 +
</div>
  
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
<div style="padding: 5px; background: #FFCC99;">
Consider any '''one''' of the three APSES domain MSAs. 
+
;Analysis (2 marks)
  
*Are the patterns of sequence variation for functionally conserved residues compatible with different binding specificities for different APSES domains? State briefly (but with reference to specific residues) what you would expect and what you find.
+
*Considering the whole alignment and your experience with editing, please note in your assignment your assessment of whether the position of indels relative to structural features of the ankyrin domains in your organism's Mbp1 protein is reliable.  
  
*Are the patterns of sequence variation for structurally conserved residues compatible with a common fold of different APSES domains? State briefly (but with reference to specific residues) what you would expect and what you find. (1 mark)
+
*CDD extends the ankyrin domain annotation beyond the 1SW6 domain boundaries. Given your assessment of conservation in that region, do you think that this is reasonable in your organisms' protein? Is there evidence for this in the alignment of the CD00204 consensus with well aligned blocks of sequence beyond the positions that match Swi6?  
 
</div>
 
</div>
  
&nbsp;
 
&nbsp;
 
  
 
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
==(5) Summary of Resources==
+
 
 +
==(3) Summary of Resources==
 
</div>
 
</div>
 
&nbsp;<br>
 
&nbsp;<br>
  
 
;Links
 
;Links
:* [[Organism_list_2006|Assigned Organisms]]
 
 
:* [http://www.ncbi.nlm.nih.gov/blast '''BLAST''']
 
:* [http://www.ncbi.nlm.nih.gov/blast '''BLAST''']
:* [http://www.pir.uniprot.org/search/idmapping.shtml '''Uniprot ID mapping''' service]
+
:* [http://www.pir.uniprot.org/?tab=mapping '''Uniprot ID mapping''' service]
 
:* [http://www.ncbi.nlm.nih.gov/sutils/blink.cgi?pid=68465419  A '''BLink''' example]
 
:* [http://www.ncbi.nlm.nih.gov/sutils/blink.cgi?pid=68465419  A '''BLink''' example]
 
:* [http://www.ebi.ac.uk/clustalw/ EBI '''CLUSTAL-W''' server]
 
:* [http://www.ebi.ac.uk/clustalw/ EBI '''CLUSTAL-W''' server]
Line 677: Line 3,741:
 
:* [http://www.ebi.ac.uk/thornton-srv/databases/sas/ '''SAS''']
 
:* [http://www.ebi.ac.uk/thornton-srv/databases/sas/ '''SAS''']
  
;Sequences
+
;Lists
:* [[All_Mbp1_proteins|'''All Mbp1 proteins''']]
+
:* [[Species list]]
:* [[All_APSES_domains|'''All APSES domains''']]
+
:* [[Mbp1_RBM_reference_sequences|'''A page of reference sequence of Mbp1 proteins''']]
 +
:* [[Mbp1_annotation|'''A page of text-based annotations for the yeast Mbp1 protein''']]
  
;Alignments
 
:'''Mbp1 proteins:'''
 
:* [[All_Mbp1_CLUSTAL|Mbp1 proteins '''CLUSTAL''' aligned]]
 
:* [[All_Mbp1_MUSCLE|Mbp1 proteins '''MUSCLE''' aligned]]
 
:* [[All_Mbp1_T-COFFEE|Mbp1 proteins '''T-Coffee''' aligned (text version)]]
 
:* [[All_Mbp1_T-COFFEE_scores|Mbp1 proteins '''T-Coffee''' aligned (coloured according to scores)]]
 
  
:'''APSES domains:'''
+
:'''Further reading'''
:* [[APSES_domains_PSI-BLAST|All APSES domains - alignment based on '''PSI-BLAST''' results]]
+
:* [http://bioinformatics.oxfordjournals.org/content/24/3/319.full Moreno-Hagelsieb &amp; Latimer compare Reciprocal Best Match vs. a related concept: Reciprocal Smallest Distance]
:* [[APSES_domains_CLUSTAL|All APSES domains - '''CLUSTAL-W''' alignment]]
 
:* [[APSES_domains_probcons|All APSES domains -  '''probcons''' alignment]]
 
  
 
+
&nbsp;<br>
&nbsp;
 
&nbsp;
 
  
 
<div style="padding: 5px; background: #D3D8E8;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #D3D8E8;  border:solid 1px #AAAAAA;">
Line 701: Line 3,756:
 
</div>
 
</div>
  
If you have any questions at all, don't hesitate to mail me at [mailto:boris.steipe@utoronto.ca boris.steipe@utoronto.ca] or post your question to the [mailto:bch441_2006@googlegroups.com Course Mailing List]
+
&nbsp;<br>
 +
 
 +
If you have any questions at all, don't hesitate to mail me at [mailto:boris.steipe@utoronto.ca boris.steipe@utoronto.ca] or post your question to the [mailto:bch441_2011@googlegroups.com Course Mailing List]

Latest revision as of 23:32, 21 September 2012

Note! This assignment is currently active. All significant changes will be announced on the mailing list.

 
 

 

 

Assignment 3 (last: 2011) - Multiple Sequence Alignment

 

Preparation, submission and due date

Read carefully.
Be sure you have understood all parts of the assignment and cover all questions in your answers! Sadly, we always get assignments back in which people have simply overlooked crucial questions. Sadly, we always get assignments back in which people have not described procedural details. If you did not notice that the above were two different sentences, you are still not reading carefully enough.

Review the guidelines for preparation and submission of BCH441 assignments.

The due date for the assignment is Monday, November 21. at 12:00.

   

Your documentation for the procedures you follow in this assignment will be worth 1 mark.


 

Introduction

 

Take care of things, and they will take care of you.
Shunryu Suzuki

Much of what we know about a protein's physiological function is based on the conservation of that function as the species evolves. We assess conservation by comparison to related proteins. Conservation - or variability - is a consequence of selection under constraints: the multiple effects on a species' fitness function that are induced through changes to the structural or functional features of a protein. Conservation patterns can thus provide evidence for many different questions: structural conservation among proteins with similar 3D-structures, functional conservation among homologues with comparable roles, peaks of sequence variability that indicate domain boundaries in multi-domain proteins, or amino acid propensities as predictors for protein engineering and design tasks.

Measuring conservation requires alignment. Therefore a carefully done multiple sequence alignment (MSA) is a cornerstone for the annotation of the essential properties a gene or protein. MSAs are also useful to resolve ambiguities in the precise placement of indels and to ensure that columns in alignments actually contain amino acids that evolve in a similar context. MSAs serve as input for

  • functional annotation;
  • protein homology modeling;
  • phylogenetic analyses, and
  • sensitive homology searches in databases.


As a first step, we will explore the search and retrieval of fungal proteins that are orthologous to yeast Mbp1, and of the APSES domains they contain. Each student is being assigned one genome-sequenced fungus. Briefly, you will

  1. Collect sequence identifiers for all APSES domain transcription factors in your assigned species;
  2. Retrieve the sequences;
  3. Perform a multiple sequence alignment with these, and a number of reference domains;
  4. Edit the alignment and annotate.


Multiple Sequence Alignment is not a solved, computational problem and a significant number of alignment tools exist, each with different strengths and objectives. It is remarkable that by far the most frequently used MSA algorithm is CLUSTAL, a procedure that was first published for the microprocessors of the late 1980s, surpassed in performance many times, and shown to be significantly inferior to more modern approaches when aligning sequences with 30% identity or less. In this assignment we will encounter various approaches to multiple alignment:

  • A model-based approach (based on the PSSM that PSI-BLAST generates)
  • Progressive alignments - CLUSTAL and MAFFT
  • Consistency based alignment - T-Coffee and MUSCLE


(1) Mbp1 homologues


(1.1) Retrieving sequences


In Assignment 2 you retrieved the protein sequences of saccharomyces cerevisiae Mbp1 and defined its APSES (KilA-N) domain. Let us now search for an orthologue of this sequence in Your Species. More precisely, you should identify proteins that fulfill the Reciprocal Best Match criterion.

First, we need to define the sequence you will use to find Mbp1 homologues. Since Mbp1 contains the very widely distributed Ankyrin motifs, a BLAST search with full length sequences will pick up a large number of Ankyrin-repeat containing proteins that are otherwise unrelated to our query. We will instead search for homologues using only the APSES domain as a query. However, the Pfam definition of the APSES domain (or KilA-N family, as it is now called) does not cover the entire length of the domain that has been crystallized. Therefore, we will use the sequence of the crystallized protein instead of the Pfam alignment. One of the results of our analysis will be whether APSES domains in fungi all have the same length as the Mbp1 domain, or whether some are indeed much shorter, as sugested by the Pfam alignment. To remind you, here is the full sequence of the 1MB1 structure (Note that the C-terminal His6 tag that has been added for purification is not part of the Mbp1 protein sequence.) ...


>PDB:1MB1
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKYQGTWVPL
NIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH


... and, for comparison, this is the corresponding alignment with the Pfam KilA-N model obtained from a RPS-BLAST search of the above sequence against the CDD database:


                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
1MB1            19 IHSTGSIMKRKKDDWVNATHILKAANFAKaKRTRILEKEVLKETHEKVQ----------------GGFGKYQGTWVPLNI 82

Cdd:pfam04383    3 YNDFEIIIRRDKDGYINATKLCKAAGATK-RFRNWLRLESTKELIEELSkennidvliievenkkGKNGRLQGTYVHPDL 81


                           90
                   ....*....|....*
1MB1            83 AKQLA----EKFSVY 93

Cdd:pfam04383   82 ALAIAswisPEFALK 96


As you can see, the Pfam alignment is 18 amino acids shorter at the N-terminus and 31 amino acids shorter at the C-terminus.


Find APSES domain proteins in your species
  1. Access the species list and identify the species that has been assigned to you.
  2. Navigate to the NCBI's main page.
  3. In the left-hand menu of links, follow the link to Genomes & Maps.
  4. Under the Databases tab, follow the link to Genome.
  5. In the Genome tools section of that page, follow the link to Genomic groups BLAST.
  6. Click on link to the eukaryotic genomes tree, then on the link for the text table. This produces a BLAST interface to a list of species for which whole-genome sequences have been sequenced, annotated and entered into the various databases.
  7. Paste the FASTA sequence of the structurally defined Mbp1 APSES domain (e.g. from 1MB1) into the search field (excluding the His-tag, of course), set the parameters correctly for a Protein search against Protein sequences using blastp. Then find your assigned species in the table and check the box next to its name. Remember to record the parameters for your search. I expect you to understand which parameters would be needed in order to make this search reproducible. Run the search.
  8. On the next screen, check the box next to Format for: PSI-BLAST. Then click on View report to show the results of the first PSI-BLAST iteration.
  9. Run subsequent iterations of PSI-BLAST simply by clicking on Go after checking the sequences that have been included.
  10. Iterate the PSI-BLAST search until convergence (i.e. until no more new sequences are added); make sure to include only sequences for which the E-value is small (smaller than about 10e-03 should be safe). Sequences with borderline E-values that improve significantly in an iteration are probably homologues. Sequences with borderline E-values that do not improve much, or for which the E-value increases are probably not homologues. If this step does not work for you or the results are not what you expect, please contact your TA right away.
  • Note: Please spend a little time on each page to understand its contents. Ask, if the page contains resources or features you don't understand. Think about what you are doing. If you simply click on the links I provide, you will miss the opportunity to understand how the resources fit into the workflow you are working on, and to be able to execute similar processes yourself. Questions on page contents can potentially appear on quizzes and exam.


Familiarize yourself with the output form you obtain, this is by far the most frequently used bioinformatics result page. You may want to refer to the NCBI explanation.

Here is a list of things to look for, all of which I expect you to know and understand. (However you do not need to comment on these points in your submission.)

On the alignment image
  • What do the different colored bars mean?
  • What is the information you get when you "mouse-over" a colored bar on the alignment image.
  • What happens when you click on one of the bars?
In the description list
  • Where does the link next to an identifier take you?
  • Where does the link in the "score" column take you?
  • What does the icon at the end of each row mean? What other icons could appear there?
In the alignment section
  • What do the alignment metrics mean:
    • Score?
    • Expect (E-value)?
    • Identities?
    • Positives?
    • Gaps?
  • What is the alignment length?
  • Which sequence is labeled Query and which one is labelled Sbjct?


Next
retrieve the sequences that have E-values low enough to make you conclude they contain APSES domain homologues.
  1. Review the sequences you have found: they should all be significantly similar to the query profile. In some of the assigned species you will find one hit for each distinct sequence in the genome, in others, you will find several versions of essentially the same gene (e.g. refseq and other accession numbers).
  2. Explore the relationship between the hits by clicking on select all sequences, then choosing Distance tree of results at the top or bottom of your search results to visualize a tree representation of similarity. Highly similar sequences will be collapsed into the same node in the distance tree; you can expand those nodes to list all the node's members.
  3. Identify one representative for each distinct protein you have found. If possible, use proteins with refseq identifiers. Avoid duplicates or nearly identical variants. If there are length differences, use the longer version (shorter versions may contain only partial sequences). Click on the checkbox next to each protein you have identified.
  4. Click on get selected sequences at the top or bottom of the page. Note and record the GIs for your sequences that are listed in the Search details box, you can use them to easily reproduce your results by pasting them into any Entrez search. Also note the URL that this has produced (in your browser's URL bar). As you see, you can retrieve a list of sequences from NCBI simply by adding a list of comma-separated GI numbers to the URL of the protein database.
  5. Click on Display settings and choose FASTA (text).

If you want, for comparison, you can run a multiple alignment with an NCBI-developed MSA tool: COBALT. On the sequence list page, in the right-hand column, in the section Analyze these sequences, click on Align sequences with COBALT. It is a convenient way to get a quick first look at an alignment of NCBI retrieved sequences.

You now have a collection of APSES domain-containing homologues in your organism. There are two more tasks we need to address before we can compute alignments and analyze them. (A) we need to rename our sequences, and (B) we need to define the boundaries of their APSES domains.


(1.2) Renaming Sequences

A phylogenetic tree or multiple alignment is not really informative if it that displays GI numbers or other abstract identifiers as labels of rows or nodes. The relationship between species is fundamental to the variation we observe and we need to make this relationship explicit.

Imagine that the rows in an MSA were completely unlabeled, or the nodes in the tree would be just circles: we would have a very hard time relating the computed relationships back to the biology they represent. Abstract identifiers like NP_010227 are not much better.

Typically, the information that programs use to label sequences is taken from the FASTA header. This provides us with an easy way to make sure they display the information we need and that we can interpret. Typically such programs will use the first few (often ten) characters they find. We will therefore design short strings strings that identify potential gene family relationships as well as species.


Species codes

The scientific name of a species is formed according to Linnaean binomial nomenclature and Swissprot has for a long time condensed species names into mnemonic five-character codes, taking the first three from the genus name and the last two from the specific name. For example Saccharomyces cerevisiae is abbreviated as SACCE and Lachancea thermotolerans is LACTH. For the most part, this creates unique strings that are good mnemonic labels for the species. I have added these "codes" to the Species list.


Gene families

Most yeast genes have traditional names, like mbp1 or sok2. These names are convenient family labels since saccharomyces cerevisiae is one of the best studied model organisms. Therefore, once we identify a protein family that includes a yeast gene, we can easily access expert knowledge in textbooks or manuscripts. Of course, such labels are arbitrary - whether we call a gene Mbp1 or WXYZ makes no difference - as long as all genes that we presume to be family members carry the same label. For higher eukaryotes, I would probably choose human gene names as a reference point, for bacteria I would choose E. coli.

To define which gene belongs into which family, we can align all newly found genes with all yeast APSES domain homologues, to find out which ones they are most similar to. This creates common family labels. We can use these as provisional family names for the encoded proteins, even though we may want to revise them once we have mapped out explicit phylogenetic trees.


Identifying APSES domains (general procedure).

In order to identify the APSES domain boundaries, you can simply run a multiple sequence alignment of the structurally defined APSES domain sequence (e.g. taken from PDB-ID 1MB1) against all sequences you have found. The boundaries of the aligned APSES domain then define the domain boundaries in the aligned proteins.


Identifiying family relationships (in the same run)

However, for efficiency, we can also determine family relationships in the same alignment that we use to define domain boundaries, if we simply include all yeast APSES domains in the MSA. Then we can judge similarity simply from examining the guide tree of the alignment and label the families accordingly. This has the added advantage that the domain boundaries are more securely defined, since we include more sequence information into the alignment.

Proceed as follows.
  1. Open the Muscle MSA input page at the EBI.
  2. Access the Yeast APSES domain collection I have prepared and copy the FASTA sequences. Paste them into the sequence field of the MUSCLE program input form.
  3. Copy the FASTA sequenced of the full length APSES domain protein sequence collection from your PSI-BLAST search (above) and paste them into the MUSCLE input form as well.
  4. Set the following parameters:
OUTPUT FORMAT: CLUSTALW2
OUTPUT TREE: from second iteration
OUTPUT ORDER: aligned
  1. Click on Submit.


The output should show the MSA. The overlap of the yeast APSES domains with your sequences defines the domain boundaries. Moreover, a tree has been calculated and you can view the tree to identify family relationships.

Visualize the alignment tree and decide on names

Click on the link to the Guide tree. This is the so-called Newick tree format and there are a large number of online tree viewers to visualize such trees. The MUSCLE form will display one tree for you,

You could also navigate (for example) to the proWeb Tree viewer and paste the tree data into the User-supplied Newick Tree input field. Choose any graphics format your browser can handle (JPEG is a pretty safe bet) and click on View tree.


  1. Interpret the tree to decide on the protein family names for your sequences:
    1. If a yeast protein is grouped with exactly one of your proteins, your protein gets the same name.
    2. If a yeast protein is grouped with more than one of your proteins, replace the number in the yeast protein with a, b, c ..., from most similar to least similar for your protein. For example: if one Aspergillus fumigatus protein is most similar to yeast Mbp1, you will give it the name MBP1_ASPFU. If two proteins are both most similar to yeast Sok2, you will name them SOKA_ASPFU and SOKB_ASPFU. Try to get it approximately right but remember that this is a process of estimation - we are not accurately measuring distances (yet).

That done, edit your FASTA headers and save your APSES domain sequence set. We will need them for the next assignment.


(2) Align and Annotate

 


(2.1) Review of domain annotations

APSES domains are relatively easy to identify and annotate but we have had problems with the ankyrin domains in Mbp1 homologues. Both CDD as well as SMART have identified such domains, but while the domain model was based on the same Pfam profile for both, and both annotated approximately the same regions, the details of the alignments and the extent of the predicted region was different.

Mbp1 forms heterodimeric complexes with a homologue, Swi6. Swi6 does not have an APSES domain, thus it does not bind DNA. But it is similar to Mbp1 in the region spanning the ankyrin domains and in 1999 Foord et al. published its crystal structure (1SW6). This structure is a good model for Ankyrin repeats in Mbp1. For details, please refer to the consolidated Mbp1 annotation page I have prepared.

In what follows, we will use the program JALVIEW - a Java based multiple sequence alignment editor to load and align sequences and to consider structural similarity between yeast Mbp1 and its closest homologue in your organism.

In this part of the assignment,

  1. You will load sequences that are most similar to Mbp1 into an MSA editor;
  2. You will add sequences of ankyrin domain models;
  3. You will perform a multiple sequence alignment;
  4. You will try to improve the alignment manually;


(2.2) Jalview, loading sequences

Geoff Barton's lab in Dundee has developed an integrated MSA editor and sequence annotation workbench with a number of very useful functions. It is written in Java and should run on Mac, Linux and Windows platforms without modifications. We will use this tool for this assignment and explore its features as we go along.

  1. Navigate to the Jalview homepage click on Download, install Jalview on your computer and start it. A number of windows that showcase the program's abilities will load, you can close these.
  2. Prepare homologous Mbp1 sequences for alignment:
    1. Find the sequence in your assigned species that fulfills the Reciprocal Best Match crierion with yeast Mbp1.
    2. Open the Mbp1 RBM reference sequences page.
    3. Copy the FASTA sequences of the reference proteins, return to Jalview and select File → Input Alignment → from Textbox and paste the sequences into the textbox.
    4. Also paste a FASTA sequence of your species' Mbp1 protein into the window.
    5. Finally copy the sequences for ankyrin domain models (below) and paste them into the Jalview textbox as well. Paste two separate copies of the CD00204 consensus sequence and one copy of 1SW6.
    6. When all the sequences are present, click on New Window. Jalview gives you all the sequences, but of course this is not yet an alignment.
Ankyrin domain models
>CD00204 ankyrin repeat consensus sequence from CDD
NARDEDGRTPLHLAASNGHLEVVKLLLENGADVNAKDNDGRTPLHLAAKNGHLEIVKLLL
EKGADVNARDKDGNTPLHLAARNGNLDVVKLLLKHGADVNARDKDGRTPLHLAAKNGHL
>1SW6 from PDB - unstructured loops replaced with xxxx
GPIITFTHDLTSDFLSSPLKIMKALPSPVVNDNEQKMKLEAFLQRLLFxxxxSFDSLLQE
VNDAFPNTQLNLNIPVDEHGNTPLHWLTSIANLELVKHLVKHGSNRLYGDNMGESCLVKA
VKSVNNYDSGTFEALLDYLYPCLILEDSMNRTILHHIIITSGMTGCSAAAKYYLDILMGW
IVKKQNRPIQSGxxxxDSILENLDLKWIIANMLNAQDSNGDTCLNIAARLGNISIVDALL
DYGADPFIANKSGLRPVDFGAG


(2.3) Computing alignments

Sequence alignments can be calculated directly from Jalview.

  1. In Jalview, select Web Service → Alignment → MAFFT Multiple Protein Sequence Alignment. The alignment is calculated in a few minutes and displayed in a new window.
  2. Choose Colour → Hydrophobicity and → by Conservation. Then select Modify Conservation Threshold... and adjust the slider left or right to see which columns are highly conserved. You will notice that the Swi6 sequence that was supposed to align only to the ankyrin domains was in fact aligned to other parts of the sequence as well. This is one part of the MSA that we will have to correct manually and a common problem when aligning sequences of different lengths.
  3. Other alignment algorithms are available and you may wish to explore whether the alignments differ significantly.


(2.4) Editing ankyrin domain alignments


A good MSA comprises only columns of residues that play similar roles in the proteins' mechanism and/or that evolve in a comparable structural context. Since it is a result of biological selection and conservation, it has relatively few indels and the indels it has are usually not placed into elements of secondary structure or into functional motifs. The contiguous features annotated for Mbp1 are expected to be left intact by a good alignment.

A poor MSA has many errors in its columns; these contain residues that actually have different functions or structural roles, even though they may look similar according to a (pairwise!) scoring matrix. A poor MSA also may have introduced indels in biologically irrelevant positions, to maximize spurious sequence similarities. Some of the features annotated for Mbp1 will be disrupted in a poor alignment and residues that are conserved may be placed into different columns.

Often errors or inconsistencies are easy to spot, and manually editing an MSA is not generally frowned upon, even though this is not a strictly objective procedure. The main goal of manual editing is to make an alignment biologically more plausible. Most comonly this means to mimize the number of rare evolutionary events that the alignment suggests and/or to emphasize conservation of known functional motifs. Here are some examples for what one might aim for in manually editing an alignment:

Reduce number of indels
From a Probcons alignment:
0447_DEBHA    ILKTE-K-T---K--SVVK      ILKTE----KTK---SVVK
9978_GIBZE    MLGLN-PGLKEIT--HSIT      MLGLNPGLKEIT---HSIT
1513_CANAL    ILKTE-K-I---K--NVVK      ILKTE----KIK---NVVK
6132_SCHPO    ELDDI-I-ESGDY--ENVD      ELDDI-IESGDY---ENVD
1244_ASPFU    ----N-PGLREIC--HSIT  ->  ----NPGLREIC---HSIT
0925_USTMA    LVKTC-PALDPHI--TKLK      LVKTCPALDPHI---TKLK
2599_ASPTE    VLDAN-PGLREIS--HSIT      VLDANPGLREIS---HSIT
9773_DEBHA    LLESTPKQYHQHI--KRIR      LLESTPKQYHQHI--KRIR
0918_CANAL    LLESTPKEYQQYI--KRIR      LLESTPKEYQQYI--KRIR

Gaps marked in red were moved. The sequence similarity in the alignment does not change considerably, however the total number of indels in this excerpt is reduced to 13 from the original 22


Move indels to more plausible position
From a CLUSTAL alignment:
4966_CANGL     MKHEKVQ------GGYGRFQ---GTW      MKHEKVQ------GGYGRFQ---GTW
1513_CANAL     KIKNVVK------VGSMNLK---GVW      KIKNVVK------VGSMNLK---GVW
6132_SCHPO     VDSKHP-----------QID---GVW  ->  VDSKHPQ-----------ID---GVW
1244_ASPFU     EICHSIT------GGALAAQ---GYW      EICHSIT------GGALAAQ---GYW

The two characters marked in red were swapped. This does not change the number of indels but places the "Q" into a a column in which it is more highly conserved (green). Progressive alignments are especially prone to this type of error.

Conserve motifs
From a CLUSTAL alignment:
6166_SCHPO      --DKRVA---GLWVPP      --DKRVA--G-LWVPP
XBP1_SACCE      GGYIKIQ---GTWLPM      GGYIKIQ--G-TWLPM
6355_ASPTE      --DEIAG---NVWISP  ->  ---DEIA--GNVWISP
5262_KLULA      GGYIKIQ---GTWLPY      GGYIKIQ--G-TWLPY

The first of the two residues marked in red is a conserved, solvent exposed hydrophobic residue that may mediate domain interactions. The second residue is the conserved glycine in a beta turn that cannot be mutated without structural disruption. Changing the position of a gap and insertion in one sequence improves the conservation of both motifs.


The Ankyrin domains are quite highly diverged, the boundaries not well defined and not even CDD, SMART and SAS agree on the precise annotations. We expect there to be alignment errors in this region. Nevertheless we would hope that a good alignment would recognize homology in that region and that ideally the required indels would be placed between the secondary structure elements, not in their middle. But judging from the sequence alignment alone, we cannot judge where the secondary structure elements ought to be. You should therefore add the following "sequence" to the alignment; it contains exactly as many characters as the Swi6 sequence above and annotates the secondary structure elements. I have derived it from the 1SW6 structure

>SecStruc 1SW6 E: strand   t: turn   H: helix   _: irregular
_EEE__tt___ttt______EE_____t___HHHHHHHHHHHHHHHH_xxxx_HHHHHHH
HHHH_t_____t_____t____HHHHHHH__tHHHHHHHHH____t___tt____HHHHH
HH__HHHH___HHHHHHHHHHHHHEE_t____HHHHHHHHH__t__HHHHHHHHHHHHHH
HHHHHH__EEE_xxxx_HHHHHt_HHHHHHH______t____HHHHHHHH__HHHHHHHH
H____t____t____HHHH___


To proceed:

  1. You should manually align the Swi6 sequence with yeast Mbp1
  2. You should bring the Secondary structure annotation into its correct alignment with Swi6
  3. You should bring both CDD ankyrin profiles into the correct alignment with yeast Mbp1

Proceed along the following steps:

  1. Add the secondary structure annotation to the sequence alignment in Jalview. Copy, select File → Add sequences → from Textbox and paste the sequence.
  2. Select Help → Documentation and read about Editing Alignments, Cursor Mode and Key strokes.
  3. Click on the yeast Mbp1 sequence row to select the entire row. Then use the cursor key to move that sequence directly above the 1SW6 sequence. Select the row of 1SW6 and use shift/mouse to move the sequence elements and realign them with yeast Mbp1. Refer to the alignment given in the Mbp1 annotation page.
  4. Align the secondary structure elements with the 1SW6 sequence: Every character of 1SW6 should be matched with either E, t, H, or _. The result should be similar to the Mbp1 annotation page. If you need to insert gaps into all sequences in the alignment, simply drag your mouse over all row headers - movement of sequences is constrained to selected regions, the rest is locked into place to prevent inadvertent misalignments. Remember to save your project from time to time: File → save so you can reload a previous state if anything goes wrong and can't be fixed with Edit → Undo.
  5. Finally align the two CD00204 consensus sequences to their correct positions (again, refer to the Mbp1 annotation page).
  6. You can now consider the principles stated above and see if you can improve the alignment, for example by moving indels out of regions of secondary structure if that is possible without changing the character of the aligned columns significantly. Select blocks within which to work to leave the remaining alignment unchanged. So that this does not become tedious, you can restrict your editing to one Ankyrin repeat that is structurally defined in Swi6. You may want to open the 1SW6 structure in VMD to define the boundaries of one such repeat. You can copy and paste sections from Jalview into your assignment for documentation or export sections of the alignment to HTML (see the example below).


(2.4.1) Editing ankyrin domain alignments - Sample

This sample was created by

  1. Editing the alignments as described above;
  2. Copying a block of aligned sequence;
  3. Pasting it To New Alignment;
  4. Colouring the residues by Hydrophobicity and setting the colour saturation according to Conservation;
  5. Choosing File → Export Image → HTML and pasting the resulting HTML source into this Wikipage.


10
|
20
|
30
|
40
|
MBP1_USTMA/341-368   - - Y G D Q L - - - A D - - - - - - - - - - I L - - - - N F Q D D E G E T P L T M A A R A R S
MBP1B_SCHCO/470-498   - R E D G D Y - - - K S - - - - - - - - - - F L - - - - D L Q D E H G D T A L N I A A R V G N
MBP1_ASHGO/465-494   F S P Q Y R I - - - E T - - - - - - - - - - L I - - - - N A Q D C K G S T P L H I A A M N R D
MBP1_CLALU/550-586   G N Q N G N S N D K K E - - - - - - - - - - L I S K F L N H Q D N E G N T A F H I A A Y N M S
MBPA_COPCI/514-542   - H E G G D F - - - R S - - - - - - - - - - L V - - - - D L Q D E H G D T A I N I A A R V G N
MBP1_DEBHA/507-550   I R D S Q E I - - - E N K K L S L S D K K E L I A K F I N H Q D I D G N T A F H I V A Y N L N
MBP1A_SCHCO/388-415   - - Y P K E L - - - A D - - - - - - - - - - V L - - - - N F Q D E D G E T A L T M A A R C R S
MBP1_AJECA/374-403   T L P P H Q I - - - S M - - - - - - - - - - L L - - - - S S Q D S N G D T A A L A A A K N G C
MBP1_PARBR/380-409   I L P P H Q I - - - S L - - - - - - - - - - L L - - - - S S Q D S N G D T A A L A A A K N G C
MBP1_NEOFI/363-392   T C S Q D E I - - - D L - - - - - - - - - - L L - - - - S C Q D S N G D T A A L V A A R N G A
MBP1_ASPNI/365-394   T F S P E E V - - - D L - - - - - - - - - - L L - - - - S C Q D S V G D T A V L V A A R N G V
MBP1_UNCRE/377-406   M Y P H H E V - - - G L - - - - - - - - - - L L - - - - A S Q D S N G D T A A L T A A K N G C
MBP1_PENCH/439-468   T C S Q D E I - - - Q M - - - - - - - - - - L L - - - - S C Q D Q N G D T A V L V A A R N G A
MBPA_TRIVE/407-436   V F P R H E I - - - S L - - - - - - - - - - L L - - - - S S Q D A N G D T A A L T A A K N G C
MBP1_PHANO/400-429   T W I P E E V - - - T R - - - - - - - - - - L L - - - - N A Q D Q N G D T A I M I A A R N G A
MBPA_SCLSC/294-313   - - - - - - - - - - - - - - - - - - - - - - - L - - - - D A R D I N G N T A I H I A A K N K A
MBPA_PYRIS/363-392   T W I P E E V - - - T R - - - - - - - - - - L L - - - - N A A D Q N G D T A I M I A A R N G A
MBP1_/361-390   - - - N H S L G V L S Q - - - - - - - - - - F M - - - - D T Q N N E G D T A L H I L A R S G A
MBP1_ASPFL/328-364   T E Q P G E V I T L G R - - - - - - - - - - F I S E I V N L R D D Q G D T A L N L A G R A R S
MBPA_MAGOR/375-404   Q H D P N F V - - - Q Q - - - - - - - - - - L L - - - - D A Q D N D G N T A V H L A A Q R G S
MBP1_CHAGL/361-390   S R S A D E L - - - Q Q - - - - - - - - - - L L - - - - D S Q D N E G N T A V H L A A M R D A
MBP1_PODAN/372-401   V R Q P E E V - - - Q A - - - - - - - - - - L L - - - - D A Q D E E G N T A L H L A A R V N A
MBP1_LACTH/458-487   F S P R Y R I - - - E N - - - - - - - - - - L I - - - - N A Q D Q N G D T A V H L A A Q N G D
MBP1_FILNE/433-460   - - Y P Q E L - - - A D - - - - - - - - - - V I - - - - N F Q D E E G E T A L T I A A R A R S
MBP1_KLULA/477-506   F T P Q Y R I - - - D V - - - - - - - - - - L I - - - - N Q Q D N D G N S P L H Y A A T N K D
MBP1_SCHST/468-501   A K D P D N K - - - K D - - - - - - - - - - L I A K F I N H Q D S D G N T A F H I C S H N L N
MBP1_SACCE/496-525   F S P Q Y R I - - - E L - - - - - - - - - - L L - - - - N T Q D K N G D T A L H I A S K N G D
CD00204/1-19   - - - - - - - - - - - - - - - - - - - - - - - - - - - - N A R D E D G R T P L H L A A S N G H
CD00204/99-118   - - - - - - - - - - - - - - - - - - - - - - - V - - - - N A R D K D G R T P L H L A A K N G H
1SW6/203-232   L D L K W I I - - - A N - - - - - - - - - - M L - - - - N A Q D S N G D T C L N I A A R L G N
SecStruc/203-232   t _ H H H H H - - - H H - - - - - - - - - - _ _ - - - - _ _ _ _ t _ _ _ _ H H H H H H H H _ _
Aligned sequences before editing. The algorithm has placed gaps into the Swi6 helix LKWIIAN and the four-residue gaps before the block of well aligned sequence on the right are poorly supported.


10
|
20
|
30
|
40
|
MBP1_USTMA/341-368   - - Y G D Q L A D - - - - - - - - - - - - - - I L N F Q D D E G E T P L T M A A R A R S
MBP1B_SCHCO/470-498   - R E D G D Y K S - - - - - - - - - - - - - - F L D L Q D E H G D T A L N I A A R V G N
MBP1_ASHGO/465-494   F S P Q Y R I E T - - - - - - - - - - - - - - L I N A Q D C K G S T P L H I A A M N R D
MBP1_CLALU/550-586   G N Q N G N S N D K K E - - - - - - - L I S K F L N H Q D N E G N T A F H I A A Y N M S
MBPA_COPCI/514-542   - H E G G D F R S - - - - - - - - - - - - - - L V D L Q D E H G D T A I N I A A R V G N
MBP1_DEBHA/507-550   I R D S Q E I E N K K L S L S D K K E L I A K F I N H Q D I D G N T A F H I V A Y N L N
MBP1A_SCHCO/388-415   - - Y P K E L A D - - - - - - - - - - - - - - V L N F Q D E D G E T A L T M A A R C R S
MBP1_AJECA/374-403   T L P P H Q I S M - - - - - - - - - - - - - - L L S S Q D S N G D T A A L A A A K N G C
MBP1_PARBR/380-409   I L P P H Q I S L - - - - - - - - - - - - - - L L S S Q D S N G D T A A L A A A K N G C
MBP1_NEOFI/363-392   T C S Q D E I D L - - - - - - - - - - - - - - L L S C Q D S N G D T A A L V A A R N G A
MBP1_ASPNI/365-394   T F S P E E V D L - - - - - - - - - - - - - - L L S C Q D S V G D T A V L V A A R N G V
MBP1_UNCRE/377-406   M Y P H H E V G L - - - - - - - - - - - - - - L L A S Q D S N G D T A A L T A A K N G C
MBP1_PENCH/439-468   T C S Q D E I Q M - - - - - - - - - - - - - - L L S C Q D Q N G D T A V L V A A R N G A
MBPA_TRIVE/407-436   V F P R H E I S L - - - - - - - - - - - - - - L L S S Q D A N G D T A A L T A A K N G C
MBP1_PHANO/400-429   T W I P E E V T R - - - - - - - - - - - - - - L L N A Q D Q N G D T A I M I A A R N G A
MBPA_SCLSC/294-313   - - - - - - - - - - - - - - - - - - - - - - - - L D A R D I N G N T A I H I A A K N K A
MBPA_PYRIS/363-392   T W I P E E V T R - - - - - - - - - - - - - - L L N A A D Q N G D T A I M I A A R N G A
MBP1_/361-390   N H S L G V L S Q - - - - - - - - - - - - - - F M D T Q N N E G D T A L H I L A R S G A
MBP1_ASPFL/328-364   T E Q P G E V I T L G R F I S E - - - - - - - I V N L R D D Q G D T A L N L A G R A R S
MBPA_MAGOR/375-404   Q H D P N F V Q Q - - - - - - - - - - - - - - L L D A Q D N D G N T A V H L A A Q R G S
MBP1_CHAGL/361-390   S R S A D E L Q Q - - - - - - - - - - - - - - L L D S Q D N E G N T A V H L A A M R D A
MBP1_PODAN/372-401   V R Q P E E V Q A - - - - - - - - - - - - - - L L D A Q D E E G N T A L H L A A R V N A
MBP1_LACTH/458-487   F S P R Y R I E N - - - - - - - - - - - - - - L I N A Q D Q N G D T A V H L A A Q N G D
MBP1_FILNE/433-460   - - Y P Q E L A D - - - - - - - - - - - - - - V I N F Q D E E G E T A L T I A A R A R S
MBP1_KLULA/477-506   F T P Q Y R I D V - - - - - - - - - - - - - - L I N Q Q D N D G N S P L H Y A A T N K D
MBP1_SCHST/468-501   A K D P D N K K D - - - - - - - - - - L I A K F I N H Q D S D G N T A F H I C S H N L N
MBP1_SACCE/496-525   F S P Q Y R I E L - - - - - - - - - - - - - - L L N T Q D K N G D T A L H I A S K N G D
CD00204/1-19   - - - - - - - - - - - - - - - - - - - - - - - - - N A R D E D G R T P L H L A A S N G H
CD00204/99-118   - - - - - - - - - - - - - - - - - - - - - - - - V N A R D K D G R T P L H L A A K N G H
1SW6/203-232   L D L K W I I A N - - - - - - - - - - - - - - M L N A Q D S N G D T C L N I A A R L G N
SecStruc/203-232   t _ H H H H H H H - - - - - - - - - - - - - - _ _ _ _ _ _ t _ _ _ _ H H H H H H H H _ _
Aligned sequence after editing. A significant cleanup of the frayed region is possible. Now there is only one insertion event, and it is placed into the loop that connects two helices of the 1SW6 structure.


(2.5) Final analysis


  • Compare the distribution of indels in the ankyrin repeat regions of your alignments. Review whether the indels in this region are concentrated in segments that connect the helices, or if they are more or less evenly distributed along the entire region of similarity. Think about whether the assertion that indels should not be placed in elements of secondary structure has merit in your alignment. Recognize that an indel in an element of secondary structure could be interpreted in a number of different ways:
    • The alignment is correct, the annotation is correct too: the indel is tolerated in that particular case, for example by extending the length of an α-helix or β-strand;
    • The alignment algorithm has made an error, the structural annotation is correct: the indel should be moved a few residues;
    • The alignment is correct, the structural annotation is wrong, this is not a secondary structure element after all;
    • Both the algorithm and the annotation are probably wrong, but we have no data to improve the situation.

(NB: remember that the structural annotations have been made for the yeast protein and might have turned out differently for the other proteins...)

You should be able to analyse discrepancies between annotation and expectation in a structured and systematic way. In particular if you notice indels that have been placed into structurally annotated regions of secondary structure, you should be able to comment on whether the location of the indel has strong support from aligned sequence motifs, or whether the indel could possibly be moved into a different location without much loss in alignment quality.


Analysis (2 marks)
  • Considering the whole alignment and your experience with editing, please note in your assignment your assessment of whether the position of indels relative to structural features of the ankyrin domains in your organism's Mbp1 protein is reliable.
  • CDD extends the ankyrin domain annotation beyond the 1SW6 domain boundaries. Given your assessment of conservation in that region, do you think that this is reasonable in your organisms' protein? Is there evidence for this in the alignment of the CD00204 consensus with well aligned blocks of sequence beyond the positions that match Swi6?


(3) Summary of Resources

 

Links
Lists


Further reading

 

[End of assignment]

 

If you have any questions at all, don't hesitate to mail me at boris.steipe@utoronto.ca or post your question to the Course Mailing List