Difference between revisions of "BIN-ALI-MSA"

From "A B C"
Jump to navigation Jump to search
m
m
 
(33 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:4px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
Multile Sequence Alignment
+
Multiple Sequence Alignment
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 +
(Multiple sequence alignment)
 +
</div>
 +
</div>
  
  {{Vspace}}
+
{{Smallvspace}}
 
+
 
<div class="keywords">
+
 
<b>Keywords:</b>&nbsp;
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
Multiple sequence alignment
+
<div style="font-size:118%;">
 +
<b>Abstract:</b><br />
 +
<section begin=abstract />
 +
A carefully produced multiple sequence alignment is an indispensable, extarordinarily valuable asset for the analysis of sequence features. Fully automated methods are regularly inferior to knowledgeable manual curation of alignments. In this unit we will discuss the concepts, practice producing MSA's online and in R, and analyze, write and display alignments. The goal is to empower you to produce the best alignments possible.
 +
<section end=abstract />
 +
</div>
 +
<!-- ============================  -->
 +
<hr>
 +
<table>
 +
<tr>
 +
<td style="padding:10px;">
 +
<b>Objectives:</b><br />
 +
This unit will ...
 +
* ... introduce the benefits of multiple sequence alignments (MSA), the objective functions they pursue, algorithms and methods, practical considerations, and the analysis of alignments;
 +
* ... demonstrate Web services that calculate MSAs;
 +
* ... teach how to compute and analyze MSA's in R.
 +
</td>
 +
<td style="padding:10px;">
 +
<b>Outcomes:</b><br />
 +
After working through this unit you ...
 +
* ... can critically assess available options for producing Multiple Sequence Alignments;
 +
* ... are familar with online and R programming tools to produce alignments;
 +
* ... have aligned the full length sequence of the MYSPE Mbp1 orthologue to a selected set of reference sequences.
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================  -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 +
*[[BIN-ALI-PSI-BLAST|BIN-ALI-PSI-BLAST (PSI-BLAST)]]
 +
*[[FND-STA-Information_theory|FND-STA-Information_theory (Concepts of Information Theory)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 
</div>
 
</div>
  
{{Vspace}}
+
{{Smallvspace}}
  
  
__TOC__
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
{{DEV}}
+
__TOC__
  
 
{{Vspace}}
 
{{Vspace}}
  
  
</div>
+
=== Evaluation ===
<div id="ABC-unit-framework">
+
This learning unit can be evaluated for a maximum of 5 marks. There are several options for submission. Choose one option, then ...
== Abstract ==
+
<ol>
<!-- included from "../components/BIN-ALI-MSA.components.wtxt", section: "abstract" -->
+
<li>Create a new page on the student Wiki as a subpage of your User Page.</li>
...
+
<li>Put all of your writing to submit on this one page.</li>
 +
<li>When you are done with everything, go to the [https://q.utoronto.ca/courses/180416/assignments Quercus '''Assignments''' page] and open the first Learning Unit that you have not submitted yet. Paste the URL of your Wiki page into the form, and click on '''Submit Assignment'''.</li>
 +
</ol>
  
{{Vspace}}
+
Your link can be submitted only once and not edited. But you may change your Wiki page at any time. However only the last version before the due date will be marked. All later edits will be silently ignored.
  
 +
{{Smallvspace}}
  
== This unit ... ==
+
; Short Report option
=== Prerequisites ===
+
:'''1.''' Create a new page on the student Wiki as a subpage of your User Page.
<!-- included from "../components/BIN-ALI-MSA.components.wtxt", section: "prerequisites" -->
+
:'''2.''' Write a short report on '''one of the five following topics - A, B, C, D, or E'''. (All reports must have the R code you wrote in an appendix.)
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
 
You need to complete the following units before beginning this one:
 
*[[BIN-ALI-Optimal_sequence_alignment]]
 
*[[BIN-ALI-PSI-BLAST]]
 
*[[FND-STA-Information_theory]]
 
  
{{Vspace}}
+
::'''A - Publication quality plot'''
 +
:::'''A.1''' Create a publication quality figure and figure caption of an MSA of Mbp1 orthologue sequences '''including MYSPE''', that covers the APSES domain only. Produce this as a single page PDF using the <tt>msa::</tt> package <code>msa::msaPrettyPrint()</code> function, and upload to the Student Wiki.
 +
:::'''A.2''' In your report, document the procedure and discuss how you have chosen the color parameters to illustrate interesting points about the domain.
  
 +
::'''B - Algorithm Comparison: MAFFT'''
 +
:::'''B.1''' At the EBI, produce a MSA of the full-length Mbp1 orthologues of the reference species plus MYSPE, using the MAFFT algorithm - a good, general purpose MSA algorithm.
 +
:::'''B.2''' Import the alignment to R and evaluate its quality relative to the MUSCLE alignment with default parameters. Report your findings.
  
=== Objectives ===
+
::'''C - Algorithm Comparison: WebPRANK'''
<!-- included from "../components/BIN-ALI-MSA.components.wtxt", section: "objectives" -->
+
:::'''C.1''' At the EBI, produce a MSA of the Mbp1 orthologues of the reference species plus MYSPE, using the WebPRANK algorithm which has an interesting approach to defining indels from computed phylogenetic relationships.
...
+
:::'''C.2''' Import the alignment to R and evaluate its quality relative to the MUSCLE alignment with default parameters. Report your findings.
  
{{Vspace}}
+
::'''D - Algorithm Comparison: PRALINE'''
 +
:::'''D.1''' PRALINE reportedly produces some of the best alignments due to its (slow) PSI-BLAST profile pre-processing step, that pulls in additional homologues to increase the information that goes into the alignment. Access the [http://www.ibi.vu.nl/programs/pralinewww/ '''PRALINE Web Server'''] and produce a high-quality MSA of the Mbp1 orthologues of the reference species plus MYSPE.
 +
:::'''D.2''' Import the alignment to R and evaluate its quality relative to the MUSCLE alignment with default parameters. Report your findings.
  
 +
::'''E - Algorithm Parameters: MUSCLE'''
 +
:::'''E.1''' MUSCLE has a large number of additional parameters to tweak alignments. Discuss their use, and try different variations on the MSA of the Mbp1 orthologues of the reference species plus MYSPE<ref>A good example how systematic tweaking of parameters can improve alignments is here: {{#pmid:27376004}}</ref>.
 +
:::'''E.2''' Report on the results of your experiments.
  
=== Outcomes ===
+
:'''3.''' When you are done, submit the link to your page via Quercus as described above.
<!-- included from "../components/BIN-ALI-MSA.components.wtxt", section: "outcomes" -->
 
...
 
  
{{Vspace}}
+
{{Smallvspace}}
  
 +
<!--
 +
; Tasks submission option
 +
:# Create a new page on the student Wiki as a subpage of your User Page.
 +
:# There are a number of tasks in which you are explicitly asked you to submit code or other text for credit. Put all of these submission on this one page.
 +
:# When you are done with everything, add the following category tag '''to the end of page''':
 +
::<code><nowiki>[[Category:EVAL-BIN-ALI-MSA]]</nowiki></code>.
  
=== Deliverables ===
+
Once the page has been saved with this tag, it is considered "submitted".
<!-- included from "../components/BIN-ALI-MSA.components.wtxt", section: "deliverables" -->
+
'''Do not''' change your submission after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
+
-->
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|course journal]].
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|insights! page]].
 
  
{{Vspace}}
+
<!--
 +
; Quiz option
 +
: Open the [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/Signup-BIN-ALI-MSA_Quiz  '''signup-page for the quiz for this unit (linked from here)'''] and add your name. Your name must be signed up by 12:00 of the day of the Quiz to ensure copies of the quiz are available for all participants.
 +
<div style="margin-left: 2rem;">Quizzes will be written in class, back-to-back if there is more than one quiz scheduled. We may begin at any time. We will have an open-ended Q&A session before the quiz. You can't take the quiz if you are not present in class when the question sheets are handed out, so don't be late. Once all scheduled quizzes are written, we will discuss and mark them. You will mark your own quiz. All marking must be done with a red pen - so you '''must''' bring a red pen to class in order to participate. The mark you give yourself may be revised by the instructor after spot-checking quizzes. If this is necessary, you will be notified. You must mark your quiz correctly and honestly - don't get into trouble with academic integrity rules: it will be an academic offence if you mark questions as correct that were discussed in class and should have been marked incorrect. When in doubt, ask.</div>
 +
-->
  
 +
{{Smallvspace}}
  
=== Evaluation ===
+
; R-code option
<!-- included from "../components/BIN-ALI-MSA.components.wtxt", section: "evaluation" -->
+
:Alignments can get very long it would be great to have an overview plot of the full-length alignment in one image. Your task is to write a function for that.
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
+
:Submit code according to the following requirements. Make sure your code is documented and that you have tested your functions to be correct.
<b>Evaluation: NA</b><br />
+
:*Write a function that takes an MsaAAMultipleAlignment object as input and produces a plot of the entire alignment. Sections of gaps shall be shown as continuos lines (<code>segments()</code>). Aligned residues shall be shown as rectangles (<code>rect()</code>). Provide an option to define line colors (e.g. default: "lightgrey"). Provide an option to define fill colors for residue rectangles (e.g. default: "skyblue"). Provide an option to color alignment columns with a color gradient according to the alignment score instead. Here is some code for inspiration of how to work with a color palette:
:This unit is not evaluated for course marks.
+
<source lang="R">
 
+
# v is the vector of moving-average scores of msaMscores
{{Vspace}}
+
lev <- cut(v, labels = FALSE, breaks = 10)
 +
myPal <- colorRampPalette(c("#e8e8e8", "#d6d6d6","#c4c4c4", "#b2b2b2",
 +
                            "#f4a582", "#d6604d", "#b2182b"))
 +
myCol <- myPal(max(lev))
  
 +
barplot(msaMScores, col=myCol[lev], border = NA)
 +
</source>
 +
:*Create a new page on the student Wiki as a subpage of your User Page. Put your documented code and instructions there.
 +
:* When you are done, submit the link to your page via Quercus as described above.
 +
{{Smallvspace}}
  
</div>
+
;Option to write a "Self-Evaluation Question"
<div id="BIO">
+
: You can submit  a "Self-Evaluation Question" for at most '''one''' of your  assignments.
 +
:Write a "Self-evaluation Question" (with a model solution) that explores the '''interpretation''' of an MSA. The goal is for the learner to think about the biological interpretation of a multiple sequence alignment. Questions that I find interesting often explain the context of a biological fact (e.g. a phosporylation site, a ligand binding site, a domain boundary, a frameshift mutation etc. etc.), then ask to interpret an MSA as to how it represents information about the fact. Apply the [[ABC-Rubrics| '''marking rubrics''']] in spirit to satisfy yourself of the quality of your question. Use the format and code templates that you find on the [[Self_evaluation_questions|'''Self evaluation questions page''']] -  but don't assume those examples are already models of excellent contributions. This will be a short-answer format question.  Note: assume that approximately the same amount of work is expected for all evaluation options. Consequently, the standard of excellence for this option will be quite high.
 +
:* Create a new page on the student Wiki as a subpage of your User Page. Develop your question there.
 +
:* When you are done, submit the link to your page via Quercus as described above.
 
== Contents ==
 
== Contents ==
<!-- included from "../components/BIN-ALI-MSA.components.wtxt", section: "contents" -->
 
  
  
Line 92: Line 158:
  
  
Multiple sequence alignments ('''MSAs''') are further useful to resolve ambiguities in the precise placement of  "indels"<ref>"indel": '''in'''sertion / '''del'''etion – a difference in sequence length between two aligned sequences that is accommodated by gaps in the alignment. Since we can't tell from the comparison of two sequences whether such a change was introduced by ''insertion into'' or ''deletion from'' the ancestral sequence, we join both into a {{WP|Portmanteau|''portmanteau''}}.</ref> and to ensure that columns in alignments actually contain amino acids that evolve in a similar context. MSAs serve as input for
+
Multiple sequence alignments ('''MSAs''') are enormously useful to resolve ambiguities in the precise placement of  "indels"<ref>"indel": '''in'''sertion / '''del'''etion – a difference in sequence length between two aligned sequences that is accommodated by gaps in the alignment. Since we can't tell from the comparison of two sequences whether such a change was introduced by ''insertion into'' or ''deletion from'' the ancestral sequence, we join both into a {{WP|Portmanteau|''portmanteau''}}.</ref> and to ensure that columns in alignments actually contain amino acids that evolve in a similar context. MSAs serve as input for
 
* functional annotation;
 
* functional annotation;
 
* protein homology modelling;
 
* protein homology modelling;
* phylogenetic analyses, and
+
* phylogenetic analyses;
* sensitive homology searches in databases.
+
* sensitive homology searches in databases;
 +
* and more.
  
  
Line 109: Line 176:
 
*selection bias may weight our results toward sequences that are over-represented and do not provide a fair representation of evolutionary divergence.
 
*selection bias may weight our results toward sequences that are over-represented and do not provide a fair representation of evolutionary divergence.
  
&nbsp;<br>
+
{{Vspace}}
  
 +
===MSA's on the web at the EBI===
  
 
{{Vspace}}
 
{{Vspace}}
  
===Computing an MSA in R===
+
The EBI hosts a number of excellent MSA programs on their Website. Let's perform an MSA of full length MBP1 orthologues:
 +
 
 +
 
 +
{{task|1=
 +
 
 +
* Navigate to the [https://www.ncbi.nlm.nih.gov/protein/ NCBI protein database] and paste the MBP1 protein RefSeq IDs from our database into the search form:
 +
NP_010227 NP_593032 XP_660758 XP_007682304 XP_955821 XP_001837394
 +
XP_569090 XP_003327086 XP_011392621 XP_006957051
 +
(add your MBP1_MYSPE RefSeq ID too!)
 +
 
 +
* This will give you a page with links to the retrieved sequences. Click on '''Summary''' and choose FASTA(text) as the '''Format''' to retrieve all sequences at once as a multi-FASTA formatted page (this is useful, remember it!)
 +
* Open another browser window and navigate to the [https://www.ebi.ac.uk/Tools/msa/ '''EBI MSA tools'''] page.
 +
* Click on '''Launch T-coffee'''.
 +
* Copy the FASTA sequences from the NCBI page, and paste them into the form at the EBI's T-Coffee page. Click '''Submit'''.
 +
* The result should show you the aligned sequences, with three blocks of high similarity:
 +
** The most N-terminal block is the APSES domain - the main DNA binding domain of these transcription factors.
 +
** In the middle, we have Ankyrin domains: these are protein-protein interaction modules that Mbp1 uses to recruit other proteins to the bound complex.
 +
** At the end, there is one additional, shorter segment of high similarity.
 +
 
 +
* Explore the tabs that are available, in particular note that you can save the result to a file.
 +
* Click on the '''Download Alignment File''' tab to load the alignment as text into a browser window. Then save the file into your project directory with a filename of <code>msaT.aln</code>. (<code>.aln</code> is the standard extension for CLUSTAL Formatted aligment files, so it helps if we give the file that extension. Of course you know better than to '''rely''' on an extension to signal the filetype and format.)
 +
 
 +
}}
  
{{Vspace}}
 
  
  
Let's use the Bioconductor msa package to align the sequences we have. Study and run the following code
 
  
 +
===MSA's in R===
  
{{task|1 =
+
{{Vspace}}
  
* Return to your RStudio session.
+
Let's move to our RStudio project to explore producing and analyzing multiple sequence alignments in R.
* Make sure you have saved <code>myDB</code> as instructed previously.
 
* Bring code and data resources up to date:
 
** '''pull''' the most recent version of the project from GitHub
 
** type <code>init()</code> to lod the most recent files and functions
 
** re-merge your current <code>myDB</code>
 
* Study and work through the code in the <code>Multiple sequence alignments</code> section of the <code>BCH441_A04.R</code> script.
 
* Note that the final task asks you to print out some results and bring them to class for the next quiz.
 
  
}}
+
{{Smallvspace}}
 +
 
 +
{{ABC-unit|BIN-ALI-MSA.R}}
  
 
{{Vspace}}
 
{{Vspace}}
Line 143: Line 227:
 
Really excellent software tools have been written that help you visualize and manually curate multiple sequence alignments. If anything, I think they tend to do too much. Past versions of the course have used Jalview, but I have heard good things of AliView <small>(and if you are on a Mac [https://github.com/4ment/seqotron seqotron] might interest you, but I only cover software that is free and runs on all three major platforms)</small>.
 
Really excellent software tools have been written that help you visualize and manually curate multiple sequence alignments. If anything, I think they tend to do too much. Past versions of the course have used Jalview, but I have heard good things of AliView <small>(and if you are on a Mac [https://github.com/4ment/seqotron seqotron] might interest you, but I only cover software that is free and runs on all three major platforms)</small>.
  
Right now, I am just mentioning the two alignment editors. If you have experience with comparing them, let us know.
+
Here, I am just mentioning the two alignment editors and encourage you to explore and use them. If you have experience with comparing them, let us know.
  
 
* [[http://www.jalview.org/ '''Jalview''']] an integrated MSA editor and sequence annotation workbench from the Barton lab in Dundee. Lots of functions.
 
* [[http://www.jalview.org/ '''Jalview''']] an integrated MSA editor and sequence annotation workbench from the Barton lab in Dundee. Lots of functions.
 
* [[http://www.ormbunkar.se/aliview/ '''AliView''']] from Uppsala: fast, lean, looks to be very practical.
 
* [[http://www.ormbunkar.se/aliview/ '''AliView''']] from Uppsala: fast, lean, looks to be very practical.
  
 +
However: we should spend a moment considering the kind of improvements '''manual editing''' of alignments can aim for.
 +
 +
{{Vspace}}
 +
 +
====Alignment Editing====
 +
 +
 +
A '''good''' MSA comprises only columns of residues that play similar roles in the proteins' mechanism and/or that evolve in a comparable structural context. Since the alignment reflects the result of biological selection and conservation, it has relatively few indels and the indels it has are usually not placed into elements of secondary structure or into functional motifs. For example, the contiguous features annotated for Mbp1 are expected to be left intact by a good alignment.
 +
 +
A '''poor''' MSA has many errors in its columns; these contain residues that actually have different functions or structural roles, even though they may look similar according to a (pairwise!) scoring matrix. A poor MSA also may have introduced indels in biologically irrelevant positions, to maximize spurious sequence similarities. Some of the features annotated for Mbp1 will be disrupted in a poor alignment and residues that are conserved may be placed into different columns.
 +
 +
Often errors or inconsistencies are easy to spot. The main goal of manual editing is to make an alignment biologically more plausible. Most commonly this means to mimize the number of rare evolutionary events that the alignment suggests and/or to emphasize conservation of known functional motifs. Here are some examples:
 +
 +
;Reduce number of indels
 +
From a Probcons alignment:
 +
0447_DEBHA    ILKTE-K<span style="color: rgb(255, 0, 0);">-</span>T<span style="color: rgb(255, 0, 0);">---</span>K--SVVK      ILKTE----KTK---SVVK
 +
9978_GIBZE    MLGLN<span style="color: rgb(255, 0, 0);">-</span>PGLKEIT--HSIT      MLGLNPGLKEIT---HSIT
 +
1513_CANAL    ILKTE-K<span style="color: rgb(255, 0, 0);">-</span>I<span style="color: rgb(255, 0, 0);">---</span>K--NVVK      ILKTE----KIK---NVVK
 +
6132_SCHPO    ELDDI-I<span style="color: rgb(255, 0, 0);">-</span>ESGDY--ENVD      ELDDI-IESGDY---ENVD
 +
1244_ASPFU    ----N<span style="color: rgb(255, 0, 0);">-</span>PGLREIC--HSIT  -&gt;  ----NPGLREIC---HSIT
 +
0925_USTMA    LVKTC<span style="color: rgb(255, 0, 0);">-</span>PALDPHI--TKLK      LVKTCPALDPHI---TKLK
 +
2599_ASPTE    VLDAN<span style="color: rgb(255, 0, 0);">-</span>PGLREIS--HSIT      VLDANPGLREIS---HSIT
 +
9773_DEBHA    LLESTPKQYHQHI--KRIR      LLESTPKQYHQHI--KRIR
 +
0918_CANAL    LLESTPKEYQQYI--KRIR      LLESTPKEYQQYI--KRIR
 +
 +
<small>Gaps marked in red were moved. The sequence similarity in the alignment does not change considerably, however the total number of indels in this excerpt is reduced to 13 from the original 22</small>
 +
 +
 +
;Move indels to more plausible position
 +
From a CLUSTAL alignment:
 +
4966_CANGL    MKHEKVQ------GGYGRFQ---GTW      MKHEKV<span style="color: rgb(0, 170, 0);">Q</span>------GGYGRFQ---GTW
 +
1513_CANAL    KIKNVVK------VGSMNLK---GVW      KIKNVV<span style="color: rgb(0, 170, 0);">K</span>------VGSMNLK---GVW
 +
6132_SCHPO    VDSKHP<span style="color: rgb(255, 0, 0);">-</span>----------<span style="color: rgb(255, 0, 0);">Q</span>ID---GVW  -&gt;  VDSKHP<span style="color: rgb(0, 170, 0);">Q</span>-----------ID---GVW
 +
1244_ASPFU    EICHSIT------GGALAAQ---GYW      EICHSI<span style="color: rgb(0, 170, 0);">T</span>------GGALAAQ---GYW
 +
 +
<small>The two characters marked in red were swapped. This does not change the number of indels but places the "Q" into a a column in which it is more highly conserved (green). Progressive alignments are especially prone to this type of error.</small>
 +
 +
;Conserve motifs
 +
From a CLUSTAL W alignment:
 +
6166_SCHPO      --DKR<span style="color: rgb(255, 0, 0);">V</span>A---<span style="color: rgb(255, 0, 0);">G</span>LWVPP      --DKR<span style="color: rgb(0, 255, 0);">V</span>A--<span style="color: rgb(0, 255, 0);">G</span>-LWVPP
 +
XBP1_SACCE      GGYIK<span style="color: rgb(255, 0, 0);">I</span>Q---<span style="color: rgb(255, 0, 0);">G</span>TWLPM      GGYIK<span style="color: rgb(0, 255, 0);">I</span>Q--<span style="color: rgb(0, 255, 0);">G</span>-TWLPM
 +
6355_ASPTE      --DE<span style="color: rgb(255, 0, 0);">I</span>A<span style="color: rgb(255, 0, 0);">G</span>---NVWISP  -&gt;  ---DE<span style="color: rgb(0, 255, 0);">I</span>A--<span style="color: rgb(0, 255, 0);">G</span>NVWISP
 +
5262_KLULA      GGYIK<span style="color: rgb(255, 0, 0);">I</span>Q---<span style="color: rgb(255, 0, 0);">G</span>TWLPY      GGYIK<span style="color: rgb(0, 255, 0);">I</span>Q--<span style="color: rgb(0, 255, 0);">G</span>-TWLPY
 +
 +
<small>The first of the two residues marked in red is a conserved, solvent exposed hydrophobic residue that may mediate domain interactions. The second residue is the conserved glycine in a beta turn that cannot be mutated without structural disruption. Changing the position of a gap and insertion in one sequence improves the conservation of both motifs.</small>
  
 
{{Vspace}}
 
{{Vspace}}
  
 +
;An example of alignment editing for ankyrin domains.
 +
This is example below came from alignment editing in JALVIEW. Columns were coloured by hydrophobicity, and the examples were exported to HTML and then pasted into the page source. Not that the bottom row of the alignment contains a manually added sequence that represents secondary structure elements that were determined by X-ray crystallography of the Swi6 ankyrin domain.
 +
 +
<table border="1"><tr><td>
 +
<table border="0" cellpadding="0" cellspacing="0">
 +
 +
<tr><td colspan="6"></td>
 +
<td colspan="9">10<br>|</td><td></td>
 +
<td colspan="9">20<br>|</td><td></td>
 +
<td colspan="9">30<br>|</td><td></td>
 +
<td colspan="3"></td><td colspan="3">40<br>|</td>
 +
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_USTMA/341-368&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">E</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#d3c2ee">P</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#ccaddf">T</td>
 +
<td bgcolor="#ecc2d5">M</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1B_SCHCO/470-498&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#f7d8e0">F</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
 +
<td bgcolor="#b0adfa">N</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#fcc2c4">V</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
</tr>
 +
 +
<tr><td nowrap="nowrap">MBP1_ASHGO/465-494&nbsp;&nbsp;</td>
 +
<td>F</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#f4eef8">T</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#efc2d0">C</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
 +
<td bgcolor="#e6d8f0">S</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#d3c2ee">P</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e5adc6">M</td>
 +
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_CLALU/550-586&nbsp;&nbsp;</td>
 +
<td>G</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td>N</td>
 +
<td>D</td>
 +
<td>K</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>S</td>
 +
<td>K</td>
 +
<td>F</td>
 +
<td>L</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#edadbd">F</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#c6ade5">Y</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f9eef3">M</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
</tr>
 +
 +
<tr><td nowrap="nowrap">MBPA_COPCI/514-542&nbsp;&nbsp;</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fdd8da">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#ffadad">I</td>
 +
<td bgcolor="#b0adfa">N</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#fcc2c4">V</td>
 +
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_DEBHA/507-550&nbsp;&nbsp;</td>
 +
<td>I</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td>K</td>
 +
<td>K</td>
 +
 +
<td>L</td>
 +
<td>S</td>
 +
<td>L</td>
 +
<td>S</td>
 +
<td>D</td>
 +
<td>K</td>
 +
<td>K</td>
 +
<td>E</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>A</td>
 +
<td>K</td>
 +
<td>F</td>
 +
<td>I</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#edadbd">F</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
 +
<td bgcolor="#fbadaf">V</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#c6ade5">Y</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1A_SCHCO/388-415&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fdd8da">V</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">E</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#ccaddf">T</td>
 +
<td bgcolor="#ecc2d5">M</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#efc2d0">C</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
 +
<td bgcolor="#f4eef8">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_AJECA/374-403&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#f9eef3">M</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#e6d8f0">S</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">K</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#faeef2">C</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PARBR/380-409&nbsp;&nbsp;</td>
 +
<td>I</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#e6d8f0">S</td>
 +
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">K</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#faeef2">C</td>
 +
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_NEOFI/363-392&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#faeef2">C</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#e6d8f0">S</td>
 +
<td bgcolor="#faeef2">C</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#fcc2c4">V</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_ASPNI/365-394&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#e6d8f0">S</td>
 +
<td bgcolor="#faeef2">C</td>
 +
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#fbadaf">V</td>
 +
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#fcc2c4">V</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
</tr>
 +
 +
<tr><td nowrap="nowrap">MBP1_UNCRE/377-406&nbsp;&nbsp;</td>
 +
<td>M</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f2d8e5">A</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">K</td>
 +
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#faeef2">C</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PENCH/439-468&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#faeef2">C</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f9eef3">M</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#e6d8f0">S</td>
 +
<td bgcolor="#faeef2">C</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">Q</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#fbadaf">V</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
 +
<td bgcolor="#fcc2c4">V</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
</tr>
 +
 +
<tr><td nowrap="nowrap">MBPA_TRIVE/407-436&nbsp;&nbsp;</td>
 +
 +
<td>V</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td bgcolor="#e6d8f0">S</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">K</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#faeef2">C</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PHANO/400-429&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#f4eef9">W</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f4eef8">T</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
 +
<td bgcolor="#c5c2fb">Q</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#ffadad">I</td>
 +
<td bgcolor="#e5adc6">M</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBPA_SCLSC/294-313&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#ffadad">I</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">K</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
 +
<td bgcolor="#f9eef3">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBPA_PYRIS/363-392&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#f4eef9">W</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f4eef8">T</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
  
====Jalview: alignment editor====
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">Q</td>
  
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#ffadad">I</td>
 +
<td bgcolor="#e5adc6">M</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
  
Geoff Barton's lab in Dundee has developed an integrated MSA editor and sequence annotation workbench with a number of very useful functions. It is written in Java and should run on Mac, Linux and Windows platforms without modifications.
+
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_/361-390&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>G</td>
 +
<td>V</td>
 +
<td>L</td>
 +
<td bgcolor="#f4eef8">S</td>
  
{{#pmid: 19151095}}
+
<td bgcolor="#eeeefe">Q</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f7d8e0">F</td>
 +
<td bgcolor="#f3d8e4">M</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
  
We will quickly install Jalview and explore its features in other assignments.
+
<td bgcolor="#f4eef8">T</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
  
{{task|1=
+
<td bgcolor="#f7adb3">L</td>
#Navigate to the [http://www.jalview.org/ Jalview homepage] click on the '''Download''' link, and install Jalview on your computer. For Mac OS X, use the '''Install Jalview Only''' link.
+
<td bgcolor="#b3adf7">H</td>
# Start Jalview. A number of windows that showcase the program's abilities will load, you can close these.
+
<td bgcolor="#ffc2c2">I</td>
#Select File &rarr; Input Alignment &rarr; from File and open the <code>APSES_proteins.mfa</code> file you have prepared above. An alignment window with sequences should appear.
+
<td bgcolor="#f7adb3">L</td>
# Choose '''Web Service''' &rarr; '''Alignment''' &rarr; '''Tcoffee with Defaults''' to run a Tcoffee MSA remotely at the Barton lab. The program should execute remotely and download the aligned results into a new window. Scroll along the window to get a sense of what has and hasn't been aligned.
+
<td bgcolor="#e4adc7">A</td>
#Select File &rarr; Input Alignment &rarr; from File and open the <code>APSES_proteins_muscle.mfa</code> file you have prepared above. An alignment window with your Muscle alignment should appear.
+
<td bgcolor="#adadff">R</td>
#Compare the two alignments and get a sense for how similar or different they are.
+
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
 
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_ASPFL/328-364&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
 
 +
<td>I</td>
 +
<td>T</td>
 +
<td>L</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f7d8e0">F</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>S</td>
 +
 
 +
<td>E</td>
 +
<td>I</td>
 +
<td>V</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b0adfa">N</td>
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
 
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBPA_MAGOR/375-404&nbsp;&nbsp;</td>
 +
<td>Q</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
 
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
 
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#fbadaf">V</td>
 +
 
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#b0adfa">Q</td>
 +
<td bgcolor="#c2c2ff">R</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
</tr>
 +
 
 +
<tr><td nowrap="nowrap">MBP1_CHAGL/361-390&nbsp;&nbsp;</td>
 +
<td>S</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
 
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#fbadaf">V</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e5adc6">M</td>
 +
 
 +
<td bgcolor="#c2c2ff">R</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PODAN/372-401&nbsp;&nbsp;</td>
 +
<td>V</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
 
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
 
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#fcc2c4">V</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
</tr>
 +
 
 +
<tr><td nowrap="nowrap">MBP1_LACTH/458-487&nbsp;&nbsp;</td>
 +
 
 +
<td>F</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">Q</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
 
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#fbadaf">V</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#b0adfa">Q</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
 
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_FILNE/433-460&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
 
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fdd8da">V</td>
 +
 
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#fbeef1">F</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
 
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">E</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#ccaddf">T</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
 
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_KLULA/477-506&nbsp;&nbsp;</td>
 +
<td>F</td>
 +
 
 +
<td bgcolor="#f4eef8">T</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#fdeeee">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
 
 +
<td bgcolor="#d3c2ee">P</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#d5c2ec">Y</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#ccaddf">T</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
 
 +
<td bgcolor="#eeeefe">D</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_SCHST/468-501&nbsp;&nbsp;</td>
 +
<td>A</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
 
 +
<td bgcolor="#eeeeff">K</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#ffd8d8">I</td>
 +
 
 +
<td>A</td>
 +
<td>K</td>
 +
<td>F</td>
 +
<td>I</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
 
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
<td bgcolor="#edadbd">F</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#eaadc0">C</td>
 +
 
 +
<td bgcolor="#caade0">S</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_SACCE/496-525&nbsp;&nbsp;</td>
 +
<td>F</td>
 +
<td bgcolor="#f4eef8">S</td>
 +
 
 +
<td bgcolor="#f2eefa">P</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#f3eef9">Y</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#eeeefe">E</td>
 +
 
 +
<td bgcolor="#fdeeef">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
 
 +
<td bgcolor="#f4eef8">T</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c2c2ff">K</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#ebc2d5">A</td>
 +
 
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#caade0">S</td>
 +
<td bgcolor="#adadff">K</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
 
 +
</tr>
 +
<tr><td nowrap="nowrap">CD00204/1-19&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c5c2fb">E</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
 
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#d8d8ff">R</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#d3c2ee">P</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
 
 +
<td bgcolor="#caade0">S</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#efeefd">H</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">CD00204/99-118&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fdd8da">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
 
 +
<td bgcolor="#eeeeff">R</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#c2c2ff">K</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
<td bgcolor="#d8d8ff">R</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#d3c2ee">P</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
 
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">K</td>
 +
<td bgcolor="#c5c2fb">N</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#efeefd">H</td>
 +
</tr>
 +
 
 +
<tr><td nowrap="nowrap">1SW6/203-232&nbsp;&nbsp;</td>
 +
<td>L</td>
 +
<td bgcolor="#eeeefe">D</td>
 +
<td bgcolor="#fdeeef">L</td>
 +
<td bgcolor="#eeeeff">K</td>
 +
<td bgcolor="#f4eef9">W</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td bgcolor="#ffeeee">I</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f3d8e4">M</td>
 +
<td bgcolor="#fbd8db">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dad8fd">N</td>
 +
<td bgcolor="#f9eef3">A</td>
 +
<td bgcolor="#eeeefe">Q</td>
 +
<td bgcolor="#c5c2fb">D</td>
 +
<td bgcolor="#d8c2e8">S</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
<td bgcolor="#cfaddc">G</td>
 +
 
 +
<td bgcolor="#dad8fd">D</td>
 +
<td bgcolor="#d9c2e7">T</td>
 +
<td bgcolor="#efc2d0">C</td>
 +
<td bgcolor="#f7adb3">L</td>
 +
<td bgcolor="#b0adfa">N</td>
 +
<td bgcolor="#ffc2c2">I</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#e4adc7">A</td>
 +
<td bgcolor="#adadff">R</td>
 +
 
 +
<td bgcolor="#f9c2c7">L</td>
 +
<td bgcolor="#f4eef7">G</td>
 +
<td bgcolor="#eeeefe">N</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">SecStruc/203-232&nbsp;&nbsp;</td>
 +
<td>t</td>
 +
<td bgcolor="#f5eef6">_</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#efeefd">H</td>
 +
 
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td bgcolor="#efeefd">H</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td bgcolor="#ead8ed">_</td>
 +
<td bgcolor="#ead8ed">_</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#ead8ed">_</td>
 +
<td bgcolor="#f5eef6">_</td>
 +
<td bgcolor="#f5eef6">_</td>
 +
 
 +
<td bgcolor="#dec2e3">_</td>
 +
<td bgcolor="#d9c2e7">t</td>
 +
<td bgcolor="#f5eef6">_</td>
 +
<td bgcolor="#d2add8">_</td>
 +
<td bgcolor="#ead8ed">_</td>
 +
<td bgcolor="#dec2e3">_</td>
 +
<td bgcolor="#c7c2f9">H</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
 
 +
<td bgcolor="#c7c2f9">H</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#b3adf7">H</td>
 +
<td bgcolor="#c7c2f9">H</td>
 +
<td bgcolor="#f5eef6">_</td>
 +
<td bgcolor="#f5eef6">_</td>
 +
</tr>
 +
</table>
 +
</td></tr>
 +
 
 +
</table>
 +
;Aligned sequences before editing. The algorithm has placed gaps into the Swi6 helix <code>LKWIIAN</code> and the four-residue gaps before the block of well aligned sequence on the right are poorly supported.
 +
 
 +
 
 +
<table border="1"><tr><td>
 +
<table border="0" cellpadding="0" cellspacing="0">
 +
 
 +
<tr><td colspan="6"></td>
 +
<td colspan="9">10<br>|</td><td></td>
 +
<td colspan="9">20<br>|</td><td></td>
 +
 
 +
<td colspan="9">30<br>|</td><td></td>
 +
<td colspan="3"></td><td colspan="3">40<br>|</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_USTMA/341-368&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
 
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
 
 +
<td>-</td>
 +
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
 
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">E</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#c2abe8">P</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#bf99d7">T</td>
 +
<td bgcolor="#e5abc5">M</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
  
}}
+
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1B_SCHCO/470-498&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#d4d2fc">E</td>
  
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
===Computing alignments===
+
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f2bfcc">F</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">E</td>
  
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#9d99f9">N</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
  
try two MSA's algorithms and load them in Jalview.
+
<td bgcolor="#dd99b9">A</td>
Locally: which one do you prefer? Modify the consensus. Annotate domains.
+
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_ASHGO/465-494&nbsp;&nbsp;</td>
 +
<td>F</td>
 +
<td bgcolor="#e2d2ee">S</td>
  
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#e2d2ed">T</td>
 +
<td>-</td>
 +
<td>-</td>
  
The EBI has a very convenient [http://www.ebi.ac.uk/Tools/msa/ page to access a number of MSA algorithms]. This is especially convenient when you want to compare, e.g. T-Coffee and Muscle and MAFFT results to see which regions of your alignment are robust. You could use any of these tools, just paste your sequences into a Webform, download the results and load into Jalview. Easy.
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
But even easier is to calculate the alignments directly from Jalview.  available. (Not today. <small>Bummer.</small>)
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
  
 +
<td bgcolor="#eaabbf">C</td>
 +
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#c2abe8">P</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
  
No. Claculate an external alignment and import.
+
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#df99b8">M</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_CLALU/550-586&nbsp;&nbsp;</td>
 +
<td>G</td>
  
;Calculate a MAFFT alignment using the Jalview Web service option:
+
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td>K</td>
  
{{task|1=
+
<td>K</td>
#In Jalview, select '''Web Service &rarr; Alignment &rarr; MAFFT with defaults...'''. The alignment is calculated in a few minutes and displayed in a new window.
+
<td>E</td>
}}
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
;Calculate a MAFFT alignment when the Jalview Web service is NOT available:
+
<td>L</td>
 +
<td>I</td>
 +
<td>S</td>
 +
<td>K</td>
 +
<td bgcolor="#f2bfcc">F</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">Q</td>
  
{{task|1=
+
<td bgcolor="#afabfa">D</td>
#In Jalview, select '''File &rarr; Output to Textbox &rarr; FASTA'''
+
<td bgcolor="#afabfa">N</td>
#Copy the sequences.
+
<td bgcolor="#d4d2fc">E</td>
#Navigate to the [http://www.ebi.ac.uk/Tools/msa/mafft/ '''MAFFT Input form'''] at the EBI.
+
<td bgcolor="#c399d4">G</td>
#Paste your sequences into the form.
+
<td bgcolor="#c2bffc">N</td>
#Click on '''Submit'''.
+
<td bgcolor="#cbabdf">T</td>
#Close the Jalview sequence window and either save your MAFFT alignment to file and load in Jalview, or simply ''''File &rarr; Input Alignment &rarr; from Textbox''', paste and click '''New Window'''.
+
<td bgcolor="#e3abc6">A</td>
}}
+
<td bgcolor="#e999ad">F</td>
 +
<td bgcolor="#a199f6">H</td>
  
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#b899df">Y</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#f0d2df">M</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBPA_COPCI/514-542&nbsp;&nbsp;</td>
  
In any case, you should now have an alignment.
+
<td>-</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#e2d2ee">S</td>
  
{{task|1=
+
<td>-</td>
#Choose '''Colour &rarr; Hydrophobicity''' and '''&rarr; by Conservation'''. Then adjust the slider left or right to see which columns are highly conserved. You will notice that the Swi6 sequence that was supposed to align only to the ankyrin domains was in fact aligned to other parts of the sequence as well. This is one part of the MSA that we will have to correct manually and a common problem when aligning sequences of different lengths.
+
<td>-</td>
}}
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#fcbfc1">V</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#fbd2d5">L</td>
  
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">E</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#ff9999">I</td>
  
==R code: load alignment and compute information scores==
+
<td bgcolor="#9d99f9">N</td>
<!-- Add sequence weighting and sampling bias correction ? -->
+
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
</tr>
  
As discussed in the lecture, Shannon information is calculated as the difference between expected and observed entropy, where entropy is the negative sum over probabilities times the log of those probabilities:
+
<tr><td nowrap="nowrap">MBP1_DEBHA/507-550&nbsp;&nbsp;</td>
 +
<td>I</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#d4d2fc">E</td>
  
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td>K</td>
 +
<td>K</td>
 +
<td>L</td>
 +
<td>S</td>
 +
<td>L</td>
 +
<td>S</td>
 +
<td>D</td>
 +
<td>K</td>
  
*a review of regex range characters +?*{min,max}, and greedy.
+
<td>K</td>
*build an AT-hook motif matcher https://en.wikipedia.org/wiki/AT-hook
+
<td>E</td>
 +
<td>L</td>
 +
<td>I</td>
 +
<td>A</td>
 +
<td>K</td>
 +
<td bgcolor="#f2bfcc">F</td>
 +
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#c2bffc">N</td>
  
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
  
Here we compute Shannon information scores for aligned positions of the APSES domain, and plot the values in '''R'''. You can try this with any part of your alignment, but I have used only the aligned residues for the APSES domain for my example. This is a good choice for a first try, since there are (almost) no gaps.
+
<td bgcolor="#e999ad">F</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#fb999c">V</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#b899df">Y</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#d4d2fc">N</td>
  
{{task|1=
+
</tr>
# Export only the sequences of the aligned APSES domains to a file on your computer, in FASTA format as explained below. You could call this: <code>Mbp1_All_APSES.fa</code>.
+
<tr><td nowrap="nowrap">MBP1A_SCHCO/388-415&nbsp;&nbsp;</td>
##Use your mouse and clik and drag to ''select'' the aligned APSES domains in the alignment window.
+
<td>-</td>
##Copy your selection to the clipboard.
+
<td>-</td>
##Use the main menu (not the menu of your alignment window) and select '''File &rarr; Input alignment &rarr; from Textbox'''; paste the selection into the textbox and click '''New Window'''.
+
<td bgcolor="#dfd2f0">Y</td>
##Use '''File &rarr; save as''' to save the aligned siequences in multi-FASTA format under the filename you want in your '''R''' project directory.
+
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fbd2d5">L</td>
  
# Explore the R-code below. Be sure that you understand it correctly. Note that this code does not implement any sampling bias correction, so positions with large numbers of gaps will receive artificially high scores (the alignment looks like the gap charecter were a conserved character).
+
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fcbfc1">V</td>
 +
<td bgcolor="#f9bfc4">L</td>
  
<source lang="rsplus">
+
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">E</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">E</td>
 +
<td bgcolor="#cbabdf">T</td>
  
# CalculateInformation.R
+
<td bgcolor="#e3abc6">A</td>
# Calculate Shannon information for positions in a multiple sequence alignment.
+
<td bgcolor="#f699a1">L</td>
# Requires: an MSA in multi FASTA format
+
<td bgcolor="#bf99d7">T</td>
 +
<td bgcolor="#e5abc5">M</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#eaabbf">C</td>
 +
<td bgcolor="#d2d2ff">R</td>
  
# It is good practice to set variables you might want to change
+
<td bgcolor="#e2d2ee">S</td>
# in a header block so you don't need to hunt all over the code
+
</tr>
# for strings you need to update.
 
#
 
setwd("/your/R/working/directory")
 
mfa      <- "MBP1_All_APSES.fa"
 
  
# ================================================
+
<tr><td nowrap="nowrap">MBP1_AJECA/374-403&nbsp;&nbsp;</td>
#   Read sequence alignment fasta file
+
<td>T</td>
# ================================================
+
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">Q</td>
  
# read MFA datafile using seqinr function read.fasta()
+
<td bgcolor="#ffd2d2">I</td>
library(seqinr)
+
<td bgcolor="#e2d2ee">S</td>
tmp  <- read.alignment(mfa, format="fasta")
+
<td bgcolor="#f0d2df">M</td>
MSA  <- as.matrix(tmp)  # convert the list into a characterwise matrix
+
<td>-</td>
                        # with appropriate row and column names using
+
<td>-</td>
                        # the seqinr function as.matrix.alignment()
+
<td>-</td>
                        # You could have a look under the hood of this
+
<td>-</td>
                        # function to understand beter how to convert a
+
<td>-</td>
                        # list into something else ... simply type
+
<td>-</td>
                        # "as.matrix.alignment" - without the parentheses
 
                        # to retrieve the function source code (as for any
 
                        # function btw).
 
  
### Explore contents of and access to the matrix of sequences
+
<td>-</td>
MSA
+
<td>-</td>
MSA[1,]
+
<td>-</td>
MSA[,1]
+
<td>-</td>
length(MSA[,1])
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
  
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#caabe0">S</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
  
# ================================================
+
<td bgcolor="#cbabdf">T</td>
#   define function to calculate entropy
+
<td bgcolor="#e3abc6">A</td>
# ================================================
+
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">K</td>
 +
<td bgcolor="#afabfa">N</td>
  
entropy <- function(v) { # calculate shannon entropy for the aa vector v
+
<td bgcolor="#e4d2ec">G</td>
                    # Note: we are not correcting for small sample sizes
+
<td bgcolor="#f4d2dc">C</td>
                    # here. Thus if there are a large number of gaps in
+
</tr>
                    # the alignment, this will look like small entropy
+
<tr><td nowrap="nowrap">MBP1_PARBR/380-409&nbsp;&nbsp;</td>
                    # since only a few amino acids are present. In the
+
<td>I</td>
                    # extreme case: if a position is only present in
+
<td bgcolor="#fbd2d5">L</td>
                    # one sequence, that one amino acid will be treated
+
<td bgcolor="#ded2f2">P</td>
                    # as 100% conserved - zero entropy. Sampling error
+
<td bgcolor="#ded2f2">P</td>
                    # corrections are discussed eg. in Schneider et al.
+
<td bgcolor="#d5d2fb">H</td>
                    # (1986) JMB 188:414
 
l <- length(v)
 
a <- rep(0, 21)      # initialize a vector with 21 elements (20 aa plus gap)
 
                    # the set the name of each row to the one letter
 
                    # code. Through this, we can access a row by its
 
                    # one letter code.
 
names(a)  <- unlist(strsplit("acdefghiklmnpqrstvwy-", ""))
 
  
for (i in 1:l) {      # for the whole vector of amino acids
+
<td bgcolor="#d4d2fc">Q</td>
c <- v[i]          # retrieve the character
+
<td bgcolor="#ffd2d2">I</td>
a[c] <- a[c] + 1  # increment its count by one
+
<td bgcolor="#e2d2ee">S</td>
} # note: we could also have used the table() function for this
+
<td bgcolor="#fbd2d5">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
tot <- sum(a) - a["-"] # calculate number of observed amino acids
+
<td>-</td>
                      # i.e. subtract gaps
+
<td>-</td>
a <- a/tot            # frequency is observations of one amino acid
+
<td>-</td>
                      # divided by all observations. We assume that
+
<td>-</td>
                      # frequency equals probability.
+
<td>-</td>
a["-"] <- 0
+
<td>-</td>
for (i in 1:length(a)) {
+
<td>-</td>
if (a[i] != 0) { # if a[i] is not zero, otherwise leave as is.
+
<td>-</td>
            # By definition, 0*log(0) = 0  but R calculates
+
<td>-</td>
            # this in parts and returns NaN for log(0).
 
a[i] <- a[i] * (log(a[i])/log(2)) # replace a[i] with
 
                                  # p(i) log_2(p(i))
 
}
 
}
 
return(-sum(a)) # return Shannon entropy
 
}
 
  
# ================================================
+
<td bgcolor="#f9bfc4">L</td>
#   calculate entropy for reference distribution
+
<td bgcolor="#f9bfc4">L</td>
#   (from UniProt, c.f. Assignment 2)
+
<td bgcolor="#d6bfe7">S</td>
# ================================================
+
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#caabe0">S</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
  
refData <- c(
+
<td bgcolor="#c2bffc">D</td>
    "A"=8.26,
+
<td bgcolor="#cbabdf">T</td>
    "Q"=3.93,
+
<td bgcolor="#e3abc6">A</td>
    "L"=9.66,
+
<td bgcolor="#dd99b9">A</td>
    "S"=6.56,
+
<td bgcolor="#f699a1">L</td>
    "R"=5.53,
+
<td bgcolor="#e3abc6">A</td>
    "E"=6.75,
+
<td bgcolor="#dd99b9">A</td>
    "K"=5.84,
+
<td bgcolor="#dd99b9">A</td>
    "T"=5.34,
+
<td bgcolor="#9999ff">K</td>
    "N"=4.06,
 
    "G"=7.08,
 
    "M"=2.42,
 
    "W"=1.08,
 
    "D"=5.45,
 
    "H"=2.27,
 
    "F"=3.86,
 
    "Y"=2.92,
 
    "C"=1.37,
 
    "I"=5.96,
 
    "P"=4.70,
 
    "V"=6.87
 
    )
 
  
### Calculate the entropy of this distribution
+
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_NEOFI/363-392&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
  
H.ref <- 0
+
<td bgcolor="#d4d2fc">D</td>
for (i in 1:length(refData)) {
+
<td bgcolor="#d4d2fc">E</td>
p <- refData[i]/sum(refData) # convert % to probabilities
+
<td bgcolor="#ffd2d2">I</td>
    H.ref <- H.ref - (p * (log(p)/log(2)))
+
<td bgcolor="#d4d2fc">D</td>
}
+
<td bgcolor="#fbd2d5">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# ================================================
+
<td>-</td>
#    calculate information for each position of
+
<td>-</td>
#    multiple sequence alignment
+
<td>-</td>
# ================================================
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
lAli <- dim(MSA)[2] # length of row in matrix is second element of dim(<matrix>).
+
<td>-</td>
I <- rep(0, lAli)  # initialize result vector
+
<td bgcolor="#f9bfc4">L</td>
for (i in 1:lAli) {
+
<td bgcolor="#f9bfc4">L</td>
I[i] = H.ref - entropy(MSA[,i])  # I = H_ref - H_obs
+
<td bgcolor="#d6bfe7">S</td>
}
+
<td bgcolor="#f4d2dc">C</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#caabe0">S</td>
 +
<td bgcolor="#d4d2fc">N</td>
  
### evaluate I
+
<td bgcolor="#c399d4">G</td>
I
+
<td bgcolor="#c2bffc">D</td>
quantile(I)
+
<td bgcolor="#cbabdf">T</td>
hist(I)
+
<td bgcolor="#e3abc6">A</td>
plot(I)
+
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
  
# you can see that we have quite a large number of columns with the same,
+
<td bgcolor="#9999ff">R</td>
# high value ... what are these?
+
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_ASPNI/365-394&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#e2d2ee">S</td>
  
which(I > 4)
+
<td bgcolor="#ded2f2">P</td>
MSA[,which(I > 4)]
+
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# And what is in the columns with low values?
+
<td>-</td>
MSA[,which(I < 1.5)]
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#caabe0">S</td>
  
# ===================================================
+
<td bgcolor="#fcd2d3">V</td>
#   plot the information
+
<td bgcolor="#c399d4">G</td>
#   (c.f. Assignment 5, see there for explanations)
+
<td bgcolor="#c2bffc">D</td>
# ===================================================
+
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#fb999c">V</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#dd99b9">A</td>
  
IP <- (I-min(I))/(max(I) - min(I) + 0.0001)
+
<td bgcolor="#dd99b9">A</td>
nCol <- 15
+
<td bgcolor="#9999ff">R</td>
IP <- floor(IP * nCol) + 1
+
<td bgcolor="#afabfa">N</td>
spect <- colorRampPalette(c("#DD0033", "#00BB66", "#3300DD"), bias=0.6)(nCol)
+
<td bgcolor="#e4d2ec">G</td>
# lets set the information scores from single informations to grey. We
+
<td bgcolor="#fcd2d3">V</td>
# change the highest level of the spectrum to grey.
+
</tr>
#spect[nCol] <- "#CCCCCC"
+
<tr><td nowrap="nowrap">MBP1_UNCRE/377-406&nbsp;&nbsp;</td>
Icol <- vector()
+
<td>M</td>
for (i in 1:length(I)) {
+
<td bgcolor="#dfd2f0">Y</td>
Icol[i] <- spect[ IP[i] ]
 
}
 
  
plot(1,1, xlim=c(0, lAli), ylim=c(-0.5, 5) ,
+
<td bgcolor="#ded2f2">P</td>
    type="n", bty="n", xlab="position in alignment", ylab="Information (bits)")
+
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td>-</td>
 +
<td>-</td>
  
# plot as rectangles: height is information and color is coded to information
+
<td>-</td>
for (i in 1:lAli) {
+
<td>-</td>
  rect(i, 0, i+1, I[i], border=NA, col=Icol[i])
+
<td>-</td>
}
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# As you can see, some of the columns reach very high values, but they are not
+
<td>-</td>
# contiguous in sequence. Are they contiguous in structure? We will find out in
+
<td>-</td>
# a later assignment, when we map computed values to structure.
+
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#eabfd3">A</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
  
</source>
+
<td bgcolor="#caabe0">S</td>
}}
+
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#cbabdf">T</td>
  
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">K</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PENCH/439-468&nbsp;&nbsp;</td>
 +
<td>T</td>
  
[[Image:InformationPlot.jpg|frame|none|Plot of information vs. sequence position produced by the '''R''' script above, for an alignment of Mbp1 ortholog APSES domains.]]
+
<td bgcolor="#f4d2dc">C</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#f0d2df">M</td>
 +
<td>-</td>
  
== Calculating conservation scores ==
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
<td bgcolor="#d4d2fc">Q</td>
  
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">Q</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#fb999c">V</td>
 +
<td bgcolor="#f699a1">L</td>
  
{{task|1=
+
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBPA_TRIVE/407-436&nbsp;&nbsp;</td>
  
* Study this code carefully, execute it, section by section and make sure you understand all of it. Ask on the list if anything is not clear.
+
<td>V</td>
 +
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#fbd2d5">L</td>
  
<source lang="R">
+
<td>-</td>
# BiostringsExample.R
+
<td>-</td>
# Short tutorial on sequence alignment with the Biostrings package.
+
<td>-</td>
# Boris Steipe for BCH441, 2013 - 2014
+
<td>-</td>
#
+
<td>-</td>
setwd("~/path/to/your/R_files/")
+
<td>-</td>
setwd("~/Documents/07.TEACHING/37-BCH441 Bioinformatics 2014/05-Materials/Assignment_5 data")
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# Biostrings is a package within the bioconductor project.
+
<td>-</td>
# bioconducter packages have their own installation system,
+
<td>-</td>
# they are normally not installed via CRAN.
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#d6bfe7">S</td>
 +
<td bgcolor="#e2d2ee">S</td>
  
# First, you load the BioConductor installer...
+
<td bgcolor="#d4d2fc">Q</td>
source("http://bioconductor.org/biocLite.R")
+
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#dd99b9">A</td>
  
# Then you can install the Biostrings package and all of its dependencies.
+
<td bgcolor="#f699a1">L</td>
biocLite("Biostrings")
+
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">K</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f4d2dc">C</td>
 +
</tr>
  
# ... and load the library.
+
<tr><td nowrap="nowrap">MBP1_PHANO/400-429&nbsp;&nbsp;</td>
library(Biostrings)
+
<td>T</td>
 +
<td bgcolor="#e2d2ef">W</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#e2d2ed">T</td>
  
# Some basic (technical) information is available ...
+
<td bgcolor="#d2d2ff">R</td>
library(help=Biostrings)
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# ... but for more in depth documentation, use the
+
<td>-</td>
# so called "vignettes" that are provided with every R package.
+
<td>-</td>
browseVignettes("Biostrings")
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">N</td>
  
# In this code, we mostly use functions that are discussed in the
+
<td bgcolor="#f0d2e0">A</td>
# pairwise alignement vignette.
+
<td bgcolor="#d4d2fc">Q</td>
(# Read in two fasta files - you will need to edit this for YFO
+
<td bgcolor="#afabfa">D</td>
sacce <- readAAStringSet("mbp1-sacce.fa", format="fasta")
+
<td bgcolor="#afabfa">Q</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
  
# "USTMA" is used only as an example here - modify for YFO  :-)
+
<td bgcolor="#ff9999">I</td>
ustma <- readAAStringSet("mbp1-ustma.fa", format="fasta")
+
<td bgcolor="#df99b8">M</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f0d2e0">A</td>
  
sacce
+
</tr>
names(sacce)
+
<tr><td nowrap="nowrap">MBPA_SCLSC/294-313&nbsp;&nbsp;</td>
names(sacce) <- "Mbp1 SACCE"
+
<td>-</td>
names(ustma) <- "Mbp1 USTMA" # Example only ... modify for YFO
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
width(sacce)
+
<td>-</td>
as.character(sacce)
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# Biostrings takes a sophisticated approach to sequence alignment ...
+
<td>-</td>
?pairwiseAlignment
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
  
# ... but the use in practice is quite simple:
+
<td bgcolor="#c2bffc">D</td>
ali <- pairwiseAlignment(sacce, ustma, substitutionMatrix = "BLOSUM50")
+
<td bgcolor="#f0d2e0">A</td>
ali
+
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#cbabdf">T</td>
  
pattern(ali)
+
<td bgcolor="#e3abc6">A</td>
subject(ali)
+
<td bgcolor="#ff9999">I</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">K</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d2d2ff">K</td>
  
writePairwiseAlignments(ali)
+
<td bgcolor="#f0d2e0">A</td>
 +
</tr>
  
p <- aligned(pattern(ali))
+
<tr><td nowrap="nowrap">MBPA_PYRIS/363-392&nbsp;&nbsp;</td>
names(p) <- "Mbp1 SACCE aligned"
+
<td>T</td>
s <- aligned(subject(ali))
+
<td bgcolor="#e2d2ef">W</td>
names(s) <- "Mbp1 USTMA aligned"
+
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
  
# don't overwrite your EMBOSS .fal files
+
<td bgcolor="#fcd2d3">V</td>
writeXStringSet(p, "mbp1-sacce.R.fal", append=FALSE, format="fasta")
+
<td bgcolor="#e2d2ed">T</td>
writeXStringSet(s, "mbp1-ustma.R.fal", append=FALSE, format="fasta")
+
<td bgcolor="#d2d2ff">R</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# Done.
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
  
</source>
+
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">Q</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
  
* Compare the alignments you received from the EMBOSS server, and that you computed using '''R'''. Are they approximately the same? Exactly? You did use different matrices and gap parameters, so minor differences are to be expected. But by and large you should get the same alignments.
+
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#ff9999">I</td>
 +
<td bgcolor="#df99b8">M</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#afabfa">N</td>
  
}}
+
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_/361-390&nbsp;&nbsp;</td>
 +
<td>N</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#e4d2ec">G</td>
  
We will now use the aligned sequences to compute a graphical display of alignment quality.
+
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
{{task|1=
+
<td bgcolor="#f2bfcc">F</td>
 +
<td bgcolor="#ebbfd3">M</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#e2d2ed">T</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#c399d4">G</td>
  
* Study this code carefully, execute it, section by section and make sure you understand all of it. Ask on the list if anything is not clear.
+
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
  
<source lang="R">
+
<td bgcolor="#caabe0">S</td>
# aliScore.R
+
<td bgcolor="#e4d2ec">G</td>
# Evaluating an alignment with a sliding window score
+
<td bgcolor="#f0d2e0">A</td>
# Boris Steipe, October 2012. Update October 2013
+
</tr>
setwd("~/path/to/your/R_files/")
+
<tr><td nowrap="nowrap">MBP1_ASPFL/328-364&nbsp;&nbsp;</td>
 +
<td>T</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#ded2f2">P</td>
  
# Scoring matrices can be found at the NCBI.
+
<td bgcolor="#e4d2ec">G</td>
# ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62
+
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#e2d2ed">T</td>
 +
<td>L</td>
 +
<td>G</td>
 +
<td>R</td>
 +
<td>F</td>
  
# It is good practice to set variables you might want to change
+
<td>I</td>
# in a header block so you don't need to hunt all over the code
+
<td>S</td>
# for strings you need to update.
+
<td>E</td>
#
+
<td>-</td>
fa1      <- "mbp1-sacce.R.fal"
+
<td>-</td>
fa2      <- "mbp1-ustma.R.fal"
+
<td>-</td>
code1    <- "SACCE"
+
<td>-</td>
code2    <- "USTMA"
+
<td>-</td>
mdmFile  <- "BLOSUM62.mdm"
+
<td>-</td>
window  <- 9  # window-size (should be an odd integer)
 
  
# ================================================
+
<td>-</td>
#   Read data files
+
<td bgcolor="#ffbfbf">I</td>
# ================================================
+
<td bgcolor="#fcbfc1">V</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#d4d2fc">Q</td>
  
# read fasta datafiles using seqinr function read.fasta()
+
<td bgcolor="#c399d4">G</td>
install.packages("seqinr")
+
<td bgcolor="#c2bffc">D</td>
library(seqinr)
+
<td bgcolor="#cbabdf">T</td>
tmp  <- unlist(read.fasta(fa1, seqtype="AA", as.string=FALSE, seqonly=TRUE))
+
<td bgcolor="#e3abc6">A</td>
seq1 <- unlist(strsplit(as.character(tmp), split=""))
+
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#9d99f9">N</td>
 +
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#c399d4">G</td>
  
tmp  <- unlist(read.fasta(fa2, seqtype="AA", as.string=FALSE, seqonly=TRUE))
+
<td bgcolor="#9999ff">R</td>
seq2 <- unlist(strsplit(as.character(tmp), split=""))
+
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBPA_MAGOR/375-404&nbsp;&nbsp;</td>
 +
<td>Q</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">D</td>
  
if (length(seq1) != length(seq2)) {
+
<td bgcolor="#ded2f2">P</td>
print("Error: Sequences have unequal length!")
+
<td bgcolor="#d4d2fc">N</td>
}
+
<td bgcolor="#f5d2db">F</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
lSeq <- length(seq1)
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# ================================================
+
<td>-</td>
#   Read scoring matrix
+
<td>-</td>
# ================================================
+
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">N</td>
  
MDM <- read.table(mdmFile, skip=6)
+
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#fb999c">V</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#dd99b9">A</td>
  
# This is a dataframe. Study how it can be accessed:
+
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9d99f9">Q</td>
 +
<td bgcolor="#ababff">R</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_CHAGL/361-390&nbsp;&nbsp;</td>
 +
<td>S</td>
 +
<td bgcolor="#d2d2ff">R</td>
  
MDM
+
<td bgcolor="#e2d2ee">S</td>
MDM[1,]
+
<td bgcolor="#f0d2e0">A</td>
MDM[,1]
+
<td bgcolor="#d4d2fc">D</td>
MDM[5,5]  # Cys-Cys
+
<td bgcolor="#d4d2fc">E</td>
MDM[20,20] # Val-Val
+
<td bgcolor="#fbd2d5">L</td>
MDM[,"W"]  # the tryptophan column
+
<td bgcolor="#d4d2fc">Q</td>
MDM["R","W"]  # Arg-Trp pairscore
+
<td bgcolor="#d4d2fc">Q</td>
MDM["W","R"]  # Trp-Arg pairscore: pairscores are symmetric
+
<td>-</td>
 +
<td>-</td>
  
colnames(MDM)  # names of columns
+
<td>-</td>
rownames(MDM)  # names of rows
+
<td>-</td>
colnames(MDM)[3]  # third column
+
<td>-</td>
rownames(MDM)[12]  # twelfth row
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# change the two "*" names to "-" so we can use them to score
+
<td>-</td>
# indels of the alignment. This is a bit of a hack, since this
+
<td>-</td>
# does not reflect the actual indel penalties (which is, as you)
+
<td>-</td>
# remember from your lectures, calculated as a gap opening
+
<td bgcolor="#f9bfc4">L</td>
# + gap extension penalty; it can't be calculated in a pairwise
+
<td bgcolor="#f9bfc4">L</td>
# manner) EMBOSS defaults for BLODSUM62 are opening -10 and
+
<td bgcolor="#c2bffc">D</td>
# extension -0.5 i.e. a gap of size 3 (-11.5) has approximately
+
<td bgcolor="#e2d2ee">S</td>
# the same penalty as a 3-character score of "-" matches (-12)
+
<td bgcolor="#d4d2fc">Q</td>
# so a pairscore of -4 is not entirely unreasonable.
+
<td bgcolor="#afabfa">D</td>
  
colnames(MDM)[24]
+
<td bgcolor="#afabfa">N</td>
rownames(MDM)[24]
+
<td bgcolor="#d4d2fc">E</td>
colnames(MDM)[24] <- "-"
+
<td bgcolor="#c399d4">G</td>
rownames(MDM)[24] <- "-"
+
<td bgcolor="#c2bffc">N</td>
colnames(MDM)[24]
+
<td bgcolor="#cbabdf">T</td>
rownames(MDM)[24]
+
<td bgcolor="#e3abc6">A</td>
MDM["Q", "-"]
+
<td bgcolor="#fb999c">V</td>
MDM["-", "D"]
+
<td bgcolor="#a199f6">H</td>
# so far so good.
+
<td bgcolor="#f7abb2">L</td>
  
# ================================================
+
<td bgcolor="#dd99b9">A</td>
#    Tabulate pairscores for alignment
+
<td bgcolor="#dd99b9">A</td>
# ================================================
+
<td bgcolor="#df99b8">M</td>
 +
<td bgcolor="#ababff">R</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_PODAN/372-401&nbsp;&nbsp;</td>
 +
<td>V</td>
  
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td>-</td>
  
# It is trivial to create a pairscore vector along the
+
<td>-</td>
# length of the aligned sequences.
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
PS <- vector()
+
<td>-</td>
for (i in 1:lSeq) {
+
<td>-</td>
  aa1 <- seq1[i]
+
<td>-</td>
  aa2 <- seq2[i]
+
<td>-</td>
  PS[i] = MDM[aa1, aa2]
+
<td bgcolor="#f9bfc4">L</td>
}
+
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">Q</td>
  
PS
+
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#a199f6">H</td>
  
 +
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#9999ff">R</td>
 +
<td bgcolor="#fcabae">V</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">MBP1_LACTH/458-487&nbsp;&nbsp;</td>
  
# The same vector could be created - albeit perhaps not so
+
<td>F</td>
# easy to understand - with the expression ...
+
<td bgcolor="#e2d2ee">S</td>
MDM[cbind(seq1,seq2)]
+
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#d4d2fc">N</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
  
# ================================================
+
<td bgcolor="#d4d2fc">Q</td>
#   Calculate moving averages
+
<td bgcolor="#afabfa">D</td>
# ================================================
+
<td bgcolor="#afabfa">Q</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#fb999c">V</td>
  
# In order to evaluate the alignment, we will calculate a
+
<td bgcolor="#a199f6">H</td>
# sliding window average over the pairscores. Somewhat surprisingly
+
<td bgcolor="#f7abb2">L</td>
# R doesn't (yet) have a native function for moving averages: options
+
<td bgcolor="#dd99b9">A</td>
# that are quoted are:
+
<td bgcolor="#dd99b9">A</td>
#   - rollmean() in the "zoo" package http://rss.acs.unt.edu/Rdoc/library/zoo/html/rollmean.html
+
<td bgcolor="#9d99f9">Q</td>
#   - MovingAverages() in "TTR" http://rss.acs.unt.edu/Rdoc/library/TTR/html/MovingAverages.html
+
<td bgcolor="#afabfa">N</td>
#   - ma() in "forecast"  http://robjhyndman.com/software/forecast/
+
<td bgcolor="#e4d2ec">G</td>
# But since this is easy to code, we shall implement it ourselves.
+
<td bgcolor="#d4d2fc">D</td>
 +
</tr>
  
PSma <- vector()          # will hold the averages
+
<tr><td nowrap="nowrap">MBP1_FILNE/433-460&nbsp;&nbsp;</td>
winS <- floor(window/2)    # span of elements above/below the centre
+
<td>-</td>
winC <- winS+1            # centre of the window
+
<td>-</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#f0d2e0">A</td>
  
# extend the vector PS with zeros (virtual observations) above and below
+
<td bgcolor="#d4d2fc">D</td>
PS <- c(rep(0, winS), PS , rep(0, winS))
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# initialize the window score for the first position
+
<td>-</td>
winScore <- sum(PS[1:window])
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fcbfc1">V</td>
 +
<td bgcolor="#ffbfbf">I</td>
 +
<td bgcolor="#c2bffc">N</td>
  
# write the first score to PSma
+
<td bgcolor="#f5d2db">F</td>
PSma[1] <- winScore
+
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">E</td>
 +
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">E</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
  
# Slide the window along the sequence, and recalculate sum()
+
<td bgcolor="#f699a1">L</td>
# Loop from the next position, to the last position that does not exceed the vector...
+
<td bgcolor="#bf99d7">T</td>
for (i in (winC + 1):(lSeq + winS)) {
+
<td bgcolor="#ffabab">I</td>
  # subtract the value that has just dropped out of the window
+
<td bgcolor="#dd99b9">A</td>
  winScore <- winScore - PS[(i-winS-1)]
+
<td bgcolor="#dd99b9">A</td>
  # add the value that has just entered the window
+
<td bgcolor="#9999ff">R</td>
  winScore <- winScore + PS[(i+winS)]
+
<td bgcolor="#e3abc6">A</td>
  # put score into PSma
+
<td bgcolor="#d2d2ff">R</td>
  PSma[i-winS] <- winScore
+
<td bgcolor="#e2d2ee">S</td>
}
 
  
# convert the sums to averages
+
</tr>
PSma <- PSma / window
+
<tr><td nowrap="nowrap">MBP1_KLULA/477-506&nbsp;&nbsp;</td>
 +
<td>F</td>
 +
<td bgcolor="#e2d2ed">T</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#dfd2f0">Y</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#ffd2d2">I</td>
  
# have a quick look at the score distributions
+
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#fcd2d3">V</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
boxplot(PSma)
+
<td>-</td>
hist(PSma)
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#f9bfc4">L</td>
 +
<td bgcolor="#ffbfbf">I</td>
  
# ================================================
+
<td bgcolor="#c2bffc">N</td>
#   Plot the alignment scores
+
<td bgcolor="#d4d2fc">Q</td>
# ================================================
+
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#caabe0">S</td>
  
# normalize the scores
+
<td bgcolor="#c2abe8">P</td>
PSma <- (PSma-min(PSma))/(max(PSma) - min(PSma) + 0.0001)
+
<td bgcolor="#f699a1">L</td>
# spread the normalized values to a desired range, n
+
<td bgcolor="#a199f6">H</td>
nCol <- 10
+
<td bgcolor="#c5abe5">Y</td>
PSma <- floor(PSma * nCol) + 1
+
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#bf99d7">T</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#d2d2ff">K</td>
  
# Assign a colorspectrum to a vector (with a bit of colormagic,
+
<td bgcolor="#d4d2fc">D</td>
# don't worry about that for now). Dark colors are poor scores,
+
</tr>
# "hot" colors are high scores
 
spect <- colorRampPalette(c("black", "red", "yellow", "white"), bias=0.4)(nCol)
 
  
# Color is an often abused aspect of plotting. One can use color to label
+
<tr><td nowrap="nowrap">MBP1_SCHST/468-501&nbsp;&nbsp;</td>
# *quantities* or *qualities*. For the most part, our pairscores measure amino
+
<td>A</td>
# acid similarity. That is a quantity and with the spectrum that we just defined
+
<td bgcolor="#d2d2ff">K</td>
# we associte the measured quantities with the color of a glowing piece
+
<td bgcolor="#d4d2fc">D</td>
# of metal: we start with black #000000, then first we ramp up the red
+
<td bgcolor="#ded2f2">P</td>
# (i.e. low-energy) part of the visible spectrum to red #FF0000, then we
+
<td bgcolor="#d4d2fc">D</td>
# add and ramp up the green spectrum giving us yellow #FFFF00 and finally we
+
<td bgcolor="#d4d2fc">N</td>
# add blue, giving us white #FFFFFF. Let's have a look at the spectrum:
 
  
s <- rep(1, nCol)
+
<td bgcolor="#d2d2ff">K</td>
barplot(s, col=spect, axes=F, main="Color spectrum")
+
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# But one aspect of our data is not quantitatively different: indels.
+
<td>-</td>
# We valued indels with pairscores of -4. But indels are not simply poor alignment,
+
<td>-</td>
# rather they are non-alignment. This means stretches of -4 values are really
+
<td>-</td>
# *qualitatively* different. Let's color them differently by changing the lowest
+
<td>-</td>
# level of the spectrum to grey.
+
<td>L</td>
 +
<td>I</td>
 +
<td>A</td>
 +
<td>K</td>
 +
<td bgcolor="#f2bfcc">F</td>
  
spect[1] <- "#CCCCCC"
+
<td bgcolor="#ffbfbf">I</td>
barplot(s, col=spect, axes=F, main="Color spectrum")
+
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#caabe0">S</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">N</td>
  
# Now we can display our alignment score vector with colored rectangles.
+
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#e3abc6">A</td>
 +
<td bgcolor="#e999ad">F</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#e699b1">C</td>
 +
<td bgcolor="#be99d9">S</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#afabfa">N</td>
  
# Convert the integers in PSma to color values from spect
+
<td bgcolor="#fbd2d5">L</td>
PScol <- vector()
+
<td bgcolor="#d4d2fc">N</td>
for (i in 1:length(PSma)) {
+
</tr>
PScol[i] <- spect[ PSma[i] ]  # this is how a value from PSma is used as an index of spect
+
<tr><td nowrap="nowrap">MBP1_SACCE/496-525&nbsp;&nbsp;</td>
}
+
<td>F</td>
 +
<td bgcolor="#e2d2ee">S</td>
 +
<td bgcolor="#ded2f2">P</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#dfd2f0">Y</td>
  
# Plot the scores. The code is similar to the last assignment.
+
<td bgcolor="#d2d2ff">R</td>
# Create an empty plot window of appropriate size
+
<td bgcolor="#ffd2d2">I</td>
plot(1,1, xlim=c(-100, lSeq), ylim=c(0, 2) , type="n", yaxt="n", bty="n", xlab="position in alignment", ylab="")
+
<td bgcolor="#d4d2fc">E</td>
 +
<td bgcolor="#fbd2d5">L</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# Add a label to the left
+
<td>-</td>
text (-30, 1, adj=1, labels=c(paste("Mbp1:\n", code1, "\nvs.\n", code2)), cex=0.9 )
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
# Loop over the vector and draw boxes  without border, filled with color.
+
<td bgcolor="#f9bfc4">L</td>
for (i in 1:lSeq) {
+
<td bgcolor="#f9bfc4">L</td>
  rect(i, 0.9, i+1, 1.1, border=NA, col=PScol[i])
+
<td bgcolor="#c2bffc">N</td>
}
+
<td bgcolor="#e2d2ed">T</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#ababff">K</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
  
# Note that the numbers along the X-axis are not sequence numbers, but numbers
+
<td bgcolor="#c2bffc">D</td>
# of the alignment, i.e. sequence number + indel length. That is important to
+
<td bgcolor="#cbabdf">T</td>
# realize: if you would like to add the annotations from the last assignment
+
<td bgcolor="#e3abc6">A</td>
# which I will leave as an exercise, you need to map your sequence numbering
+
<td bgcolor="#f699a1">L</td>
# into alignment numbering. Let me know in case you try that but need some help.
+
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#ffabab">I</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#be99d9">S</td>
 +
<td bgcolor="#9999ff">K</td>
  
</source>
+
<td bgcolor="#afabfa">N</td>
}}
+
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">D</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">CD00204/1-19&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#afabfa">E</td>
 +
<td bgcolor="#d4d2fc">D</td>
  
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#bfbfff">R</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#c2abe8">P</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#dd99b9">A</td>
 +
<td bgcolor="#dd99b9">A</td>
  
{{Vspace}}
+
<td bgcolor="#be99d9">S</td>
 +
<td bgcolor="#afabfa">N</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">CD00204/99-118&nbsp;&nbsp;</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
== Further reading, links and resources ==
+
<td>-</td>
<!-- {{#pmid: 19957275}} -->
+
<td>-</td>
<!-- {{WWW|WWW_GMOD}} -->
+
<td>-</td>
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
{{Vspace}}
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#fcbfc1">V</td>
 +
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d2d2ff">R</td>
 +
<td bgcolor="#afabfa">D</td>
 +
<td bgcolor="#ababff">K</td>
  
 +
<td bgcolor="#d4d2fc">D</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#bfbfff">R</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#c2abe8">P</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#dd99b9">A</td>
  
== Notes ==
+
<td bgcolor="#dd99b9">A</td>
<!-- included from "../components/BIN-ALI-MSA.components.wtxt", section: "notes" -->
+
<td bgcolor="#9999ff">K</td>
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
+
<td bgcolor="#afabfa">N</td>
<references />
+
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">1SW6/203-232&nbsp;&nbsp;</td>
 +
<td>L</td>
 +
<td bgcolor="#d4d2fc">D</td>
  
{{Vspace}}
+
<td bgcolor="#fbd2d5">L</td>
 +
<td bgcolor="#d2d2ff">K</td>
 +
<td bgcolor="#e2d2ef">W</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#ffd2d2">I</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td>-</td>
 +
<td>-</td>
  
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
</div>
+
<td>-</td>
<div id="ABC-unit-framework">
+
<td>-</td>
== Self-evaluation ==
+
<td>-</td>
<!-- included from "../components/BIN-ALI-MSA.components.wtxt", section: "self-evaluation" -->
+
<td bgcolor="#ebbfd3">M</td>
<!--
+
<td bgcolor="#f9bfc4">L</td>
=== Question 1===
+
<td bgcolor="#c2bffc">N</td>
 +
<td bgcolor="#f0d2e0">A</td>
 +
<td bgcolor="#d4d2fc">Q</td>
 +
<td bgcolor="#afabfa">D</td>
  
Question ...
+
<td bgcolor="#caabe0">S</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
<td bgcolor="#c399d4">G</td>
 +
<td bgcolor="#c2bffc">D</td>
 +
<td bgcolor="#cbabdf">T</td>
 +
<td bgcolor="#eaabbf">C</td>
 +
<td bgcolor="#f699a1">L</td>
 +
<td bgcolor="#9d99f9">N</td>
 +
<td bgcolor="#ffabab">I</td>
  
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
+
<td bgcolor="#dd99b9">A</td>
Answer ...
+
<td bgcolor="#dd99b9">A</td>
<div class="mw-collapsible-content">
+
<td bgcolor="#9999ff">R</td>
Answer ...
+
<td bgcolor="#f7abb2">L</td>
 +
<td bgcolor="#e4d2ec">G</td>
 +
<td bgcolor="#d4d2fc">N</td>
 +
</tr>
 +
<tr><td nowrap="nowrap">SecStruc/203-232&nbsp;&nbsp;</td>
 +
<td>t</td>
  
</div>
+
<td bgcolor="#e6d2e9">_</td>
  </div>
+
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td bgcolor="#d5d2fb">H</td>
 +
<td>-</td>
  
  {{Vspace}}
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
  
-->
+
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td>-</td>
 +
<td bgcolor="#dcbfe1">_</td>
 +
<td bgcolor="#dcbfe1">_</td>
 +
<td bgcolor="#dcbfe1">_</td>
 +
<td bgcolor="#e6d2e9">_</td>
 +
<td bgcolor="#e6d2e9">_</td>
  
{{Vspace}}
+
<td bgcolor="#d2abd8">_</td>
 +
<td bgcolor="#cbabdf">t</td>
 +
<td bgcolor="#e6d2e9">_</td>
 +
<td bgcolor="#c799cf">_</td>
 +
<td bgcolor="#dcbfe1">_</td>
 +
<td bgcolor="#d2abd8">_</td>
 +
<td bgcolor="#b2abf7">H</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#a199f6">H</td>
  
 +
<td bgcolor="#b2abf7">H</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#a199f6">H</td>
 +
<td bgcolor="#b2abf7">H</td>
 +
<td bgcolor="#e6d2e9">_</td>
 +
<td bgcolor="#e6d2e9">_</td>
 +
</tr>
 +
</table>
 +
</td></tr>
  
 +
</table>
 +
;Aligned sequence after editing. A significant cleanup of the frayed region is possible. Now there is only one insertion event, and it is placed into the loop that connects two helices of the 1SW6 structure.
  
 
{{Vspace}}
 
{{Vspace}}
  
 +
<!--
 +
==Model Based Alignments: PSSMs and HMMs==
  
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
+
{{Vspace}}
  
----
+
;Position Specific Scoring Matrices (PSSMs)
 +
 
 +
The sensitivity of PSI-BLAST is based on the alignment of profiles of related sequences. The profiles are represented as position specific scoring matrices compiled from the alignment of hits, first to the original sequence and then to the profile. Incidentally, this process can also be turned around, and a collection of pre-compiled PSSMs can be used to annotate protein sequence: this is the principle employed by RPS-BLAST, the tool that identifies conserved domains at the beginning of every BLAST search, and has been used to build the CDD database of conserved domains (for a very informative help-page on CDD [https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml '''see here'''].
 +
-->
  
 
{{Vspace}}
 
{{Vspace}}
  
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
+
== Further reading, links and resources ==
 +
{{Smallvspace}}
 +
This is a good, current recapitulation of many of the concepts you have encountered in this unit. Compact to read, I highly recommend this paper to reinforce what you have just learned.
 +
{{#pmid: 27896722}}
 +
{{Smallvspace}}
 +
{{#pmid: 21930656}}
 +
{{#pmid: 24602402}}
 +
{{#pmid: 28884485}}
 +
{{#pmid: 24170395}}
 +
{{#pmid: 17784778}}
 +
<!-- {{WWW|WWW_GMOD}} -->
 +
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 +
== Notes ==
 +
<references />
  
----
+
{{Vspace}}
  
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 789: Line 3,633:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-05
+
:2020-10-07
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:0.1
+
:1.1
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.1 Edit policy update
 +
*1.0 2020 Updates
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{EVAL}}
 +
{{LIVE}}
 +
{{EVAL}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 05:01, 10 October 2020

Multiple Sequence Alignment

(Multiple sequence alignment)


 


Abstract:

A carefully produced multiple sequence alignment is an indispensable, extarordinarily valuable asset for the analysis of sequence features. Fully automated methods are regularly inferior to knowledgeable manual curation of alignments. In this unit we will discuss the concepts, practice producing MSA's online and in R, and analyze, write and display alignments. The goal is to empower you to produce the best alignments possible.


Objectives:
This unit will ...

  • ... introduce the benefits of multiple sequence alignments (MSA), the objective functions they pursue, algorithms and methods, practical considerations, and the analysis of alignments;
  • ... demonstrate Web services that calculate MSAs;
  • ... teach how to compute and analyze MSA's in R.

Outcomes:
After working through this unit you ...

  • ... can critically assess available options for producing Multiple Sequence Alignments;
  • ... are familar with online and R programming tools to produce alignments;
  • ... have aligned the full length sequence of the MYSPE Mbp1 orthologue to a selected set of reference sequences.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

  • Prerequisites:
    This unit builds on material covered in the following prerequisite units:


     



     



     


    Evaluation

    This learning unit can be evaluated for a maximum of 5 marks. There are several options for submission. Choose one option, then ...

    1. Create a new page on the student Wiki as a subpage of your User Page.
    2. Put all of your writing to submit on this one page.
    3. When you are done with everything, go to the Quercus Assignments page and open the first Learning Unit that you have not submitted yet. Paste the URL of your Wiki page into the form, and click on Submit Assignment.

    Your link can be submitted only once and not edited. But you may change your Wiki page at any time. However only the last version before the due date will be marked. All later edits will be silently ignored.


     
    Short Report option
    1. Create a new page on the student Wiki as a subpage of your User Page.
    2. Write a short report on one of the five following topics - A, B, C, D, or E. (All reports must have the R code you wrote in an appendix.)
    A - Publication quality plot
    A.1 Create a publication quality figure and figure caption of an MSA of Mbp1 orthologue sequences including MYSPE, that covers the APSES domain only. Produce this as a single page PDF using the msa:: package msa::msaPrettyPrint() function, and upload to the Student Wiki.
    A.2 In your report, document the procedure and discuss how you have chosen the color parameters to illustrate interesting points about the domain.
    B - Algorithm Comparison: MAFFT
    B.1 At the EBI, produce a MSA of the full-length Mbp1 orthologues of the reference species plus MYSPE, using the MAFFT algorithm - a good, general purpose MSA algorithm.
    B.2 Import the alignment to R and evaluate its quality relative to the MUSCLE alignment with default parameters. Report your findings.
    C - Algorithm Comparison: WebPRANK
    C.1 At the EBI, produce a MSA of the Mbp1 orthologues of the reference species plus MYSPE, using the WebPRANK algorithm which has an interesting approach to defining indels from computed phylogenetic relationships.
    C.2 Import the alignment to R and evaluate its quality relative to the MUSCLE alignment with default parameters. Report your findings.
    D - Algorithm Comparison: PRALINE
    D.1 PRALINE reportedly produces some of the best alignments due to its (slow) PSI-BLAST profile pre-processing step, that pulls in additional homologues to increase the information that goes into the alignment. Access the PRALINE Web Server and produce a high-quality MSA of the Mbp1 orthologues of the reference species plus MYSPE.
    D.2 Import the alignment to R and evaluate its quality relative to the MUSCLE alignment with default parameters. Report your findings.
    E - Algorithm Parameters: MUSCLE
    E.1 MUSCLE has a large number of additional parameters to tweak alignments. Discuss their use, and try different variations on the MSA of the Mbp1 orthologues of the reference species plus MYSPE[1].
    E.2 Report on the results of your experiments.
    3. When you are done, submit the link to your page via Quercus as described above.


     



     
    R-code option
    Alignments can get very long it would be great to have an overview plot of the full-length alignment in one image. Your task is to write a function for that.
    Submit code according to the following requirements. Make sure your code is documented and that you have tested your functions to be correct.
    • Write a function that takes an MsaAAMultipleAlignment object as input and produces a plot of the entire alignment. Sections of gaps shall be shown as continuos lines (segments()). Aligned residues shall be shown as rectangles (rect()). Provide an option to define line colors (e.g. default: "lightgrey"). Provide an option to define fill colors for residue rectangles (e.g. default: "skyblue"). Provide an option to color alignment columns with a color gradient according to the alignment score instead. Here is some code for inspiration of how to work with a color palette:
    # v is the vector of moving-average scores of msaMscores
    lev <- cut(v, labels = FALSE, breaks = 10)
    myPal <- colorRampPalette(c("#e8e8e8", "#d6d6d6","#c4c4c4", "#b2b2b2",
                                "#f4a582", "#d6604d", "#b2182b"))
    myCol <- myPal(max(lev))
    
    barplot(msaMScores, col=myCol[lev], border = NA)
    • Create a new page on the student Wiki as a subpage of your User Page. Put your documented code and instructions there.
    • When you are done, submit the link to your page via Quercus as described above.
     
    Option to write a "Self-Evaluation Question"
    You can submit a "Self-Evaluation Question" for at most one of your assignments.
    Write a "Self-evaluation Question" (with a model solution) that explores the interpretation of an MSA. The goal is for the learner to think about the biological interpretation of a multiple sequence alignment. Questions that I find interesting often explain the context of a biological fact (e.g. a phosporylation site, a ligand binding site, a domain boundary, a frameshift mutation etc. etc.), then ask to interpret an MSA as to how it represents information about the fact. Apply the marking rubrics in spirit to satisfy yourself of the quality of your question. Use the format and code templates that you find on the Self evaluation questions page - but don't assume those examples are already models of excellent contributions. This will be a short-answer format question. Note: assume that approximately the same amount of work is expected for all evaluation options. Consequently, the standard of excellence for this option will be quite high.
    • Create a new page on the student Wiki as a subpage of your User Page. Develop your question there.
    • When you are done, submit the link to your page via Quercus as described above.

    Contents

    Task:


    Multiple sequence alignments (MSAs) are enormously useful to resolve ambiguities in the precise placement of "indels"[2] and to ensure that columns in alignments actually contain amino acids that evolve in a similar context. MSAs serve as input for

    • functional annotation;
    • protein homology modelling;
    • phylogenetic analyses;
    • sensitive homology searches in databases;
    • and more.


    Multiple Sequence Alignment

     

    In order to perform a multiple sequence alignment, we obviously need a set of homologous sequences. This is not trivial. All interpretation of MSA results depends absolutely on how the input sequences were chosen. Should we include only orthologues, or paralogues as well? Should we include only species with fully sequenced genomes, or can we tolerate that some orthologous genes are possibly missing for a species? Should we include all sequences we can lay our hands on, or should we restrict the selection to a manageable number of representative sequences? All of these choices influence our interpretation:

    • orthologues are expected to be functionally and structurally conserved;
    • paralogues may have divergent function but have similar structure;
    • missing genes may make paralogs look like orthologs; and
    • selection bias may weight our results toward sequences that are over-represented and do not provide a fair representation of evolutionary divergence.


     

    MSA's on the web at the EBI

     

    The EBI hosts a number of excellent MSA programs on their Website. Let's perform an MSA of full length MBP1 orthologues:


    Task:

    • Navigate to the NCBI protein database and paste the MBP1 protein RefSeq IDs from our database into the search form:
    NP_010227 NP_593032 XP_660758 XP_007682304 XP_955821 XP_001837394
    XP_569090 XP_003327086 XP_011392621 XP_006957051
    

    (add your MBP1_MYSPE RefSeq ID too!)

    • This will give you a page with links to the retrieved sequences. Click on Summary and choose FASTA(text) as the Format to retrieve all sequences at once as a multi-FASTA formatted page (this is useful, remember it!)
    • Open another browser window and navigate to the EBI MSA tools page.
    • Click on Launch T-coffee.
    • Copy the FASTA sequences from the NCBI page, and paste them into the form at the EBI's T-Coffee page. Click Submit.
    • The result should show you the aligned sequences, with three blocks of high similarity:
      • The most N-terminal block is the APSES domain - the main DNA binding domain of these transcription factors.
      • In the middle, we have Ankyrin domains: these are protein-protein interaction modules that Mbp1 uses to recruit other proteins to the bound complex.
      • At the end, there is one additional, shorter segment of high similarity.
    • Explore the tabs that are available, in particular note that you can save the result to a file.
    • Click on the Download Alignment File tab to load the alignment as text into a browser window. Then save the file into your project directory with a filename of msaT.aln. (.aln is the standard extension for CLUSTAL Formatted aligment files, so it helps if we give the file that extension. Of course you know better than to rely on an extension to signal the filetype and format.)



    MSA's in R

     

    Let's move to our RStudio project to explore producing and analyzing multiple sequence alignments in R.


     

    Task:

     
    • Open RStudio and load the ABC-units R project. If you have loaded it before, choose FileRecent projectsABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit.
    • Choose ToolsVersion ControlPull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
    • Type init() if requested.
    • Open the file BIN-ALI-MSA.R and follow the instructions.


     

    Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.


     


     

    Sequence alignment editors

     

    Really excellent software tools have been written that help you visualize and manually curate multiple sequence alignments. If anything, I think they tend to do too much. Past versions of the course have used Jalview, but I have heard good things of AliView (and if you are on a Mac seqotron might interest you, but I only cover software that is free and runs on all three major platforms).

    Here, I am just mentioning the two alignment editors and encourage you to explore and use them. If you have experience with comparing them, let us know.

    • [Jalview] an integrated MSA editor and sequence annotation workbench from the Barton lab in Dundee. Lots of functions.
    • [AliView] from Uppsala: fast, lean, looks to be very practical.

    However: we should spend a moment considering the kind of improvements manual editing of alignments can aim for.


     

    Alignment Editing

    A good MSA comprises only columns of residues that play similar roles in the proteins' mechanism and/or that evolve in a comparable structural context. Since the alignment reflects the result of biological selection and conservation, it has relatively few indels and the indels it has are usually not placed into elements of secondary structure or into functional motifs. For example, the contiguous features annotated for Mbp1 are expected to be left intact by a good alignment.

    A poor MSA has many errors in its columns; these contain residues that actually have different functions or structural roles, even though they may look similar according to a (pairwise!) scoring matrix. A poor MSA also may have introduced indels in biologically irrelevant positions, to maximize spurious sequence similarities. Some of the features annotated for Mbp1 will be disrupted in a poor alignment and residues that are conserved may be placed into different columns.

    Often errors or inconsistencies are easy to spot. The main goal of manual editing is to make an alignment biologically more plausible. Most commonly this means to mimize the number of rare evolutionary events that the alignment suggests and/or to emphasize conservation of known functional motifs. Here are some examples:

    Reduce number of indels
    From a Probcons alignment:
    0447_DEBHA    ILKTE-K-T---K--SVVK      ILKTE----KTK---SVVK
    9978_GIBZE    MLGLN-PGLKEIT--HSIT      MLGLNPGLKEIT---HSIT
    1513_CANAL    ILKTE-K-I---K--NVVK      ILKTE----KIK---NVVK
    6132_SCHPO    ELDDI-I-ESGDY--ENVD      ELDDI-IESGDY---ENVD
    1244_ASPFU    ----N-PGLREIC--HSIT  ->  ----NPGLREIC---HSIT
    0925_USTMA    LVKTC-PALDPHI--TKLK      LVKTCPALDPHI---TKLK
    2599_ASPTE    VLDAN-PGLREIS--HSIT      VLDANPGLREIS---HSIT
    9773_DEBHA    LLESTPKQYHQHI--KRIR      LLESTPKQYHQHI--KRIR
    0918_CANAL    LLESTPKEYQQYI--KRIR      LLESTPKEYQQYI--KRIR
    

    Gaps marked in red were moved. The sequence similarity in the alignment does not change considerably, however the total number of indels in this excerpt is reduced to 13 from the original 22


    Move indels to more plausible position
    From a CLUSTAL alignment:
    4966_CANGL     MKHEKVQ------GGYGRFQ---GTW      MKHEKVQ------GGYGRFQ---GTW
    1513_CANAL     KIKNVVK------VGSMNLK---GVW      KIKNVVK------VGSMNLK---GVW
    6132_SCHPO     VDSKHP-----------QID---GVW  ->  VDSKHPQ-----------ID---GVW
    1244_ASPFU     EICHSIT------GGALAAQ---GYW      EICHSIT------GGALAAQ---GYW
    

    The two characters marked in red were swapped. This does not change the number of indels but places the "Q" into a a column in which it is more highly conserved (green). Progressive alignments are especially prone to this type of error.

    Conserve motifs
    From a CLUSTAL W alignment:
    6166_SCHPO      --DKRVA---GLWVPP      --DKRVA--G-LWVPP
    XBP1_SACCE      GGYIKIQ---GTWLPM      GGYIKIQ--G-TWLPM
    6355_ASPTE      --DEIAG---NVWISP  ->  ---DEIA--GNVWISP
    5262_KLULA      GGYIKIQ---GTWLPY      GGYIKIQ--G-TWLPY
    

    The first of the two residues marked in red is a conserved, solvent exposed hydrophobic residue that may mediate domain interactions. The second residue is the conserved glycine in a beta turn that cannot be mutated without structural disruption. Changing the position of a gap and insertion in one sequence improves the conservation of both motifs.


     
    An example of alignment editing for ankyrin domains.

    This is example below came from alignment editing in JALVIEW. Columns were coloured by hydrophobicity, and the examples were exported to HTML and then pasted into the page source. Not that the bottom row of the alignment contains a manually added sequence that represents secondary structure elements that were determined by X-ray crystallography of the Swi6 ankyrin domain.

    10
    |
    20
    |
    30
    |
    40
    |
    MBP1_USTMA/341-368   - - Y G D Q L - - - A D - - - - - - - - - - I L - - - - N F Q D D E G E T P L T M A A R A R S
    MBP1B_SCHCO/470-498   - R E D G D Y - - - K S - - - - - - - - - - F L - - - - D L Q D E H G D T A L N I A A R V G N
    MBP1_ASHGO/465-494   F S P Q Y R I - - - E T - - - - - - - - - - L I - - - - N A Q D C K G S T P L H I A A M N R D
    MBP1_CLALU/550-586   G N Q N G N S N D K K E - - - - - - - - - - L I S K F L N H Q D N E G N T A F H I A A Y N M S
    MBPA_COPCI/514-542   - H E G G D F - - - R S - - - - - - - - - - L V - - - - D L Q D E H G D T A I N I A A R V G N
    MBP1_DEBHA/507-550   I R D S Q E I - - - E N K K L S L S D K K E L I A K F I N H Q D I D G N T A F H I V A Y N L N
    MBP1A_SCHCO/388-415   - - Y P K E L - - - A D - - - - - - - - - - V L - - - - N F Q D E D G E T A L T M A A R C R S
    MBP1_AJECA/374-403   T L P P H Q I - - - S M - - - - - - - - - - L L - - - - S S Q D S N G D T A A L A A A K N G C
    MBP1_PARBR/380-409   I L P P H Q I - - - S L - - - - - - - - - - L L - - - - S S Q D S N G D T A A L A A A K N G C
    MBP1_NEOFI/363-392   T C S Q D E I - - - D L - - - - - - - - - - L L - - - - S C Q D S N G D T A A L V A A R N G A
    MBP1_ASPNI/365-394   T F S P E E V - - - D L - - - - - - - - - - L L - - - - S C Q D S V G D T A V L V A A R N G V
    MBP1_UNCRE/377-406   M Y P H H E V - - - G L - - - - - - - - - - L L - - - - A S Q D S N G D T A A L T A A K N G C
    MBP1_PENCH/439-468   T C S Q D E I - - - Q M - - - - - - - - - - L L - - - - S C Q D Q N G D T A V L V A A R N G A
    MBPA_TRIVE/407-436   V F P R H E I - - - S L - - - - - - - - - - L L - - - - S S Q D A N G D T A A L T A A K N G C
    MBP1_PHANO/400-429   T W I P E E V - - - T R - - - - - - - - - - L L - - - - N A Q D Q N G D T A I M I A A R N G A
    MBPA_SCLSC/294-313   - - - - - - - - - - - - - - - - - - - - - - - L - - - - D A R D I N G N T A I H I A A K N K A
    MBPA_PYRIS/363-392   T W I P E E V - - - T R - - - - - - - - - - L L - - - - N A A D Q N G D T A I M I A A R N G A
    MBP1_/361-390   - - - N H S L G V L S Q - - - - - - - - - - F M - - - - D T Q N N E G D T A L H I L A R S G A
    MBP1_ASPFL/328-364   T E Q P G E V I T L G R - - - - - - - - - - F I S E I V N L R D D Q G D T A L N L A G R A R S
    MBPA_MAGOR/375-404   Q H D P N F V - - - Q Q - - - - - - - - - - L L - - - - D A Q D N D G N T A V H L A A Q R G S
    MBP1_CHAGL/361-390   S R S A D E L - - - Q Q - - - - - - - - - - L L - - - - D S Q D N E G N T A V H L A A M R D A
    MBP1_PODAN/372-401   V R Q P E E V - - - Q A - - - - - - - - - - L L - - - - D A Q D E E G N T A L H L A A R V N A
    MBP1_LACTH/458-487   F S P R Y R I - - - E N - - - - - - - - - - L I - - - - N A Q D Q N G D T A V H L A A Q N G D
    MBP1_FILNE/433-460   - - Y P Q E L - - - A D - - - - - - - - - - V I - - - - N F Q D E E G E T A L T I A A R A R S
    MBP1_KLULA/477-506   F T P Q Y R I - - - D V - - - - - - - - - - L I - - - - N Q Q D N D G N S P L H Y A A T N K D
    MBP1_SCHST/468-501   A K D P D N K - - - K D - - - - - - - - - - L I A K F I N H Q D S D G N T A F H I C S H N L N
    MBP1_SACCE/496-525   F S P Q Y R I - - - E L - - - - - - - - - - L L - - - - N T Q D K N G D T A L H I A S K N G D
    CD00204/1-19   - - - - - - - - - - - - - - - - - - - - - - - - - - - - N A R D E D G R T P L H L A A S N G H
    CD00204/99-118   - - - - - - - - - - - - - - - - - - - - - - - V - - - - N A R D K D G R T P L H L A A K N G H
    1SW6/203-232   L D L K W I I - - - A N - - - - - - - - - - M L - - - - N A Q D S N G D T C L N I A A R L G N
    SecStruc/203-232   t _ H H H H H - - - H H - - - - - - - - - - _ _ - - - - _ _ _ _ t _ _ _ _ H H H H H H H H _ _
    Aligned sequences before editing. The algorithm has placed gaps into the Swi6 helix LKWIIAN and the four-residue gaps before the block of well aligned sequence on the right are poorly supported.


    10
    |
    20
    |
    30
    |
    40
    |
    MBP1_USTMA/341-368   - - Y G D Q L A D - - - - - - - - - - - - - - I L N F Q D D E G E T P L T M A A R A R S
    MBP1B_SCHCO/470-498   - R E D G D Y K S - - - - - - - - - - - - - - F L D L Q D E H G D T A L N I A A R V G N
    MBP1_ASHGO/465-494   F S P Q Y R I E T - - - - - - - - - - - - - - L I N A Q D C K G S T P L H I A A M N R D
    MBP1_CLALU/550-586   G N Q N G N S N D K K E - - - - - - - L I S K F L N H Q D N E G N T A F H I A A Y N M S
    MBPA_COPCI/514-542   - H E G G D F R S - - - - - - - - - - - - - - L V D L Q D E H G D T A I N I A A R V G N
    MBP1_DEBHA/507-550   I R D S Q E I E N K K L S L S D K K E L I A K F I N H Q D I D G N T A F H I V A Y N L N
    MBP1A_SCHCO/388-415   - - Y P K E L A D - - - - - - - - - - - - - - V L N F Q D E D G E T A L T M A A R C R S
    MBP1_AJECA/374-403   T L P P H Q I S M - - - - - - - - - - - - - - L L S S Q D S N G D T A A L A A A K N G C
    MBP1_PARBR/380-409   I L P P H Q I S L - - - - - - - - - - - - - - L L S S Q D S N G D T A A L A A A K N G C
    MBP1_NEOFI/363-392   T C S Q D E I D L - - - - - - - - - - - - - - L L S C Q D S N G D T A A L V A A R N G A
    MBP1_ASPNI/365-394   T F S P E E V D L - - - - - - - - - - - - - - L L S C Q D S V G D T A V L V A A R N G V
    MBP1_UNCRE/377-406   M Y P H H E V G L - - - - - - - - - - - - - - L L A S Q D S N G D T A A L T A A K N G C
    MBP1_PENCH/439-468   T C S Q D E I Q M - - - - - - - - - - - - - - L L S C Q D Q N G D T A V L V A A R N G A
    MBPA_TRIVE/407-436   V F P R H E I S L - - - - - - - - - - - - - - L L S S Q D A N G D T A A L T A A K N G C
    MBP1_PHANO/400-429   T W I P E E V T R - - - - - - - - - - - - - - L L N A Q D Q N G D T A I M I A A R N G A
    MBPA_SCLSC/294-313   - - - - - - - - - - - - - - - - - - - - - - - - L D A R D I N G N T A I H I A A K N K A
    MBPA_PYRIS/363-392   T W I P E E V T R - - - - - - - - - - - - - - L L N A A D Q N G D T A I M I A A R N G A
    MBP1_/361-390   N H S L G V L S Q - - - - - - - - - - - - - - F M D T Q N N E G D T A L H I L A R S G A
    MBP1_ASPFL/328-364   T E Q P G E V I T L G R F I S E - - - - - - - I V N L R D D Q G D T A L N L A G R A R S
    MBPA_MAGOR/375-404   Q H D P N F V Q Q - - - - - - - - - - - - - - L L D A Q D N D G N T A V H L A A Q R G S
    MBP1_CHAGL/361-390   S R S A D E L Q Q - - - - - - - - - - - - - - L L D S Q D N E G N T A V H L A A M R D A
    MBP1_PODAN/372-401   V R Q P E E V Q A - - - - - - - - - - - - - - L L D A Q D E E G N T A L H L A A R V N A
    MBP1_LACTH/458-487   F S P R Y R I E N - - - - - - - - - - - - - - L I N A Q D Q N G D T A V H L A A Q N G D
    MBP1_FILNE/433-460   - - Y P Q E L A D - - - - - - - - - - - - - - V I N F Q D E E G E T A L T I A A R A R S
    MBP1_KLULA/477-506   F T P Q Y R I D V - - - - - - - - - - - - - - L I N Q Q D N D G N S P L H Y A A T N K D
    MBP1_SCHST/468-501   A K D P D N K K D - - - - - - - - - - L I A K F I N H Q D S D G N T A F H I C S H N L N
    MBP1_SACCE/496-525   F S P Q Y R I E L - - - - - - - - - - - - - - L L N T Q D K N G D T A L H I A S K N G D
    CD00204/1-19   - - - - - - - - - - - - - - - - - - - - - - - - - N A R D E D G R T P L H L A A S N G H
    CD00204/99-118   - - - - - - - - - - - - - - - - - - - - - - - - V N A R D K D G R T P L H L A A K N G H
    1SW6/203-232   L D L K W I I A N - - - - - - - - - - - - - - M L N A Q D S N G D T C L N I A A R L G N
    SecStruc/203-232   t _ H H H H H H H - - - - - - - - - - - - - - _ _ _ _ _ _ t _ _ _ _ H H H H H H H H _ _
    Aligned sequence after editing. A significant cleanup of the frayed region is possible. Now there is only one insertion event, and it is placed into the loop that connects two helices of the 1SW6 structure.


     


     

    Further reading, links and resources

     

    This is a good, current recapitulation of many of the concepts you have encountered in this unit. Compact to read, I highly recommend this paper to reinforce what you have just learned.

    Bawono et al. (2017) Multiple Sequence Alignment. Methods Mol Biol 1525:167-189. (pmid: 27896722)

    PubMed ] [ DOI ] The increasing importance of Next Generation Sequencing (NGS) techniques has highlighted the key role of multiple sequence alignment (MSA) in comparative structure and function analysis of biological sequences. MSA often leads to fundamental biological insight into sequence-structure-function relationships of nucleotide or protein sequence families. Significant advances have been achieved in this field, and many useful tools have been developed for constructing alignments, although many biological and methodological issues are still open. This chapter first provides some background information and considerations associated with MSA techniques, concentrating on the alignment of protein sequences. Then, a practical overview of currently available methods and a description of their specific advantages and limitations are given, to serve as a helpful guide or starting point for researchers who aim to construct a reliable MSA.

     
    Benítez-Páez et al. (2012) A practical guide for the computational selection of residues to be experimentally characterized in protein families. Brief Bioinformatics 13:329-36. (pmid: 21930656)

    PubMed ] [ DOI ] In recent years, numerous biocomputational tools have been designed to extract functional and evolutionary information from multiple sequence alignments (MSAs) of proteins and genes. Most biologists working actively on the characterization of proteins from a single or family perspective use the MSA analysis to retrieve valuable information about amino acid conservation and the functional role of residues in query protein(s). In MSAs, adjustment of alignment parameters is a key point to improve the quality of MSA output. However, this issue is frequently underestimated and/or misunderstood by scientists and there is no in-depth knowledge available in this field. This brief review focuses on biocomputational approaches complementary to MSA to help distinguish functional residues in protein families. These additional analyses involve issues ranging from phylogenetic to statistical, which address the detection of amino acids pivotal for protein function at any level. In recent years, a large number of tools has been designed for this very purpose. Using some of these relevant, useful tools, we have designed a practical pipeline to perform in silico studies with a view to improving the characterization of family proteins and their functional residues. This review-guide aims to present biologists a set of specially designed tools to study proteins. These tools are user-friendly as they use web servers or easy-to-handle applications. Such criteria are essential for this review as most of the biologists (experimentalists) working in this field are unfamiliar with these biocomputational analysis approaches.

    Pais et al. (2014) Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol Biol 9:4. (pmid: 24602402)

    PubMed ] [ DOI ] BACKGROUND: Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale data analysis, it is imperative to optimize the application of software. Therefore, a balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. We compared both accuracy and cost of nine popular MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX, MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the benchmark alignment dataset BAliBASE and discuss the relevance of some implementations embedded in each program's algorithm. Accuracy of alignment was calculated with the two standard scoring functions provided by BAliBASE, the sum-of-pairs and total-column scores, and computational costs were determined by collecting peak memory usage and time of execution. RESULTS: Our results indicate that mostly the consistency-based programs Probcons, T-Coffee, Probalign and MAFFT outperformed the other programs in accuracy. Whenever sequences with large N/C terminal extensions were present in the BAliBASE suite, Probalign, MAFFT and also CLUSTAL OMEGA outperformed Probcons and T-Coffee. The drawback of these programs is that they are more memory-greedy and slower than POA, CLUSTALW, DIALIGN-TX, and MUSCLE. CLUSTALW and MUSCLE were the fastest programs, being CLUSTALW the least RAM memory demanding program. CONCLUSIONS: Based on the results presented herein, all four programs Probcons, T-Coffee, Probalign and MAFFT are well recommended for better accuracy of multiple sequence alignments. T-Coffee and recent versions of MAFFT can deliver faster and reliable alignments, which are specially suited for larger datasets than those encountered in the BAliBASE suite, if multi-core computers are available. In fact, parallelization of alignments for multi-core computers should probably be addressed by more programs in a near future, which will certainly improve performance significantly.

    Sievers & Higgins (2018) Clustal Omega for making accurate alignments of many protein sequences. Protein Sci 27:135-145. (pmid: 28884485)

    PubMed ] [ DOI ] Clustal Omega is a widely used package for carrying out multiple sequence alignment. Here, we describe some recent additions to the package and benchmark some alternative ways of making alignments. These benchmarks are based on protein structure comparisons or predictions and include a recently described method based on secondary structure prediction. In general, Clustal Omega is fast enough to make very large alignments and the accuracy of protein alignments is high when compared to alternative packages. The package is freely available as executables or source code from www.clustal.org or can be run on-line from a variety of sites, especially the EBI www.ebi.ac.uk.

    Iantorno et al. (2014) Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment. Methods Mol Biol 1079:59-73. (pmid: 24170395)

    PubMed ] [ DOI ] Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies-based on simulation, consistency, protein structure, and phylogeny-and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application-with a keen awareness of the assumptions underlying each benchmarking strategy.

    Notredame (2007) Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol 3:e123. (pmid: 17784778)

    PubMed ] [ DOI ]

    Notes

    1. A good example how systematic tweaking of parameters can improve alignments is here:
      Long et al. (2016) Determination of optimal parameters of MAFFT program based on BAliBASE3.0 database. Springerplus 5:736. (pmid: 27376004)

      PubMed ] [ DOI ] BACKGROUND: Multiple sequence alignment (MSA) is one of the most important research contents in bioinformatics. A number of MSA programs have emerged. The accuracy of MSA programs highly depends on the parameters setting, mainly including gap open penalties (GOP), gap extension penalties (GEP) and substitution matrix (SM). This research tries to obtain the optimal GOP, GEP and SM rather than MAFFT default parameters. RESULTS: The paper discusses the MAFFT program benchmarked on BAliBASE3.0 database, and the optimal parameters of MAFFT program are obtained, which are better than the default parameters of CLUSTALW and MAFFT program. CONCLUSIONS: The optimal parameters can improve the results of multiple sequence alignment, which is feasible and efficient.

    2. "indel": insertion / deletion – a difference in sequence length between two aligned sequences that is accommodated by gaps in the alignment. Since we can't tell from the comparison of two sequences whether such a change was introduced by insertion into or deletion from the ancestral sequence, we join both into a portmanteau.


     


    About ...
     
    Author:

    Boris Steipe <boris.steipe@utoronto.ca>

    Created:

    2017-08-05

    Modified:

    2020-10-07

    Version:

    1.1

    Version history:

    • 1.1 Edit policy update
    • 1.0 2020 Updates
    • 0.1 First stub

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.