Difference between revisions of "Lecture 07"
Jump to navigation
Jump to search
(5 intermediate revisions by the same user not shown) | |||
Line 43: | Line 43: | ||
*[http://prodata.swmed.edu/promals/ Dallas '''PROMALS Web server''']<br> | *[http://prodata.swmed.edu/promals/ Dallas '''PROMALS Web server''']<br> | ||
*[http://www.ebi.ac.uk/clustalw/ EBI '''CLUSTAL''' web server]<br> | *[http://www.ebi.ac.uk/clustalw/ EBI '''CLUSTAL''' web server]<br> | ||
− | *[http://www.ebi.ac.uk/t-coffee/ T-Coffee Web server]<br> | + | *[http://www.ebi.ac.uk/t-coffee/ EBI '''T-Coffee''' Web server]<br> |
*[http://www.ebi.ac.uk/muscle/ EBI '''MUSCLE Web server''']<br> | *[http://www.ebi.ac.uk/muscle/ EBI '''MUSCLE Web server''']<br> | ||
− | *[http://probcons.stanford.edu Stanford '''PROBCONS server'']<br> | + | *[http://probcons.stanford.edu Stanford '''PROBCONS server''']<br> |
+ | *[http://sparks.informatics.iupui.edu/Softwares-Services_files/spem.htm Indiana '''SPEM server]<br> | ||
*[http://cbcsrv.watson.ibm.com/Tmsa.html MUSCA, based on the Teiresias pattern discovery algorithm]<br> | *[http://cbcsrv.watson.ibm.com/Tmsa.html MUSCA, based on the Teiresias pattern discovery algorithm]<br> | ||
*[http://hmmer.janelia.org/ HMMER, a profile hidden Markov model tool]<br> | *[http://hmmer.janelia.org/ HMMER, a profile hidden Markov model tool]<br> | ||
− | |||
*[http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE/ BAliBASE], [http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE2/ BAliBASE 2.0] and [http://www-bio3d-igbmc.u-strasbg.fr/~julie/balibase/index.html BAliBASE 3.0]<br> | *[http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE/ BAliBASE], [http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE2/ BAliBASE 2.0] and [http://www-bio3d-igbmc.u-strasbg.fr/~julie/balibase/index.html BAliBASE 3.0]<br> | ||
− | |||
*[http://www.ebi.ac.uk/help/formats_frame.html EBI help page on formats]<br> | *[http://www.ebi.ac.uk/help/formats_frame.html EBI help page on formats]<br> | ||
*[http://www.jalview.org/ '''Jalview''' home page]<br> | *[http://www.jalview.org/ '''Jalview''' home page]<br> | ||
− | *[http://www.ch.embnet.org/software/BOX_form.html Embnet BOXSHADE server]<br> | + | *[http://www.ch.embnet.org/software/BOX_form.html Embnet '''BOXSHADE''' server]<br> |
+ | *[http://en.wikipedia.org/wiki/Multiple_sequence_alignment '''Wikipedia''' page on Multiple Sequence Alignment]<br> | ||
<br> | <br> | ||
<div style="padding: 10 px; background: #B0B8D7; border:solid 1px #AAAAAA;"> | <div style="padding: 10 px; background: #B0B8D7; border:solid 1px #AAAAAA;"> | ||
+ | |||
====Exercises==== | ====Exercises==== | ||
</div><br> | </div><br> | ||
Line 156: | Line 157: | ||
======Slide 020====== | ======Slide 020====== | ||
[[Image:07_slide020.jpg|frame|none|Lecture 07, Slide 020<br> | [[Image:07_slide020.jpg|frame|none|Lecture 07, Slide 020<br> | ||
− | Run the MUSCLE MSAs via the [http://www.ebi.ac.uk/muscle/ EBI '''MUSCLE Web server'''] which is very easy to use, or via the [http://phylogenomics.berkeley.edu/cgi-bin/muscle/input_muscle.py | + | Run the MUSCLE MSAs via the [http://www.ebi.ac.uk/muscle/ EBI '''MUSCLE Web server'''] which is very easy to use, or via the [http://phylogenomics.berkeley.edu/cgi-bin/muscle/input_muscle.py Berkeley '''MUSCLE server'''] courtesy of Kimmen Sjolander's lab. Source code and compiled code can be obtained from the [http://www.drive5.com/muscle/ Muscle homepage] and a local installation on UNIX and Windows machines is straightforward. The site also hosts the PREFAB multiple alignment benchmark. |
− | |||
]] | ]] | ||
+ | |||
======Slide 021====== | ======Slide 021====== | ||
[[Image:07_slide021.jpg|frame|none|Lecture 07, Slide 021<br> | [[Image:07_slide021.jpg|frame|none|Lecture 07, Slide 021<br> | ||
− | One of the best algorithms that aligns sequences without additional database information. Run it on the web via the [http://probcons.stanford.edu Stanford '''PROBCONS server''] | + | One of the best algorithms that aligns sequences without additional database information. Run it on the web via the [http://probcons.stanford.edu Stanford '''PROBCONS server'''], or download the code and install locally. |
]] | ]] | ||
+ | |||
======Slide 022====== | ======Slide 022====== | ||
[[Image:07_slide022.jpg|frame|none|Lecture 07, Slide 022<br> | [[Image:07_slide022.jpg|frame|none|Lecture 07, Slide 022<br> | ||
Line 205: | Line 207: | ||
======Slide 032====== | ======Slide 032====== | ||
[[Image:07_slide032.jpg|frame|none|Lecture 07, Slide 032<br> | [[Image:07_slide032.jpg|frame|none|Lecture 07, Slide 032<br> | ||
− | The obvious first approach is to search for a recent review. For the last year of sequence alignment literature in PubMed: search <tt>("multiple sequence alignment"[ti] OR "multiple alignment"[ti]) AND (server OR algorithm) AND "last 1 years"[dp]</tt> or just [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&Db=pubmed&term= | + | The obvious first approach is to search for a recent review. For the last year of sequence alignment literature in PubMed: search <tt>("multiple sequence alignment"[ti] OR "multiple alignment"[ti]) AND (server OR algorithm) AND "last 1 years"[dp]</tt> or just [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&Db=pubmed&term=%22multiple+sequence+alignment%22%5Bti%5D+OR+%22multiple+alignment%22%5Bti%5D+AND+%22last+1+Years%22%5Bdp%5D '''click here'''.] Note that not all "reviews" have been tagged by the PubMed curators as such. In the list returned in September 2007, the most recent review was found by the above search strategy, but it was in the list of publications, not in the sub-set of reviews. Of course, no recent review may be available, or the available reviews may not be very informative. [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0030123 Cedric Notredame's MSA review (2007)] is technical and probably less-helpful for the non-expert, although it emphasizes the paradigm shift towards '''template based alignment''' strategies well. [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=16679011 Edgar and Batzoglou's MSA review (2005)], by the authors of MUSCLE and ProbCons, is much more readable and a good, comprehensive introduction to modern methods. |
]] | ]] | ||
+ | |||
======Slide 033====== | ======Slide 033====== | ||
[[Image:07_slide033.jpg|frame|none|Lecture 07, Slide 033<br> | [[Image:07_slide033.jpg|frame|none|Lecture 07, Slide 033<br> | ||
Line 239: | Line 242: | ||
======Slide 039====== | ======Slide 039====== | ||
[[Image:07_slide039.jpg|frame|none|Lecture 07, Slide 039<br> | [[Image:07_slide039.jpg|frame|none|Lecture 07, Slide 039<br> | ||
− | Three common formats exist for MSA results. A '''CLUSTAL''' formatted alignment is the format in most common use. Take care when formatting input files to ensure the '''first 10 characters in your input file are unique''' and contain '''no special characters'''! I have seen programs break on blanks, hyphens and | + | Three common formats exist for MSA results. A '''CLUSTAL''' formatted alignment is the format in most common use. Take care when formatting input files to ensure the '''first 10 characters in your input file are unique''' and contain '''no special characters'''! I have seen programs break on blanks, hyphens and | (pipe). The latter is especially annoying, since the | character is used in NCBI FASTA files to separate the database identifier from the accession number. (More information at the [http://www.ebi.ac.uk/help/formats_frame.html EBI help page on formats].) |
]] | ]] | ||
+ | |||
======Slide 040====== | ======Slide 040====== | ||
[[Image:07_slide040.jpg|frame|none|Lecture 07, Slide 040<br> | [[Image:07_slide040.jpg|frame|none|Lecture 07, Slide 040<br> |
Latest revision as of 18:57, 7 October 2007
(Previous lecture) ... (Next lecture)
Multiple Sequence Alignment
Objectives for this part of the course
- Understand that MSA is an unsolved, difficult problem with different "best" solutions for different purposes.
- Be familiar with different biological heuristics that distinguish a "good" alignment from a "poor" alignment.
- Understand the importance of benchmarks for assessing the performance of computational tools.
- Be aware of how different biological priorities have resulted in different algorithmic strategies and some of the available tools that represent them.
- Be aware that the most frequently used and referenced tool - CLUSTAL - is no longer state-of-the-art and know which modern tools are much better.
- Confidently be able to survey recent developments and choose an appropriate algorithm.
- Be able to perform and interpret MSAs in practice, know how to prepare input, which formats to use and what common output formats look like.
- Understand strategies to prepare input and improve alignments, based on the requirement of columnwise homology.
- Know about strategies and tools for manual editing of alignments.
Links summary
- Dallas PROMALS Web server
- EBI CLUSTAL web server
- EBI T-Coffee Web server
- EBI MUSCLE Web server
- Stanford PROBCONS server
- Indiana SPEM server
- MUSCA, based on the Teiresias pattern discovery algorithm
- HMMER, a profile hidden Markov model tool
- BAliBASE, BAliBASE 2.0 and BAliBASE 3.0
- EBI help page on formats
- Jalview home page
- Embnet BOXSHADE server
- Wikipedia page on Multiple Sequence Alignment
Exercises
- Read Cedric Notredame's MSA review (2007)
- Read Edgar and Batzoglou's MSA review (2005)
- More exercises will be covered in Assignment 3.
Lecture slides
Uses and Problems
Slide 004
Slide 005
Slide 006
Slide 007
Right, wrong, good and poor
Slide 009
Slide 010
Slide 011
MSA in practice
Slide 013
Slide 014
Slide 015
Slide 016
Slide 017
Slide 019
Slide 020
Slide 021
Slide 022
Slide 023
Slide 024
Slide 025
Slide 026
Slide 027
Slide 028
Slide 029
Slide 030
Slide 031
Slide 032
Slide 033
Slide 034
Slide 035
Editing and printing
Slide 037
Slide 038
Slide 039
Slide 040
Slide 041
Slide 042
Slide 043
Slide 044
Slide 045
Slide 046
Slide 047
Slide 048
Slide 049
Slide 050