Difference between revisions of "Lecture 07"
Jump to navigation
Jump to search
(7 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | |||
+ | <!-- div style="padding: 5px; background: #FF4560; border:solid 2px #000000;"> | ||
+ | '''Update Warning!''' | ||
+ | This page has not been revised yet for the 2008 Fall term. | ||
+ | Some of the slides will probably be reused, but please consider the page as a whole out of date | ||
+ | as long as this warning appears here. Also, the lectures may be taught in a different sequence. | ||
+ | </div --> | ||
+ | | ||
+ | | ||
__NOTOC__ | __NOTOC__ | ||
− | <small>[[Lecture_06|(Previous lecture)]] ... [[Lecture_08|(Next lecture)]]</small> | + | <small>[[Lecture_06|(Previous lecture)]] ... [[Lecture_08|(Next lecture)]]</small> |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == | + | <br> |
+ | <br> | ||
+ | <div style="padding: 2px; background: #879BFA; border:solid 1px #AAAAAA;"> | ||
+ | ==Multiple Sequence Alignment== | ||
+ | </div><br> | ||
+ | | ||
+ | <br> | ||
+ | <div style="padding: 10 px; background: #B0B8D7; border:solid 1px #AAAAAA;"> | ||
+ | ====Objectives for this part of the course==== | ||
+ | </div><br> | ||
+ | * Understand that MSA is an unsolved, difficult problem with different "best" solutions for different purposes.<br> | ||
+ | * Be familiar with different biological heuristics that distinguish a "good" alignment from a "poor" alignment.<br> | ||
+ | * Understand the importance of benchmarks for assessing the performance of computational tools.<br> | ||
+ | * Be aware of how different biological priorities have resulted in different algorithmic strategies and some of the available tools that represent them.<br> | ||
+ | * Be aware that the most frequently used and referenced tool - CLUSTAL - is no longer state-of-the-art and know which modern tools are much better.<br> | ||
+ | * Confidently be able to survey recent developments and choose an appropriate algorithm.<br> | ||
+ | * Be able to perform and interpret MSAs in practice, know how to prepare input, which formats to use and what common output formats look like.<br> | ||
+ | * Understand strategies to prepare input and improve alignments, based on the requirement of columnwise homology.<br> | ||
+ | * Know about strategies and tools for manual editing of alignments.<br> | ||
− | ====== | + | <br> |
− | [[ | + | <div style="padding: 10 px; background: #B0B8D7; border:solid 1px #AAAAAA;"> |
+ | ====Links summary==== | ||
+ | </div><br> | ||
+ | *[http://prodata.swmed.edu/promals/ Dallas '''PROMALS Web server''']<br> | ||
+ | *[http://www.ebi.ac.uk/clustalw/ EBI '''CLUSTAL''' web server]<br> | ||
+ | *[http://www.ebi.ac.uk/t-coffee/ EBI '''T-Coffee''' Web server]<br> | ||
+ | *[http://www.ebi.ac.uk/muscle/ EBI '''MUSCLE Web server''']<br> | ||
+ | *[http://probcons.stanford.edu Stanford '''PROBCONS server''']<br> | ||
+ | *[http://sparks.informatics.iupui.edu/Softwares-Services_files/spem.htm Indiana '''SPEM server]<br> | ||
+ | *[http://cbcsrv.watson.ibm.com/Tmsa.html MUSCA, based on the Teiresias pattern discovery algorithm]<br> | ||
+ | *[http://hmmer.janelia.org/ HMMER, a profile hidden Markov model tool]<br> | ||
+ | *[http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE/ BAliBASE], [http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE2/ BAliBASE 2.0] and [http://www-bio3d-igbmc.u-strasbg.fr/~julie/balibase/index.html BAliBASE 3.0]<br> | ||
+ | *[http://www.ebi.ac.uk/help/formats_frame.html EBI help page on formats]<br> | ||
+ | *[http://www.jalview.org/ '''Jalview''' home page]<br> | ||
+ | *[http://www.ch.embnet.org/software/BOX_form.html Embnet '''BOXSHADE''' server]<br> | ||
+ | *[http://en.wikipedia.org/wiki/Multiple_sequence_alignment '''Wikipedia''' page on Multiple Sequence Alignment]<br> | ||
+ | |||
+ | |||
+ | <br> | ||
+ | <div style="padding: 10 px; background: #B0B8D7; border:solid 1px #AAAAAA;"> | ||
+ | |||
+ | ====Exercises==== | ||
+ | </div><br> | ||
+ | * Read [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0030123 Cedric Notredame's MSA review (2007)]<br> | ||
+ | * Read [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=16679011 Edgar and Batzoglou's MSA review (2005)]<br> | ||
+ | * More exercises will be covered in Assignment 3. | ||
+ | |||
+ | |||
+ | <br> | ||
+ | <div style="padding: 10 px; background: #879BFA; border:solid 1px #AAAAAA;"> | ||
+ | ==Lecture slides== | ||
+ | </div><br> | ||
+ | <br> | ||
− | |||
− | |||
− | |||
− | + | <br> | |
− | ==== | + | <br> |
− | + | <br> | |
+ | <div style="padding: 10 px; background: #B0B8D7; border:solid 1px #AAAAAA;"> | ||
+ | ===Uses and Problems=== | ||
+ | </div><br> | ||
+ | <br> | ||
− | |||
======Slide 004====== | ======Slide 004====== | ||
− | [[Image: | + | [[Image:07_slide004.jpg|frame|none|Lecture 07, Slide 004<br> |
− | + | Multiple sequence alignments don't only match residues. They also give information on how strongly a residue is conserved, what it can be replaced with, which species share particular sequence patterns, and where in the sequence indels can be tolerated. An analysis of conservation even allows to distinguish between structurally and functionally conserved residues! | |
]] | ]] | ||
======Slide 005====== | ======Slide 005====== | ||
− | [[Image: | + | [[Image:07_slide005.jpg|frame|none|Lecture 07, Slide 005<br> |
− | + | Multiple sequence alignments are more accurate than pairwise alignments, thus they are the method of choice for starting homology modeling projects. Their combined information is invaluable for secondary structure prediction and sensitive database searches. They contain the information needed for inferences about evolutionary relationships, i.e. the order in which particular sequence changes occurred. | |
]] | ]] | ||
======Slide 006====== | ======Slide 006====== | ||
− | [[Image: | + | [[Image:07_slide006.jpg|frame|none|Lecture 07, Slide 006<br> |
− | + | Multiple alignments cannot necessarily be '''constructed''' from pairwise alignments. Moreover, it may be impossible to merge three mutually pairwise alignments into a non-contradicting multiple alignment. However the inverse is always possible: a multiple alignment can be '''decomposed''' into pairwise alignments. | |
]] | ]] | ||
======Slide 007====== | ======Slide 007====== | ||
− | [[Image: | + | [[Image:07_slide007.jpg|frame|none|Lecture 07, Slide 007<br> |
+ | Besides being intractable, it is questionable how meaningful the objective function of optimal sequence alignments - the '''pair score''' - is for multiple alignments. For example the pair score does not refelct on the pattern of indel placements, or whether a particular motif is well-conserved. | ||
+ | ]] | ||
− | |||
− | |||
− | |||
− | + | <br> | |
+ | <br> | ||
+ | <br> | ||
+ | <div style="padding: 10 px; background: #B0B8D7; border:solid 1px #AAAAAA;"> | ||
+ | ===Right, wrong, good and poor=== | ||
+ | </div><br> | ||
+ | <br> | ||
+ | |||
======Slide 009====== | ======Slide 009====== | ||
− | [[Image: | + | [[Image:07_slide009.jpg|frame|none|Lecture 07, Slide 009<br> |
− | + | If we want to write an algorithm to optimize anything at all, we first must define how we can measure the quality of the result. This metric defines the '''target function''' or '''objective function'''. | |
]] | ]] | ||
======Slide 010====== | ======Slide 010====== | ||
− | [[Image: | + | [[Image:07_slide010.jpg|frame|none|Lecture 07, Slide 010<br> |
− | + | In practice, each of the biological objectives we can define suggests a different alignment strategy. The most modern algorithms currently available attempt to satisfy these heuristics simultaneously. Note that these are '''heuristics''', they are not the result of some rigorously applied theory, but reflect the complex relationship between protein sequence, structure, evolution and selection. | |
]] | ]] | ||
======Slide 011====== | ======Slide 011====== | ||
− | [[Image: | + | [[Image:07_slide011.jpg|frame|none|Lecture 07, Slide 011<br> |
]] | ]] | ||
− | |||
− | |||
− | + | ||
+ | <br> | ||
+ | <br> | ||
+ | <br> | ||
+ | <div style="padding: 10 px; background: #B0B8D7; border:solid 1px #AAAAAA;"> | ||
+ | ===MSA in practice=== | ||
+ | </div><br> | ||
+ | <br> | ||
+ | |||
======Slide 013====== | ======Slide 013====== | ||
− | [[Image: | + | [[Image:07_slide013.jpg|frame|none|Lecture 07, Slide 013<br> |
− | + | Exact methods certainly have their place where it comes to analyzing and improving algorithms; they are especially of interest to computer science because high-dimensional optimal alignment is a difficult problem. However they cannot compete in terms of result-quality with modern heuristic methods. | |
]] | ]] | ||
======Slide 014====== | ======Slide 014====== | ||
− | [[Image: | + | [[Image:07_slide014.jpg|frame|none|Lecture 07, Slide 014<br> |
− | + | '''Progressive''' alignment is one of three fundamental algorithmic approaches to MSA. The EBI offers [http://www.ebi.ac.uk/clustalw/ Clustal alignments online]. | |
]] | ]] | ||
======Slide 015====== | ======Slide 015====== | ||
− | [[Image: | + | [[Image:07_slide015.jpg|frame|none|Lecture 07, Slide 015<br> |
− | + | '''Consistency''' based multiple alignment is one of three fundamental algorithmic approaches to MSA. Many modern algorithms have a consistency based step included, however none of them relies solely on consistency, since problems from spurious local similarity can corrupt the alignment. [http://cbcsrv.watson.ibm.com/Tmsa.html MUSCA, based on the Teiresias pattern discovery algorithm] is offered through IBM's Watson Labs Web server. Similarly, the [http://meme.sdsc.edu/ MEME algorithm for motif discovery]which is more commonly used in sequence analysis infers a motif-based alignment. | |
]] | ]] | ||
======Slide 016====== | ======Slide 016====== | ||
− | [[Image: | + | [[Image:07_slide016.jpg|frame|none|Lecture 07, Slide 016<br> |
− | + | ''Probabilistic''' multiple alignment is one of three fundamental algorithmic approaches to MSA. A statistical model of the sequences is built, then the alignment can be generated by aligning the sequences to the model. Of course, aligning sequences to a profile is a special case of this procedure: PSI BLAST can thus be used as an alignment algorithm. The most widely used algortihm is Sean Eddy's [http://hmmer.janelia.org/ HMMER, a profile hidden Markov model tool] which is also used in the generation of the [http://pfam.sanger.ac.uk/Pfam Pfam domain database]. | |
]] | ]] | ||
======Slide 017====== | ======Slide 017====== | ||
− | [[Image: | + | [[Image:07_slide017.jpg|frame|none|Lecture 07, Slide 017<br> |
+ | [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=9254694 Altschul ''et al.'' (1998) ''Nucleic Acids Research'' '''25''':3389-3402] | ||
+ | ]] | ||
+ | ======Slide 019====== | ||
+ | [[Image:07_slide019.jpg|frame|none|Lecture 07, Slide 019<br> | ||
+ | I personally rate TCoffee as the most useful and useable tool that is currently available. It is robust, fast, and gives reasonable results for many cases. Usually it is '''very''' noticeably better then CLUSTAL and I would reject any result based on CLUSTAL for that reason. The EBI offers a [http://www.ebi.ac.uk/t-coffee/ T-Coffee Web server] which is very easy to use (although alignment size is limited;). Source code can be obtained and a local installation on UNIX machines is straightforward. The [http://www.tcoffee.org/ '''TCoffee Web page'''] links to another Web server and also offers 3DCoffee, a variant that automatically fetches related structures and incorporates structural alignments for increased accuracy.<br> | ||
+ | <br> | ||
+ | The inset image shows one of the usuful features of TCoffee: an alignment output in which sequence is coloured according to the local quality of the alignment. This makes reliable and unreliable regions easy to spot, and immediately highlights outliers that could for example be due to sequence errors, such as frameshifts in exons. (MSA taken from the Mbp1 full-length alignment). | ||
+ | ]] | ||
+ | ======Slide 020====== | ||
+ | [[Image:07_slide020.jpg|frame|none|Lecture 07, Slide 020<br> | ||
+ | Run the MUSCLE MSAs via the [http://www.ebi.ac.uk/muscle/ EBI '''MUSCLE Web server'''] which is very easy to use, or via the [http://phylogenomics.berkeley.edu/cgi-bin/muscle/input_muscle.py Berkeley '''MUSCLE server'''] courtesy of Kimmen Sjolander's lab. Source code and compiled code can be obtained from the [http://www.drive5.com/muscle/ Muscle homepage] and a local installation on UNIX and Windows machines is straightforward. The site also hosts the PREFAB multiple alignment benchmark. | ||
+ | ]] | ||
+ | ======Slide 021====== | ||
+ | [[Image:07_slide021.jpg|frame|none|Lecture 07, Slide 021<br> | ||
+ | One of the best algorithms that aligns sequences without additional database information. Run it on the web via the [http://probcons.stanford.edu Stanford '''PROBCONS server'''], or download the code and install locally. | ||
]] | ]] | ||
− | |||
− | |||
+ | ======Slide 022====== | ||
+ | [[Image:07_slide022.jpg|frame|none|Lecture 07, Slide 022<br> | ||
+ | SPEM is one of the most accurate algorithms currently available, in particular for sequences of very low similarity. Run alignments via the [http://sparks.informatics.iupui.edu/Softwares-Services_files/spem.htm Indiana '''SPEM server]. | ||
]] | ]] | ||
− | ======Slide | + | ======Slide 023====== |
− | [[Image: | + | [[Image:07_slide023.jpg|frame|none|Lecture 07, Slide 023<br> |
+ | One of the latest additions to the toolkit, '''PROMALS is currently the most accurate MSA tool available'''. Run it on the [http://prodata.swmed.edu/promals/ Dallas '''PROMALS Web server''']. Read the [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btm017?ijkey=8VzLUe2lszEStAI&keytype=ref PROMALS paper in the 2007 NAR Web server issue]. | ||
+ | ]] | ||
+ | ======Slide 024====== | ||
+ | [[Image:07_slide024.jpg|frame|none|Lecture 07, Slide 024<br> | ||
+ | Just what does PROMALS' improved performance mean, relative to e.g. CLUSTAL? For one, we can see a clear leap in performance through the inclusion of database information and consensus structure predictions (SPEM and PROMALS). On the other hand, regarding the SABmark superfamily dataset that is perhaps most characteristic of "typical" alignment problems with recognizeable, but low % identity, PROMALS achieves a 50% improvement relative to CLUSTAL, a 30% improvement relative to MUSCLE and ProbCons. This is much more than just statistical noise. | ||
+ | ]] | ||
+ | ======Slide 025====== | ||
+ | [[Image:07_slide025.jpg|frame|none|Lecture 07, Slide 025<br> | ||
+ | How do we know that a new algorithm is better than a previous one? Benchmarks, or "Gold Standards" are an essential part of scientific hygiene. We as users must demand objective comparisons to existing methods, as referees we must require them for publication, as members of the research community we must participate in defining them and provide raw data for their construction. But we must also realize that an "arms-race" of sorts may be ensuing: as developers use the benchmarks as a training set, artificially high performance scores may be generated and performance on novel problems may degrade. | ||
+ | ]] | ||
+ | ======Slide 026====== | ||
+ | [[Image:07_slide026.jpg|frame|none|Lecture 07, Slide 026<br> | ||
+ | Access [http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE/ the original BAliBASE] (1999) here. Two updated versions have been created: [http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE2/ BAliBASE 2.0] (2000) and [http://www-bio3d-igbmc.u-strasbg.fr/~julie/balibase/index.html BAliBASE 3.0]. Central to BAliBASE is the concept of '''core blocks''' of alignable regions in which a pairwise correspondence of residues can be defined; outside these regions an alignment is not possible since the structural differences are too large. ([http://bioinformatics.oxfordjournals.org/cgi/content/abstract/15/1/87 BAliBASE: Thompson J. ''et al.,'' (1999) ''Bioinformatics'' '''15''':87-88.]). | ||
+ | ]] | ||
+ | ======Slide 027====== | ||
+ | [[Image:07_slide027.jpg|frame|none|Lecture 07, Slide 027<br> | ||
+ | [http://bioinformatics.oxfordjournals.org/cgi/content/full/21/7/1267 SABmark: Van Walle ''et al.'' (2005) ''Bioinformatics'' '''21''':1267-1268]. [http://bioinformatics.vub.ac.be/databases/databases.html SABmark homepage]. | ||
+ | ]] | ||
+ | ======Slide 028====== | ||
+ | [[Image:07_slide028.jpg|frame|none|Lecture 07, Slide 028<br> | ||
+ | Construction of PREFAB is described in [http://nar.oxfordjournals.org/cgi/content/full/32/5/1792?ijkey=48Nmt1tta0fMg&keytype=ref MUSCLE: Edgar (2004) ''Nucl Acids Res'' '''32''':1792-1797]. | ||
+ | ]] | ||
+ | ======Slide 029====== | ||
+ | [[Image:07_slide029.jpg|frame|none|Lecture 07, Slide 029<br> | ||
]] | ]] | ||
− | ======Slide | + | ======Slide 030====== |
− | [[Image: | + | [[Image:07_slide030.jpg|frame|none|Lecture 07, Slide 030<br> |
]] | ]] | ||
− | ======Slide | + | ======Slide 031====== |
− | [[Image: | + | [[Image:07_slide031.jpg|frame|none|Lecture 07, Slide 031<br> |
+ | "Relevance" for Google may not be the same as relevance for your work. For some applications, novelty is more important than cross-references and page-hits. For a more curated view, you can try the [http://en.wikipedia.org/wiki/Multiple_sequence_alignment '''Wikipedia''' page on Multiple Sequence Alignment] or the [http://wikiomics.org/wiki/Multiple_sequence_alignment Wikiomics] page. <small>(Wikiomics is a project you should know about, but it doesn't appear to be catching on very well.)</small> | ||
+ | ]] | ||
+ | ======Slide 032====== | ||
+ | [[Image:07_slide032.jpg|frame|none|Lecture 07, Slide 032<br> | ||
+ | The obvious first approach is to search for a recent review. For the last year of sequence alignment literature in PubMed: search <tt>("multiple sequence alignment"[ti] OR "multiple alignment"[ti]) AND (server OR algorithm) AND "last 1 years"[dp]</tt> or just [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&Db=pubmed&term=%22multiple+sequence+alignment%22%5Bti%5D+OR+%22multiple+alignment%22%5Bti%5D+AND+%22last+1+Years%22%5Bdp%5D '''click here'''.] Note that not all "reviews" have been tagged by the PubMed curators as such. In the list returned in September 2007, the most recent review was found by the above search strategy, but it was in the list of publications, not in the sub-set of reviews. Of course, no recent review may be available, or the available reviews may not be very informative. [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0030123 Cedric Notredame's MSA review (2007)] is technical and probably less-helpful for the non-expert, although it emphasizes the paradigm shift towards '''template based alignment''' strategies well. [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=16679011 Edgar and Batzoglou's MSA review (2005)], by the authors of MUSCLE and ProbCons, is much more readable and a good, comprehensive introduction to modern methods. | ||
+ | ]] | ||
+ | ======Slide 033====== | ||
+ | [[Image:07_slide033.jpg|frame|none|Lecture 07, Slide 033<br> | ||
+ | An alternative and more exploratory approach is to choose a recent '''highly relevant''' article, then to use the NCBI's "Related Articles" service. This search strategy allows you to search '''forward''' in time from a particular publication. In the above example, a serch for <tt>clustal[ti]</tt> yielded a publication on CLUSTAL from 2003 ... | ||
]] | ]] | ||
− | ======Slide | + | ======Slide 034====== |
− | [[Image: | + | [[Image:07_slide034.jpg|frame|none|Lecture 07, Slide 034<br> |
+ | ... in the list of related articles (in September 2007) the article on PROMALS (2007): was number 5 in the hit-list, SPEM (2005) came as number 50. | ||
+ | ]] | ||
+ | ======Slide 035====== | ||
+ | [[Image:07_slide035.jpg|frame|none|Lecture 07, Slide 035<br> | ||
+ | Spend some time and thought '''before''' you run the MSA to review the sequences that you are planning to align. Including un-alignable sequence '''will''' lead the algorithms astray and has the potential to degrade the entire alignment. | ||
+ | ]] | ||
+ | |||
+ | |||
+ | <br> | ||
+ | <br> | ||
+ | <br> | ||
+ | <div style="padding: 10 px; background: #B0B8D7; border:solid 1px #AAAAAA;"> | ||
+ | ===Editing and printing=== | ||
+ | </div><br> | ||
+ | <br> | ||
+ | ======Slide 037====== | ||
+ | [[Image:07_slide037.jpg|frame|none|Lecture 07, Slide 037<br> | ||
+ | Three common formats exist for MSA results. An aligned '''multi FASTA''' file contains FASTA formatted sequences into which gap characters have been inserted. Of course, multi FASTA files can also be unaligned and they are the most common way of formatting '''input files''' for MSAs. | ||
+ | ]] | ||
+ | ======Slide 038====== | ||
+ | [[Image:07_slide038.jpg|frame|none|Lecture 07, Slide 038<br> | ||
+ | Three common formats exist for MSA results. '''MSF''' is a legacy format from the GCG package of sequence alignments, also produced by the EMBOSS tool EMMA, and supported as a valid input format for many programs. Gaps are denoted by periods and checksums are calculated for the sequences and for the alignment. | ||
+ | ]] | ||
+ | ======Slide 039====== | ||
+ | [[Image:07_slide039.jpg|frame|none|Lecture 07, Slide 039<br> | ||
+ | Three common formats exist for MSA results. A '''CLUSTAL''' formatted alignment is the format in most common use. Take care when formatting input files to ensure the '''first 10 characters in your input file are unique''' and contain '''no special characters'''! I have seen programs break on blanks, hyphens and | (pipe). The latter is especially annoying, since the | character is used in NCBI FASTA files to separate the database identifier from the accession number. (More information at the [http://www.ebi.ac.uk/help/formats_frame.html EBI help page on formats].) | ||
]] | ]] | ||
− | |||
− | |||
+ | ======Slide 040====== | ||
+ | [[Image:07_slide040.jpg|frame|none|Lecture 07, Slide 040<br> | ||
+ | It is common and perfectly permissible to manually edit a MSA with some biologically motivated heuristic in mind '''as long as you document what you have done'''! In the early days of MSAs, editing was simply <u>required</u> since the results were often obviously inadequate. In all cases in which the algorithm uses only the input sequences for the alignment, this still holds true. However, regarding the more modern template-based procedures (e.g. SPEM, PROMALS or PRALINE) I would be more reluctant to edit, since we may be actively ignoring/discarding the additional information the algorithm has used. | ||
]] | ]] | ||
− | ======Slide | + | ======Slide 041====== |
− | [[Image: | + | [[Image:07_slide041.jpg|frame|none|Lecture 07, Slide 041<br> |
]] | ]] | ||
− | ======Slide | + | ======Slide 042====== |
− | [[Image: | + | [[Image:07_slide042.jpg|frame|none|Lecture 07, Slide 042<br> |
+ | Jalview is integrated into the EBI multiple sequence alignment services, or you can access [http://www.jalview.org/ '''Jalview''' home page]. | ||
+ | ]] | ||
+ | ======Slide 043====== | ||
+ | [[Image:07_slide043.jpg|frame|none|Lecture 07, Slide 043<br> | ||
+ | [http://bioinformatics.oxfordjournals.org/cgi/content/full/22/4/504 VMD structural alignment: Eargle ''et al.'' (2006) ''Bioinformatics'' '''22''':504-506] | ||
+ | ]] | ||
+ | ======Slide 044====== | ||
+ | [[Image:07_slide044.jpg|frame|none|Lecture 07, Slide 044<br> | ||
+ | [http://pfaat.sourceforge.net/ The '''PFAAT''' homepage] | ||
+ | ]] | ||
+ | ======Slide 045====== | ||
+ | [[Image:07_slide045.jpg|frame|none|Lecture 07, Slide 045<br> | ||
+ | [http://aig.cs.man.ac.uk/research/utopia/cinema/cinema.php The '''CINEMA''' homepage]. | ||
+ | ]] | ||
+ | ======Slide 046====== | ||
+ | [[Image:07_slide046.jpg|frame|none|Lecture 07, Slide 046<br> | ||
+ | Purely for alignment visualization, run it from the [http://www.ch.embnet.org/software/BOX_form.html Embnet BOXSHADE server] or the [http://bioweb.pasteur.fr/seqanal/interfaces/boxshade.html Pasteur institute BOXSHADE server]. The EMBOSS package has tools with similar functionality. | ||
+ | ]] | ||
+ | ======Slide 047====== | ||
+ | [[Image:07_slide047.jpg|frame|none|Lecture 07, Slide 047<br> | ||
]] | ]] | ||
− | ======Slide | + | ======Slide 048====== |
− | [[Image: | + | [[Image:07_slide048.jpg|frame|none|Lecture 07, Slide 048<br> |
]] | ]] | ||
− | ======Slide | + | ======Slide 049====== |
− | [[Image: | + | [[Image:07_slide049.jpg|frame|none|Lecture 07, Slide 049<br> |
]] | ]] | ||
− | ======Slide | + | ======Slide 050====== |
− | [[Image: | + | [[Image:07_slide050.jpg|frame|none|Lecture 07, Slide 050<br> |
]] | ]] | ||
− | |||
− | |||
− | ]] | + | |
+ | <br> | ||
+ | <br> | ||
+ | ---- | ||
+ | <small>[[Lecture_06|(Previous lecture)]] ... [[Lecture_08|(Next lecture)]]</small> |
Latest revision as of 18:57, 7 October 2007
(Previous lecture) ... (Next lecture)
Multiple Sequence Alignment
Objectives for this part of the course
- Understand that MSA is an unsolved, difficult problem with different "best" solutions for different purposes.
- Be familiar with different biological heuristics that distinguish a "good" alignment from a "poor" alignment.
- Understand the importance of benchmarks for assessing the performance of computational tools.
- Be aware of how different biological priorities have resulted in different algorithmic strategies and some of the available tools that represent them.
- Be aware that the most frequently used and referenced tool - CLUSTAL - is no longer state-of-the-art and know which modern tools are much better.
- Confidently be able to survey recent developments and choose an appropriate algorithm.
- Be able to perform and interpret MSAs in practice, know how to prepare input, which formats to use and what common output formats look like.
- Understand strategies to prepare input and improve alignments, based on the requirement of columnwise homology.
- Know about strategies and tools for manual editing of alignments.
Links summary
- Dallas PROMALS Web server
- EBI CLUSTAL web server
- EBI T-Coffee Web server
- EBI MUSCLE Web server
- Stanford PROBCONS server
- Indiana SPEM server
- MUSCA, based on the Teiresias pattern discovery algorithm
- HMMER, a profile hidden Markov model tool
- BAliBASE, BAliBASE 2.0 and BAliBASE 3.0
- EBI help page on formats
- Jalview home page
- Embnet BOXSHADE server
- Wikipedia page on Multiple Sequence Alignment
Exercises
- Read Cedric Notredame's MSA review (2007)
- Read Edgar and Batzoglou's MSA review (2005)
- More exercises will be covered in Assignment 3.
Lecture slides
Uses and Problems
Slide 004
Slide 005
Slide 006
Slide 007
Right, wrong, good and poor
Slide 009
Slide 010
Slide 011
MSA in practice
Slide 013
Slide 014
Slide 015
Slide 016
Slide 017
Slide 019
Slide 020
Slide 021
Slide 022
Slide 023
Slide 024
Slide 025
Slide 026
Slide 027
Slide 028
Slide 029
Slide 030
Slide 031
Slide 032
Slide 033
Slide 034
Slide 035
Editing and printing
Slide 037
Slide 038
Slide 039
Slide 040
Slide 041
Slide 042
Slide 043
Slide 044
Slide 045
Slide 046
Slide 047
Slide 048
Slide 049
Slide 050