Expected Preparations:

  [BIN-ALI]
Optimal_sequence_alignment
 
  The units listed above are part of this course and contain important preparatory material.  

Keywords: BLAST algorithm and Web interface; interpretation of BLAST alignments

Objectives:

This unit will …

  • … introduce the BLAST algorithm;

  • … discuss interpretation of BLAST output;

  • … teach how to perform a BLAST search online, and through R scripts;

Outcomes:

After working through this unit you …

  • … can run and interpret BLAST searches;

  • … can use BLAST online;

  • … can compose R scripts to execute BLAST searches.


Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.

Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these.

Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


Evaluation:

NA: This unit is not evaluated for course marks.

Contents

This unit introduces the BLAST algorithm and practices a BLAST search via the Web interface, and scripted in R.

BLAST is by a margin the most important computational tool of molecular biology. It is so important, that we couldn’t even begin our explorations here without it, and thus we have already used BLAST in the BIN-Storing_data unit, to find the most similar sequence to MBP1_SACCE in MYSPE.

Task…

 

A good, detailed introduction to finding homologues on a database scale - and recognizing whether similar sequences are indeed homologous - is Pearson (2013)1.

 

Reciprocal Best Matches

 

In this unit we will use BLAST to perform Reciprocal Best Matches.

One of the important questions of model-organism based inference is: which genes perform the same function in two different organisms. In the absence of other information, our best guess is that these are the two genes that are mutually most similar. The keyword here is mutually. If MBP1_SACCE from S. cerevisiae is the best match to RES2_SCHPO in S. pombe, the two proteins are only mutually most similar if RES2_SCHPO is more similar to MBP1_SACCE than to any other S. cerevisiae protein. We call this a Reciprocal Best Match, or “RBM”2.

The argument is summarized in the figure on the right: genes that evolve under continuos selective pressure on their function have relatively lower mutation rates and are thus more similar to each other, than genes that undergo neo- or sub-functionalization after duplication.

However, there is a catch: proteins are often composed of multiple domains that implement distinct roles of their function. Under the assumptions above we could hypothesize: * a gene in MYSPE that has the “same” function as the Mbp1 cell-cycle checkpoint switch in yeast should be an RBM to Mbp1; * a gene that binds to the same DNA sites as Mbp1 should have a DNA-binding domain that is an RBM to the DNA binding domain of Mbp1.

Thus we’ll compare RBMs in MYSPE for full-length Mbp1_SACCE and its DNA-binding domain, and see if the results are the same.

A hypothetical phylogenetic gene tree. “S” is a speciation in the tree, “D” is a duplication within a species. The duplicated gene (teal triangle) evolves towards a different function and thus acquires more mutations than its paralogue (teal circle). If an RBM search start from the teal triangle, it finds the red circle. However the reciprocal match finds the teal circle. The red and teal circles fulfill the RBM criterion. (RBM.jpg)

 

Full-length RBM

 

You have already performed the first half of the experiment: matching from S. cerevisiae to MYSPE. The backward match is simple.

Task…

  1. Access BLAST and follow the link to the protein blast program.
  2. Enter the RefSeq ID for MBP1_MYSPE in the Query sequence field.
  3. Select refseq_protein as the database to search in, and enter Saccharomyces cerevisiae (taxid:4932) to restrict the organism for which hits are reported.
  4. Run BLAST. Examine the results.

If your top-hit is NP_010227, you have confirmed the RBM between Mbp1_SACCE and Mbp1_MYSPE. If it is not, let me know. I expect this to be the same and would like to verify your results if it is not3.

 

RBM for the DNA binding domain

 

The DNA-binding domain of Mbp1_SACCE is called an APSES domain. If the RBM between Saccharomyces cerevisiae Mbp1 and MYSPE is truly an orthologue, we expect all of the protein’s respective domains to have the RBM property as well. But let’s not simply assume what we can easily test. We’ll define the sequence of the APSES domain in MBP1_SACCE and MYSPE and see how these definitions reflect in a BLAST search.

 

Defining the range of the APSES domain annotation

{{#lst:Reference annotation yeast Mbp1|CDD_APSES}}

 

Further Reading

Contents of the BLAST report

Boratyn, Grzegorz M et al.. (2013). “BLAST: a more efficient report with usability improvements”. Nucleic Acids Research 41(Web Server issue):W29–33 .
[PMID: 23609542] [DOI: 10.1093/nar/gkt282]

The Basic Local Alignment Search Tool (BLAST) website at the National Center for Biotechnology (NCBI) is an important resource for searching and aligning sequences. A new BLAST report allows faster loading of alignments, adds navigation aids, allows easy downloading of subject sequences and reports and has improved usability. Here, we describe these improvements to the BLAST report, discuss design decisions, describe other improvements to the search page and database documentation and outline plans for future development. The NCBI BLAST URL is http://blast.ncbi.nlm.nih.gov.

Questions, comments

If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.

Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.

References

Page ID: BIN-ALI-BLAST

Author:
Boris Steipe ( <boris.steipe@utoronto.ca> )
Created:
2017-08-05
Last modified:
2022-09-14
Version:
1.1
Version History:
–  1.1 2020 Maintenance
–  1.0 Live version 2017
–  0.1 First stub
Tagged with:
–  Unit
–  Live
–  Has lecture slides
–  Links to R course project
–  Contains images
–  Has further reading

 

[END]


  1. ↩︎

  2. Note that RBMs are usually orthologues, but the definition of orthologue and RBM is not the same. Most importantly, many orthologues are not RBMs. We will explore this more when we discuss phylogenetic inference.↩︎

  3. One such case we encountered involved a protein that has a corrupted annotation for the DNA binding domain. It appears to be the correct orthologue, but it only has the second highest BLAST score.↩︎