Difference between revisions of "FND-Homology"
m |
m |
||
Line 1: | Line 1: | ||
<div id="ABC"> | <div id="ABC"> | ||
− | <div style="padding:5px; border:1px solid #000000; background-color:# | + | <div style="padding:5px; border:1px solid #000000; background-color:#f4d7b7; font-size:300%; font-weight:400; color: #000000; width:100%;"> |
Concepts and Consequences of Homology | Concepts and Consequences of Homology | ||
− | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:# | + | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#f4d7b7; font-size:30%; font-weight:200; color: #000000; "> |
(Concepts of homology; Orthologs; Paralogs) | (Concepts of homology; Orthologs; Paralogs) | ||
</div> | </div> | ||
Line 10: | Line 10: | ||
− | <div style="padding:5px; border:1px solid #000000; background-color:# | + | <div style="padding:5px; border:1px solid #000000; background-color:#f4d7b733; font-size:85%;"> |
<div style="font-size:118%;"> | <div style="font-size:118%;"> | ||
<b>Abstract:</b><br /> | <b>Abstract:</b><br /> | ||
Line 42: | Line 42: | ||
<section begin=deliverables /> | <section begin=deliverables /> | ||
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-time_management" --> | <!-- included from "./data/ABC-unit_components.txt", section: "deliverables-time_management" --> | ||
− | + | <li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li> | |
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-journal" --> | <!-- included from "./data/ABC-unit_components.txt", section: "deliverables-journal" --> | ||
− | + | <li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li> | |
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-insights" --> | <!-- included from "./data/ABC-unit_components.txt", section: "deliverables-insights" --> | ||
− | + | <li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li> | |
<section end=deliverables /> | <section end=deliverables /> | ||
<!-- ============================ --> | <!-- ============================ --> | ||
Line 53: | Line 53: | ||
<b>Prerequisites:</b><br /> | <b>Prerequisites:</b><br /> | ||
<!-- included from "./data/ABC-unit_components.txt", section: "notes-external_prerequisites" --> | <!-- included from "./data/ABC-unit_components.txt", section: "notes-external_prerequisites" --> | ||
− | You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources: | + | You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:<br /> |
<!-- included from "./data/ABC-unit_prerequisites.txt", section: "biomolecules" --> | <!-- included from "./data/ABC-unit_prerequisites.txt", section: "biomolecules" --> | ||
*<b>Biomolecules</b>: The molecules of life; nucleic acids and amino acids; the genetic code; protein folding; post-translational modifications and protein biochemistry; membrane proteins; biological function. | *<b>Biomolecules</b>: The molecules of life; nucleic acids and amino acids; the genetic code; protein folding; post-translational modifications and protein biochemistry; membrane proteins; biological function. | ||
Line 61: | Line 61: | ||
*<b>Evolution</b>: Theory of evolution; variation, neutral drift and selection. | *<b>Evolution</b>: Theory of evolution; variation, neutral drift and selection. | ||
<!-- included from "./data/ABC-unit_components.txt", section: "notes-prerequisites" --> | <!-- included from "./data/ABC-unit_components.txt", section: "notes-prerequisites" --> | ||
− | This unit builds on material covered in the following prerequisite units: | + | This unit builds on material covered in the following prerequisite units:<br /> |
*[[BIN-Storing_data|BIN-Storing_data (Storing Data)]] | *[[BIN-Storing_data|BIN-Storing_data (Storing Data)]] | ||
*[[BIN-Sequence|BIN-Sequence (Sequence)]] | *[[BIN-Sequence|BIN-Sequence (Sequence)]] | ||
Line 72: | Line 72: | ||
+ | {{REVISE}} | ||
{{Smallvspace}} | {{Smallvspace}} |
Revision as of 12:39, 16 September 2020
Concepts and Consequences of Homology
(Concepts of homology; Orthologs; Paralogs)
Abstract:
Homology is the most important concept for bioinformatics, since shared ancestry allows many inferences about the structure and function of proteins. This unit introduces the concept and explores MBP1_MYSPE relationships.
Objectives:
|
Outcomes:
|
Deliverables:
Prerequisites:
You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:
- Biomolecules: The molecules of life; nucleic acids and amino acids; the genetic code; protein folding; post-translational modifications and protein biochemistry; membrane proteins; biological function.
- The Central Dogma: Regulation of transcription and translation; protein biosynthesis and degradation; quality control.
- Evolution: Theory of evolution; variation, neutral drift and selection.
This unit builds on material covered in the following prerequisite units:
Contents
Contents
Task:
- Read the introductory notes on the concept of homology.
Considerations for the MYSPE "Mbp1"
- Consider!
In the BIN-Storing_data unit you have found the protein of MYSPE that is most similar to yeast Mbp1, in MYSPE. Consider if this protein is homologous to the yeast protein. For most of these questions, you will probably not know the answer right now, but we will find out more in later units.
- Are the sequences similar?
- Obviously you have found the MYSPE sequence as a result of a BLAST search and you probably known that BLAST finds similar sequences in large databases. But it will usually always find something, and that could be a chance similarity. Significant similarity would be very high, would extend over the whole length of the protein, could be restricted to individual domains. When would you say: similar enough?
- Do the proteins have similar structures?
- If your protein happens to have had a part of its structure analyzed by X-ray crystallography, you could compare the structures. However, this is unlikely for the Mbp1 relatives - except for the ankyrin domains. These are ubiquitous protein-protein interaction motifs and won't tell us much more than that. It's unlikely that other (parts of) the MYSPE protein structure are known.
- What about patterns of conserved residues?
- We need more proteins to consider that - and we need to align them.
- Are the proteins known to perform similar functions?
- That might require function prediction. There might be an annotation in the FASTA header of the MYSPE protein - but it's likely to be made based on homology to the yeast protein. Could be experimental evidence though - check carefully, just in case.
All of these considerations lead to bioinformatics queries that we will pursue in later units.
Defining orthologs
For functional inference between organisms, the key is to find orthologs.
To be reasonably certain about orthology relationships, one needs to construct and analyze detailed evolutionary trees. This is computationally expensive and the results are not always unambiguous. But a number of different strategies are available that use approximations, or precomputed results to define orthologs. These are especially useful for large, cross genome surveys. They are less useful for detailed analysis of individual genes.
- Orthologs by RBM (Reciprocal Best Match)
- The RBM criterion is only an approximation to orthology, but computationally very tractable and usually correct[1]. To find an RBM, first search for the best match of a gene in the target genome, then check whether that best match retrieves the original query when it used to serach in the source genome. You have already done the first step when you identified the best match of yeast Mbp1 in MYSPE. Now do the second step.
Get the ID for the gene which you have identified and annotated as the best BLAST match for Mbp1 in MYSPE and confirm that this gene has Mbp1 as the most significant hit in the yeast proteome. The results are unambiguous, but there may be residual doubt whether these two best-matching sequences are actually the most similar orthologs.
Task:
- Navigate to the BLAST homepage and access the protein BLAST page.
- Copy the RefSeq identifier for MBP1_MYSPE from your journal into the search field (You can search directly with an NCBI identifier IF you want to search with the full-length sequence.)
- Set the database to refseq;
- restrict the species to Saccharomyces cerevisiae.
- Run BLAST.
- Keep the window open for the next task.
The top hit should be yeast Mbp1 (NP_010227). Discuss on the list if it is not.
If the top hit is NP_010227, you have confirmed the RBM or BBM criterion (Reciprocal Best Match or Bidirectional Best Hit, respectively).
Task:
Explain to someone you know why RBM is expected to find orthologous pairs of genes. Don't paraphrase the fact that they do, or merely describe how an RBM analysis works, but explain why we can expect it to be successful in identifying an evolutionary relationship when all we have are measures of pairwise similarity.
If you can't figure it out, ask on the mailing list.
- Orthology by annotation
- The NCBI precomputes gropus of related genes and makes them available via the HomoloGene database from the RefSeq database entry for your protein.
Task:
- Navigate to the RefSeq protein page for MBP1_MYSPE. (There should be a link from the query identifier in your BLAST result page).
- Follow the Homologene link in the right-hand menu under Related information. (Follow the link to MBP1_SACCE if your species has not been annotated and there is no Homolgene link from your protein's page.)
You should see a number of genes that are considered homologous other fungi, but there is no way to tell whether these are orthologues, and the links to proteins with shared domains shows you that there are several that share (non-specific) ankyrin domains, and only a few that also have the (highly specific) Kila-N (or APSES) domain.
- Orthologs by eggNOG
- The eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups) database contains orthologous groups of genes at the EMBL. It seems to be continuously updated, and the search functionality is reasonable. Try the search with the MBP1_MYSPE refseq identifier. What I see are orthologs annotated in non-fungi but to the ankyrin domain, which is a meaningless relationship. Alignments and trees are also available, as are database downloads for algorithmic analysis.
- Orthologs at OrthoDB
- OrthoDB includes a large number of species, among them all of our protein-sequenced fungi. However the search function (by keyword - try "Mbp1") retrieves many paralogs together with the orthologs, for example, the yeast Soc2 and Phd1 proteins are found in the same orthologous group these two are clearly paralogs and again results focus on ankyrin-domain containing proteins.
- Orthologs at OMA
OMA (the Orthologous Matrix) maintained at the Swiss Federal Institute of Technology contains a large number of orthologs from sequenced genomes. Searching with the refseq identifier of MBP1_MYSPE will probably retrieve hits that you can access via the "Orthologs" tab. As a whole this database is well constructed, the output is useful, and data is available for download and API access; this would be the resource of my first choice for pre-computed orthology queries.
- Orthologs by syntenic gene order conservation
- OMA also provides synteny information, one hallmark of an orthologous relationship (Why?).
Self-evaluation
Notes
Further reading, links and resources
Koonin (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39:309-38. (pmid: 16285863) |
If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2017-09-30
Version:
- 1.0
Version history:
- 1.0 First live version
- 0.1 First stub
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.