Difference between revisions of "ABC-INT-Phylogeny"

From "A B C"
Jump to navigation Jump to search
m
m
 
(24 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:4px solid #000000; background-color:#e19fa7; font-size:300%; font-weight:400; color: #000000; width:100%;">
Integration Unit: Phylogeny
+
Integrator Unit: Phylogeny
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#e19fa7; font-size:30%; font-weight:200; color: #000000; ">
 +
(Integrator unit: calculate and analyse a phylogenetic tree)
 +
</div>
 +
</div>
 +
 
 +
{{Smallvspace}}
 +
 
  
  {{Vspace}}
+
<div style="padding:5px; border:1px solid #000000; background-color:#e19fa733; font-size:85%;">
 
+
<div style="font-size:118%;">
<div class="keywords">
+
<b>Abstract:</b><br />
<b>Keywords:</b>&nbsp;
+
<section begin=abstract />
Integration unit: calculate and analyse a phylogenetic tree
+
This page integrates material from the learning units for working with multiple sequence alignments, and building and analysing phylogenetic trees, in a task for evaluation.
 +
<section end=abstract />
 
</div>
 
</div>
 +
<!-- ============================  -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<li><b>Integrator unit</b>: Deliverables can be submitted for course marks. See below for details.</li>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 +
*[[BIN-PHYLO-Tree_analysis|BIN-PHYLO-Tree_analysis (Analysing Phylogenetic Trees)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 +
</div>
 +
 +
{{Smallvspace}}
 +
  
{{Vspace}}
+
 
 +
{{Smallvspace}}
  
  
Line 19: Line 45:
  
  
{{STUB}}
+
=== Evaluation ===
 +
This "Integrator Unit" should be submitted for evaluation for a maximum of 13 marks if one of the written deliverables is chosen, resp. 24 marks if you choose this for your oral test<ref>Note: the oral test is cumulative. It will focus on the content of this unit but will also cover other material that leads up to it.</ref>.
 +
:Please note the evaluation types that are available as options for this unit.
 +
:Be mindful of the [[ABC-Rubrics| '''Marking rubrics''']].
 +
:If this is submitted for your oral test, please read the [[BCH441 Oral Test instructions|Oral test instructions]] before you begin.
 +
:If your submission includes R code, please read the [[BCH441 Code submisson instructions|Code submission instructions]] before you begin.
  
{{Vspace}}
+
Once you have chosen an option ...
 +
<ol>
 +
<li>Create a new page on the student Wiki as a subpage of your User Page.</li>
 +
<li>Put all of your writing to submit on this one page.</li>
  
 +
<li>When you are done with everything, go to the [https://q.utoronto.ca/courses/180416/assignments Quercus '''Assignments''' page] and open the appropriate '''Integrator Unit''' assignment. Paste the URL of your Wiki page into the form, and click on '''Submit Assignment'''.</li>
 +
</ol>
  
</div>
+
Your link can be submitted only once and not edited. But you may change your Wiki page at any time. However only the last version before the due date will be marked. All later edits will be silently ignored.
<div id="ABC-unit-framework">
 
== Abstract ==
 
<!-- included from "../components/ABC-INT-Phylogeny.components.wtxt", section: "abstract" -->
 
This page assesses the learning units for working with multiple sequence alignments and structure data.
 
  
*Use an Mbp1 orthologue (RBM)
+
{{Smallvspace}}
*Find all orthologues and paralogues in YFO and reference species
 
*Collect, rename, align
 
*Prepare for phylogenetic analysis
 
*Calculate tree
 
*Analyze informativeness
 
*Prepare reference tree from 28srRNA (also see https://bmcevolbiol.biomedcentral.com/articles/10.1186/1471-2148-11-152 and https://academic.oup.com/sysbio/article/61/5/835/1736260/Phylogenetic-Signal-and-Noise-Predicting-the-Power)
 
*Interpret
 
  
{{Vspace}}
+
;Report option
 +
* Work through the tasks described below.
 +
* Document your results in a short technical report on a subpage of your User page on the Student Wiki. Describe your methods in your report to an appropriate level of detail that your analysis can be exactly reproduced. If you write R-code, include the code in your report;
 +
* When you are done, submit the link to your page via Quercus as described above.
  
 +
{{Smallvspace}}
  
== This unit ... ==
+
;Interview option
=== Prerequisites ===
+
: Identify a laboratory whose work includes constructing and evaluating phylogenetic trees. Get in touch with the PI, a postdoc or senior graduate student in the laboratory and interview them <!-- in person or --> by eMail. Make sure they understand that this is a for-credit assignment in a course you are taking.<ref>You may CC me on correspondence if you wish.</ref> Devise meaningful questions to find out:
<!-- included from "../components/ABC-INT-Phylogeny.components.wtxt", section: "prerequisites" -->
+
:* why this work is important;
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
+
:* what methods they employ;
You need to complete the following units before beginning this one:
+
:* in particular, how they define reference species, create alignments, compute phylogenetic trees, estimate significance and prepare figures for publication (get technical on that point: you need to report on software and key parameters);
*[[BIN-PHYLO-Selective_pressure]]
+
:* what they have recently learned;
 +
:* what the major challenges, current discussions, or controversies in their field are.
 +
:* write up your interview on a subpage of your User page of the Student Wiki;
 +
:* add background information that may be required to understand the methodology (assume the level of background knowledge appropriate for a student in this course);
 +
:* make sure that you have included important literature references.
 +
:* Follow up questions if additional clarification is needed.
 +
* When you are done, submit the link to your page via Quercus as described above.
  
{{Vspace}}
+
{{Smallvspace}}
  
 +
;Literature research option
 +
:Navigate to the [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/ABC-INT-Phylogeny_topics '''Phylogeny Literature Research Topics] page on the Student Wiki.
 +
:* Pick a topic and enter your name in the table to claim it.
 +
:* Write a report on your research. Note: this is not a review, but a report. Think of a "whitepaper", not a publication. Write to a specialist technical audience and be specific to provide actionable information.
 +
:* write your report on a subpage of your User page of the Student Wiki;
 +
:* make sure that you have included all references in your report, and citations in an appropriate reference section. Use the <tt><nowiki>{{#pmid:0000000}}</nowiki></tt> template. References must be to the page where the information is found, not merely to a paper as a whole (where appropriate).
 +
* When you are done, submit the link to your page via Quercus as described above.
  
=== Objectives ===
+
{{Smallvspace}}
<!-- included from "../components/ABC-INT-Phylogeny.components.wtxt", section: "objectives" -->
 
...
 
  
{{Vspace}}
+
;Oral test option
 +
* Work through the tasks described below. Remember to document your work in your journal, but there is no need to format this specially as a report.
 +
* Part of your task will involve writing R code; refer to the [[BCH441 Code submisson instructions|Code submission instructions]] and link to your page from your Journal.
 +
* Note that the work must be completed [[BCH441 Oral Test instructions| '''before''' your actual test date.]]
  
 +
{{Smallvspace}}
 +
;R code option
 +
* Work through the tasks described in the scenario and develop code as required.
 +
* Put your code and other documentation on a subpage of your User page on the Student Wiki;
 +
* When you are done, submit the link to your page via Quercus as described above.
  
=== Outcomes ===
+
== Contents ==
<!-- included from "../components/ABC-INT-Phylogeny.components.wtxt", section: "outcomes" -->
 
...
 
  
{{Vspace}}
 
  
 +
=== For the Report Option ...===
  
=== Deliverables ===
+
Choose '''one''' of the two tasks below:
<!-- included from "../components/ABC-INT-Phylogeny.components.wtxt", section: "deliverables" -->
+
{{Smallvspace}}
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-milestone" -->
+
;Does masking improve the tree?
*<b>No separate deliverables</b>: This unit collects other units and has no deliverables on its own.
+
{{Smallvspace}}
 +
{{task|
 +
# Produce a phylogenetic tree from full-length Mbp1 orthologues for the reference species and MYSPE. Do not apply masking.
 +
# Produce a second phylogenetic tree from full-length Mbp1 orthologues for the reference species and MYSPE. Apply masking to delete all columns that have more then 2/3 gap-characters.
 +
# Determine which tree is "more correct" by calculating tree distances to the species tree.
 +
# Report your findings.
 +
}}
 +
{{Smallvspace}}
 +
;Does adding characters improve the tree?
 +
{{Smallvspace}}
 +
{{task|
 +
# Produce a phylogenetic tree only from '''APSES domains''' of Mbp1 orthologues for the reference species and MYSPE. Do not apply masking.
 +
# Produce a second phylogenetic tree from '''full-length''' Mbp1 orthologues for the reference species and MYSPE. Again, do not apply masking.
 +
# Determine which tree is "more correct" by calculating tree distances to the species tree.
 +
# Report your findings.
 +
}}
  
 
{{Vspace}}
 
{{Vspace}}
  
 +
=== For the Oral Test Option ...===
  
=== Evaluation ===
+
Interpret the full APSES tree.
<!-- included from "../components/ABC-INT-Phylogeny.components.wtxt", section: "evaluation" -->
+
{{Smallvspace}}
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
+
{{task|1=
<b>Evaluation: NA</b><br />
+
* Produce a phylogenetic tree from APSES domains of all proteins in myDB. This includes all APSES domain proteins from MYSPE that you have found with PSI-BLAST. (Caution: the proml program may take quite long to compute this tree. Several hours or overnight. Don't choose this option if you don't have sufficient time before the date of your test.) You will find this bit of code useful to get you started:
:This unit is not evaluated for course marks.
+
<pre>
 +
library(msa)
  
{{Vspace}}
+
# Align all sequences in the database + KILA_ESSCO
 +
mySeq <- myDB$protein$sequence
 +
names(mySeq) <- myDB$protein$name
 +
mySeq <- c(mySeq,
 +
          "IDGEIIHLRAKDGYINATSMCRTAGKLLSDYTRLKTTQEFFDELSRDMGIPISELIQSFKGGRPENQGTWVHPDIAINLAQ")
 +
names(mySeq)[length(mySeq)] <- "KILA_ESCCO"
  
 +
mySeqMSA <- msaClustalOmega(AAStringSet(mySeq)) # too many sequences for MUSCLE
  
</div>
 
<div id="BIO">
 
== Contents ==
 
<!-- included from "../components/ABC-INT-Phylogeny.components.wtxt", section: "contents" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "milestone" -->
 
This is a "milestone unit". Its purpose is merely to collect a number of preparatory units into a single, common prerequisite. It has no contents of its own; you are expected to be familiar and competent with all preparatory material at this point.
 
  
{{Vspace}}
+
# get the sequence of the SACCE APSES domain
 +
sel <- myDB$protein$name == "MBP1_SACCE"
 +
proID <- myDB$protein$ID[sel]
  
 +
sel <- myDB$feature$ID[myDB$feature$name == "APSES fold"]
 +
fanID <- myDB$annotation$ID[myDB$annotation$proteinID == proID &
 +
                            myDB$annotation$featureID == sel]
 +
start <- myDB$annotation$start[fanID]
 +
end  <- myDB$annotation$end[fanID]
  
== Further reading, links and resources ==
+
SACCEapses <- substring(myDB$protein$sequence[proID], start, end)
<!-- {{#pmid: 19957275}} -->
 
<!-- {{WWW|WWW_GMOD}} -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
  
{{Vspace}}
+
# extract the APSES domains from the MSA
 +
APSESmsa <- fetchMSAmotif(mySeqMSA, SACCEapses)
  
 +
# Produce the phylogenetic tree ...
 +
</pre>
  
== Notes ==
+
* Interpret the tree with two objectives.
<!-- included from "../components/ABC-INT-Phylogeny.components.wtxt", section: "notes" -->
+
:* (A) how many APSES domain proteins did the last common ancestor (LCA) of all fungi have?
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
+
:* (B) what is the evolutionary history of the APSES domain proteins in MYSPE? Were genes lost? Did duplications occur?
<references />
+
* Annotate your tree. Be prepared to screen-share your annotation document during the test and to discuss your interpretation.
 +
}}
  
 
{{Vspace}}
 
{{Vspace}}
  
 +
=== For the R-code option ...===
  
</div>
+
;Does adding information improve the tree?
<div id="ABC-unit-framework">
+
{{Smallvspace}}
== Self-evaluation ==
+
{{task|1=
<!-- included from "../components/ABC-INT-Phylogeny.components.wtxt", section: "self-evaluation" -->
+
Here we compare our original tree, with one that was produced after adding additional sequences into the tree-building step, however keeping the same alignment.
<!--
 
=== Question 1===
 
  
Question ...
+
* Produce a MSA from APSES domains of all proteins in myDB.
 +
<pre>
 +
library(msa)
  
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
+
# Align all sequences in the database + KILA_ESSCO
Answer ...
+
mySeq <- myDB$protein$sequence
<div class="mw-collapsible-content">
+
names(mySeq) <- myDB$protein$name
Answer ...
+
mySeq <- c(mySeq,
 +
          "IDGEIIHLRAKDGYINATSMCRTAGKLLSDYTRLKTTQEFFDELSRDMGIPISELIQSFKGGRPENQGTWVHPDIAINLAQ")
 +
names(mySeq)[length(mySeq)] <- "KILA_ESCCO"
  
</div>
+
mySeqMSA <- msaClustalOmega(AAStringSet(mySeq)) # too many sequences for MUSCLE
  </div>
 
  
  {{Vspace}}
 
  
-->
+
# get the sequence of the SACCE APSES domain
 +
sel <- myDB$protein$name == "MBP1_SACCE"
 +
proID <- myDB$protein$ID[sel]
  
{{Vspace}}
+
sel <- myDB$feature$ID[myDB$feature$name == "APSES fold"]
 +
fanID <- myDB$annotation$ID[myDB$annotation$proteinID == proID &
 +
                            myDB$annotation$featureID == sel]
 +
start <- myDB$annotation$start[fanID]
 +
end  <- myDB$annotation$end[fanID]
  
 +
SACCEapses <- substring(myDB$protein$sequence[proID], start, end)
  
 +
# extract the APSES domains from the MSA
 +
APSESmsa <- fetchMSAmotif(mySeqMSA, SACCEapses)
 +
</pre>
  
{{Vspace}}
+
* Write an R-script that does the following:
 +
** pick ten random sequences plus the Mbp1 orthologues plus <tt>KILA_ESCCO</tt>
 +
** remove all other sequences from the alignment
 +
** mask all columns that have more then 80% gap characters
 +
** produce a phylogenetic tree from this input data
 +
** drop all tips from your tree that are not Mbp1 orthologues and not <tt>KILA_ESCCO</tt>. This code will be useful:
 +
<pre>
 +
# assuming your new tree is called "allApsTree"
 +
sel <- ! (allApsTree$tip.label %in% fungiTree$tip.label)
 +
newTree <- drop.tip(allApsTree, allApsTree$tip.label[sel])
 +
</pre>
 +
* Is this tree more similar to <tt>fungiTree</tt> than <tt>apsTree</tt> was?
 +
* Submit your script, significant data, and the results. Interpret the outcome.
  
 +
}}
  
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
+
{{Vspace}}
  
----
+
== Further reading, links and resources ==
 +
<!-- {{#pmid: 19957275}} -->
 +
<!-- {{WWW|WWW_GMOD}} -->
 +
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 +
== Notes ==
 +
<references />
  
 
{{Vspace}}
 
{{Vspace}}
  
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
 
 
----
 
 
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 157: Line 252:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-09
+
:2020-12-07
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:0.1
+
:1.4
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.4 Regenerated inadvertently deleted "Report option" instructions.
 +
*1.3 Edit policy update
 +
*1.2 2020 Updates
 +
*1.1 Corrected posted marks, which were not consistent with the description in the syllabus.
 +
*1.0 First live version
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{INTEGRATOR}}
 +
{{LIVE}}
 +
{{EVAL}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 21:27, 8 December 2020

Integrator Unit: Phylogeny

(Integrator unit: calculate and analyse a phylogenetic tree)


 


Abstract:

This page integrates material from the learning units for working with multiple sequence alignments, and building and analysing phylogenetic trees, in a task for evaluation.


Deliverables:

  • Integrator unit: Deliverables can be submitted for course marks. See below for details.

  • Prerequisites:
    This unit builds on material covered in the following prerequisite units:


     



     



     


    Evaluation

    This "Integrator Unit" should be submitted for evaluation for a maximum of 13 marks if one of the written deliverables is chosen, resp. 24 marks if you choose this for your oral test[1].

    Please note the evaluation types that are available as options for this unit.
    Be mindful of the Marking rubrics.
    If this is submitted for your oral test, please read the Oral test instructions before you begin.
    If your submission includes R code, please read the Code submission instructions before you begin.

    Once you have chosen an option ...

    1. Create a new page on the student Wiki as a subpage of your User Page.
    2. Put all of your writing to submit on this one page.
    3. When you are done with everything, go to the Quercus Assignments page and open the appropriate Integrator Unit assignment. Paste the URL of your Wiki page into the form, and click on Submit Assignment.

    Your link can be submitted only once and not edited. But you may change your Wiki page at any time. However only the last version before the due date will be marked. All later edits will be silently ignored.


     
    Report option
    • Work through the tasks described below.
    • Document your results in a short technical report on a subpage of your User page on the Student Wiki. Describe your methods in your report to an appropriate level of detail that your analysis can be exactly reproduced. If you write R-code, include the code in your report;
    • When you are done, submit the link to your page via Quercus as described above.


     
    Interview option
    Identify a laboratory whose work includes constructing and evaluating phylogenetic trees. Get in touch with the PI, a postdoc or senior graduate student in the laboratory and interview them by eMail. Make sure they understand that this is a for-credit assignment in a course you are taking.[2] Devise meaningful questions to find out:
    • why this work is important;
    • what methods they employ;
    • in particular, how they define reference species, create alignments, compute phylogenetic trees, estimate significance and prepare figures for publication (get technical on that point: you need to report on software and key parameters);
    • what they have recently learned;
    • what the major challenges, current discussions, or controversies in their field are.
    • write up your interview on a subpage of your User page of the Student Wiki;
    • add background information that may be required to understand the methodology (assume the level of background knowledge appropriate for a student in this course);
    • make sure that you have included important literature references.
    • Follow up questions if additional clarification is needed.
    • When you are done, submit the link to your page via Quercus as described above.


     
    Literature research option
    Navigate to the Phylogeny Literature Research Topics page on the Student Wiki.
    • Pick a topic and enter your name in the table to claim it.
    • Write a report on your research. Note: this is not a review, but a report. Think of a "whitepaper", not a publication. Write to a specialist technical audience and be specific to provide actionable information.
    • write your report on a subpage of your User page of the Student Wiki;
    • make sure that you have included all references in your report, and citations in an appropriate reference section. Use the {{#pmid:0000000}} template. References must be to the page where the information is found, not merely to a paper as a whole (where appropriate).
    • When you are done, submit the link to your page via Quercus as described above.


     
    Oral test option
    • Work through the tasks described below. Remember to document your work in your journal, but there is no need to format this specially as a report.
    • Part of your task will involve writing R code; refer to the Code submission instructions and link to your page from your Journal.
    • Note that the work must be completed before your actual test date.


     
    R code option
    • Work through the tasks described in the scenario and develop code as required.
    • Put your code and other documentation on a subpage of your User page on the Student Wiki;
    • When you are done, submit the link to your page via Quercus as described above.

    Contents

    For the Report Option ...

    Choose one of the two tasks below:

     
    Does masking improve the tree?
     

    Task:

    1. Produce a phylogenetic tree from full-length Mbp1 orthologues for the reference species and MYSPE. Do not apply masking.
    2. Produce a second phylogenetic tree from full-length Mbp1 orthologues for the reference species and MYSPE. Apply masking to delete all columns that have more then 2/3 gap-characters.
    3. Determine which tree is "more correct" by calculating tree distances to the species tree.
    4. Report your findings.
     
    Does adding characters improve the tree?
     

    Task:

    1. Produce a phylogenetic tree only from APSES domains of Mbp1 orthologues for the reference species and MYSPE. Do not apply masking.
    2. Produce a second phylogenetic tree from full-length Mbp1 orthologues for the reference species and MYSPE. Again, do not apply masking.
    3. Determine which tree is "more correct" by calculating tree distances to the species tree.
    4. Report your findings.


     

    For the Oral Test Option ...

    Interpret the full APSES tree.

     

    Task:

    • Produce a phylogenetic tree from APSES domains of all proteins in myDB. This includes all APSES domain proteins from MYSPE that you have found with PSI-BLAST. (Caution: the proml program may take quite long to compute this tree. Several hours or overnight. Don't choose this option if you don't have sufficient time before the date of your test.) You will find this bit of code useful to get you started:
    library(msa)
    
    # Align all sequences in the database + KILA_ESSCO
    mySeq <- myDB$protein$sequence
    names(mySeq) <- myDB$protein$name
    mySeq <- c(mySeq,
               "IDGEIIHLRAKDGYINATSMCRTAGKLLSDYTRLKTTQEFFDELSRDMGIPISELIQSFKGGRPENQGTWVHPDIAINLAQ")
    names(mySeq)[length(mySeq)] <- "KILA_ESCCO"
    
    mySeqMSA <- msaClustalOmega(AAStringSet(mySeq)) # too many sequences for MUSCLE
    
    
    # get the sequence of the SACCE APSES domain
    sel <- myDB$protein$name == "MBP1_SACCE"
    proID <- myDB$protein$ID[sel]
    
    sel <- myDB$feature$ID[myDB$feature$name == "APSES fold"]
    fanID <- myDB$annotation$ID[myDB$annotation$proteinID == proID &
                                myDB$annotation$featureID == sel]
    start <- myDB$annotation$start[fanID]
    end   <- myDB$annotation$end[fanID]
    
    SACCEapses <- substring(myDB$protein$sequence[proID], start, end)
    
    # extract the APSES domains from the MSA
    APSESmsa <- fetchMSAmotif(mySeqMSA, SACCEapses)
    
    # Produce the phylogenetic tree ...
    
    • Interpret the tree with two objectives.
    • (A) how many APSES domain proteins did the last common ancestor (LCA) of all fungi have?
    • (B) what is the evolutionary history of the APSES domain proteins in MYSPE? Were genes lost? Did duplications occur?
    • Annotate your tree. Be prepared to screen-share your annotation document during the test and to discuss your interpretation.


     

    For the R-code option ...

    Does adding information improve the tree?
     

    Task:
    Here we compare our original tree, with one that was produced after adding additional sequences into the tree-building step, however keeping the same alignment.

    • Produce a MSA from APSES domains of all proteins in myDB.
    library(msa)
    
    # Align all sequences in the database + KILA_ESSCO
    mySeq <- myDB$protein$sequence
    names(mySeq) <- myDB$protein$name
    mySeq <- c(mySeq,
               "IDGEIIHLRAKDGYINATSMCRTAGKLLSDYTRLKTTQEFFDELSRDMGIPISELIQSFKGGRPENQGTWVHPDIAINLAQ")
    names(mySeq)[length(mySeq)] <- "KILA_ESCCO"
    
    mySeqMSA <- msaClustalOmega(AAStringSet(mySeq)) # too many sequences for MUSCLE
    
    
    # get the sequence of the SACCE APSES domain
    sel <- myDB$protein$name == "MBP1_SACCE"
    proID <- myDB$protein$ID[sel]
    
    sel <- myDB$feature$ID[myDB$feature$name == "APSES fold"]
    fanID <- myDB$annotation$ID[myDB$annotation$proteinID == proID &
                                myDB$annotation$featureID == sel]
    start <- myDB$annotation$start[fanID]
    end   <- myDB$annotation$end[fanID]
    
    SACCEapses <- substring(myDB$protein$sequence[proID], start, end)
    
    # extract the APSES domains from the MSA
    APSESmsa <- fetchMSAmotif(mySeqMSA, SACCEapses)
    
    • Write an R-script that does the following:
      • pick ten random sequences plus the Mbp1 orthologues plus KILA_ESCCO
      • remove all other sequences from the alignment
      • mask all columns that have more then 80% gap characters
      • produce a phylogenetic tree from this input data
      • drop all tips from your tree that are not Mbp1 orthologues and not KILA_ESCCO. This code will be useful:
    # assuming your new tree is called "allApsTree"
    sel <- ! (allApsTree$tip.label %in% fungiTree$tip.label)
    newTree <- drop.tip(allApsTree, allApsTree$tip.label[sel])
    
    • Is this tree more similar to fungiTree than apsTree was?
    • Submit your script, significant data, and the results. Interpret the outcome.


     

    Further reading, links and resources

    Notes

    1. Note: the oral test is cumulative. It will focus on the content of this unit but will also cover other material that leads up to it.
    2. You may CC me on correspondence if you wish.


     


    About ...
     
    Author:

    Boris Steipe <boris.steipe@utoronto.ca>

    Created:

    2017-08-05

    Modified:

    2020-12-07

    Version:

    1.4

    Version history:

    • 1.4 Regenerated inadvertently deleted "Report option" instructions.
    • 1.3 Edit policy update
    • 1.2 2020 Updates
    • 1.1 Corrected posted marks, which were not consistent with the description in the syllabus.
    • 1.0 First live version
    • 0.1 First stub

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.