Expected Preparations:
|
|||||||
|
|||||||
Keywords: Integrator unit: create a homology model and assess the role of sequence conservation | |||||||
|
|||||||
Objectives:
|
Outcomes:
|
||||||
|
|||||||
Deliverables: Integrator unit: Deliverables can be submitted for course marks. See below for details. |
|||||||
|
|||||||
Evaluation: Material based on this Integrator Unit can be submitted for summative feedback (course marks). It will be marked for a maximum of 18 marks for a regular submission, resp. 36 marks if you choose this for your Oral Test1. For your report:
If you choose this unit for your Oral Test option:
|
This page integrates material from the learning units for working with multiple sequence alignments and structure data in a task for evaluation.
You have collected the APSES domain proteins of MYSPE in your protein database and this is now a useful collection of sequences with a shared fold that have evolved separately - but under similar constraints - for hundreds of millions of years.
We can be very confident about our APSES domain alignments, since there are hardly any indels in these sequences - and given a confident alignment we can arrive at a very reasonable structural model. This, for example would allow us to look at residues in the APSES recognition domain that are conserved among known Mbp1 orthologues, but vary between paralogues - you have all the tools to try this at some point.
For this assignment however we are going to look at conservation in the ankyrin domains. Their identification and alignment is a bigger challenge. Interestingly, an ankyrin domain structure is known for one of the homologues in this set - although it is not in our set of sequences. This is the structure of yeast Swi6, a homologue of Mbp1 that has a non-functional APSES domain; it too is involved in cell-cycle regulation since it dimerizes with Mbp1 in the MBF complex (as well as dimerizing with Swi4 in the SBF complex).
Ankyrin domains are among the most widely distributed protein interaction modules and they are found in a wide variaty of functionally diverse proteins. One might think that it is an advantage to work with sequences for which much data is available, but in this case, the abundance of examples actually turns into a liability: database searches come up with so many hits that the biologically or functionally significant ones easily get drowned out in the noise.
There are two sources of structure we will compare. One is the “classic” homology modeling approach: take a known structure, change some amino acids, then find a reasonable model that shows how the sequence changes would be accommodated. This is the SwissModel approach.
The other approach is less than two years old. In December 2020 a program made headlines as a contender for having solved the greatest challenge of computational biology: the protein folding problem. Indeed, this algorithm - Alpha Fold - is unanimously considered the source of a “revolution” in structural biology, by the most highly regarded researchers in the field. As a recent Nature news feature maps out, its impact can hardly be overstated (Callaway 2022). Do read that article. This is ab initio structure prediction.
There are currently two easy ways to get AlphaFold models. For many sequences the models already exist and can be pulled from a large database of structures curated by the EBI. For sequences for which no model exists yet, you can run Alpha Fold for free and get a prediction (although this may take a few hours). Both procedures are integrated with ChimeraX.
(If no AlphaFold structure exists at the EBI and you can’t produce a model, document what you did, what you expected to happen, what happened instead, and whether you think this problem can be solved. Then (and only then) use the Saccharomyyces cerevisiae model instead.)
If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.
Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.
[END]
Note: the oral test is cumulative. It will focus on the content of this unit but will also cover other material that leads up to it.↩︎
You will probably already have this sequence annotated to MBP1_MYSPE since this is annotated by similarity by SMART. However we need a “real alignment” of the entire sequence this time.↩︎
These will be too many sequences for the Muscle algorithm, use CLUSTAL Omega instead.↩︎