Integrator Unit: Homology Modelling

Contents
- Scenario and background
Homology Model: Swiss-Model
Ab initio Model: AlphaFold
Compare
Analyze
Interpret
Questions, comments
References

Expected Preparations:

	[BIN-SX] Homology_modelling
	The units listed above are part of this course and contain important preparatory material.

Keywords: Integrator unit: create a homology model and assess the role of sequence conservation

Objectives:

Outcomes:

Deliverables:

Integrator unit: Deliverables can be submitted for course marks. See below for details.

Evaluation:

Material based on this Integrator Unit can be submitted for summative feedback (course marks). It will be marked for a maximum of 18 marks for a regular submission, resp. 36 marks if you choose this for your Oral Test ¹.

For your report:

Create a new document in your shared Google drive folder.
Call your document ABC-INT-Homology-modelling-<your name>-2022
Work through the tasks described below.
Document your work and your results. Write this at a technical level, like a lab report and include all details that are needed to make your work reproducible. Follow the additional instructions for R-code in case you are submitting code.
You are asked to produce several ChimeraX images. Either write the required ChimeraX commands into a script, or use the CX() function to execute them. In any case: you should be able to recreate the image by running your script. No manual commands through the ChimeraX menu interface should be required. Document your script in an appendix to your report.
Include a (CC) license at the end of your document, as instructed at the beginning of the course.
When you are done with everything, go to the Assignments page on Quercus and open the appropriate Integrator Unit submission category. Paste the URL of your report document into the form, and click on Submit Assignment. Your link can be submitted only once and not edited. Also: do not edit your document after it has been submitted.

If you choose this unit for your Oral Test option:

Prepare your report as above.
Be prepared to discuss your findings during the test.
Make sure the report is submitted before your test date (cf. Oral Test instructions).

This page integrates material from the learning units for working with multiple sequence alignments and structure data in a task for evaluation.

Scenario and background

You have collected the APSES domain proteins of MYSPE in your protein database and this is now a useful collection of sequences with a shared fold that have evolved separately - but under similar constraints - for hundreds of millions of years.

We can be very confident about our APSES domain alignments, since there are hardly any indels in these sequences - and given a confident alignment we can arrive at a very reasonable structural model. This, for example would allow us to look at residues in the APSES recognition domain that are conserved among known Mbp1 orthologues, but vary between paralogues - you have all the tools to try this at some point.

For this assignment however we are going to look at conservation in the ankyrin domains. Their identification and alignment is a bigger challenge. Interestingly, an ankyrin domain structure is known for one of the homologues in this set - although it is not in our set of sequences. This is the structure of yeast Swi6, a homologue of Mbp1 that has a non-functional APSES domain; it too is involved in cell-cycle regulation since it dimerizes with Mbp1 in the MBF complex (as well as dimerizing with Swi4 in the SBF complex).

Ankyrin domains are among the most widely distributed protein interaction modules and they are found in a wide variaty of functionally diverse proteins. One might think that it is an advantage to work with sequences for which much data is available, but in this case, the abundance of examples actually turns into a liability: database searches come up with so many hits that the biologically or functionally significant ones easily get drowned out in the noise.

There are two sources of structure we will compare. One is the “classic” homology modeling approach: take a known structure, change some amino acids, then find a reasonable model that shows how the sequence changes would be accommodated. This is the SwissModel approach.

The other approach is less than two years old. In December 2020 a program made headlines as a contender for having solved the greatest challenge of computational biology: the protein folding problem. Indeed, this algorithm - Alpha Fold - is unanimously considered the source of a “revolution” in structural biology, by the most highly regarded researchers in the field. As a recent Nature news feature maps out, its impact can hardly be overstated (Callaway 2022). Do read that article. This is ab initio structure prediction.

Homology Model: Swiss-Model

Produce a multiple sequence alignment of the yeast Swi6 sequence from PDB 1SW6.² Make sure you include all MBP1 homologues from the database, not just the Mbp1 orthologues.³
Following the procedures of the Homology Modelling unit, prepare a homology model of the MBP1_MYSPE ankyrin domains based on the 1SW6 structure.

Ab initio Model: AlphaFold

There are currently two easy ways to get AlphaFold models. For many sequences the models already exist and can be pulled from a large database of structures curated by the EBI. For sequences for which no model exists yet, you can run Alpha Fold for free and get a prediction (although this may take a few hours). Both procedures are integrated with ChimeraX.

Read about the principle here, and read about the ChimeraX Command: alphafold. This should be straightforward to implement, but if not, please post on the discussion board.
Download a model of the MBP1_MYSPE ankyrin domains as defined for your homology model if one can be found at the EBI; or, if none is available, produce a new model.

(If no AlphaFold structure exists at the EBI and you can’t produce a model, document what you did, what you expected to happen, what happened instead, and whether you think this problem can be solved. Then (and only then) use the Saccharomyyces cerevisiae model instead.)

Compare

Produce an informative stereo image of the superimposed backbone structure of the Swi6 coordinates, the SwissModel, and the AlphaFold model.

Analyze

Considering the columns of your multiple sequence alignment, discuss and document with reference to your two models whether there are solvent exposed residues that are highly conserved. Take particular note whether there are any such residues where MBP1_MYSPE differs from the consensus of the other, aligned sequences. (Such outliers could point to functionally significant residues.)
Discuss with reference to specific residues whether the two models are different in that respect (e.g. does one model place a residue into the core of a protein whereas the other one predicts it to be exposed?). Support your analysis with stereo-images.
Remember to label residues in your images. Remember to document your ChimeraX commands.

Interpret

Which of the two models is better?

Questions, comments

If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.

Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.

References

Callaway, Ewen. 2022. “What’s Next for AlphaFold and the AI Protein-Folding Revolution.” Nature 604 (7905): 234–38. https://doi.org/10.1038/d41586-022-00997-5.

About this page …

[END]

Note: the oral test is cumulative. It will focus on the content of this unit but will also cover other material that leads up to it.↩︎
You will probably already have this sequence annotated to MBP1_MYSPE since this is annotated by similarity by SMART. However we need a “real alignment” of the entire sequence this time.↩︎
These will be too many sequences for the Muscle algorithm, use CLUSTAL Omega instead.↩︎