Difference between revisions of "BIO Assignment Week 5"

From "A B C"
Jump to navigation Jump to search
Line 222: Line 222:
 
*Study and consider whether this is the case here and which residues might be included.
 
*Study and consider whether this is the case here and which residues might be included.
 
}}
 
}}
 +
 +
 +
 +
===APSES domains in Chimera (from A4)===
 +
What precisely constitutes an APSES domain however is a matter of definition, as you can explore in the following (optional) task.
 +
 +
 +
<div class="mw-collapsible mw-collapsed" data-expandtext="Expand" data-collapsetext="Collapse" style="border:#000000 solid 1px; padding: 10px; margin-left:25px; margin-right:25px;">Optional: Load the structure in Chimera, like you did in the last assignment and switch on stereo viewing ... (more) <div  class="mw-collapsible-content">
 +
<ol start="7">
 +
<li>Display the protein in ribbon style, e.g. with the '''Interactive 1''' preset.
 +
<li>Access the '''Interpro''' information page for Mbp1 at the EBI: http://www.ebi.ac.uk/interpro/protein/P39678
 +
<li>In the section '''Domains and repeats''', mouse over the red annotations and note down the residue numbers for the annotated domains. Also follow the links to the respective Interpro domain definition pages.
 +
</ol>
 +
 +
At this point we have definitions for the following regions on the Mbp1 protein ...
 +
*The KilA-N (pfam 04383) domain definition as applied to the Mbp1 protein sequence by CDD;
 +
*The InterPro ''KilA, N-terminal/APSES-type HTH, DNA-binding (IPR018004)'' definition annotated on the Mbp1 sequence;
 +
*The InterPro ''Transcription regulator HTH, APSES-type DNA-binding domain (IPR003163)'' definition annotated on the Mbp1 sequence;
 +
*<small>(... in addition &ndash; without following the source here &ndash; the UniProt record for Mbp1 annotates a "HTH APSES-type" domain from residues 5-111)</small>
 +
 +
... each with its distinct and partially overlapping sequence range. Back to Chimera:
 +
 +
<!-- For reference:
 +
1MB1: 3-100
 +
2BM8: 4-102
 +
CDD KilA-N: 19-93
 +
InterPro KilA-N: 23-88
 +
InterPro APSES: 3-133
 +
Uniprot HTH/APSES: 5-111
 +
-->
 +
 +
<ol start="10">
 +
<li>In the sequence window, select the sequence corresponding to the '''Interpro KilA-N''' annotation and colour this fragment red. <small>Remember that you can get the sequence numbers of a residue in the sequence window when you hover the pointer over it - but do confirm that the sequence numbering that Chimera displays matches the numbering of the Interpro domain definition.</small></li>
 +
 +
<li>Then select the residue range(s) by which the '''CDD KilA-N''' definition is larger, and colour that fragment orange.</li>
 +
 +
<li>Then select the residue range(s) by which the '''InterPro APSES domain''' definition is larger, and colour that fragment yellow.</li>
 +
 +
<li>If the structure contains residues outside these ranges, colour these white.</li>
 +
 +
<li>Study this in a side-by-side stereo view and get a sense for how the ''extra'' sequence beyond the Kil-A N domain(s) is part of the structure, and how the integrity of the folded structure would be affected if these fragments were missing.</li>
 +
 +
<li>Display Hydrogen bonds, to get a sense of interactions between residues from the differently colored parts. First show the protein as a stick model, with sticks that are thicker than the default to give a better sense of sidechain packing:<br />
 +
::(i) '''Select''' &rarr; '''Select all''' <br />
 +
::(ii) '''Actions''' &rarr; '''Ribbon''' &rarr; '''hide''' <br />
 +
::(iii) '''Select''' &rarr; '''Structure''' &rarr; '''protein''' <br />
 +
::(iv) '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''show''' <br />
 +
::(v)  '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''stick''' <br />
 +
::(vi) click on the looking glass icon at the bottom right of the graphics window to bring up the inspector window and choose '''Inspect ... Bond'''. Change the radius to 0.4.<br />
 +
</li>
 +
 +
<li>Then calculate and display the hydrogen bonds:<br />
 +
::(vii) '''Tools''' &rarr; '''Surface/Binding Analysis''' &rarr; '''FindHbond''' <br />
 +
::(viii) Set the '''Line width''' to 3.0, leave all other parameters with their default values an click '''Apply'''<br />
 +
:: Clear the selection.<br />
 +
Study this view, especially regarding side chain H-bonds. Are there many? Do side chains interact more with other sidechains, or with the backbone?
 +
</li>
 +
 +
<li>Let's now simplify the scene a bit and focus on backbone/backbone H-bonds:<br />
 +
::(ix) '''Select''' &rarr; '''Structure''' &rarr; '''Backbone''' &rarr; '''full'''<br />
 +
::(x)  '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''show only'''<br /><br />
 +
:: Clear the selection.<br />
 +
In this way you can appreciate how H-bonds build secondary structure - &alpha;-helices and &beta;-sheets - and how these interact with each other ... in part '''across the KilA N boundary'''.
 +
</li>
 +
 +
 +
<li>Save the resulting image as a jpeg no larger than 600px across and upload it to your Lab notebook on the Wiki.</li>
 +
<li>When you are done, congratulate yourself on having earned a bonus of 10% on the next quiz.</li>
 +
</ol>
 +
 +
</div>
 +
</div>
 +
 +
 +
There is a rather important lesson in this: domain definitions may be fluid, and their boundaries may be computationally derived from sequence comparisons across many families, and do not necessarily correspond to individual structures. Make sure you understand this well.
 +
}}
 +
 +
 +
Given this, it seems appropriate to search the sequence database with the sequence of an Mbp1 structure&ndash;this being a structured, stable, subdomain of the whole that presumably contains the protein's most unique and specific function. Let us retrieve this sequence. All PDB structures have their sequences stored in the NCBI protein database. They can be accessed simply via the PDB-ID, which serves as an identifier both for the NCBI and the PDB databases. However there is a small catch (isn't there always?). PDB files can contain more than one protein, e.g. if the crystal structure contains a complex<ref>Think of the [http://www.pdb.org/pdb/101/motm.do?momID=121 ribosome] or [http://www.pdb.org/pdb/101/motm.do?momID=3 DNA-polymerase] as extreme examples.</ref>. Each of the individual proteins gets a so-called '''chain ID'''&ndash;a one letter identifier&ndash; to identify them uniquely. To find their unique sequence in the database, you need to know the PDB ID as well as the chain ID. If the file contains only a single protein (as in our case), the chain ID is always '''<code>A</code>'''<ref>Otherwise, you need to study the PDB Web page for the structure, or the text in the PDB file itself, to identify which part of the complex is labeled with which chain ID. For example, immunoglobulin structures some time label the ''light-'' and ''heavy chain'' fragments as "L" and "H", and sometimes as "A" and "B"&ndash;there are no fixed rules. You can also load the structure in VMD, color "by chain" and use the mouse to click on residues in each chain to identify it.</ref>. make sure you understand the concept of protein chains, and chain IDs.
 +
 +
 +
 +
 +
  
  

Revision as of 22:11, 11 October 2015

Assignment for Week 5
Structure Analysis

< Assignment 4 Assignment 6 >

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

 
 

Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.






The PDB

The search options in the PDB structure database are as sophisticated as those at the NCBI. For now, we will try a simple keyword search to get us started.


Task:

  1. Visit the RCSB PDB website at http://www.pdb.org/
  2. Briefly orient yourself regarding the database contents and its information offerings and services.
  3. Enter Mbp1 into the search field.
  4. In your journal, note down the PDB IDs for the three Saccharomyces cerevisiae Mbp1 transcription factor structures your search has retrieved.
  5. Click on one of the entries and explore the information and services linked from that page.

 




From A1:

  • install the molecular graphics viewer UCSF Chimera[1] on your own computer, work through a tutorial on its use and begin practicing the skill of viewing split-screen stereographic scenes without aids;

Molecular graphics

A molecular viewer is a program that takes protein structure data and allows you to display and explore it. For a number of reasons, I chose to use the UCSF Chimera viewer for this course.


UCSF Chimera

Task:

  • Access the Chimera page.
  • Install the program as per the instructions in the section: "Installing Chimera".
  • Access the Chimera User's Guide tutorial section. The "Getting Started" tutorial is offered in two versions: one for work with the graphical user interface (GUI), i.e. the usual system of windows and drop-down menu selections. The other is a command-line version for the same. What is the difference? In general, GUI interfaces (Menu version) are well suited for beginners who are not yet familiar with all the options. Having the commands and alternatives presented on a menu makes first steps very easy through simple selection of keywords. On the other hand, work from command line interfaces is much faster and more flexible if you know what you are doing and thus much better suited for the experienced user. It is also quite straightforward to execute series of commands in stored scripts, allowing you to automate tasks. For now, we will stay with the menu version but we will use commands later in the course and you are of course welcome to explore.
  • Work through the Chimera tutorial Getting Started - Menu version, Part 1.

Stereo vision

Task:

Access the Stereo Vision tutorial and practice viewing molecular structures in stereo.

Practice at least ...

  • two times daily,
  • for 3-5 minutes each session,

Keep up your practice throughout the course. Stereo viewing will be required in the final exam, but more importantly, it is a wonderful skill that will greatly support any activity of yours related to structural molecular biology. Practice with different molecules and try out different colours and renderings.

Note: do not go through your practice sessions mechanically. If you are not making any progress with stereo vision, contact me so we can help you on the right track.






from A2;


Chimera

In this task we will explore the sequence interface of Chimera, use it to select specific parts of a molecule, and colour specific regions (or residues) of a molecule separately.

 

Task:

  1. Open Chimera.
  2. One of the three yeast Mbp1 fragment structures has the PDB ID 1BM8. Load it in Chimera (simply enter the ID into the appropriate field of the FileFetch by ID... window).
  3. Display the protein in PresetsInteractive 1 mode and familiarize yourself with its topology of helices and strands.
  4. Open the sequence tool: ToolsSequenceSequence. You will see the sequence for each chain - here there is only one chain. By default, coloured rectangles overlay the secondary structure elements of the sequence.
  5. Hover the mouse over some residues and note that the sequence number and chain is shown at the bottom of the window.
  6. Click/drag one residue to select it. (Simply a click wont work, you need to drag a little bit for the selection to catch on.) Note that the residue gets a green overlay in the sequence window, as it also gets selected with a green border in the graphics window.
  7. In the bottom of the sequence window, there are instructions how to select (multiple) regions. Try this: colour the protein white (SelectSelect All; ActionsColorlight gray). Clear the selection. Now select all the helical regions (pale yellow boxes) by click/dragging and using the shift key. Color them red. Then select all the strands by clicking into any of the pale green boxes and color them green.
  8. Finally, generate a stereo-view that shows the molecule well, in which the domain is coloured dark grey, and the APSES domain residues (as defined in the FASTA listing above, from I19 to Y93) are coloured with a colour ramp (ToolsDepictionRainbow)[2]
  9. Show the first and last residue's CA atom[3] as a sphere and colour the first one blue (to mark the N-terminus) and the last one red. E.g.:
    1. SelectAtom specifier:4@CA
    2. ActionsRibbonhide
    3. ActionsAtoms/bondsshow
    4. ActionsAtoms/bondssphere
    5. ActionsColorcornflower blue
    6. Then click on the selection inspector (the green button with the magnifying glass at the lower right of the graphics window) and set the sphere radius to 1.0Å.
  10. Save the image in your Wiki journal in JPEG format (FileSave Image and upload it to the Student Wiki).


 

Stereo vision

Task:

Continue with your stereo practice.

Practice at least ...

  • two times daily,
  • for 3-5 minutes each session.
  • Measure your interocular distance and your fusion distance as explained here on the Student Wiki and add it to the table.

Keep up your practice throughout the course. Once again: do not go through your practice sessions mechanically. If you are not making constant progress in your practice sessions, contact me so we can help you on the right track.

Modeling small molecules (optional)

As an optional part of the assignment, here is a small tutorial for modeling and visualizing "small-molecule" structures.


Defining a molecule

A number of public repositories make small molecule information available, such as PubChem at the NCBI, the ligand collection at the PDB, the ChEBI database at the European Bioinformatics Institute, or the NCI database browser at the US National Cancer Institute. One general way to export topology information from these services is to use SMILES strings—a shorthand notation for the composition and topology of chemical compounds.


Task:

  1. Access each of the databases mentioned above.
  2. Enter "caffeine" as a search term.
  3. Explore the contents of the result, in particular note and copy the SMILES string for the compound.


Alternatively, you can sketch your own compound. Versions of Peter Ertl's Java Molecular Editor (JME) are offered on several websites (e.g. click on Transfer to Java Editor on a NCI results page), and PubChem offers this functionality via its Sketcher tool.

Task:

  1. Navigate to PubChem.
  2. Follow the link to Chemical structure search (in the right hand menu).
  3. Click on the 3D conformer tab and on the Launch button to launch the molecular editor in its own window.
  4. Sketch the structure of caffeine. I find the editor quite intuitive but if you need help, just use the Help button in the editor.
  5. Save the SMILES string of your compound.
  6. Also Export your result in SMILES format as a file.

Translating SMILES to structure

Online services exist to translate SMILES to (idealized) coordinates.

Task:

  1. Access the online SMILES translation service at the NCI.
  2. Paste a caffeine SMILES string into the form, choose the PDB radio button, click on Translate and download your file.
  3. Load the molecule in Chimera.

Chimera also has a function to translate SMILES to coordinates.

Task:

  1. In Chimera:
    1. FileClose Session.
    2. ToolsStructure EditingBuild Structure.
    3. Select SMILES string, paste the string and click Apply.
  2. The caffeine molecule will be generated and visualized in the graphics window.




Introduction

Integrating evolutionary information with structural information allows us to establish which residues are invariant in a family–these are presumably structurally important sites–and which residues are functionally important, since they are invariant within, but changeable between subfamilies.

To visualize these relationships, we will load an MSA of APSES domains with VMD and color it by conservation.



The DNA binding site

Now, that you know how YFO Mbp1 aligns with yeast Mbp1, you can evaluate functional conservation in these homologous proteins. You probably already downloaded the two Biochemistry papers by Taylor et al. (2000) and by Deleeuw et al. (2008) that we encountered in Assignment 2. These discuss the residues involved in DNA binding[4]. In particular the residues between 50-74 have been proposed to comprise the DNA recognition domain.

Task:

  1. Using the APSES domain alignment you have just constructed, find the YFO Mbp1 residues that correspond to the range 50-74 in yeast.
  2. Note whether the sequences are especially highly conserved in this region.
  3. Using Chimera, look at the region. Use the sequence window to make sure that the sequence numbering between the paper and the PDB file are the same (they are often not identical!). Then select the residues - the proposed recognition domain - and color them differently for emphasis. Study this in stereo to get a sense of the spatial relationships. Check where the conserved residues are.
  4. A good representation is stick - but other representations that include sidechains will also serve well.
  5. Calculate a solvent accessible surface of the protein in a separate representation and make it transparent.
  6. You could combine three representations: (1) the backbone (in ribbon view), (2) the sidechains of residues that presumably contact DNA, distinctly colored, and (3) a transparent surface of the entire protein. This image should show whether residues annotated as DNA binding form a contiguous binding interface.


DNA binding interfaces are expected to comprise a number of positively charged amino acids, that might form salt-bridges with the phosphate backbone.


Task:

  • Study and consider whether this is the case here and which residues might be included.


APSES domains in Chimera (from A4)

What precisely constitutes an APSES domain however is a matter of definition, as you can explore in the following (optional) task.


Optional: Load the structure in Chimera, like you did in the last assignment and switch on stereo viewing ... (more)
  1. Display the protein in ribbon style, e.g. with the Interactive 1 preset.
  2. Access the Interpro information page for Mbp1 at the EBI: http://www.ebi.ac.uk/interpro/protein/P39678
  3. In the section Domains and repeats, mouse over the red annotations and note down the residue numbers for the annotated domains. Also follow the links to the respective Interpro domain definition pages.

At this point we have definitions for the following regions on the Mbp1 protein ...

  • The KilA-N (pfam 04383) domain definition as applied to the Mbp1 protein sequence by CDD;
  • The InterPro KilA, N-terminal/APSES-type HTH, DNA-binding (IPR018004) definition annotated on the Mbp1 sequence;
  • The InterPro Transcription regulator HTH, APSES-type DNA-binding domain (IPR003163) definition annotated on the Mbp1 sequence;
  • (... in addition – without following the source here – the UniProt record for Mbp1 annotates a "HTH APSES-type" domain from residues 5-111)

... each with its distinct and partially overlapping sequence range. Back to Chimera:


  1. In the sequence window, select the sequence corresponding to the Interpro KilA-N annotation and colour this fragment red. Remember that you can get the sequence numbers of a residue in the sequence window when you hover the pointer over it - but do confirm that the sequence numbering that Chimera displays matches the numbering of the Interpro domain definition.
  2. Then select the residue range(s) by which the CDD KilA-N definition is larger, and colour that fragment orange.
  3. Then select the residue range(s) by which the InterPro APSES domain definition is larger, and colour that fragment yellow.
  4. If the structure contains residues outside these ranges, colour these white.
  5. Study this in a side-by-side stereo view and get a sense for how the extra sequence beyond the Kil-A N domain(s) is part of the structure, and how the integrity of the folded structure would be affected if these fragments were missing.
  6. Display Hydrogen bonds, to get a sense of interactions between residues from the differently colored parts. First show the protein as a stick model, with sticks that are thicker than the default to give a better sense of sidechain packing:
    (i) SelectSelect all
    (ii) ActionsRibbonhide
    (iii) SelectStructureprotein
    (iv) ActionsAtoms/Bondsshow
    (v) ActionsAtoms/Bondsstick
    (vi) click on the looking glass icon at the bottom right of the graphics window to bring up the inspector window and choose Inspect ... Bond. Change the radius to 0.4.
  7. Then calculate and display the hydrogen bonds:
    (vii) ToolsSurface/Binding AnalysisFindHbond
    (viii) Set the Line width to 3.0, leave all other parameters with their default values an click Apply
    Clear the selection.
    Study this view, especially regarding side chain H-bonds. Are there many? Do side chains interact more with other sidechains, or with the backbone?
  8. Let's now simplify the scene a bit and focus on backbone/backbone H-bonds:
    (ix) SelectStructureBackbonefull
    (x) ActionsAtoms/Bondsshow only

    Clear the selection.
    In this way you can appreciate how H-bonds build secondary structure - α-helices and β-sheets - and how these interact with each other ... in part across the KilA N boundary.
  9. Save the resulting image as a jpeg no larger than 600px across and upload it to your Lab notebook on the Wiki.
  10. When you are done, congratulate yourself on having earned a bonus of 10% on the next quiz.


There is a rather important lesson in this: domain definitions may be fluid, and their boundaries may be computationally derived from sequence comparisons across many families, and do not necessarily correspond to individual structures. Make sure you understand this well. }}


Given this, it seems appropriate to search the sequence database with the sequence of an Mbp1 structure–this being a structured, stable, subdomain of the whole that presumably contains the protein's most unique and specific function. Let us retrieve this sequence. All PDB structures have their sequences stored in the NCBI protein database. They can be accessed simply via the PDB-ID, which serves as an identifier both for the NCBI and the PDB databases. However there is a small catch (isn't there always?). PDB files can contain more than one protein, e.g. if the crystal structure contains a complex[5]. Each of the individual proteins gets a so-called chain ID–a one letter identifier– to identify them uniquely. To find their unique sequence in the database, you need to know the PDB ID as well as the chain ID. If the file contains only a single protein (as in our case), the chain ID is always A[6]. make sure you understand the concept of protein chains, and chain IDs.





 

Chimera "sequence": implicit or explicit ?

We discussed the distinction between implicit and explicit sequence. But which one does the Chimera sequence window display? Let's find out.

Task:

  1. Open Chimera and load the 1BM8 structure from the PDB.
  2. Save the ccordinate file with FileSave PDB ..., use a filename of test.pdb.
  3. Open this file in a plain text editor: notepad, TextEdit, nano or the like, but not MS Word! Make sure you view the file in a fixed-width font, not proportionally spaced, i.e. Courier, not Arial. Otherwise the columns in the file won't line up.
  4. Find the records that begin with SEQRES ... and confirm that the first amino acid is GLN.
  5. Now scroll down to the ATOM section of the file. Identify the records for the first residue in the structure. Delete all lines for side-chain atoms except for the CB atom. This changes the coordinates for glutamine to those of alanine.
  6. Replace the GLN residue name with ALA (uppercase). This relabels the residue as Alanine in the coordinate section. Therefore you have changed the implicit sequence. Implicit and explicit sequence are now different. The second atom record should now look like this:
ATOM 2 CA ALA A 4 -0.575 5.127 16.398 1.00 51.22 C
  1. Save the file and load it in Chimera.
  2. Open the sequence window: does it display Q or A as the first reside?

Therefore, does Chimera use the implicit or explicit sequence in the sequence window?

Coloring by conservation

With VMD, you can import a sequence alignment into the MultiSeq extension and color residues by conservation. The protocol below assumes that an MSA exists - you could have produced it in many different ways, for convenience, I have precalculated one for you. This may not contain the sequences from YFO, if you are curious about these you are welcome to add them and realign.

Task:

Load the Mbp1 APSES alignment into MultiSeq.
  1. Access the set of MUSCLE aligned and edited fungal APSES domains.
  2. Copy the alignment and save it into a convenient directory on your computer as a plain text file. Give it the extension .aln .
  3. Open VMD and load the 1BM8 structure.
  4. As usual, turn the axes off and display your structure in side-by-side stereo.
  5. Visualize the structure as New Cartoon with Index coloring to re-orient yourself. Identify the recognition helix and the "wing".
  6. Open Extensions → Analysis → Multiseq.
  7. You can answer No to download metadata databases, we won't need them here.
  8. In the MultiSeq Window, navigate to File → Import Data...; Choose "From Files" and Browse to the location of the alignment you have saved. The File navigation window gives you options which files to enable: choose to Enable ALN files (these are CLUSTAL formatted multiple sequence alignments).
  9. Open the alignment file, click on Ok to import the data. If the data can't be loaded, the file may have the wrong extension: .aln is required.
  10. find the Mbp1_SACCE sequence in the list, click on it and move it to the top of the Sequences list with your mouse (the list is not static, you can re-order the sequences in any way you like).


You will see that the 1BM8 sequence and the Mbp1_SACCA APSES domain sequence do not match: at the N-terminus the sequence that corresponds to the PDB structure has extra residues, and in the middle the APSES sequences may have gaps inserted.

Task:

Bring the 1MB1 sequence in register with the APSES alignment.
  1. MultiSeq supports typical text-editor selection mechanisms. Clicking on a residue selects it, clicking on a row selects the whole sequence. Dragging with the mouse selects several residues, shift-clicking selects ranges, and option-clicking toggles the selection on or off for individual residues. Using the mouse and/or the shift key as required, select the entire first column of the Sequences you have imported. Note: don't include the 1BM8 sequence - this is just for the aligned sequences.
  2. Select Edit → Enable Editing... → Gaps only to allow changing indels.
  3. Pressing the spacebar once should insert a gap character before the selected column in all sequences. Insert as many gaps as you need to align the beginning of sequences with the corresponding residues of 1BM8: S I M ... . Note: Have patience - the program's response can be a bit sluggish.
  4. Now insert as many gaps as you need into the 1BM8 structure sequence, to align it completely with the Mbp1_SACCE APSES domain sequence. (Simply select residues in the sequence and use the space bar to insert gaps. (Note: I have noticed a bug that sometimes prevents slider or keyboard input to the MultiSeq window; it fails to regain focus after operations in a different window. I don't know whether this is a Mac related problem or a more general bug in MultiSeq. When this happens I quit VMD and restore a saved session. It is a bit annoying but not mission-critical. But to be able to do that, you might want to save your session every now and then.)
  5. When you are done, it may be prudent to save the state of your alignment. Use File → Save Session...


Task:

Color by similarity
  1. Use the View → Coloring → Sequence similarity → BLOSUM30 option to color the residues in the alignment and structure. This clearly shows you where conserved and variable residues are located and allows to analyze their structural context.
  2. Navigate to the Representations window and create a Tube representation of the structure's backbone. Use User coloring to color it according to the conservation score that the Multiseq extension has calculated.
  3. Create a new representation, choose Licorice as the drawing method, User as the coloring method and select (sidechain or name CA) and not element H (note: CA, the C-alpha atom must be capitalized.)
  4. Double-click on the NewCartoon representation to hide it.
  5. You can adjust the color scale in the usual way by navigating to VMD main → Graphics → Colors..., choosing the Color Scale tab and adjusting the scale midpoint.


Study this structure in some detail. If you wish, you could load and superimpose the DNA complexes to determine which conserved residues are in the vicinity of the double helix strands and potentially able to interact with backbone or bases. Note that the most highly conserved residues in the family alignment are all structurally conserved elements of the core. Solvent exposed residues that comprise the surface of the recognition helix are quite variable, especially at the binding site. You may also find - if you load the DNA molecules, that residues that contact the phosphate backbone in general tend to be more highly conserved than residues that contact bases.



Chimera capabilities

Hydrogen bonds

Secondary Structure

Mutations

Minimal changes to structure models can be done directly in Chimera. This illustrates the principle of full-scale modeling quite nicely. For an example, let us consider the residue A 42 of the 1BM8 structure. It is oriented twards the core of the protein, but most other Mbp1 orthologs have a larger amino acid in this position, V, or even I.

Task:

  1. Open 1BM8 in Chimera, hide the ribbons and show all atoms as a stick model.
  2. Color the protein white.
  3. Open the sequence window and select A 42. Color it red. Choose Actions → Set pivot. Then study how nicely the alanine sidechain fits into the cavity formed by its surrounding residues.
  4. To emphasize this better, hide the solvent molecules and select only the protein atoms. Display them as a sphere model to better appreciate the packing, i.e. the Van der Waals contacts we discussed in class. Use the Favorites → Side view panel to move the clipping plane and see a section through the protein. Study the packing, in particular, note that the additional methyl groups of a valine or isoleucine would not have enough space in the structure. Then restore the clipping planes so you can see the whole molecule.
  5. Lets simplify the view: choose Actions → Atoms/Bonds → backbone only → chain trace. Then select A 42 again in the sequence window and choose Actions → Atoms/Bonds → show.
  6. Add the surrounding residues: choose Select → Zone.... In the window, see that the box is checked that selects all atoms at a distance of less then 5Å to the current selection, and check the lower box to select the whole residue of any atom that matches the distance cutoff criterion. Click OK and choose Actions → Atoms/Bonds → show.
  7. Select A 42 again: left-click (control click) on any atom of the alanine to select the atom, then up-arrow to select the entire residue. Now let's mutate this residue to isoleucine.
  8. Choose Tools → Structure Editing → Rotamers and select ILE as the rotamer type. Click OK, a window will pop up that shows you the possible rotamers for isoleucine together with their database-derived probabilities; you can select them in the window and cycle through them with your arrow keys. But note that the probabilities are very different - and thus show you high-energy and low-energy rotamers to choose from. Therefore, unless you have compelling reasons to do otherwise, try to find the highest-probability rotamer that may fit. This is where your stereo viewing practice becomes important, if not essential. It is really, really hard to do this reasonably in a 2D image! It becomes quite obvious in 3D. Btw: I find such "quantitative" work - where the real distances are important - easier in orthographic than in perspective view (cf. the Camera panel).
  9. I find that the first rotamer is actually not such a bad fit. The CD atom comes close to the sidechains of I 25 and L 96. But we can assume that these are somewhat mobile and can accommodate a denser packing, because - as you can easily verify in your Jalview alignment - it is NOT the case that sequences that have I 42, have a smaller residue in position 25 and/or 96. So let's accept the most frequent ILE rotamer by selecting it in the rotamer window and clicking OK (while existing side chain(s): replace is selected).
  10. Done.

If you want to go over this in more detail, check the video tutorial on YouTube published by the NIAID bioinformatics group here. I would also encourage you to go over Part 2 of the video tutorial that discusses how to check for and resolve (by energy minimization) steric clashes. But do remember that it is not clear whether energy minimization will make your structure more correct in the sense of a smaller overall RMSD with the real, mutated protein.

What we have done here with one residue is exactly the way homology modeling works with entire sequences. Let's now build a homology model for YFO Mbp1.


Scripting and Programming

(Code generation with R?)


Links and resources

Gajiwala & Burley (2000) Winged helix proteins. Curr Opin Struct Biol 10:110-6. (pmid: 10679470)

PubMed ] [ DOI ] The winged helix proteins constitute a subfamily within the large ensemble of helix-turn-helix proteins. Since the discovery of the winged helix/fork head motif in 1993, a large number of topologically related proteins with diverse biological functions have been characterized by X-ray crystallography and solution NMR spectroscopy. Recently, a winged helix transcription factor (RFX1) was shown to bind DNA using unprecedented interactions between one of its eponymous wings and the major groove. This surprising observation suggests that the winged helix proteins can be subdivided into at least two classes with radically different modes of DNA recognition.

Aravind et al. (2005) The many faces of the helix-turn-helix domain: transcription regulation and beyond. FEMS Microbiol Rev 29:231-62. (pmid: 15808743)

PubMed ] [ DOI ] The helix-turn-helix (HTH) domain is a common denominator in basal and specific transcription factors from the three super-kingdoms of life. At its core, the domain comprises of an open tri-helical bundle, which typically binds DNA with the 3rd helix. Drawing on the wealth of data that has accumulated over two decades since the discovery of the domain, we present an overview of the natural history of the HTH domain from the viewpoint of structural analysis and comparative genomics. In structural terms, the HTH domains have developed several elaborations on the basic 3-helical core, such as the tetra-helical bundle, the winged-helix and the ribbon-helix-helix type configurations. In functional terms, the HTH domains are present in the most prevalent transcription factors of all prokaryotic genomes and some eukaryotic genomes. They have been recruited to a wide range of functions beyond transcription regulation, which include DNA repair and replication, RNA metabolism and protein-protein interactions in diverse signaling contexts. Beyond their basic role in mediating macromolecular interactions, the HTH domains have also been incorporated into the catalytic domains of diverse enzymes. We discuss the general domain architectural themes that have arisen amongst the HTH domains as a result of their recruitment to these diverse functions. We present a natural classification, higher-order relationships and phyletic pattern analysis of all the major families of HTH domains. This reconstruction suggests that there were at least 6-11 different HTH domains in the last universal common ancestor of all life forms, which covered much of the structural diversity and part of the functional versatility of the extant representatives of this domain. In prokaryotes the total number of HTH domains per genome shows a strong power-equation type scaling with the gene number per genome. However, the HTH domains in two-component signaling pathways show a linear scaling with gene number, in contrast to the non-linear scaling of HTH domains in single-component systems and sigma factors. These observations point to distinct evolutionary forces in the emergence of different signaling systems with HTH transcription factors. The archaea and bacteria share a number of ancient families of specific HTH transcription factors. However, they do not share any orthologous HTH proteins in the basal transcription apparatus. This differential relationship of their basal and specific transcriptional machinery poses an apparent conundrum regarding the origins of their transcription apparatus.


 

 


Footnotes and references

  1. * Previous versions of this course have used the VMD molecular viewer. Material on this is still available at the VMD page.
  2. The Rainbow tool can only create color ramps for an entire molecule. In order to achieve this effect: color the molecule with a color ramp, then select the APSES domain, then invert the selection and color the new selection dark grey.
  3. See here for details of the specification syntax.
  4. (Taylor et al. (2000) Biochemistry 39: 3943-3954 and Deleeuw et al. (2008) Biochemistry. 47:6378-6385)
  5. Think of the ribosome or DNA-polymerase as extreme examples.
  6. Otherwise, you need to study the PDB Web page for the structure, or the text in the PDB file itself, to identify which part of the complex is labeled with which chain ID. For example, immunoglobulin structures some time label the light- and heavy chain fragments as "L" and "H", and sometimes as "A" and "B"–there are no fixed rules. You can also load the structure in VMD, color "by chain" and use the mouse to click on residues in each chain to identify it.


 

Ask, if things don't work for you!

If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.



< Assignment 4 Assignment 6 >