Difference between revisions of "BIO Assignment Week 5"

From "A B C"
Jump to navigation Jump to search
m
 
(9 intermediate revisions by the same user not shown)
Line 10: Line 10:
  
 
{{Template:Inactive}}
 
{{Template:Inactive}}
 
+
<small>Concepts and activities (and reading, if applicable) for this assignment will be topics on the upcoming quiz.</small>
Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.  
 
 
 
 
 
 
 
  
  
Line 20: Line 16:
 
__TOC__
 
__TOC__
  
 +
&nbsp;
  
 +
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
  
  
==The PDB==
+
&nbsp;<br>
  
 
+
;How could the search for ultimate truth have revealed so hideous and visceral-looking an object?
The search options in the PDB structure database are as sophisticated as those at the NCBI. For now, we will try a simple keyword search to get us started.
+
:''[https://en.wikipedia.org/wiki/Max_Perutz Max Perutz]&nbsp;&nbsp;<small>(on his first glimpse of the Hemoglobin structure)</small>''
 +
</div>
  
 
{{task|
 
# Visit the RCSB PDB website at http://www.pdb.org/
 
# Briefly orient yourself regarding the database contents and its information offerings and services.
 
# Enter <code>Mbp1</code> into the search field.
 
# In your journal, note down the PDB IDs for the three ''Saccharomyces cerevisiae'' Mbp1 transcription factor structures your search has retrieved.
 
# Click on one of the entries and explore the information and services linked from that page.
 
}}
 
  
 
&nbsp;
 
&nbsp;
  
 +
==Introduction==
  
  
 +
Where is the hidden beauty in structure, and where, the "ultimate truth"? In the previous assignments we have discovered homologues of APSES domain containing proteins in all fungal species. This makes the domain an ancient protein family that had already duplicated to several paralogues at the time when the cenancestor of all fungi lived, more than 600,000,000 years ago, in the [http://www.ucmp.berkeley.edu/fungi/fungifr.html Vendian period] of the Proterozoic era of Precambrian times.
  
 +
In this assignment we will explore its molecular structure.
  
----
 
From A1:
 
  
* install the molecular graphics viewer '''UCSF Chimera'''<ref>* Previous versions of this course have used the '''VMD''' molecular viewer. Material on this is still available at the [[VMD|'''VMD''' page]].</ref> on your own computer, work through a tutorial on its use and begin practicing the skill of viewing split-screen stereographic scenes without aids;  
+
&nbsp;
  
==Molecular graphics==
+
==Molecular graphics: UCSF Chimera==
  
A molecular viewer is a program that takes protein structure data and allows you to display and explore it. For a number of reasons, I chose to use the UCSF Chimera viewer for this course.
+
To view molecular structures, we need a tool to visualize the three dimensional relationships of atoms. A ''molecular viewer'' is a program that takes 3D structure data and allows you to display and explore it. For a number of reasons, I use the UCSF Chimera viewer for this course:
  
 +
# Chimera is free and open;
 +
# It creates very appealing graphics;
 +
# It is under ongoing development and is well maintained;
 +
# It provides an array of useful utilities for structure analysis; and,
 +
# besides an intuitive, menu driven interface, Chimera can be scripted via its command line, or even programmed via its in-built python interpreter.
  
===UCSF Chimera===
 
  
{{task|
+
{{#lst:UCSF_Chimera|Installation}}
* Access the [[UCSF Chimera|'''Chimera''' page]].
 
* Install the program as per the instructions in the section: "Installing Chimera".
 
* Access the Chimera User's Guide [https://www.rbvi.ucsf.edu/chimera/docs/UsersGuide/frametut.html '''tutorial section''']. The "Getting Started" tutorial is offered in two versions: one for work with the graphical user interface (GUI), ''i.e.'' the usual system of windows and drop-down menu selections. The other is a command-line version for the same. What is the difference? In general, '''GUI interfaces''' (Menu version) are well suited for beginners who are not yet familiar with all the options. Having the commands and alternatives presented on a menu makes first steps very easy through simple selection of keywords. On the other hand, work from '''command line interfaces''' is much faster and more flexible if you know what you are doing and thus much better suited for the experienced user. It is also quite straightforward to execute series of commands in stored scripts, allowing you to automate tasks. For now, we will stay with the menu version but we will use commands later in the course and you are of course welcome to explore.
 
* Work through the Chimera tutorial '''Getting Started - Menu version''', Part 1.
 
}}
 
  
=== Stereo vision ===
 
  
{{task|
+
Let's explore Chimera functions first with a simple small molecule:
Access the '''[[Stereo Vision]]''' tutorial and practice viewing molecular structures in stereo.
 
 
 
Practice at least ...
 
* two times daily,
 
* for 3-5 minutes each session,
 
}}
 
 
 
Keep up your practice throughout the course. '''Stereo viewing will be required in the final exam,''' but more importantly, it is a wonderful skill that will greatly support any activity of yours related to structural molecular biology. Practice with different molecules and try out different colours and renderings.
 
 
 
'''Note: do not go through your practice sessions mechanically. If you are not making any progress with stereo vision, contact me so we can help you on the right track.'''
 
 
 
 
 
* [[UCSF Chimera|'''Chimera page''']]
 
* [[Stereo Vision|'''Stereo vision tutorial''']]
 
 
 
*[http://www.rcsb.org/pdb/static.do?p=software/software_links/molecular_graphics.html Molecular Graphics Software Links]&ndash; a collection of links at the PDB
 
 
 
 
 
 
 
 
 
----
 
 
 
 
 
 
 
from A2;
 
 
 
----
 
 
 
==Chimera==
 
 
 
In this task we will explore the sequence interface of Chimera, use it to select specific parts of a molecule, and colour specific regions (or residues) of a molecule separately.
 
 
 
&nbsp;
 
{{task|
 
# Open Chimera.
 
# One of the three yeast Mbp1 fragment structures has the PDB ID <code>1BM8</code>. Load it in Chimera (simply enter the ID into the appropriate field of the '''File''' &rarr; '''Fetch by ID...''' window).
 
# Display the protein in '''Presets''' &rarr; '''Interactive&nbsp;1''' mode and familiarize yourself with its topology of helices and strands.
 
# Open the sequence tool: '''Tools''' &rarr; '''Sequence''' &rarr; '''Sequence'''. You will see the sequence for each chain - here there is only one chain. By default, coloured rectangles overlay the secondary structure elements of the sequence.
 
# Hover the mouse over some residues and note that the sequence number and chain is shown at the bottom of the window.
 
# Click/drag one residue to select it. <small>(Simply a click wont work, you need to drag a little bit for the selection to catch on.)</small> Note that the residue gets a green overlay in the sequence window, as it also gets selected with a green border in the graphics window.
 
# In the bottom of the sequence window, there are instructions how to select (multiple) regions. Try this: colour the protein white ('''Select''' &rarr; '''Select&nbsp;All'''; '''Actions''' &rarr; '''Color''' &rarr; '''light&nbsp;gray'''). Clear the selection. Now select all the helical regions (pale yellow boxes) by click/dragging and using the shift key. Color them red. Then select all the strands by clicking into any of the pale green boxes and color them green.
 
# Finally, generate a stereo-view that shows the molecule well, in which the domain is coloured dark grey, and the APSES domain residues (as defined in the FASTA listing above, from I19 to Y93) are coloured with a colour ramp ('''Tools''' &rarr; '''Depiction''' &rarr; '''Rainbow''')<ref>The [https://www.cgl.ucsf.edu/chimera/1.2065/docs/ContributedSoftware/rainbow/rainbow.html Rainbow tool] can only create color ramps for an entire molecule. In order to achieve this effect: color the molecule with a color ramp, then select the APSES domain, then '''invert the selection''' and color the new selection dark grey.</ref>
 
# Show the first and last residue's CA atom<ref>See [https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/midas/frameatom_spec.html '''here'''] for details of the specification syntax.</ref> as a sphere and colour the first one blue (to mark the N-terminus) and the last one red. E.g.:
 
##'''Select''' &rarr; '''Atom&nbsp;specifier''' &rarr; <code>:4@CA</code>
 
##'''Actions''' &rarr; '''Ribbon''' &rarr; '''hide'''
 
##'''Actions''' &rarr; '''Atoms/bonds''' &rarr; '''show'''
 
##'''Actions''' &rarr; '''Atoms/bonds''' &rarr; '''sphere'''
 
##'''Actions''' &rarr; '''Color''' &rarr; '''cornflower&nbsp;blue'''
 
##Then click on the selection inspector (the green button with the magnifying glass at the lower right of the graphics window) and set the sphere radius to 1.0Å.
 
# Save the image in your Wiki journal in JPEG format ('''File''' &rarr; '''Save&nbsp;Image''' and upload it to the Student Wiki).
 
}}
 
  
  
 
&nbsp;
 
&nbsp;
  
== Stereo vision ==
+
=== Modeling small molecules ===
 
 
{{task|
 
Continue with your stereo practice.
 
 
 
Practice at least ...
 
* two times daily,
 
* for 3-5 minutes each session.
 
 
 
* Measure your interocular distance and your fusion distance as explained '''[http://biochemistry.utoronto.ca/steipe/abc/students/index.php/Stereo_vision_data here on the Student Wiki]''' and add it to the table.
 
}}
 
 
 
Keep up your practice throughout the course. '''Once again: do not go through your practice sessions mechanically. If you are not making constant progress in your practice sessions, contact me so we can help you on the right track.'''
 
 
 
== Modeling small molecules (optional) ==
 
 
 
 
 
As an optional part of the assignment, here is a small tutorial for modeling and visualizing "small-molecule" structures.
 
 
 
 
 
 
 
=== Defining a molecule ===
 
  
 +
"Small" molecules are solvent, ligands, substrates, products, prosthetic groups, drugs - in short, essentially everything that is not made by DNA-, RNA-polymerases or the ribosome. Whereas the biopolymers are still front and centre in our quest to understand molecular biology, small molecules are crucial for our quest to interact with the inventory of the cell, create useful products, or advance medicine.
  
A number of public repositories make small molecule information available, such as [http://pubchem.ncbi.nlm.nih.gov/ PubChem] at the NCBI, the ligand collection at the [http://pdb.org '''PDB'''], the [http://www.ebi.ac.uk/chebi/ ChEBI] database at the European Bioinformatics Institute, or the [http://cactus.nci.nih.gov/ncidb2.2/ NCI database browser] at the US National Cancer Institute. One general way to export topology information from these services is to use {{WP|SMILES|SMILES strings}}&mdash;a shorthand notation for the composition and topology of chemical compounds.  
+
A number of public repositories make small-molecule information available, such as [http://pubchem.ncbi.nlm.nih.gov/ PubChem] at the NCBI, the ligand collection at the [http://pdb.org '''PDB'''], the [http://www.ebi.ac.uk/chebi/ ChEBI] database at the European Bioinformatics Institute, the Canadian [http://www.drugbank.ca DrugBank], or the [http://cactus.nci.nih.gov/ncidb2.2/ NCI database browser] at the US National Cancer Institute. One general way to export topology information from these services is to use {{WP|SMILES|SMILES strings}}&mdash;a shorthand notation for the composition and topology of chemical compounds.  
  
  
 
{{task|
 
{{task|
# Access each of the databases mentioned above.
+
# Access [http://pubchem.ncbi.nlm.nih.gov/ PubChem].
# Enter "caffeine" as a search term.
+
# Enter "caffeine" as a search term in the '''Compound''' tab. A number of matches to this keyword search are returned.
# Explore the contents of the result, in particular note and copy the SMILES string for the compound.
+
# Click on the [http://pubchem.ncbi.nlm.nih.gov/compound/2519 top hit - 1,3,7-Trimethylxanthine, the Caffeine molecule]. Note that the page contains among other items:
 +
## A 2D structural sketch;
 +
## An idealized 3D structural conformer, for which you can download coordinates in several formats;
 +
## The IUPAC name: <code>1,3,7-trimethylpurine-2,6-dione</code>;
 +
## The CAS identifier <code>58-08-2</code> which is a unique identifier and can be used as a cross-reference ID;
 +
## The {{WP|SMILES|SMILES strings|SMILES string}} <code>CN1C{{=}}NC2{{=}}C1C({{=}}O)N(C({{=}}O)N2C)C</code>;
 +
## ... and much more.
 
}}
 
}}
  
  
Alternatively, you can sketch your own compound. Versions of Peter Ertl's {{WP|JME_editor|Java Molecular Editor (JME)}} are offered on several websites (e.g. click on '''Transfer to Java Editor''' on a NCI results page), and PubChem offers this functionality via its '''Sketcher''' tool.
+
That's great, but let's sketch our own version of caffeine. Several versions of Peter Ertl's {{WP|JME_editor|Java Molecular Editor (JME)}} are offered online, PubChem offers this functionality via its '''Sketcher''' tool.
  
 
{{task|
 
{{task|
# Navigate to [http://pubchem.ncbi.nlm.nih.gov/ PubChem].
+
# Return to the [http://pubchem.ncbi.nlm.nih.gov/ PubChem homepage].
# Follow the link to '''Chemical structure search''' (in the right hand menu).
+
# Follow the link to '''Structure search''' (in the right hand menu).
 
# Click on the '''3D conformer''' tab and on the '''Launch''' button to launch the molecular editor in its own window.
 
# Click on the '''3D conformer''' tab and on the '''Launch''' button to launch the molecular editor in its own window.
# Sketch the structure of caffeine. I find the editor quite intuitive but if you need help, just use the '''Help''' button in the editor.
+
# Sketch the structure of caffeine. I find the editor quite intuitive but clicking on the '''Help''' button will give you a quick, structured overview. Make sure you define your double-bonds correctly.
# Save the SMILES string of your compound.
+
# '''Export''' the SMILES string of your compound to your project folder.
# Also '''Export''' your result in SMILES format as a file.
 
 
}}
 
}}
 +
  
 
=== Translating SMILES to structure ===
 
=== Translating SMILES to structure ===
  
 
+
Chimera can translate SMILES strings to coordinates<ref>There are several online servers that translate SMILES strings to idealized structures, see e.g. the [http://cactus.nci.nih.gov/translate/ online SMILES translation service] at the NCI.</ref>.
Online services exist to translate SMILES to (idealized) coordinates.
 
  
 
{{task|
 
{{task|
# Access the [http://cactus.nci.nih.gov/translate/ online SMILES translation service] at the NCI.
+
# Open Chimera.
# Paste a caffeine SMILES string into the form, choose the '''PDB''' radio button, click on '''Translate''' and download your file.
+
# Select '''Tools''' &rarr; '''Structure&nbsp;Editing''' &rarr; '''Build&nbsp;Structure'''.
# Load the molecule in Chimera.
+
# In the '''Build Structure''' window, select the '''SMILES string''' button, paste the string from your file, and click '''Apply'''.
}}
+
# The caffeine molecule will be generated and visualized in the graphics window. This is a "stick" representation.
 +
# You can rotate it with your mouse, &lt;command&gt; drag to scale, <shift> drag to translate.
 +
# Use the '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''ball &amp; stick''' or '''sphere''' menu items to change appearance.
 +
# Use the '''Actions''' &rarr; '''Color''' &rarr; '''by element''' menu to change colors.
 +
# Change the display back to stick and use '''Actions''' &rarr; '''Surface''' &rarr; '''show''' to add a solvent accessible surface. Choosing this command triggers the calculation of the surface, which is then available as an individually selectable object. However, with default parameters the surface appears a bit rough for this small molecule.
 +
# Change the parameters of this solvent accessible surface:
 +
## Select the surface with &lt;control&gt;&lt;click&gt; (&lt;control&gt;&lt;left mouse button&gt; on windows). A green contour line appears around selected items – it surrounds the surface in this case.
 +
## Open the selection inspector by clicking on the tiny green icon in the lower-right corner of the window (It has a magnifying glass symbol which means "inspect" for Chimera, not "search").
 +
## Select Inspect ...'''MSMS surface''' and change the '''Vertex density''' value to 50.0 - hit return.
 +
# By default, the surface inherits the colour of the atoms it envelopes. To change the colour of the surface, use the '''Actions''' &rarr; '''Color''' &rarr; '''all options''' menu. Click the '''surfaces''' button to indicate that the color choice should be applied to the surface object (note what else you can apply color to...), then choose '''cornflower blue'''.
 +
# Use the '''Actions''' &rarr; '''Surface''' &rarr; '''transparency''' &rarr; '''50%''' menu to see atoms and bonds that are covered by the surface.
 +
# To begin working with molecules in "true" 3D, choose '''Tools''' &rarr; '''Viewing Controls''' &rarr; '''Camera''' and select '''camera mode''' &rarr; '''wall-eye stereo'''. Also, use the '''Effects''' tab of the '''Viewing''' window, and ''check'' '''shadows''' off.
 +
# Your structure should look about like what you see below. Save your session with the '''File''' &rarr; '''Save Session''' dialogue so you can easily recreate the scene.
  
Chimera also has a function to translate SMILES to coordinates.
 
  
{{task|
+
{{stereo|Caffeine_stereo.jpg|'''Wall-eye stereo view''' of the caffeine structure, surrounded by a transparent molecular surface. The image for the left eye is on the left side. For instructions on ''stereo-viewing'', see the next section.
# In Chimera:
 
##'''File''' &rarr; '''Close&nbsp;Session'''.
 
##'''Tools''' &rarr; '''Structure&nbsp;Editing''' &rarr; '''Build&nbsp;Structure'''.
 
##Select '''SMILES string''', paste the string and click '''Apply'''.
 
# The caffeine molecule will be generated and visualized in the graphics window.
 
 
}}
 
}}
  
  
 +
}}
  
  
 +
{{Vspace}}
  
 +
=== Stereo vision ===
  
 
+
A simple molecular scene like the caffeine molecule is a great way to practice viewing structures in stereo. This is a learnable skill, but it takes practice.
==Introduction==
 
 
 
Integrating evolutionary information with structural information allows us to establish which residues are invariant in a family&ndash;these are presumably structurally important sites&ndash;and which residues are functionally important, since they are invariant within, but changeable between subfamilies.
 
 
 
To visualize these relationships, we will load an MSA of APSES domains with VMD and color it by conservation.
 
 
 
 
 
 
 
 
 
=== The DNA binding site ===
 
 
 
 
 
Now, that you know how YFO Mbp1 aligns with yeast Mbp1, you can evaluate functional conservation in these homologous proteins. You probably already downloaded the two Biochemistry papers by Taylor et al. (2000) and by Deleeuw et al. (2008) that we encountered in Assignment 2. These discuss the residues involved in DNA binding<ref>([http://www.ncbi.nlm.nih.gov/pubmed/10747782 Taylor ''et al.'' (2000) ''Biochemistry'' '''39''': 3943-3954] and [http://www.ncbi.nlm.nih.gov/pubmed/18491920 Deleeuw ''et al.'' (2008) Biochemistry. '''47''':6378-6385])</ref>. In particular the residues between 50-74 have been proposed to comprise the DNA recognition domain.
 
  
 
{{task|
 
{{task|
# Using the APSES domain alignment you have just constructed, find the YFO Mbp1 residues that correspond to the range 50-74 in yeast.
+
Access the '''[[Stereo Vision]]''' tutorial and practice viewing molecular structures in stereo.  
# Note whether the sequences are especially highly conserved in this region.
 
# Using Chimera, look at the region. Use the sequence window '''to make sure''' that the sequence numbering between the paper and the PDB file are the same (they are often not identical!). Then select the residues - the proposed recognition domain -  and color them differently for emphasis. Study this in stereo to get a sense of the spatial relationships. Check where the conserved residues are.
 
# A good representation is '''stick''' - but other representations that include sidechains will also serve well.
 
# Calculate a solvent accessible surface of the protein in a separate representation and make it transparent.
 
# You could  combine three representations: (1) the backbone (in '''ribbon view'''), (2) the sidechains of residues that presumably contact DNA, distinctly colored, and (3) a transparent surface of the entire protein. This image should show whether residues annotated as DNA binding form a contiguous binding interface.
 
}}
 
  
 
+
Practice at least ...
DNA binding interfaces are expected to comprise a number of positively charged amino acids, that might form salt-bridges with the phosphate backbone.
+
* two times daily,
 
+
* for 3-5 minutes each session,
 
 
{{task|
 
*Study and consider whether this is the case here and which residues might be included.
 
 
}}
 
}}
  
 +
Keep up your practice throughout the course. It is a wonderful skill that will greatly support your understanding of structural molecular biology. Practice with different molecules and try out different colours and renderings.
  
 +
'''Note: do not go through your practice sessions mechanically. If you are not making any progress with stereo vision, contact me so I can help you on the right track.'''
  
===APSES domains in Chimera (from A4)===
+
{{Vspace}}
What precisely constitutes an APSES domain however is a matter of definition, as you can explore in the following (optional) task.
 
  
 +
==Global properties==
  
<div class="mw-collapsible mw-collapsed" data-expandtext="Expand" data-collapsetext="Collapse" style="border:#000000 solid 1px; padding: 10px; margin-left:25px; margin-right:25px;">Optional: Load the structure in Chimera, like you did in the last assignment and switch on stereo viewing ... (more) <div  class="mw-collapsible-content">
+
In this series of tasks we will showcase some of the '''globally''' applied tools that help us study molecular structure.
<ol start="7">
 
<li>Display the protein in ribbon style, e.g. with the '''Interactive 1''' preset.
 
<li>Access the '''Interpro''' information page for Mbp1 at the EBI: http://www.ebi.ac.uk/interpro/protein/P39678
 
<li>In the section '''Domains and repeats''', mouse over the red annotations and note down the residue numbers for the annotated domains. Also follow the links to the respective Interpro domain definition pages.
 
</ol>
 
  
At this point we have definitions for the following regions on the Mbp1 protein ...
+
{{Vspace}}
*The KilA-N (pfam 04383) domain definition as applied to the Mbp1 protein sequence by CDD;
 
*The InterPro ''KilA, N-terminal/APSES-type HTH, DNA-binding (IPR018004)'' definition annotated on the Mbp1 sequence;
 
*The InterPro ''Transcription regulator HTH, APSES-type DNA-binding domain (IPR003163)'' definition annotated on the Mbp1 sequence;
 
*<small>(... in addition &ndash; without following the source here &ndash; the UniProt record for Mbp1 annotates a "HTH APSES-type" domain from residues 5-111)</small>
 
  
... each with its distinct and partially overlapping sequence range. Back to Chimera:
+
===A Ramachandran plot===
  
<!-- For reference:
+
{{task|1=
1MB1: 3-100
+
# To reset all views and selections, choose '''Favorites''' &rarr; '''Model Panel'''. Select the 1BM8 model and click the '''close''' button to remove it.
2BM8: 4-102
+
# In the graphics window, click on the "lightning bolt" icon at the bottom. You should see a button labelled 1BM8 on the right. This is where you will find recent structures. Click <code>1BM8</code> to re-load it.
CDD KilA-N: 19-93
+
# Choose '''Presets''' &rarr; '''Interactive 2 (all atoms)''' for a detailed view.
InterPro KilA-N: 23-88
+
# Choose '''Favorites''' &rarr; '''Model Panel'''
InterPro APSES: 3-133
+
# Look for the Option '''Ramachandran plot...''' in the choices on the right.
Uniprot HTH/APSES: 5-111
+
# Click the button and study the result. The dots in this[https://www.cgl.ucsf.edu/chimera/docs/ContributedSoftware/ramachandran/ramachandran.html Ramachandran Plot] represent the phi-psi angle combinations for residue backbones. We see that they are well distributed, this is a high-resolution structure essentially without outliers. Clicking on a dot selects a residue in the structure viewer (selected residues have a green contour).
-->
+
# Choose '''File''' &rarr; '''Fetch by ID''' and fetch <code>1L3G</code>, an NMR structure of the Mbp1 APSES domain. Chimera loads the 19 models that comprise this structure dataset.
 +
# In the '''Favorites''' &rarr; '''Model Panel''', select 1BM8 and click on '''hide'''.
 +
# Then select 1LG3 and click '''group/ungroup''' to be able to address the models individually. Select any of the models individually and click again on '''Ramachandran plot'''. You will see that the points are much more dispersed, and there are a number of outliers that have comparatively high-energy conformations.
 +
}}
  
<ol start="10">
 
<li>In the sequence window, select the sequence corresponding to the '''Interpro KilA-N''' annotation and colour this fragment red. <small>Remember that you can get the sequence numbers of a residue in the sequence window when you hover the pointer over it - but do confirm that the sequence numbering that Chimera displays matches the numbering of the Interpro domain definition.</small></li>
 
  
<li>Then select the residue range(s) by which the '''CDD KilA-N''' definition is larger, and colour that fragment orange.</li>
+
&nbsp;
  
<li>Then select the residue range(s) by which the '''InterPro APSES domain''' definition is larger, and colour that fragment yellow.</li>
+
===B-factors===
  
<li>If the structure contains residues outside these ranges, colour these white.</li>
+
{{task|1=
 +
# Choose '''Favorites''' &rarr; '''Model Panel''', click/drag over the 1LG3 models and click '''close''' to remove them again.
 +
# To explore B-Factors in the 1BM8 model, click '''show''' to view it again.
 +
# Choose '''Tools''' &rarr; '''Structure Analysis'''  &rarr; '''Render byAttribute'''.
 +
# Select '''Attributes of atoms''', '''Model''' 1BM8 and '''Attribute''': '''bfactor'''. A histogram appears with sliders that allow you to render the distribution of values found in the structure for this attribute.
 +
# Let's colour the atoms by B-Factor. Click on the colours tab. A standard colouring scheme is blue - white - red, but you can move the sliders, add new thresholds, and colour them individually by clicking on the colour patch to create your own colour spectrum, e.g. from black via red to white, in a {{WP|Black_body_radiation|black-body spectrum}}. Click '''Apply'''.
 +
# Choose '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''stick''' to give the bonds more volume. You will find that the core of the protein has low temperature factors, and the surface has a number of highly mobile sidechains and loops.
  
<li>Study this in a side-by-side stereo view and get a sense for how the ''extra'' sequence beyond the Kil-A N domain(s) is part of the structure, and how the integrity of the folded structure would be affected if these fragments were missing.</li>
+
{{stereo|1BM8_thermal_stereo.jpg|'''Structure of the yeast transcription factor Mbp1 DNA binding domain (1BM8)''' coloured by B-factor (thermal factor). The protein bonds are shown in a "stick" model, coloured with a spectrum that emulates black-body radiation. Note that the interior of the protein is less mobile, some of the surface loops are highly mobile (or statically disordered, X-ray structures can't distinguish that) and the discretely bound water molecules that are visible in this high-resolution structure are generally more mobile than the residues they bind to.
 +
}}
  
<li>Display Hydrogen bonds, to get a sense of interactions between residues from the differently colored parts. First show the protein as a stick model, with sticks that are thicker than the default to give a better sense of sidechain packing:<br />
 
::(i) '''Select''' &rarr; '''Select all''' <br />
 
::(ii) '''Actions''' &rarr; '''Ribbon''' &rarr; '''hide''' <br />
 
::(iii) '''Select''' &rarr; '''Structure''' &rarr; '''protein''' <br />
 
::(iv) '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''show''' <br />
 
::(v)  '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''stick''' <br />
 
::(vi) click on the looking glass icon at the bottom right of the graphics window to bring up the inspector window and choose '''Inspect ... Bond'''. Change the radius to 0.4.<br />
 
</li>
 
  
<li>Then calculate and display the hydrogen bonds:<br />
+
}}
::(vii) '''Tools''' &rarr; '''Surface/Binding Analysis''' &rarr; '''FindHbond''' <br />
 
::(viii) Set the '''Line width''' to 3.0, leave all other parameters with their default values an click '''Apply'''<br />
 
:: Clear the selection.<br />
 
Study this view, especially regarding side chain H-bonds. Are there many? Do side chains interact more with other sidechains, or with the backbone?
 
</li>
 
  
<li>Let's now simplify the scene a bit and focus on backbone/backbone H-bonds:<br />
 
::(ix) '''Select''' &rarr; '''Structure''' &rarr; '''Backbone''' &rarr; '''full'''<br />
 
::(x)  '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''show only'''<br /><br />
 
:: Clear the selection.<br />
 
In this way you can appreciate how H-bonds build secondary structure - &alpha;-helices and &beta;-sheets - and how these interact with each other ... in part '''across the KilA N boundary'''.
 
</li>
 
  
 +
&nbsp;
  
<li>Save the resulting image as a jpeg no larger than 600px across and upload it to your Lab notebook on the Wiki.</li>
+
===Electrostatics===
<li>When you are done, congratulate yourself on having earned a bonus of 10% on the next quiz.</li>
 
</ol>
 
 
 
</div>
 
</div>
 
  
 +
{{task|1=
 +
# To visualize the electrostatic potential of the protein, mapped on the surface, first select '''Presets''' &rarr; '''Interactive 2...''' and '''Actions''' &rarr; '''Color''' &rarr; '''cyan''' for a vividly contrasting color.
 +
# A simple electrostatic potential calculation just assumes Coulomb charges. A more accurate calculation of full Poisson-Boltzmann potentials is [https://www.cgl.ucsf.edu/chimera/current/docs/UsersGuide/tutorials/surfprop.html also available]. Select '''Tools''' &rarr; '''Electrostatic/Binding Analysis''' &rarr; '''Coulombic Surface Coloring'''.
 +
# Make sure the surface object is selected in the form (it should be selected by default since there is only one surface), keep the default parameters and click '''Apply'''.
 +
# Use '''Actions''' &rarr; '''Surface''' &rarr; '''Transparency''' &rarr; '''30%''' to make the protein backbone somewhat visible.
 +
# Open the '''Tools''' &rarr; '''Viewing Controls''' &rarr; '''Lighting''' window &rarr; and set '''Intensity''' from '''two-point''' to '''ambient'''. This reduces shadowing and reflections on the surface and thus emphasizes the color values - here our focus is not on shape, but on property.
 +
# Use the '''Effects''' tab to turn '''shadows''' off and '''depth-cueing''' and '''silhouettes''' on. This recreates visual cues of depth which compensate for the loss of shape information by using a flat lighting model.
  
There is a rather important lesson in this: domain definitions may be fluid, and their boundaries may be computationally derived from sequence comparisons across many families, and do not necessarily correspond to individual structures. Make sure you understand this well.
+
{{stereo|1BM8_coulomb_stereo.jpg|'''Coulomb (electrostatic) potential''' mapped to the solvent accessible surface of the yeast transcription factor Mbp1 DNA binding domain (1BM8). The protein backbone is visible through the transparent surface as a cartoon model, note the helix at the bottom of the structure. This helix has been suggested to play a role in forming the domain's DNA binding site and the positive (blue) electrostatic potential of the region is consistent with binding the negatively charged phosphate backbone of DNA. The other side of the domain has a negative (red) charge excess, which balances the molecule's electric charge overall, but also guides the protein-ligand interaction and supports faster on-rates.
 
}}
 
}}
  
  
Given this, it seems appropriate to search the sequence database with the sequence of an Mbp1 structure&ndash;this being a structured, stable, subdomain of the whole that presumably contains the protein's most unique and specific function. Let us retrieve this sequence. All PDB structures have their sequences stored in the NCBI protein database. They can be accessed simply via the PDB-ID, which serves as an identifier both for the NCBI and the PDB databases. However there is a small catch (isn't there always?). PDB files can contain more than one protein, e.g. if the crystal structure contains a complex<ref>Think of the [http://www.pdb.org/pdb/101/motm.do?momID=121 ribosome] or [http://www.pdb.org/pdb/101/motm.do?momID=3 DNA-polymerase] as extreme examples.</ref>. Each of the individual proteins gets a so-called '''chain ID'''&ndash;a one letter identifier&ndash; to identify them uniquely. To find their unique sequence in the database, you need to know the PDB ID as well as the chain ID. If the file contains only a single protein (as in our case), the chain ID is always '''<code>A</code>'''<ref>Otherwise, you need to study the PDB Web page for the structure, or the text in the PDB file itself, to identify which part of the complex is labeled with which chain ID. For example, immunoglobulin structures some time label the ''light-'' and ''heavy chain'' fragments as "L" and "H", and sometimes as "A" and "B"&ndash;there are no fixed rules. You can also load the structure in VMD, color "by chain" and use the mouse to click on residues in each chain to identify it.</ref>. make sure you understand the concept of protein chains, and chain IDs.
+
}}
 
 
 
 
 
 
 
 
 
 
  
  
Line 311: Line 202:
 
&nbsp;
 
&nbsp;
  
=== Chimera "sequence": implicit or explicit ? ===
+
===Hydrogen bonds===
  
We discussed the distinction between implicit and explicit sequence. But which one does the Chimera sequence window display? Let's find out.
+
{{task|1=
 +
# Hydrogen bonds encode the basic folding patterns of the protein. To visualize H-bonds select '''Presets''' &rarr; '''Publication 1...''' and '''Actions''' &rarr; '''Color''' &rarr; '''by element'''.
 +
# Use '''Tools''' &rarr; '''Structure Analysis''' &rarr; '''FindHBond''' and '''Apply''' default parameters.
 +
# To emphasize the role of H-bonds in determining the architecture of the protein, select '''Select''' &rarr; '''Structure''' &rarr; '''backbone''' &rarr; '''full''' and  then '''Select''' &rarr; '''Invert (all models)'''. Now  '''Actions''' &rarr; '''Atoms/bonds''' &rarr; '''hide''' will show only the backbone with its H-bonds.
  
{{task|1=
 
# Open Chimera and load the 1BM8 structure from the PDB.
 
# Save the ccordinate file with '''File''' &rarr; '''Save PDB ...''', use a filename of <code>test.pdb</code>.
 
# Open this file in a '''plain text''' editor: notepad, TextEdit, nano or the like, but not MS Word! Make sure you view the file in a '''fixed-width font''', not proportionally spaced, i.e. Courier, not Arial. Otherwise the columns in the file won't line up.
 
# Find the records that begin with <code>SEQRES  ...</code> and confirm that the first amino acid is <code>GLN</code>.
 
# Now scroll down to the <code>ATOM  </code> section of the file. Identify the records for the first residue in the structure. Delete all lines for side-chain atoms except for the <code>CB</code> atom. This changes the coordinates for glutamine to those of alanine.
 
# Replace the <code>GLN</code> residue name with <code>ALA</code> (uppercase). This relabels the residue as Alanine in the coordinate section. Therefore you have changed the '''implicit''' sequence. Implicit and explicit sequence are now different. The second atom record should now look like this:<br />
 
:<code>ATOM      2  CA  ALA A  4      -0.575  5.127  16.398  1.00 51.22          C</code>
 
<ol start="7">
 
<li>Save the file and load it in Chimera.
 
<li>Open the sequence window: does it display <code>Q</code> or <code>A</code> as the first reside?
 
</ol>
 
  
Therefore, does Chimera use the '''implicit''' or '''explicit''' sequence in the sequence window?
 
  
 +
{{stereo|1BM8_hbond_stereo.jpg|'''Hydrogen bonds''' shown for the peptide backbone of the yeast transcription factor Mbp1 DNA binding domain (1BM8). This view emphasizes the interactions of secondary structure elements that govern the folding topology of the domain.
 
}}
 
}}
  
==Coloring by conservation==
 
  
With VMD, you can import a sequence alignment into the MultiSeq extension and color residues by conservation. The protocol below assumes that an MSA exists - you could have produced it in many different ways, for convenience, I have precalculated one for you. This may not contain the sequences from YFO, if you are curious about these you are welcome to add them and realign.
+
}}
  
{{task|1=
 
;Load the Mbp1 APSES alignment into MultiSeq.
 
  
# Access [[Reference alignment for APSES domains (MUSCLE, reference species)|the set of MUSCLE aligned and edited fungal APSES domains]].
 
# Copy the alignment and save it into a convenient directory on your computer as a plain text file. Give it the extension <code>.aln</code> .
 
# Open VMD and load the <code>1BM8</code> structure.
 
# As usual, turn the axes off and display your structure in side-by-side stereo.
 
# Visualize the structure as '''New Cartoon''' with '''Index''' coloring to re-orient yourself. Identify the recognition helix and the "wing".
 
# Open '''Extensions &rarr; Analysis &rarr; Multiseq'''.
 
# You can answer '''No''' to download metadata databases, we won't need them here.
 
# In the MultiSeq Window, navigate to '''File &rarr; Import Data...'''; Choose "From Files" and '''Browse''' to the location of the alignment you have saved. The File navigation window gives you options which files to enable: choose to '''Enable <code>ALN</code>''' files (these are CLUSTAL formatted multiple sequence alignments).
 
# Open the alignment file, click on '''Ok''' to import the data. If the data can't be loaded, the file may have the wrong extension: .aln is required.
 
# find the <code>Mbp1_SACCE</code> sequence in the list, click on it and move it to the top of the Sequences list with your mouse (the list is not static, you can re-order the sequences in any way you like).
 
}}
 
  
 +
==Chimera sequence interface==
  
You will see that the <code>1BM8</code> sequence and the <code>Mbp1_SACCA APSES</code> domain sequence do not match: at the N-terminus the sequence that corresponds to the PDB structure has extra residues, and in the middle the APSES sequences may have gaps inserted.
+
In this task we will explore the sequence interface of Chimera, use it to select specific parts of a molecule, and colour specific regions (or residues) of a molecule separately.
  
 +
&nbsp;
 
{{task|1=
 
{{task|1=
;Bring the 1MB1 sequence in register with the APSES alignment.
+
# Display the protein in '''Presets''' &rarr; '''Interactive&nbsp;1''' mode and familiarize yourself with its topology of helices and strands.  
 
+
# Now turn hydrogen bonds off: the menu commands of Chimera all have a command line equivalent. Open the command line by clicking on the "computer" icon in the upper left corner of the viewer window. Then type "~hbonds". The "~" undoes previous commands.
# MultiSeq supports typical text-editor selection mechanisms. Clicking on a residue selects it, clicking on a row selects the whole sequence. Dragging with the mouse selects several residues, shift-clicking selects ranges, and option-clicking toggles the selection on or off for individual residues. Using the mouse and/or the shift key as required, select the '''entire first column''' of the '''Sequences''' you have imported. Note: don't include the 1BM8 sequence - this is just for the aligned sequences.
+
# Use '''Tools''' &rarr; '''Depiction ''' &rarr; '''Rainbow''' to color the chain from blue to red. (You need to change the colour patches by clicking on them to open the colour editor. Choose an HSL colour model, use Saturation and Lightness 0.5 to keep the colour to somewhat subdued hues, then use the slider to choose appropriate hue values.) Click '''Apply'''.
# Select '''Edit &rarr; Enable Editing... &rarr; Gaps only''' to allow changing indels.
+
# Open the sequence tool: '''Tools''' &rarr; '''Sequence''' &rarr; '''Sequence'''. By default, coloured rectangles overlay the secondary structure elements of the sequence.
# Pressing the spacebar once should insert a gap character before the '''selected column''' in all sequences. Insert as many gaps as you need to align the beginning of sequences with the corresponding residues of 1BM8: <code>S I M ...</code> . Note: Have patience - the program's response can be a bit sluggish.
+
# Hover the mouse over some residues and note that the sequence number and chain is shown at the bottom of the window.
# Now insert as many gaps as you need into the <code>1BM8</code> structure sequence, to align it completely with the <code>Mbp1_SACCE</code> APSES domain sequence. (Simply select residues in the sequence and use the space bar to insert gaps. (Note: I have noticed a bug that sometimes prevents slider or keyboard input to the MultiSeq window; it fails to regain focus after operations in a different window. I don't know whether this is a Mac related problem or a more general bug in MultiSeq. When this happens I quit VMD and restore a saved session. It is a bit annoying but not mission-critical. But to be able to do that, you might want to save your session every now and then.)
+
# Click/drag one residue to select it. <small>(Simply a click wont work, you need to drag a little bit for the selection to catch on.)</small> Note that the residue gets a green overlay in the sequence window, and it also gets selected with a green border in the graphics window.
# When you are done, it may be prudent to save the state of your alignment. Use '''File &rarr; Save Session...'''  
+
# In the bottom of the sequence window, there are instructions how to select (multiple) regions. Clear the selection by &lt;control&gt; clicking into an empty spot of the viewer. Now select the region that encompasses the residues that have been reported to form the DNA binding subdomain: <code>KRTRILEKEVLKETHEKVQGGFGKYQ</code> (Taylor 2000). Show the side chains of these residues by clicking on the little green inspector icon on the viewer window, inspecting '''Atom''' and choosing '''displayed: true''', and inspecting '''Bond''' and setting the stick radius to 0.4.
}}
+
# Undisplay the Hydrogen atoms by selecting the element H in the Chemistry option of the Selection Menu, and use the Action menu to '''hide''' them. Then use the effects pane of the Depiction menu to add a contour.
 +
# Finally, give the scene a gradient grey background grey via the '''Actions''' &rarr; '''Color''' &rarr; '''all options...''' menu.
  
  
{{task|1=
+
{{stereo|1BM8_DNAbindingRegion_stereo.jpg|'''The DNA binding region of Mbp1''' according to NMR measurements of DNA contact by Taylor ''et al. (2000). The backbone of 1BM8 is shown with a colour ramp from blue (N-terminus) to red (C-terminus). The side chains of the region 50-74 are shown colored by element.
;Color by similarity
 
  
# Use the '''View &rarr; Coloring &rarr; Sequence similarity &rarr; BLOSUM30''' option to color the residues in the alignment and structure. This clearly shows you where conserved and variable residues are located and allows to analyze their structural context.
+
}}
# Navigate to the '''Representations''' window and create a '''Tube''' representation of the structure's backbone. Use '''User''' coloring to color it according to the conservation score that the Multiseq extension has calculated.
 
# Create a new representation, choose '''Licorice''' as the drawing method, '''User''' as the coloring method and select <code>(sidechain or name CA) and not element H</code> (note: <code>CA</code>, the C-alpha atom must be capitalized.)
 
# Double-click on the NewCartoon representation to hide it.
 
# You can adjust the color scale in the usual way by navigating to '''VMD main &rarr; Graphics &rarr; Colors...''', choosing the Color Scale tab and adjusting the scale midpoint.
 
  
 
}}
 
}}
  
  
Study this structure in some detail. If you wish, you could load and superimpose the DNA complexes to determine which conserved residues are in the vicinity of the double helix strands and potentially able to interact with backbone or bases. Note that the most highly conserved residues in the family alignment are all structurally conserved elements of the core. Solvent exposed residues that comprise the surface of the recognition helix are quite variable, especially at the binding site. You may also find - if you load the DNA molecules, that residues that contact the phosphate backbone in general tend to be more highly conserved than residues that contact bases.
+
&nbsp;
 
 
 
 
 
 
 
 
 
 
==Chimera capabilities==
 
 
 
  
===Hydrogen bonds===
+
== Compute with structures ==
  
 +
{{Vspace}}
  
===Secondary Structure===
+
To practice actual computations with structures we'll use the Grant lab's bio3d package in '''R'''.
  
 +
{{task|1 =
  
===Mutations===
+
* Open an RStudio session, and load the BCH441 project.
Minimal changes to structure models can be done directly in Chimera. This illustrates the principle of full-scale modeling quite nicely. For an example, let us consider the residue <code>A&nbsp;42</code> of the 1BM8 structure. It is oriented twards the core of the protein, but most other Mbp1 orthologs have a larger amino acid in this position, <code>V</code>, or even <code>I</code>.
+
* Bring code and data resources up to date:
 +
** '''pull''' the most recent version of the project from GitHub
 +
** type <code>init()</code> to load the most recent files and functions.
 +
* Study and work through the code in the <code>BCH441_A05.R</code> script.
 +
* There are a number of questions in the code, it would be good if you don't gloss over them but try to answer them for yourself. Especially the questions about the final histogram: without interpretation, without learning something interesting about biology from the plot, all this is just Cargo Cult.
  
{{task|1=
 
# Open <code>1BM8</code> in Chimera, hide the ribbons and show all atoms as a stick model.
 
# Color the protein white.
 
# Open the sequence window and select <code>A&nbsp;42</code>. Color it red. Choose '''Actions&nbsp;&rarr;&nbsp;Set pivot'''. Then study how nicely the alanine sidechain fits into the cavity formed by its surrounding residues.
 
# To emphasize this better, hide the solvent molecules and select only the protein atoms. Display them as a '''sphere''' model to better appreciate the packing, i.e. the Van der Waals contacts we discussed in class. Use the '''Favorites&nbsp;&rarr;&nbsp;Side view''' panel to move the clipping plane and see a section through the protein. Study the packing, in particular, note that the additional methyl groups of a valine or isoleucine would not have enough space in the structure. Then restore the clipping planes so you can see the whole molecule.
 
# Lets simplify the view: choose '''Actions &rarr; Atoms/Bonds &rarr; backbone&nbsp;only &rarr; chain&nbsp;trace'''. Then select <code>A&nbsp;42</code> again in the sequence window and choose '''Actions &rarr; Atoms/Bonds &rarr; show'''.
 
# Add the surrounding residues: choose '''Select &rarr; Zone...'''. In the window, see that the box is checked that selects all atoms at a distance of less then 5&Aring; to the current selection, and check the lower box to select the whole residue of any atom that matches the distance cutoff criterion. Click '''OK''' and choose '''Actions &rarr; Atoms/Bonds &rarr; show'''.
 
#Select <code>A&nbsp;42</code> again: '''left-click''' (control click) on any atom of the alanine to select the atom, then '''up-arrow''' to select the entire residue. Now let's mutate this residue to isoleucine.
 
#Choose '''Tools &rarr; Structure&nbsp;Editing &rarr; Rotamers''' and select <code>ILE</code> as the rotamer type. Click '''OK''', a window will pop up that shows you the possible rotamers for isoleucine together with their database-derived probabilities; you can select them in the window and cycle through them with your arrow keys. But note that the probabilities are '''very''' different - and thus show you high-energy and low-energy rotamers to choose from. Therefore, unless you have compelling reasons to do otherwise, try to find the highest-probability rotamer that may fit. This is where your stereo viewing practice becomes important, if not essential. It is really, really hard to do this reasonably in a 2D image! It becomes quite obvious in 3D. Btw: I find such "quantitative" work - where the real distances are important - easier in '''orthographic''' than in '''perspective''' view (cf. the '''Camera''' panel).
 
#I find that the first rotamer is actually not such a bad fit. The <code>CD</code> atom comes close to the sidechains of <code>I&nbsp;25</code> and <code>L&nbsp;96</code>. But we can assume that these are somewhat mobile and can accommodate a denser packing, because - as you can easily verify in your Jalview alignment - it is '''NOT''' the case that sequences that have <code>I&nbsp;42</code>, have a smaller residue in position <code>25</code> and/or <code>96</code>. So let's accept the most frequent <code>ILE</code> rotamer by selecting it in the rotamer window and clicking '''OK''' (while '''existing side chain(s): replace''' is selected).
 
#Done.
 
 
}}
 
}}
  
If you want to go over this in more detail, check the video tutorial on YouTube published by the NIAID bioinformatics group [http://www.youtube.com/watch?v=bcXMexN6hjY '''here''']. I would also encourage you to go over [http://www.youtube.com/watch?v=eJkrvr-xeXY '''Part 2 of the video tutorial'''] that discusses how to check for and resolve (by energy minimization) steric clashes. But do remember that it is not clear whether energy minimization will make your structure more correct in the sense of a smaller overall RMSD with the real, mutated protein.
+
{{Vspace}}
  
What we have done here with one residue is exactly the way homology modeling works with entire sequences. Let's now build a homology model for YFO Mbp1.
+
;That is all;
  
 +
{{Vspace}}
  
===Scripting and Programming===
+
== Links and resources ==
  
  
(Code generation with '''R'''?)
+
* [[UCSF Chimera|'''Chimera page''']]
 +
* [https://www.cgl.ucsf.edu/chimera/current/docs/UsersGuide/framecontrib.html Chimera Tools Index]
 +
* [[Stereo Vision|'''Stereo vision tutorial''']]
  
 +
*[http://www.rcsb.org/pdb/static.do?p=software/software_links/molecular_graphics.html Molecular Graphics Software Links]&ndash; a collection of links at the PDB.
  
  
== Links and resources ==
+
{{#pmid: 10747782}}
{{#pmid: 10679470}}
 
{{#pmid: 15808743}}
 
  
<!-- {{#pmid: 19957275}} -->
 
 
<!-- {{WWW|WWW_GMOD}} -->
 
<!-- {{WWW|WWW_GMOD}} -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->

Latest revision as of 05:54, 4 December 2016

Assignment for Week 5
Structure Analysis

< Assignment 4 Assignment 6 >

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

 
 
Concepts and activities (and reading, if applicable) for this assignment will be topics on the upcoming quiz.


 


 

How could the search for ultimate truth have revealed so hideous and visceral-looking an object?
Max Perutz  (on his first glimpse of the Hemoglobin structure)


 

Introduction

Where is the hidden beauty in structure, and where, the "ultimate truth"? In the previous assignments we have discovered homologues of APSES domain containing proteins in all fungal species. This makes the domain an ancient protein family that had already duplicated to several paralogues at the time when the cenancestor of all fungi lived, more than 600,000,000 years ago, in the Vendian period of the Proterozoic era of Precambrian times.

In this assignment we will explore its molecular structure.


 

Molecular graphics: UCSF Chimera

To view molecular structures, we need a tool to visualize the three dimensional relationships of atoms. A molecular viewer is a program that takes 3D structure data and allows you to display and explore it. For a number of reasons, I use the UCSF Chimera viewer for this course:

  1. Chimera is free and open;
  2. It creates very appealing graphics;
  3. It is under ongoing development and is well maintained;
  4. It provides an array of useful utilities for structure analysis; and,
  5. besides an intuitive, menu driven interface, Chimera can be scripted via its command line, or even programmed via its in-built python interpreter.


Task:

  1. Access the Chimera homepage and navigate to the Download section.
  2. Find the the newest version for your platform in the table and click on the file to download it.
  3. Follow the instructions to install Chimera.


Let's explore Chimera functions first with a simple small molecule:


 

Modeling small molecules

"Small" molecules are solvent, ligands, substrates, products, prosthetic groups, drugs - in short, essentially everything that is not made by DNA-, RNA-polymerases or the ribosome. Whereas the biopolymers are still front and centre in our quest to understand molecular biology, small molecules are crucial for our quest to interact with the inventory of the cell, create useful products, or advance medicine.

A number of public repositories make small-molecule information available, such as PubChem at the NCBI, the ligand collection at the PDB, the ChEBI database at the European Bioinformatics Institute, the Canadian DrugBank, or the NCI database browser at the US National Cancer Institute. One general way to export topology information from these services is to use SMILES strings—a shorthand notation for the composition and topology of chemical compounds.


Task:

  1. Access PubChem.
  2. Enter "caffeine" as a search term in the Compound tab. A number of matches to this keyword search are returned.
  3. Click on the top hit - 1,3,7-Trimethylxanthine, the Caffeine molecule. Note that the page contains among other items:
    1. A 2D structural sketch;
    2. An idealized 3D structural conformer, for which you can download coordinates in several formats;
    3. The IUPAC name: 1,3,7-trimethylpurine-2,6-dione;
    4. The CAS identifier 58-08-2 which is a unique identifier and can be used as a cross-reference ID;
    5. The SMILES strings CN1C=NC2=C1C(=O)N(C(=O)N2C)C;
    6. ... and much more.


That's great, but let's sketch our own version of caffeine. Several versions of Peter Ertl's Java Molecular Editor (JME) are offered online, PubChem offers this functionality via its Sketcher tool.

Task:

  1. Return to the PubChem homepage.
  2. Follow the link to Structure search (in the right hand menu).
  3. Click on the 3D conformer tab and on the Launch button to launch the molecular editor in its own window.
  4. Sketch the structure of caffeine. I find the editor quite intuitive but clicking on the Help button will give you a quick, structured overview. Make sure you define your double-bonds correctly.
  5. Export the SMILES string of your compound to your project folder.


Translating SMILES to structure

Chimera can translate SMILES strings to coordinates[1].

Task:

  1. Open Chimera.
  2. Select ToolsStructure EditingBuild Structure.
  3. In the Build Structure window, select the SMILES string button, paste the string from your file, and click Apply.
  4. The caffeine molecule will be generated and visualized in the graphics window. This is a "stick" representation.
  5. You can rotate it with your mouse, <command> drag to scale, <shift> drag to translate.
  6. Use the ActionsAtoms/Bondsball & stick or sphere menu items to change appearance.
  7. Use the ActionsColorby element menu to change colors.
  8. Change the display back to stick and use ActionsSurfaceshow to add a solvent accessible surface. Choosing this command triggers the calculation of the surface, which is then available as an individually selectable object. However, with default parameters the surface appears a bit rough for this small molecule.
  9. Change the parameters of this solvent accessible surface:
    1. Select the surface with <control><click> (<control><left mouse button> on windows). A green contour line appears around selected items – it surrounds the surface in this case.
    2. Open the selection inspector by clicking on the tiny green icon in the lower-right corner of the window (It has a magnifying glass symbol which means "inspect" for Chimera, not "search").
    3. Select Inspect ...MSMS surface and change the Vertex density value to 50.0 - hit return.
  10. By default, the surface inherits the colour of the atoms it envelopes. To change the colour of the surface, use the ActionsColorall options menu. Click the surfaces button to indicate that the color choice should be applied to the surface object (note what else you can apply color to...), then choose cornflower blue.
  11. Use the ActionsSurfacetransparency50% menu to see atoms and bonds that are covered by the surface.
  12. To begin working with molecules in "true" 3D, choose ToolsViewing ControlsCamera and select camera modewall-eye stereo. Also, use the Effects tab of the Viewing window, and check shadows off.
  13. Your structure should look about like what you see below. Save your session with the FileSave Session dialogue so you can easily recreate the scene.


Caffeine stereo.jpg

Wall-eye stereo view of the caffeine structure, surrounded by a transparent molecular surface. The image for the left eye is on the left side. For instructions on stereo-viewing, see the next section.



 

Stereo vision

A simple molecular scene like the caffeine molecule is a great way to practice viewing structures in stereo. This is a learnable skill, but it takes practice.

Task:

Access the Stereo Vision tutorial and practice viewing molecular structures in stereo.

Practice at least ...

  • two times daily,
  • for 3-5 minutes each session,

Keep up your practice throughout the course. It is a wonderful skill that will greatly support your understanding of structural molecular biology. Practice with different molecules and try out different colours and renderings.

Note: do not go through your practice sessions mechanically. If you are not making any progress with stereo vision, contact me so I can help you on the right track.


 

Global properties

In this series of tasks we will showcase some of the globally applied tools that help us study molecular structure.


 

A Ramachandran plot

Task:

  1. To reset all views and selections, choose FavoritesModel Panel. Select the 1BM8 model and click the close button to remove it.
  2. In the graphics window, click on the "lightning bolt" icon at the bottom. You should see a button labelled 1BM8 on the right. This is where you will find recent structures. Click 1BM8 to re-load it.
  3. Choose PresetsInteractive 2 (all atoms) for a detailed view.
  4. Choose FavoritesModel Panel
  5. Look for the Option Ramachandran plot... in the choices on the right.
  6. Click the button and study the result. The dots in thisRamachandran Plot represent the phi-psi angle combinations for residue backbones. We see that they are well distributed, this is a high-resolution structure essentially without outliers. Clicking on a dot selects a residue in the structure viewer (selected residues have a green contour).
  7. Choose FileFetch by ID and fetch 1L3G, an NMR structure of the Mbp1 APSES domain. Chimera loads the 19 models that comprise this structure dataset.
  8. In the FavoritesModel Panel, select 1BM8 and click on hide.
  9. Then select 1LG3 and click group/ungroup to be able to address the models individually. Select any of the models individually and click again on Ramachandran plot. You will see that the points are much more dispersed, and there are a number of outliers that have comparatively high-energy conformations.


 

B-factors

Task:

  1. Choose FavoritesModel Panel, click/drag over the 1LG3 models and click close to remove them again.
  2. To explore B-Factors in the 1BM8 model, click show to view it again.
  3. Choose ToolsStructure AnalysisRender byAttribute.
  4. Select Attributes of atoms, Model 1BM8 and Attribute: bfactor. A histogram appears with sliders that allow you to render the distribution of values found in the structure for this attribute.
  5. Let's colour the atoms by B-Factor. Click on the colours tab. A standard colouring scheme is blue - white - red, but you can move the sliders, add new thresholds, and colour them individually by clicking on the colour patch to create your own colour spectrum, e.g. from black via red to white, in a black-body spectrum. Click Apply.
  6. Choose ActionsAtoms/Bondsstick to give the bonds more volume. You will find that the core of the protein has low temperature factors, and the surface has a number of highly mobile sidechains and loops.

1BM8 thermal stereo.jpg

Structure of the yeast transcription factor Mbp1 DNA binding domain (1BM8) coloured by B-factor (thermal factor). The protein bonds are shown in a "stick" model, coloured with a spectrum that emulates black-body radiation. Note that the interior of the protein is less mobile, some of the surface loops are highly mobile (or statically disordered, X-ray structures can't distinguish that) and the discretely bound water molecules that are visible in this high-resolution structure are generally more mobile than the residues they bind to.


 

Electrostatics

Task:

  1. To visualize the electrostatic potential of the protein, mapped on the surface, first select PresetsInteractive 2... and ActionsColorcyan for a vividly contrasting color.
  2. A simple electrostatic potential calculation just assumes Coulomb charges. A more accurate calculation of full Poisson-Boltzmann potentials is also available. Select ToolsElectrostatic/Binding AnalysisCoulombic Surface Coloring.
  3. Make sure the surface object is selected in the form (it should be selected by default since there is only one surface), keep the default parameters and click Apply.
  4. Use ActionsSurfaceTransparency30% to make the protein backbone somewhat visible.
  5. Open the ToolsViewing ControlsLighting window → and set Intensity from two-point to ambient. This reduces shadowing and reflections on the surface and thus emphasizes the color values - here our focus is not on shape, but on property.
  6. Use the Effects tab to turn shadows off and depth-cueing and silhouettes on. This recreates visual cues of depth which compensate for the loss of shape information by using a flat lighting model.

1BM8 coulomb stereo.jpg

Coulomb (electrostatic) potential mapped to the solvent accessible surface of the yeast transcription factor Mbp1 DNA binding domain (1BM8). The protein backbone is visible through the transparent surface as a cartoon model, note the helix at the bottom of the structure. This helix has been suggested to play a role in forming the domain's DNA binding site and the positive (blue) electrostatic potential of the region is consistent with binding the negatively charged phosphate backbone of DNA. The other side of the domain has a negative (red) charge excess, which balances the molecule's electric charge overall, but also guides the protein-ligand interaction and supports faster on-rates.


 

Hydrogen bonds

Task:

  1. Hydrogen bonds encode the basic folding patterns of the protein. To visualize H-bonds select PresetsPublication 1... and ActionsColorby element.
  2. Use ToolsStructure AnalysisFindHBond and Apply default parameters.
  3. To emphasize the role of H-bonds in determining the architecture of the protein, select SelectStructurebackbonefull and then SelectInvert (all models). Now ActionsAtoms/bondshide will show only the backbone with its H-bonds.


1BM8 hbond stereo.jpg

Hydrogen bonds shown for the peptide backbone of the yeast transcription factor Mbp1 DNA binding domain (1BM8). This view emphasizes the interactions of secondary structure elements that govern the folding topology of the domain.


Chimera sequence interface

In this task we will explore the sequence interface of Chimera, use it to select specific parts of a molecule, and colour specific regions (or residues) of a molecule separately.

 

Task:

  1. Display the protein in PresetsInteractive 1 mode and familiarize yourself with its topology of helices and strands.
  2. Now turn hydrogen bonds off: the menu commands of Chimera all have a command line equivalent. Open the command line by clicking on the "computer" icon in the upper left corner of the viewer window. Then type "~hbonds". The "~" undoes previous commands.
  3. Use ToolsDepiction Rainbow to color the chain from blue to red. (You need to change the colour patches by clicking on them to open the colour editor. Choose an HSL colour model, use Saturation and Lightness 0.5 to keep the colour to somewhat subdued hues, then use the slider to choose appropriate hue values.) Click Apply.
  4. Open the sequence tool: ToolsSequenceSequence. By default, coloured rectangles overlay the secondary structure elements of the sequence.
  5. Hover the mouse over some residues and note that the sequence number and chain is shown at the bottom of the window.
  6. Click/drag one residue to select it. (Simply a click wont work, you need to drag a little bit for the selection to catch on.) Note that the residue gets a green overlay in the sequence window, and it also gets selected with a green border in the graphics window.
  7. In the bottom of the sequence window, there are instructions how to select (multiple) regions. Clear the selection by <control> clicking into an empty spot of the viewer. Now select the region that encompasses the residues that have been reported to form the DNA binding subdomain: KRTRILEKEVLKETHEKVQGGFGKYQ (Taylor 2000). Show the side chains of these residues by clicking on the little green inspector icon on the viewer window, inspecting Atom and choosing displayed: true, and inspecting Bond and setting the stick radius to 0.4.
  8. Undisplay the Hydrogen atoms by selecting the element H in the Chemistry option of the Selection Menu, and use the Action menu to hide them. Then use the effects pane of the Depiction menu to add a contour.
  9. Finally, give the scene a gradient grey background grey via the ActionsColorall options... menu.


1BM8 DNAbindingRegion stereo.jpg

The DNA binding region of Mbp1 according to NMR measurements of DNA contact by Taylor et al. (2000). The backbone of 1BM8 is shown with a colour ramp from blue (N-terminus) to red (C-terminus). The side chains of the region 50-74 are shown colored by element.



 

Compute with structures

 

To practice actual computations with structures we'll use the Grant lab's bio3d package in R.

Task:

  • Open an RStudio session, and load the BCH441 project.
  • Bring code and data resources up to date:
    • pull the most recent version of the project from GitHub
    • type init() to load the most recent files and functions.
  • Study and work through the code in the BCH441_A05.R script.
  • There are a number of questions in the code, it would be good if you don't gloss over them but try to answer them for yourself. Especially the questions about the final histogram: without interpretation, without learning something interesting about biology from the plot, all this is just Cargo Cult.


 
That is all;


 

Links and resources


Taylor et al. (2000) Characterization of the DNA-binding domains from the yeast cell-cycle transcription factors Mbp1 and Swi4. Biochemistry 39:3943-54. (pmid: 10747782)

PubMed ] [ DOI ] The minimal DNA-binding domains of the Saccharomyces cerevisiae transcription factors Mbp1 and Swi4 have been identified and their DNA binding properties have been investigated by a combination of methods. An approximately 100 residue region of sequence homology at the N-termini of Mbp1 and Swi4 is necessary but not sufficient for full DNA binding activity. Unexpectedly, nonconserved residues C-terminal to the core domain are essential for DNA binding. Proteolysis of Mbp1 and Swi4 DNA-protein complexes has revealed the extent of these sequences, and C-terminally extended molecules with substantially enhanced DNA binding activity compared to the core domains alone have been produced. The extended Mbp1 and Swi4 proteins bind to their cognate sites with similar affinity [K(A) approximately (1-4) x 10(6) M(-)(1)] and with a 1:1 stoichiometry. However, alanine substitution of two lysine residues (116 and 122) within the C-terminal extension (tail) of Mbp1 considerably reduces the apparent affinity for an MCB (MluI cell-cycle box) containing oligonucleotide. Both Mbp1 and Swi4 are specific for their cognate sites with respect to nonspecific DNA but exhibit similar affinities for the SCB (Swi4/Swi6 cell-cycle box) and MCB consensus elements. Circular dichroism and (1)H NMR spectroscopy reveal that complex formation results in substantial perturbations of base stacking interactions upon DNA binding. These are localized to a central 5'-d(C-A/G-CG)-3' region common to both MCB and SCB sequences consistent with the observed pattern of specificity. Changes in the backbone amide proton and nitrogen chemical shifts upon DNA binding have enabled us to experimentally define a DNA-binding surface on the core N-terminal domain of Mbp1 that is associated with a putative winged helix-turn-helix motif. Furthermore, significant chemical shift differences occur within the C-terminal tail of Mbp1, supporting the notion of two structurally distinct DNA-binding regions within these proteins.


 

 


Footnotes and references

  1. There are several online servers that translate SMILES strings to idealized structures, see e.g. the online SMILES translation service at the NCI.


 

Ask, if things don't work for you!

If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.



< Assignment 4 Assignment 6 >