Difference between revisions of "BIO Assignment Week 5"

From "A B C"
Jump to navigation Jump to search
m
 
(27 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
<div class="b1">
 
<div class="b1">
 
Assignment for Week 5<br />
 
Assignment for Week 5<br />
<span style="font-size: 70%">Sequence alignment </span>
+
<span style="font-size: 70%">Structure Analysis</span>
 
</div>
 
</div>
 +
<table style="width:100%;"><tr>
 +
<td style="height:30px; vertical-align:middle; text-align:left; font-size:80%;">[[BIO_Assignment_Week_4|&lt;&nbsp;Assignment&nbsp;4]]</td>
 +
<td style="height:30px; vertical-align:middle; text-align:right; font-size:80%;">[[BIO_Assignment_Week_6|Assignment&nbsp;6&nbsp;&gt;]]</td>
 +
</tr></table>
  
 
{{Template:Inactive}}
 
{{Template:Inactive}}
 +
<small>Concepts and activities (and reading, if applicable) for this assignment will be topics on the upcoming quiz.</small>
  
Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.
 
  
  
 
__TOC__
 
__TOC__
 
  
 
&nbsp;
 
&nbsp;
==Introduction==
 
  
In this assignment we will perform an optimal global and local sequence alignment, and use '''R''' to plot the alignment quality as a colored bar-graph.
+
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
  
  
=== Optimal sequence alignments ===
+
&nbsp;<br>
  
 
+
;How could the search for ultimate truth have revealed so hideous and visceral-looking an object?
Online programs for optimal sequence alignment are part of the EMBOSS tools. The programs take FASTA files as input.
+
:''[https://en.wikipedia.org/wiki/Max_Perutz Max Perutz]&nbsp;&nbsp;<small>(on his first glimpse of the Hemoglobin structure)</small>''
 
+
</div>
;Local optimal SEQUENCE alignment "water"
 
{{task|1=
 
# Retrieve the FASTA file for the YFO Mbp1 protein and for [http://www.ncbi.nlm.nih.gov/protein/NP_010227?report=fasta&log$=seqview&format=text ''Saccharomyces']'.
 
# Save the files as text files to your computer, (if you haven't done so already). You could give them an extension of <code>.fa</code>.
 
# Access the [http://emboss.bioinformatics.nl/ EMBOSS Explorer site] (if you haven't done so yet, you might want to bookmark it.)
 
# Look for '''ALIGNMENT LOCAL''', click on '''water''', paste your FASTA sequences and run the program with default parameters.
 
# Study the results. You will probably find that the alignment extends over most of the protein, but does not include the termini.
 
# Considering the sequence identy cutoff we discussed in class (25% over the length of a domain), do you believe that the APSES domains are homologous?
 
# Change the '''Gap opening''' and '''Gap extension''' parameters to high values (e.g. 30 and 5). Then run the alignment again.
 
# Note what is different.
 
# You could try getting only an alignment for the ankyrin domains, by deleting the approximate region of the APSES domains from your input.
 
}}
 
 
 
 
 
;Global optimal SEQUENCE alignment "needle"
 
{{task|1=
 
# Look for '''ALIGNMENT GLOBAL''', click on '''needle''', paste your FASTA sequences and run the program with default parameters.
 
# Study the results. You will find that the alignment extends over the entire protein, likely with long indels at the termini.
 
# Change the '''Output alignment format''' to '''FASTA pairwise simple''', to retrieve the aligned FASTA files with indels.
 
# Copy the aligned sequences (with indels) and save them to your computer. You could give them an extension of <code>.fal</code> to remind you that they are aligned FASTA sequences.
 
}}
 
  
  
 
&nbsp;
 
&nbsp;
  
== The Mutation Data Matrix ==
+
==Introduction==
  
The NCBI makes its alignment matrices available by ftp at ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62 . Access that site and download the <code>BLOSUM62</code> matrix to your computer. You could give it a filename of <code>BLOSUM62.mdm</code>.
 
  
It should look like this.
+
Where is the hidden beauty in structure, and where, the "ultimate truth"? In the previous assignments we have discovered homologues of APSES domain containing proteins in all fungal species. This makes the domain an ancient protein family that had already duplicated to several paralogues at the time when the cenancestor of all fungi lived, more than 600,000,000 years ago, in the [http://www.ucmp.berkeley.edu/fungi/fungifr.html Vendian period] of the Proterozoic era of Precambrian times.
 
 
<source lang="text">
 
#  Matrix made by matblas from blosum62.iij
 
#  * column uses minimum score
 
#  BLOSUM Clustered Scoring Matrix in 1/2 Bit Units
 
#  Blocks Database = /data/blocks_5.0/blocks.dat
 
#  Cluster Percentage: >= 62
 
#  Entropy =  0.6979, Expected =  -0.5209
 
  A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V  B  Z  X  *
 
A  4 -1 -2 -2  0 -1 -1  0 -2 -1 -1 -1 -1 -2 -1  1  0 -3 -2  0 -2 -1  0 -4
 
R -1  5  0 -2 -3  1  0 -2  0 -3 -2  2 -1 -3 -2 -1 -1 -3 -2 -3 -1  0 -1 -4
 
N -2  0  6  1 -3  0  0  0  1 -3 -3  0 -2 -3 -2  1  0 -4 -2 -3  3  0 -1 -4
 
D -2 -2  1  6 -3  0  2 -1 -1 -3 -4 -1 -3 -3 -1  0 -1 -4 -3 -3  4  1 -1 -4
 
C  0 -3 -3 -3  9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4
 
Q -1  1  0  0 -3  5  2 -2  0 -3 -2  1  0 -3 -1  0 -1 -2 -1 -2  0  3 -1 -4
 
E -1  0  0  2 -4  2  5 -2  0 -3 -3  1 -2 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4
 
G  0 -2  0 -1 -3 -2 -2  6 -2 -4 -4 -2 -3 -3 -2  0 -2 -2 -3 -3 -1 -2 -1 -4
 
H -2  0  1 -1 -3  0  0 -2  8 -3 -3 -1 -2 -1 -2 -1 -2 -2  2 -3  0  0 -1 -4
 
I -1 -3 -3 -3 -1 -3 -3 -4 -3  4  2 -3  1  0 -3 -2 -1 -3 -1  3 -3 -3 -1 -4
 
L -1 -2 -3 -4 -1 -2 -3 -4 -3  2  4 -2  2  0 -3 -2 -1 -2 -1  1 -4 -3 -1 -4
 
K -1  2  0 -1 -3  1  1 -2 -1 -3 -2  5 -1 -3 -1  0 -1 -3 -2 -2  0  1 -1 -4
 
M -1 -1 -2 -3 -1  0 -2 -3 -2  1  2 -1  5  0 -2 -1 -1 -1 -1  1 -3 -1 -1 -4
 
F -2 -3 -3 -3 -2 -3 -3 -3 -1  0  0 -3  0  6 -4 -2 -2  1  3 -1 -3 -3 -1 -4
 
P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4  7 -1 -1 -4 -3 -2 -2 -1 -2 -4
 
S  1 -1  1  0 -1  0  0  0 -1 -2 -2  0 -1 -2 -1  4  1 -3 -2 -2  0  0  0 -4
 
T  0 -1  0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1  1  5 -2 -2  0 -1 -1  0 -4
 
W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1  1 -4 -3 -2 11  2 -3 -4 -3 -2 -4
 
Y -2 -2 -2 -3 -2 -1 -2 -3  2 -1 -1 -2 -1  3 -3 -2 -2  2  7 -1 -3 -2 -1 -4
 
V  0 -3 -3 -3 -1 -2 -2 -3 -3  3  1 -2  1 -1 -2 -2  0 -3 -1  4 -3 -2 -1 -4
 
B -2 -1  3  4 -3  0  1 -1  0 -3 -4  0 -3 -3 -2  0 -1 -4 -3 -3  4  1 -1 -4
 
Z -1  0  0  1 -3  3  4 -2  0 -3 -3  1 -1 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4
 
X  0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2  0  0 -2 -1 -1 -1 -1 -1 -4
 
* -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4  1
 
</source>
 
 
 
 
 
{{task|
 
* Study this and make sure you understand what this table is, how it can be used, and what a reasonable range of values for identities and pairscores for non-identical, similar and dissimilar residues is. Ask on the mailing list in case you have questions.
 
}}
 
  
 +
In this assignment we will explore its molecular structure.
  
  
 
&nbsp;
 
&nbsp;
== The DNA binding site ==
 
  
 +
==Molecular graphics: UCSF Chimera==
  
Now, that you know how YFO Mbp1 aligns with yeast Mbp1, you can evaluate functional conservation in these homologous proteins. You probably already downloaded the two Biochemistry papers by Taylor et al. (2000) and by Deleeuw et al. (2008) that we encountered in Assignment 2. These discuss the residues involved in DNA binding<ref>([http://www.ncbi.nlm.nih.gov/pubmed/10747782 Taylor ''et al.'' (2000) ''Biochemistry'' '''39''': 3943-3954] and [http://www.ncbi.nlm.nih.gov/pubmed/18491920 Deleeuw ''et al.'' (2008) Biochemistry. '''47''':6378-6385])</ref>. In particular the residues between 50-74 have been proposed to comprise the DNA recognition domain.
+
To view molecular structures, we need a tool to visualize the three dimensional relationships of atoms. A ''molecular viewer'' is a program that takes 3D structure data and allows you to display and explore it. For a number of reasons, I use the UCSF Chimera viewer for this course:
  
{{task|
+
# Chimera is free and open;
# Using the APSES domain alignment you have just constructed, find the YFO Mbp1 residues that correspond to the range 50-74 in yeast.
+
# It creates very appealing graphics;
# Note whether the sequences are especially highly conserved in this region.
+
# It is under ongoing development and is well maintained;
# Using VMD, look at the region. Use the sequence viewer '''to make sure''' that the sequence numbering between the paper and the PDB file are the same (they are often not identical!). Then select the residues - the proposed recognition domain -  and color them differently for emphasis. Study this in stereo to get a sense of the spatial relationships. Check where the conserved residues are.
+
# It provides an array of useful utilities for structure analysis; and,
# A good representation is '''Licorice''' - but other representations that include sidechains will also serve well. You may want to reduce the thickness of bonds to declutter the image a bit.
+
# besides an intuitive, menu driven interface, Chimera can be scripted via its command line, or even programmed via its in-built python interpreter.
# Calculate a solvent accessible surface of the protein in a separate representation and make it transparent.
 
# You could  combine three representations: (1) the backbone (in '''new cartoon'''), (2) the sidechains of residues that presumably contact DNA, distinctly colored, and (3) a transparent surface of the entire protein. This image should show whether residues annotated as DNA binding form a contiguous binding interface. '''Note:''' VMD makes smart use of GPU capabilities of your computer. Try setting the VMD graphics parameters to visualize with '''GLSL''' - your transparent surface may look '''much''' better.  
 
}}
 
  
  
DNA binding interfaces are expected to comprise a number of positively charged amino acids, that might form salt-bridges with the phosphate backbone.
+
{{#lst:UCSF_Chimera|Installation}}
  
  
{{task|
+
Let's explore Chimera functions first with a simple small molecule:
*Study and consider whether this is the case here and which residues might be included.
 
}}
 
  
  
 
&nbsp;
 
&nbsp;
== R code: coloring the alignment by quality ==
 
  
 +
=== Modeling small molecules ===
  
 +
"Small" molecules are solvent, ligands, substrates, products, prosthetic groups, drugs - in short, essentially everything that is not made by DNA-, RNA-polymerases or the ribosome. Whereas the biopolymers are still front and centre in our quest to understand molecular biology, small molecules are crucial for our quest to interact with the inventory of the cell, create useful products, or advance medicine.
  
{{task|1=
+
A number of public repositories make small-molecule information available, such as [http://pubchem.ncbi.nlm.nih.gov/ PubChem] at the NCBI, the ligand collection at the [http://pdb.org '''PDB'''], the [http://www.ebi.ac.uk/chebi/ ChEBI] database at the European Bioinformatics Institute, the Canadian [http://www.drugbank.ca DrugBank], or the [http://cactus.nci.nih.gov/ncidb2.2/ NCI database browser] at the US National Cancer Institute. One general way to export topology information from these services is to use {{WP|SMILES|SMILES strings}}&mdash;a shorthand notation for the composition and topology of chemical compounds.
  
* Study this code carefully, execute it, section by section and make sure you understand all of it. Ask on the list if anything is not clear.
 
  
<source lang="R">
+
{{task|
# BiostringsExample.R
+
# Access [http://pubchem.ncbi.nlm.nih.gov/ PubChem].
# Short tutorial on sequence alignment with the Biostrings package.
+
# Enter "caffeine" as a search term in the '''Compound''' tab. A number of matches to this keyword search are returned.
# Boris Steipe, October 2013
+
# Click on the [http://pubchem.ncbi.nlm.nih.gov/compound/2519 top hit - 1,3,7-Trimethylxanthine, the Caffeine molecule]. Note that the page contains among other items:
#
+
## A 2D structural sketch;
setwd("~/path/to/your/R_files/")
+
## An idealized 3D structural conformer, for which you can download coordinates in several formats;
 +
## The IUPAC name: <code>1,3,7-trimethylpurine-2,6-dione</code>;
 +
## The CAS identifier <code>58-08-2</code> which is a unique identifier and can be used as a cross-reference ID;
 +
## The {{WP|SMILES|SMILES strings|SMILES string}} <code>CN1C{{=}}NC2{{=}}C1C({{=}}O)N(C({{=}}O)N2C)C</code>;
 +
## ... and much more.
 +
}}
  
  
# Biostrings is a package within the bioconductor project.
+
That's great, but let's sketch our own version of caffeine. Several versions of Peter Ertl's {{WP|JME_editor|Java Molecular Editor (JME)}} are offered online, PubChem offers this functionality via its '''Sketcher''' tool.
# bioconducter packages have their own installation system,
 
# they are normally not installed via CRAN.
 
# http://www.bioconductor.org/packages/2.13/bioc/vignettes/Biostrings/inst/doc/PairwiseAlignments.pdf
 
  
source("http://bioconductor.org/biocLite.R")
+
{{task|
biocLite("Biostrings")
+
# Return to the [http://pubchem.ncbi.nlm.nih.gov/ PubChem homepage].
 +
# Follow the link to '''Structure search''' (in the right hand menu).
 +
# Click on the '''3D conformer''' tab and on the '''Launch''' button to launch the molecular editor in its own window.
 +
# Sketch the structure of caffeine. I find the editor quite intuitive but clicking on the '''Help''' button will give you a quick, structured overview. Make sure you define your double-bonds correctly.
 +
# '''Export''' the SMILES string of your compound to your project folder.
 +
}}
  
library(Biostrings)
 
  
library(help=Biostrings)
+
=== Translating SMILES to structure ===

# Read in two fasta files - you will need to edit this for YFO
 
sacce <- readAAStringSet("mbp1_sacce.fa", format="fasta")
 
ustma <- readAAStringSet("mbp1_ustma.fa", format="fasta")
 
  
sacce
+
Chimera can translate SMILES strings to coordinates<ref>There are several online servers that translate SMILES strings to idealized structures, see e.g. the [http://cactus.nci.nih.gov/translate/ online SMILES translation service] at the NCI.</ref>.
names(sacce)
 
names(sacce) <- "Mbp1 SACCE"
 
names(ustma) <- "Mbp1 USTMA" # Example only ... modify for YFO
 
  
width(sacce)
+
{{task|
as.character(sacce)
+
# Open Chimera.
 +
# Select '''Tools''' &rarr; '''Structure&nbsp;Editing''' &rarr; '''Build&nbsp;Structure'''.
 +
# In the '''Build Structure''' window, select the '''SMILES string''' button, paste the string from your file, and click '''Apply'''.
 +
# The caffeine molecule will be generated and visualized in the graphics window. This is a "stick" representation.
 +
# You can rotate it with your mouse, &lt;command&gt; drag to scale, <shift> drag to translate.
 +
# Use the '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''ball &amp; stick''' or '''sphere''' menu items to change appearance.
 +
# Use the '''Actions''' &rarr; '''Color''' &rarr; '''by element''' menu to change colors.
 +
# Change the display back to stick and use '''Actions''' &rarr; '''Surface''' &rarr; '''show''' to add a solvent accessible surface. Choosing this command triggers the calculation of the surface, which is then available as an individually selectable object. However, with default parameters the surface appears a bit rough for this small molecule.
 +
# Change the parameters of this solvent accessible surface:
 +
## Select the surface with &lt;control&gt;&lt;click&gt; (&lt;control&gt;&lt;left mouse button&gt; on windows). A green contour line appears around selected items – it surrounds the surface in this case.
 +
## Open the selection inspector by clicking on the tiny green icon in the lower-right corner of the window (It has a magnifying glass symbol which means "inspect" for Chimera, not "search").
 +
## Select Inspect ...'''MSMS surface''' and change the '''Vertex density''' value to 50.0 - hit return.
 +
# By default, the surface inherits the colour of the atoms it envelopes. To change the colour of the surface, use the '''Actions''' &rarr; '''Color''' &rarr; '''all options''' menu. Click the '''surfaces''' button to indicate that the color choice should be applied to the surface object (note what else you can apply color to...), then choose '''cornflower blue'''.
 +
# Use the '''Actions''' &rarr; '''Surface''' &rarr; '''transparency''' &rarr; '''50%''' menu to see atoms and bonds that are covered by the surface.
 +
# To begin working with molecules in "true" 3D, choose '''Tools''' &rarr; '''Viewing Controls''' &rarr; '''Camera''' and select '''camera mode''' &rarr; '''wall-eye stereo'''. Also, use the '''Effects''' tab of the '''Viewing''' window, and ''check'' '''shadows''' off.
 +
# Your structure should look about like what you see below. Save your session with the '''File''' &rarr; '''Save Session''' dialogue so you can easily recreate the scene.
  
# Biostrings takes a sophisticated approach to sequence alignment ...
 
?pairwiseAlignment
 
  
# ... but the use in practice is quite simple:
+
{{stereo|Caffeine_stereo.jpg|'''Wall-eye stereo view''' of the caffeine structure, surrounded by a transparent molecular surface. The image for the left eye is on the left side. For instructions on ''stereo-viewing'', see the next section.
ali <- pairwiseAlignment(sacce, ustma, substitutionMatrix = "BLOSUM50")
+
}}
ali
 
  
pattern(ali)
 
subject(ali)
 
  
writePairwiseAlignments(ali)
+
}}
  
p <- aligned(pattern(ali))
 
names(p) <- "Mbp1 SACCE aligned"
 
s <- aligned(subject(ali))
 
names(s) <- "Mbp1 USTMA aligned"
 
  
# don't overwrite your EMBOSS .fal files
+
{{Vspace}}
writeXStringSet(p, "mbp1_sacce.R.fal", append=FALSE, format="fasta")
 
writeXStringSet(s, "mbp1_ustma.R.fal", append=FALSE, format="fasta")
 
  
# Done.
+
=== Stereo vision ===
  
</source>
+
A simple molecular scene like the caffeine molecule is a great way to practice viewing structures in stereo. This is a learnable skill, but it takes practice.
  
* Compare the alignments you received from the EMBOSS server, and that you co puted using '''R'''. Are they aproximately the same? Exactly? You did use different matrices and gap aameters, so minor differences are to be expected. But by and large you should get the same alignments.
+
{{task|
 +
Access the '''[[Stereo Vision]]''' tutorial and practice viewing molecular structures in stereo.  
  
 +
Practice at least ...
 +
* two times daily,
 +
* for 3-5 minutes each session,
 
}}
 
}}
  
We will now use the aligned sequences to compute a graphical display of alignment quality.
+
Keep up your practice throughout the course. It is a wonderful skill that will greatly support your understanding of structural molecular biology. Practice with different molecules and try out different colours and renderings.
  
 +
'''Note: do not go through your practice sessions mechanically. If you are not making any progress with stereo vision, contact me so I can help you on the right track.'''
  
{{task|1=
+
{{Vspace}}
  
* Study this code carefully, execute it, section by section and make sure you understand all of it. Ask on the list if anything is not clear.
+
==Global properties==
  
<source lang="R">
+
In this series of tasks we will showcase some of the '''globally''' applied tools that help us study molecular structure.
# aliScore.R
 
# Evaluating an alignment with a sliding window score
 
# Boris Steipe, October 2012. Update October 2013
 
setwd("~/Documents/07.TEACHING/36-BCH441\ Bioinformatics\ 2013/")
 
  
# Scoring matrices can be found at the NCBI.
+
{{Vspace}}
# ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62
 
  
# It is good practice to set variables you might want to change
+
===A Ramachandran plot===
# in a header block so you don't need to hunt all over the code
 
# for strings you need to update.
 
#
 
fa1      <- "mbp1_sacce.R.fal"
 
fa2      <- "mbp1_ustma.R.fal"
 
code1    <- "SACCE"
 
code2    <- "USTMA"
 
mdmFile  <- "BLOSUM62.mdm"
 
window  <- 9  # window-size (should be an odd integer)
 
  
# ================================================
+
{{task|1=
#   Read data files
+
# To reset all views and selections, choose '''Favorites''' &rarr; '''Model Panel'''. Select the 1BM8 model and click the '''close''' button to remove it.
# ================================================
+
# In the graphics window, click on the "lightning bolt" icon at the bottom. You should see a button labelled 1BM8 on the right. This is where you will find recent structures. Click <code>1BM8</code> to re-load it.
 +
# Choose '''Presets''' &rarr; '''Interactive 2 (all atoms)''' for a detailed view.
 +
# Choose '''Favorites''' &rarr; '''Model Panel'''
 +
# Look for the Option '''Ramachandran plot...''' in the choices on the right.
 +
# Click the button and study the result. The dots in this[https://www.cgl.ucsf.edu/chimera/docs/ContributedSoftware/ramachandran/ramachandran.html Ramachandran Plot] represent the phi-psi angle combinations for residue backbones. We see that they are well distributed, this is a high-resolution structure essentially without outliers. Clicking on a dot selects a residue in the structure viewer (selected residues have a green contour).
 +
# Choose '''File''' &rarr; '''Fetch by ID''' and fetch <code>1L3G</code>, an NMR structure of the Mbp1 APSES domain. Chimera loads the 19 models that comprise this structure dataset.
 +
# In the '''Favorites''' &rarr; '''Model Panel''', select 1BM8 and click on '''hide'''.
 +
# Then select 1LG3 and click '''group/ungroup''' to be able to address the models individually. Select any of the models individually and click again on '''Ramachandran plot'''. You will see that the points are much more dispersed, and there are a number of outliers that have comparatively high-energy conformations.
 +
}}
  
# read fasta datafiles using seqinr function read.fasta()
 
install.packages("seqinr")
 
library(seqinr)
 
tmp  <- unlist(read.fasta(fa1, seqtype="AA", as.string=FALSE, seqonly=TRUE))
 
seq1 <- unlist(strsplit(as.character(tmp), split=""))
 
  
tmp  <- unlist(read.fasta(fa2, seqtype="AA", as.string=FALSE, seqonly=TRUE))
+
&nbsp;
seq2 <- unlist(strsplit(as.character(tmp), split=""))
 
  
if (length(seq1) != length(seq2)) {
+
===B-factors===
print("Error: Sequences have unequal length!")
 
}
 
 
lSeq <- length(seq1)
 
  
# ================================================
+
{{task|1=
#   Read scoring matrix
+
# Choose '''Favorites''' &rarr; '''Model Panel''', click/drag over the 1LG3 models and click '''close''' to remove them again.
# ================================================
+
# To explore B-Factors in the 1BM8 model, click '''show''' to view it again.
 +
# Choose '''Tools''' &rarr; '''Structure Analysis'''  &rarr; '''Render byAttribute'''.
 +
# Select '''Attributes of atoms''', '''Model''' 1BM8 and '''Attribute''': '''bfactor'''. A histogram appears with sliders that allow you to render the distribution of values found in the structure for this attribute.
 +
# Let's colour the atoms by B-Factor. Click on the colours tab. A standard colouring scheme is blue - white - red, but you can move the sliders, add new thresholds, and colour them individually by clicking on the colour patch to create your own colour spectrum, e.g. from black via red to white, in a {{WP|Black_body_radiation|black-body spectrum}}. Click '''Apply'''.
 +
# Choose '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''stick''' to give the bonds more volume. You will find that the core of the protein has low temperature factors, and the surface has a number of highly mobile sidechains and loops.
  
MDM <- read.table(mdmFile, skip=6)
+
{{stereo|1BM8_thermal_stereo.jpg|'''Structure of the yeast transcription factor Mbp1 DNA binding domain (1BM8)''' coloured by B-factor (thermal factor). The protein bonds are shown in a "stick" model, coloured with a spectrum that emulates black-body radiation. Note that the interior of the protein is less mobile, some of the surface loops are highly mobile (or statically disordered, X-ray structures can't distinguish that) and the discretely bound water molecules that are visible in this high-resolution structure are generally more mobile than the residues they bind to.
 +
}}
  
# This is a dataframe. Study how it can be accessed:
 
  
MDM
+
}}
MDM[1,]
 
MDM[,1]
 
MDM[5,5]  # Cys-Cys
 
MDM[20,20] # Val-Val
 
MDM[,"W"]  # the tryptophan column
 
MDM["R","W"]  # Arg-Trp pairscore
 
MDM["W","R"]  # Trp-Arg pairscore: pairscores are symmetric
 
  
colnames(MDM)  # names of columns
 
rownames(MDM)  # names of rows
 
colnames(MDM)[3]  # third column
 
rownames(MDM)[12]  # twelvth row
 
  
# change the two "*" names to "-" so we can use them to score
+
&nbsp;
# indels of the alignment. This is a bit of a hack, since this
 
# does not reflect the actual indel penalties (which is, as you)
 
# remember from your lectures, calculated as a gap opening
 
# + gap extension penalty; it can't be calculated in a pairwise
 
# manner) EMBOSS defaults for BLODSUM62 are opening -10 and
 
# extension -0.5 i.e. a gap of size 3 (-11.5) has approximately
 
# the same penalty as a 3-character score of "-" matches (-12)
 
# so a pairscore of -4 is not entirely unreasonable.
 
  
colnames(MDM)[24]
+
===Electrostatics===
rownames(MDM)[24]
 
colnames(MDM)[24] <- "-"
 
rownames(MDM)[24] <- "-"
 
colnames(MDM)[24]
 
rownames(MDM)[24]
 
MDM["Q", "-"]
 
MDM["-", "D"]
 
# so far so good.
 
  
# ================================================
+
{{task|1=
#   Tabulate pairscores for alignment
+
# To visualize the electrostatic potential of the protein, mapped on the surface, first select '''Presets''' &rarr; '''Interactive 2...''' and '''Actions''' &rarr; '''Color''' &rarr; '''cyan''' for a vividly contrasting color.
# ================================================
+
# A simple electrostatic potential calculation just assumes Coulomb charges. A more accurate calculation of full Poisson-Boltzmann potentials is [https://www.cgl.ucsf.edu/chimera/current/docs/UsersGuide/tutorials/surfprop.html also available]. Select '''Tools''' &rarr; '''Electrostatic/Binding Analysis''' &rarr; '''Coulombic Surface Coloring'''.
 +
# Make sure the surface object is selected in the form (it should be selected by default since there is only one surface), keep the default parameters and click '''Apply'''.
 +
# Use '''Actions''' &rarr; '''Surface''' &rarr; '''Transparency''' &rarr; '''30%''' to make the protein backbone somewhat visible.
 +
# Open the '''Tools''' &rarr; '''Viewing Controls''' &rarr; '''Lighting''' window &rarr; and set '''Intensity''' from '''two-point''' to '''ambient'''. This reduces shadowing and reflections on the surface and thus emphasizes the color values - here our focus is not on shape, but on property.
 +
# Use the '''Effects''' tab to turn '''shadows''' off and '''depth-cueing''' and '''silhouettes''' on. This recreates visual cues of depth which compensate for the loss of shape information by using a flat lighting model.
  
 +
{{stereo|1BM8_coulomb_stereo.jpg|'''Coulomb (electrostatic) potential''' mapped to the solvent accessible surface of the yeast transcription factor Mbp1 DNA binding domain (1BM8). The protein backbone is visible through the transparent surface as a cartoon model, note the helix at the bottom of the structure. This helix has been suggested to play a role in forming the domain's DNA binding site and the positive (blue) electrostatic potential of the region is consistent with binding the negatively charged phosphate backbone of DNA. The other side of the domain has a negative (red) charge excess, which balances the molecule's electric charge overall, but also guides the protein-ligand interaction and supports faster on-rates.
 +
}}
  
# It is trivial to create a pairscore vector along the
 
# length of the aligned sequences.
 
  
PS <- vector()
+
}}
for (i in 1:lSeq) {
 
  aa1 <- seq1[i]
 
  aa2 <- seq2[i]
 
  PS[i] = MDM[aa1, aa2]
 
}
 
  
PS
 
  
  
# ================================================
+
&nbsp;
#    Calculate moving averages
 
# ================================================
 
  
# In order to evaluate the alignment, we will calculate a
+
===Hydrogen bonds===
# sliding window average over the pairscores. Somewhat surprisingly
 
# R doesn't (yet) have a native function for moving averages: options
 
# that are quoted are:
 
#  - rollmean() in the "zoo" package http://rss.acs.unt.edu/Rdoc/library/zoo/html/rollmean.html
 
#  - MovingAverages() in "TTR"  http://rss.acs.unt.edu/Rdoc/library/TTR/html/MovingAverages.html
 
#  - ma() in "forecast"  http://robjhyndman.com/software/forecast/
 
# But since this is easy to code, we shall implement it ourselves.
 
  
PSma <- vector()          # will hold the averages
+
{{task|1=
winS <- floor(window/2)   # span of elements above/below the centre
+
# Hydrogen bonds encode the basic folding patterns of the protein. To visualize H-bonds select '''Presets''' &rarr; '''Publication 1...''' and '''Actions''' &rarr; '''Color''' &rarr; '''by element'''.
winC <- winS+1            # centre of the window
+
# Use '''Tools''' &rarr; '''Structure Analysis''' &rarr; '''FindHBond''' and '''Apply''' default parameters.
 +
# To emphasize the role of H-bonds in determining the architecture of the protein, select '''Select''' &rarr; '''Structure''' &rarr; '''backbone''' &rarr; '''full''' and  then '''Select''' &rarr; '''Invert (all models)'''. Now  '''Actions''' &rarr; '''Atoms/bonds''' &rarr; '''hide''' will show only the backbone with its H-bonds.
  
# extend the vector PS with zeros (virtual observations) above and below
 
PS <- c(rep(0, winS), PS , rep(0, winS))
 
  
# initialize the window score for the first position
 
winScore <- sum(PS[1:window])
 
  
# write the first score to PSma
+
{{stereo|1BM8_hbond_stereo.jpg|'''Hydrogen bonds''' shown for the peptide backbone of the yeast transcription factor Mbp1 DNA binding domain (1BM8). This view emphasizes the interactions of secondary structure elements that govern the folding topology of the domain.
PSma[1] <- winScore
+
}}
  
# Slide the window along the sequence, and recalculate sum()
 
# Loop from the next position, to the last position that does not exceed the vector...
 
for (i in (winC + 1):(lSeq + winS)) {
 
  # subtract the value that has just dropped out of the window
 
  winScore <- winScore - PS[(i-winS-1)]
 
  # add the value that has just entered the window
 
  winScore <- winScore + PS[(i+winS)] 
 
  # put score into PSma
 
  PSma[i-winS] <- winScore
 
}
 
  
# convert the sums to averages
+
}}
PSma <- PSma / window
 
  
# have a quick look at the score distributions
 
  
boxplot(PSma)
 
hist(PSma)
 
  
# ================================================
+
==Chimera sequence interface==
#    Plot the alignment scores
 
# ================================================
 
  
# normalize the scores
+
In this task we will explore the sequence interface of Chimera, use it to select specific parts of a molecule, and colour specific regions (or residues) of a molecule separately.
PSma <- (PSma-min(PSma))/(max(PSma) - min(PSma) + 0.0001)
 
# spread the normalized values to a desired range, n
 
nCol <- 10
 
PSma <- floor(PSma * nCol) + 1
 
  
# Assign a colorspectrum to a vector (with a bit of colormagic,
+
&nbsp;
# don't worry about that for now). Dark colors are poor scores,
+
{{task|1=
# "hot" colors are high scores
+
# Display the protein in '''Presets''' &rarr; '''Interactive&nbsp;1''' mode and familiarize yourself with its topology of helices and strands.
spect <- colorRampPalette(c("black", "red", "yellow", "white"), bias=0.4)(nCol)
+
# Now turn hydrogen bonds off: the menu commands of Chimera all have a command line equivalent. Open the command line by clicking on the "computer" icon in the upper left corner of the viewer window. Then type "~hbonds". The "~" undoes previous commands.
 +
# Use '''Tools''' &rarr; '''Depiction ''' &rarr; '''Rainbow''' to color the chain from blue to red. (You need to change the colour patches by clicking on them to open the colour editor. Choose an HSL colour model, use Saturation and Lightness 0.5 to keep the colour to somewhat subdued hues, then use the slider to choose appropriate hue values.) Click '''Apply'''.
 +
# Open the sequence tool: '''Tools''' &rarr; '''Sequence''' &rarr; '''Sequence'''. By default, coloured rectangles overlay the secondary structure elements of the sequence.
 +
# Hover the mouse over some residues and note that the sequence number and chain is shown at the bottom of the window.
 +
# Click/drag one residue to select it. <small>(Simply a click wont work, you need to drag a little bit for the selection to catch on.)</small> Note that the residue gets a green overlay in the sequence window, and it also gets selected with a green border in the graphics window.
 +
# In the bottom of the sequence window, there are instructions how to select (multiple) regions. Clear the selection by &lt;control&gt; clicking into an empty spot of the viewer. Now select the region that encompasses the residues that have been reported to form the DNA binding subdomain: <code>KRTRILEKEVLKETHEKVQGGFGKYQ</code> (Taylor 2000). Show the side chains of these residues by clicking on the little green inspector icon on the viewer window, inspecting '''Atom''' and choosing '''displayed: true''', and inspecting '''Bond''' and setting the stick radius to 0.4.
 +
# Undisplay the Hydrogen atoms by selecting the element H in the Chemistry option of the Selection Menu, and use the Action menu to '''hide''' them. Then use the effects pane of the Depiction menu to add a contour.
 +
# Finally, give the scene a gradient grey background grey via the '''Actions''' &rarr; '''Color''' &rarr; '''all options...''' menu.
  
# Color is an often abused aspect of plotting. One can use color to label
 
# *quantities* or *qualities*. For the most part, our pairscores measure amino
 
# acid similarity. That is a quantity and with the spectrum that we just defined
 
# we associte the measured quantities with the color of a glowing piece
 
# of metal: we start with black #000000, then first we ramp up the red
 
# (i.e. low-energy) part of the visible spectrum to red #FF0000, then we
 
# add and ramp up the green spectrum giving us yellow #FFFF00 and finally we
 
# add blue, giving us white #FFFFFF. Let's have a look at the spectrum:
 
  
s <- rep(1, nCol)
+
{{stereo|1BM8_DNAbindingRegion_stereo.jpg|'''The DNA binding region of Mbp1''' according to NMR measurements of DNA contact by Taylor ''et al. (2000). The backbone of 1BM8 is shown with a colour ramp from blue (N-terminus) to red (C-terminus). The side chains of the region 50-74 are shown colored by element.
barplot(s, col=spect, axes=F, main="Color spectrum")
 
  
# But one aspect of our data is not quantitatively different: indels.
+
}}
# We valued indels with pairscores of -4. But indels are not simply poor alignment,
 
# rather they are non-alignment. This means stretches of -4 values are really
 
# *qualitatively* different. Let's color them differently by changing the lowest
 
# level of the spectrum to grey.
 
  
spect[1] <- "#CCCCCC"
+
}}
barplot(s, col=spect, axes=F, main="Color spectrum")
 
  
# Now we can display our alignment score vector with colored rectangles.
 
  
# Convert the integers in PSma to color values from spect
+
&nbsp;
PScol <- vector()
 
for (i in 1:length(PSma)) {
 
PScol[i] <- spect[ PSma[i] ]  # this is how a value from PSma is used as an index of spect
 
}
 
  
# Plot the scores. The code is similar to the last assignment.
+
== Compute with structures ==
# Create an empty plot window of appropriate size
 
plot(1,1, xlim=c(-100, lSeq), ylim=c(0, 2) , type="n", yaxt="n", bty="n", xlab="position in alignment", ylab="")
 
  
# Add a label to the left
+
{{Vspace}}
text (-30, 1, adj=1, labels=c(paste("Mbp1:\n", code1, "\nvs.\n", code2)), cex=0.9 )
 
  
# Loop over the vector and draw boxes  without border, filled with color.
+
To practice actual computations with structures we'll use the Grant lab's bio3d package in '''R'''.
for (i in 1:lSeq) {
 
  rect(i, 0.9, i+1, 1.1, border=NA, col=PScol[i])
 
}
 
  
# Note that the numbers along the X-axis are not sequence numbers, but numbers
+
{{task|1 =
# of the alignment, i.e. sequence number + indel length. That is important to
 
# realize: if you would like to add the annotations from the last assignment
 
# which I will leave as an exercise, you need to map your sequence numbering
 
# into alignment numbering. Let me know in case you try that but need some help.
 
  
 +
* Open an RStudio session, and load the BCH441 project.
 +
* Bring code and data resources up to date:
 +
** '''pull''' the most recent version of the project from GitHub
 +
** type <code>init()</code> to load the most recent files and functions.
 +
* Study and work through the code in the <code>BCH441_A05.R</code> script.
 +
* There are a number of questions in the code, it would be good if you don't gloss over them but try to answer them for yourself. Especially the questions about the final histogram: without interpretation, without learning something interesting about biology from the plot, all this is just Cargo Cult.
  
</source>
 
 
}}
 
}}
  
 +
{{Vspace}}
  
;That is all.
+
;That is all;
 
 
  
&nbsp;
+
{{Vspace}}
  
 
== Links and resources ==
 
== Links and resources ==
  
<!-- {{#pmid: 19957275}} -->
 
<!-- {{WWW|WWW_GMOD}} -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
  
 +
* [[UCSF Chimera|'''Chimera page''']]
 +
* [https://www.cgl.ucsf.edu/chimera/current/docs/UsersGuide/framecontrib.html Chimera Tools Index]
 +
* [[Stereo Vision|'''Stereo vision tutorial''']]
  
== Notes and references  ==
+
*[http://www.rcsb.org/pdb/static.do?p=software/software_links/molecular_graphics.html Molecular Graphics Software Links]&ndash; a collection of links at the PDB.
  
  
<references />
+
{{#pmid: 10747782}}
  
 +
<!-- {{WWW|WWW_GMOD}} -->
 +
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
  
 
&nbsp;
 
&nbsp;
Line 429: Line 287:
 
&nbsp;
 
&nbsp;
 
{{#lst:BIO_Assignment_Week_1|assignment_footer}}
 
{{#lst:BIO_Assignment_Week_1|assignment_footer}}
 +
 +
<table style="width:100%;"><tr>
 +
<td style="height:30px; vertical-align:middle; text-align:left; font-size:80%;">[[BIO_Assignment_Week_4|&lt;&nbsp;Assignment&nbsp;4]]</td>
 +
<td style="height:30px; vertical-align:middle; text-align:right; font-size:80%;">[[BIO_Assignment_Week_6|Assignment&nbsp;6&nbsp;&gt;]]</td>
 +
</tr></table>
  
  

Latest revision as of 05:54, 4 December 2016

Assignment for Week 5
Structure Analysis

< Assignment 4 Assignment 6 >

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

 
 
Concepts and activities (and reading, if applicable) for this assignment will be topics on the upcoming quiz.


 


 

How could the search for ultimate truth have revealed so hideous and visceral-looking an object?
Max Perutz  (on his first glimpse of the Hemoglobin structure)


 

Introduction

Where is the hidden beauty in structure, and where, the "ultimate truth"? In the previous assignments we have discovered homologues of APSES domain containing proteins in all fungal species. This makes the domain an ancient protein family that had already duplicated to several paralogues at the time when the cenancestor of all fungi lived, more than 600,000,000 years ago, in the Vendian period of the Proterozoic era of Precambrian times.

In this assignment we will explore its molecular structure.


 

Molecular graphics: UCSF Chimera

To view molecular structures, we need a tool to visualize the three dimensional relationships of atoms. A molecular viewer is a program that takes 3D structure data and allows you to display and explore it. For a number of reasons, I use the UCSF Chimera viewer for this course:

  1. Chimera is free and open;
  2. It creates very appealing graphics;
  3. It is under ongoing development and is well maintained;
  4. It provides an array of useful utilities for structure analysis; and,
  5. besides an intuitive, menu driven interface, Chimera can be scripted via its command line, or even programmed via its in-built python interpreter.


Task:

  1. Access the Chimera homepage and navigate to the Download section.
  2. Find the the newest version for your platform in the table and click on the file to download it.
  3. Follow the instructions to install Chimera.


Let's explore Chimera functions first with a simple small molecule:


 

Modeling small molecules

"Small" molecules are solvent, ligands, substrates, products, prosthetic groups, drugs - in short, essentially everything that is not made by DNA-, RNA-polymerases or the ribosome. Whereas the biopolymers are still front and centre in our quest to understand molecular biology, small molecules are crucial for our quest to interact with the inventory of the cell, create useful products, or advance medicine.

A number of public repositories make small-molecule information available, such as PubChem at the NCBI, the ligand collection at the PDB, the ChEBI database at the European Bioinformatics Institute, the Canadian DrugBank, or the NCI database browser at the US National Cancer Institute. One general way to export topology information from these services is to use SMILES strings—a shorthand notation for the composition and topology of chemical compounds.


Task:

  1. Access PubChem.
  2. Enter "caffeine" as a search term in the Compound tab. A number of matches to this keyword search are returned.
  3. Click on the top hit - 1,3,7-Trimethylxanthine, the Caffeine molecule. Note that the page contains among other items:
    1. A 2D structural sketch;
    2. An idealized 3D structural conformer, for which you can download coordinates in several formats;
    3. The IUPAC name: 1,3,7-trimethylpurine-2,6-dione;
    4. The CAS identifier 58-08-2 which is a unique identifier and can be used as a cross-reference ID;
    5. The SMILES strings CN1C=NC2=C1C(=O)N(C(=O)N2C)C;
    6. ... and much more.


That's great, but let's sketch our own version of caffeine. Several versions of Peter Ertl's Java Molecular Editor (JME) are offered online, PubChem offers this functionality via its Sketcher tool.

Task:

  1. Return to the PubChem homepage.
  2. Follow the link to Structure search (in the right hand menu).
  3. Click on the 3D conformer tab and on the Launch button to launch the molecular editor in its own window.
  4. Sketch the structure of caffeine. I find the editor quite intuitive but clicking on the Help button will give you a quick, structured overview. Make sure you define your double-bonds correctly.
  5. Export the SMILES string of your compound to your project folder.


Translating SMILES to structure

Chimera can translate SMILES strings to coordinates[1].

Task:

  1. Open Chimera.
  2. Select ToolsStructure EditingBuild Structure.
  3. In the Build Structure window, select the SMILES string button, paste the string from your file, and click Apply.
  4. The caffeine molecule will be generated and visualized in the graphics window. This is a "stick" representation.
  5. You can rotate it with your mouse, <command> drag to scale, <shift> drag to translate.
  6. Use the ActionsAtoms/Bondsball & stick or sphere menu items to change appearance.
  7. Use the ActionsColorby element menu to change colors.
  8. Change the display back to stick and use ActionsSurfaceshow to add a solvent accessible surface. Choosing this command triggers the calculation of the surface, which is then available as an individually selectable object. However, with default parameters the surface appears a bit rough for this small molecule.
  9. Change the parameters of this solvent accessible surface:
    1. Select the surface with <control><click> (<control><left mouse button> on windows). A green contour line appears around selected items – it surrounds the surface in this case.
    2. Open the selection inspector by clicking on the tiny green icon in the lower-right corner of the window (It has a magnifying glass symbol which means "inspect" for Chimera, not "search").
    3. Select Inspect ...MSMS surface and change the Vertex density value to 50.0 - hit return.
  10. By default, the surface inherits the colour of the atoms it envelopes. To change the colour of the surface, use the ActionsColorall options menu. Click the surfaces button to indicate that the color choice should be applied to the surface object (note what else you can apply color to...), then choose cornflower blue.
  11. Use the ActionsSurfacetransparency50% menu to see atoms and bonds that are covered by the surface.
  12. To begin working with molecules in "true" 3D, choose ToolsViewing ControlsCamera and select camera modewall-eye stereo. Also, use the Effects tab of the Viewing window, and check shadows off.
  13. Your structure should look about like what you see below. Save your session with the FileSave Session dialogue so you can easily recreate the scene.


Caffeine stereo.jpg

Wall-eye stereo view of the caffeine structure, surrounded by a transparent molecular surface. The image for the left eye is on the left side. For instructions on stereo-viewing, see the next section.



 

Stereo vision

A simple molecular scene like the caffeine molecule is a great way to practice viewing structures in stereo. This is a learnable skill, but it takes practice.

Task:

Access the Stereo Vision tutorial and practice viewing molecular structures in stereo.

Practice at least ...

  • two times daily,
  • for 3-5 minutes each session,

Keep up your practice throughout the course. It is a wonderful skill that will greatly support your understanding of structural molecular biology. Practice with different molecules and try out different colours and renderings.

Note: do not go through your practice sessions mechanically. If you are not making any progress with stereo vision, contact me so I can help you on the right track.


 

Global properties

In this series of tasks we will showcase some of the globally applied tools that help us study molecular structure.


 

A Ramachandran plot

Task:

  1. To reset all views and selections, choose FavoritesModel Panel. Select the 1BM8 model and click the close button to remove it.
  2. In the graphics window, click on the "lightning bolt" icon at the bottom. You should see a button labelled 1BM8 on the right. This is where you will find recent structures. Click 1BM8 to re-load it.
  3. Choose PresetsInteractive 2 (all atoms) for a detailed view.
  4. Choose FavoritesModel Panel
  5. Look for the Option Ramachandran plot... in the choices on the right.
  6. Click the button and study the result. The dots in thisRamachandran Plot represent the phi-psi angle combinations for residue backbones. We see that they are well distributed, this is a high-resolution structure essentially without outliers. Clicking on a dot selects a residue in the structure viewer (selected residues have a green contour).
  7. Choose FileFetch by ID and fetch 1L3G, an NMR structure of the Mbp1 APSES domain. Chimera loads the 19 models that comprise this structure dataset.
  8. In the FavoritesModel Panel, select 1BM8 and click on hide.
  9. Then select 1LG3 and click group/ungroup to be able to address the models individually. Select any of the models individually and click again on Ramachandran plot. You will see that the points are much more dispersed, and there are a number of outliers that have comparatively high-energy conformations.


 

B-factors

Task:

  1. Choose FavoritesModel Panel, click/drag over the 1LG3 models and click close to remove them again.
  2. To explore B-Factors in the 1BM8 model, click show to view it again.
  3. Choose ToolsStructure AnalysisRender byAttribute.
  4. Select Attributes of atoms, Model 1BM8 and Attribute: bfactor. A histogram appears with sliders that allow you to render the distribution of values found in the structure for this attribute.
  5. Let's colour the atoms by B-Factor. Click on the colours tab. A standard colouring scheme is blue - white - red, but you can move the sliders, add new thresholds, and colour them individually by clicking on the colour patch to create your own colour spectrum, e.g. from black via red to white, in a black-body spectrum. Click Apply.
  6. Choose ActionsAtoms/Bondsstick to give the bonds more volume. You will find that the core of the protein has low temperature factors, and the surface has a number of highly mobile sidechains and loops.

1BM8 thermal stereo.jpg

Structure of the yeast transcription factor Mbp1 DNA binding domain (1BM8) coloured by B-factor (thermal factor). The protein bonds are shown in a "stick" model, coloured with a spectrum that emulates black-body radiation. Note that the interior of the protein is less mobile, some of the surface loops are highly mobile (or statically disordered, X-ray structures can't distinguish that) and the discretely bound water molecules that are visible in this high-resolution structure are generally more mobile than the residues they bind to.


 

Electrostatics

Task:

  1. To visualize the electrostatic potential of the protein, mapped on the surface, first select PresetsInteractive 2... and ActionsColorcyan for a vividly contrasting color.
  2. A simple electrostatic potential calculation just assumes Coulomb charges. A more accurate calculation of full Poisson-Boltzmann potentials is also available. Select ToolsElectrostatic/Binding AnalysisCoulombic Surface Coloring.
  3. Make sure the surface object is selected in the form (it should be selected by default since there is only one surface), keep the default parameters and click Apply.
  4. Use ActionsSurfaceTransparency30% to make the protein backbone somewhat visible.
  5. Open the ToolsViewing ControlsLighting window → and set Intensity from two-point to ambient. This reduces shadowing and reflections on the surface and thus emphasizes the color values - here our focus is not on shape, but on property.
  6. Use the Effects tab to turn shadows off and depth-cueing and silhouettes on. This recreates visual cues of depth which compensate for the loss of shape information by using a flat lighting model.

1BM8 coulomb stereo.jpg

Coulomb (electrostatic) potential mapped to the solvent accessible surface of the yeast transcription factor Mbp1 DNA binding domain (1BM8). The protein backbone is visible through the transparent surface as a cartoon model, note the helix at the bottom of the structure. This helix has been suggested to play a role in forming the domain's DNA binding site and the positive (blue) electrostatic potential of the region is consistent with binding the negatively charged phosphate backbone of DNA. The other side of the domain has a negative (red) charge excess, which balances the molecule's electric charge overall, but also guides the protein-ligand interaction and supports faster on-rates.


 

Hydrogen bonds

Task:

  1. Hydrogen bonds encode the basic folding patterns of the protein. To visualize H-bonds select PresetsPublication 1... and ActionsColorby element.
  2. Use ToolsStructure AnalysisFindHBond and Apply default parameters.
  3. To emphasize the role of H-bonds in determining the architecture of the protein, select SelectStructurebackbonefull and then SelectInvert (all models). Now ActionsAtoms/bondshide will show only the backbone with its H-bonds.


1BM8 hbond stereo.jpg

Hydrogen bonds shown for the peptide backbone of the yeast transcription factor Mbp1 DNA binding domain (1BM8). This view emphasizes the interactions of secondary structure elements that govern the folding topology of the domain.


Chimera sequence interface

In this task we will explore the sequence interface of Chimera, use it to select specific parts of a molecule, and colour specific regions (or residues) of a molecule separately.

 

Task:

  1. Display the protein in PresetsInteractive 1 mode and familiarize yourself with its topology of helices and strands.
  2. Now turn hydrogen bonds off: the menu commands of Chimera all have a command line equivalent. Open the command line by clicking on the "computer" icon in the upper left corner of the viewer window. Then type "~hbonds". The "~" undoes previous commands.
  3. Use ToolsDepiction Rainbow to color the chain from blue to red. (You need to change the colour patches by clicking on them to open the colour editor. Choose an HSL colour model, use Saturation and Lightness 0.5 to keep the colour to somewhat subdued hues, then use the slider to choose appropriate hue values.) Click Apply.
  4. Open the sequence tool: ToolsSequenceSequence. By default, coloured rectangles overlay the secondary structure elements of the sequence.
  5. Hover the mouse over some residues and note that the sequence number and chain is shown at the bottom of the window.
  6. Click/drag one residue to select it. (Simply a click wont work, you need to drag a little bit for the selection to catch on.) Note that the residue gets a green overlay in the sequence window, and it also gets selected with a green border in the graphics window.
  7. In the bottom of the sequence window, there are instructions how to select (multiple) regions. Clear the selection by <control> clicking into an empty spot of the viewer. Now select the region that encompasses the residues that have been reported to form the DNA binding subdomain: KRTRILEKEVLKETHEKVQGGFGKYQ (Taylor 2000). Show the side chains of these residues by clicking on the little green inspector icon on the viewer window, inspecting Atom and choosing displayed: true, and inspecting Bond and setting the stick radius to 0.4.
  8. Undisplay the Hydrogen atoms by selecting the element H in the Chemistry option of the Selection Menu, and use the Action menu to hide them. Then use the effects pane of the Depiction menu to add a contour.
  9. Finally, give the scene a gradient grey background grey via the ActionsColorall options... menu.


1BM8 DNAbindingRegion stereo.jpg

The DNA binding region of Mbp1 according to NMR measurements of DNA contact by Taylor et al. (2000). The backbone of 1BM8 is shown with a colour ramp from blue (N-terminus) to red (C-terminus). The side chains of the region 50-74 are shown colored by element.



 

Compute with structures

 

To practice actual computations with structures we'll use the Grant lab's bio3d package in R.

Task:

  • Open an RStudio session, and load the BCH441 project.
  • Bring code and data resources up to date:
    • pull the most recent version of the project from GitHub
    • type init() to load the most recent files and functions.
  • Study and work through the code in the BCH441_A05.R script.
  • There are a number of questions in the code, it would be good if you don't gloss over them but try to answer them for yourself. Especially the questions about the final histogram: without interpretation, without learning something interesting about biology from the plot, all this is just Cargo Cult.


 
That is all;


 

Links and resources


Taylor et al. (2000) Characterization of the DNA-binding domains from the yeast cell-cycle transcription factors Mbp1 and Swi4. Biochemistry 39:3943-54. (pmid: 10747782)

PubMed ] [ DOI ] The minimal DNA-binding domains of the Saccharomyces cerevisiae transcription factors Mbp1 and Swi4 have been identified and their DNA binding properties have been investigated by a combination of methods. An approximately 100 residue region of sequence homology at the N-termini of Mbp1 and Swi4 is necessary but not sufficient for full DNA binding activity. Unexpectedly, nonconserved residues C-terminal to the core domain are essential for DNA binding. Proteolysis of Mbp1 and Swi4 DNA-protein complexes has revealed the extent of these sequences, and C-terminally extended molecules with substantially enhanced DNA binding activity compared to the core domains alone have been produced. The extended Mbp1 and Swi4 proteins bind to their cognate sites with similar affinity [K(A) approximately (1-4) x 10(6) M(-)(1)] and with a 1:1 stoichiometry. However, alanine substitution of two lysine residues (116 and 122) within the C-terminal extension (tail) of Mbp1 considerably reduces the apparent affinity for an MCB (MluI cell-cycle box) containing oligonucleotide. Both Mbp1 and Swi4 are specific for their cognate sites with respect to nonspecific DNA but exhibit similar affinities for the SCB (Swi4/Swi6 cell-cycle box) and MCB consensus elements. Circular dichroism and (1)H NMR spectroscopy reveal that complex formation results in substantial perturbations of base stacking interactions upon DNA binding. These are localized to a central 5'-d(C-A/G-CG)-3' region common to both MCB and SCB sequences consistent with the observed pattern of specificity. Changes in the backbone amide proton and nitrogen chemical shifts upon DNA binding have enabled us to experimentally define a DNA-binding surface on the core N-terminal domain of Mbp1 that is associated with a putative winged helix-turn-helix motif. Furthermore, significant chemical shift differences occur within the C-terminal tail of Mbp1, supporting the notion of two structurally distinct DNA-binding regions within these proteins.


 

 


Footnotes and references

  1. There are several online servers that translate SMILES strings to idealized structures, see e.g. the online SMILES translation service at the NCI.


 

Ask, if things don't work for you!

If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.



< Assignment 4 Assignment 6 >