Difference between revisions of "BIO Assignment Week 5"

From "A B C"
Jump to navigation Jump to search
m
m
Line 2: Line 2:
 
<div class="b1">
 
<div class="b1">
 
Assignment for Week 5<br />
 
Assignment for Week 5<br />
<span style="font-size: 70%">Sequence alignment </span>
+
<span style="font-size: 70%">Structure Analysis</span>
 
</div>
 
</div>
 
<table style="width:100%;"><tr>
 
<table style="width:100%;"><tr>
Line 14: Line 14:
  
  
 +
 +
 +
 +
 +
__TOC__
  
 
----
 
----
Line 181: Line 186:
  
  
 
 
__TOC__
 
 
 
&nbsp;
 
 
==Introduction==
 
 
In this assignment we will perform an optimal global and local sequence alignment, and use '''R''' to plot the alignment quality as a colored bar-graph.
 
 
 
=== Optimal sequence alignments ===
 
 
 
Online programs for optimal sequence alignment are part of the EMBOSS tools. The programs take FASTA files as input.
 
 
;Local optimal SEQUENCE alignment "water"
 
{{task|1=
 
# Retrieve the FASTA file for the YFO Mbp1 protein and for [http://www.ncbi.nlm.nih.gov/protein/NP_010227?report=fasta&log$=seqview&format=text ''Saccharomyces cerevisiae''].
 
# Save the files as text files to your computer, (if you haven't done so already). You could give them an extension of <code>.fa</code>.
 
# Access the [http://emboss.bioinformatics.nl/ EMBOSS Explorer site] (if you haven't done so yet, you might want to bookmark it.)
 
# Look for '''ALIGNMENT LOCAL''', click on '''water''', paste your FASTA sequences and run the program with default parameters.
 
# Study the results. You will probably find that the alignment extends over most of the protein, but does not include the termini.
 
# Considering the sequence identy cutoff we discussed in class (25% over the length of a domain), do you believe that the APSES domains are homologous?
 
# Change the '''Gap opening''' and '''Gap extension''' parameters to high values (e.g. 30 and 5). Then run the alignment again.
 
# Note what is different.
 
# You could try getting only an alignment for the ankyrin domains that you have found in the last assignment, by deleting the approximate region of the APSES domains from your input.
 
}}
 
 
 
;Global optimal SEQUENCE alignment "needle"
 
{{task|1=
 
# Look for '''ALIGNMENT GLOBAL''', click on '''needle''', paste your FASTA sequences and run the program with default parameters.
 
# Study the results. You will find that the alignment extends over the entire protein, likely with long ''indels'' at the termini.
 
# Change the '''Output alignment format''' to '''FASTA pairwise simple''', to retrieve the aligned FASTA files with indels.
 
# Copy the aligned sequences (with indels) and save them to your computer. You could give them an extension of <code>.fal</code> to remind you that they are aligned FASTA sequences.
 
}}
 
 
 
&nbsp;
 
 
== The Mutation Data Matrix ==
 
 
The NCBI makes its alignment matrices available by ftp. They are located at  ftp://ftp.ncbi.nih.gov/blast/matrices - for example here is a link to the [ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62 '''BLOSUM62 matrix''']<ref>The directory also contains sourcecode to generte the PAM matrices. This may be of interest for you if you ever want to produce scoring matrices from your own datasets.</ref>. Access that site and download the <code>BLOSUM62</code> matrix to your computer. You could give it a filename of <code>BLOSUM62.mdm</code>.
 
 
It should look like this.
 
 
<source lang="text">
 
#  Matrix made by matblas from blosum62.iij
 
#  * column uses minimum score
 
#  BLOSUM Clustered Scoring Matrix in 1/2 Bit Units
 
#  Blocks Database = /data/blocks_5.0/blocks.dat
 
#  Cluster Percentage: >= 62
 
#  Entropy =  0.6979, Expected =  -0.5209
 
  A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V  B  Z  X  *
 
A  4 -1 -2 -2  0 -1 -1  0 -2 -1 -1 -1 -1 -2 -1  1  0 -3 -2  0 -2 -1  0 -4
 
R -1  5  0 -2 -3  1  0 -2  0 -3 -2  2 -1 -3 -2 -1 -1 -3 -2 -3 -1  0 -1 -4
 
N -2  0  6  1 -3  0  0  0  1 -3 -3  0 -2 -3 -2  1  0 -4 -2 -3  3  0 -1 -4
 
D -2 -2  1  6 -3  0  2 -1 -1 -3 -4 -1 -3 -3 -1  0 -1 -4 -3 -3  4  1 -1 -4
 
C  0 -3 -3 -3  9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4
 
Q -1  1  0  0 -3  5  2 -2  0 -3 -2  1  0 -3 -1  0 -1 -2 -1 -2  0  3 -1 -4
 
E -1  0  0  2 -4  2  5 -2  0 -3 -3  1 -2 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4
 
G  0 -2  0 -1 -3 -2 -2  6 -2 -4 -4 -2 -3 -3 -2  0 -2 -2 -3 -3 -1 -2 -1 -4
 
H -2  0  1 -1 -3  0  0 -2  8 -3 -3 -1 -2 -1 -2 -1 -2 -2  2 -3  0  0 -1 -4
 
I -1 -3 -3 -3 -1 -3 -3 -4 -3  4  2 -3  1  0 -3 -2 -1 -3 -1  3 -3 -3 -1 -4
 
L -1 -2 -3 -4 -1 -2 -3 -4 -3  2  4 -2  2  0 -3 -2 -1 -2 -1  1 -4 -3 -1 -4
 
K -1  2  0 -1 -3  1  1 -2 -1 -3 -2  5 -1 -3 -1  0 -1 -3 -2 -2  0  1 -1 -4
 
M -1 -1 -2 -3 -1  0 -2 -3 -2  1  2 -1  5  0 -2 -1 -1 -1 -1  1 -3 -1 -1 -4
 
F -2 -3 -3 -3 -2 -3 -3 -3 -1  0  0 -3  0  6 -4 -2 -2  1  3 -1 -3 -3 -1 -4
 
P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4  7 -1 -1 -4 -3 -2 -2 -1 -2 -4
 
S  1 -1  1  0 -1  0  0  0 -1 -2 -2  0 -1 -2 -1  4  1 -3 -2 -2  0  0  0 -4
 
T  0 -1  0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1  1  5 -2 -2  0 -1 -1  0 -4
 
W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1  1 -4 -3 -2 11  2 -3 -4 -3 -2 -4
 
Y -2 -2 -2 -3 -2 -1 -2 -3  2 -1 -1 -2 -1  3 -3 -2 -2  2  7 -1 -3 -2 -1 -4
 
V  0 -3 -3 -3 -1 -2 -2 -3 -3  3  1 -2  1 -1 -2 -2  0 -3 -1  4 -3 -2 -1 -4
 
B -2 -1  3  4 -3  0  1 -1  0 -3 -4  0 -3 -3 -2  0 -1 -4 -3 -3  4  1 -1 -4
 
Z -1  0  0  1 -3  3  4 -2  0 -3 -3  1 -1 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4
 
X  0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2  0  0 -2 -1 -1 -1 -1 -1 -4
 
* -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4  1
 
</source>
 
 
 
{{task|
 
* Study this and make sure you understand what this table is, how it can be used, and what a reasonable range of values for identities and pairscores for non-identical, similar and dissimilar residues is. Ask on the mailing list in case you have questions.
 
}}
 
 
 
 
&nbsp;
 
== The DNA binding site ==
 
 
 
Now, that you know how YFO Mbp1 aligns with yeast Mbp1, you can evaluate functional conservation in these homologous proteins. You probably already downloaded the two Biochemistry papers by Taylor et al. (2000) and by Deleeuw et al. (2008) that we encountered in Assignment 2. These discuss the residues involved in DNA binding<ref>([http://www.ncbi.nlm.nih.gov/pubmed/10747782 Taylor ''et al.'' (2000) ''Biochemistry'' '''39''': 3943-3954] and [http://www.ncbi.nlm.nih.gov/pubmed/18491920 Deleeuw ''et al.'' (2008) Biochemistry. '''47''':6378-6385])</ref>. In particular the residues between 50-74 have been proposed to comprise the DNA recognition domain.
 
 
{{task|
 
# Using the APSES domain alignment you have just constructed, find the YFO Mbp1 residues that correspond to the range 50-74 in yeast.
 
# Note whether the sequences are especially highly conserved in this region.
 
# Using Chimera, look at the region. Use the sequence window '''to make sure''' that the sequence numbering between the paper and the PDB file are the same (they are often not identical!). Then select the residues - the proposed recognition domain -  and color them differently for emphasis. Study this in stereo to get a sense of the spatial relationships. Check where the conserved residues are.
 
# A good representation is '''stick''' - but other representations that include sidechains will also serve well.
 
# Calculate a solvent accessible surface of the protein in a separate representation and make it transparent.
 
# You could  combine three representations: (1) the backbone (in '''ribbon view'''), (2) the sidechains of residues that presumably contact DNA, distinctly colored, and (3) a transparent surface of the entire protein. This image should show whether residues annotated as DNA binding form a contiguous binding interface.
 
}}
 
 
 
DNA binding interfaces are expected to comprise a number of positively charged amino acids, that might form salt-bridges with the phosphate backbone.
 
 
 
{{task|
 
*Study and consider whether this is the case here and which residues might be included.
 
}}
 
 
 
&nbsp;
 
== R code: coloring the alignment by quality ==
 
 
 
 
{{task|1=
 
 
* Study this code carefully, execute it, section by section and make sure you understand all of it. Ask on the list if anything is not clear.
 
 
<source lang="R">
 
# BiostringsExample.R
 
# Short tutorial on sequence alignment with the Biostrings package.
 
# Boris Steipe for BCH441, 2013 - 2014
 
#
 
setwd("~/path/to/your/R_files/")
 
setwd("~/Documents/07.TEACHING/37-BCH441 Bioinformatics 2014/05-Materials/Assignment_5 data")
 
 
# Biostrings is a package within the bioconductor project.
 
# bioconducter packages have their own installation system,
 
# they are normally not installed via CRAN.
 
 
# First, you load the BioConductor installer...
 
source("http://bioconductor.org/biocLite.R")
 
 
# Then you can install the Biostrings package and all of its dependencies.
 
biocLite("Biostrings")
 
 
# ... and load the library.
 
library(Biostrings)
 
 
# Some basic (technical) information is available ...
 
library(help=Biostrings)
 
 
# ... but for more in depth documentation, use the
 
# so called "vignettes" that are provided with every R package.
 
browseVignettes("Biostrings")
 
 
# In this code, we mostly use functions that are discussed in the
 
# pairwise alignement vignette.
 

# Read in two fasta files - you will need to edit this for YFO
 
sacce <- readAAStringSet("mbp1-sacce.fa", format="fasta")
 
 
# "USTMA" is used only as an example here - modify for YFO  :-)
 
ustma <- readAAStringSet("mbp1-ustma.fa", format="fasta")
 
 
sacce
 
names(sacce)
 
names(sacce) <- "Mbp1 SACCE"
 
names(ustma) <- "Mbp1 USTMA" # Example only ... modify for YFO
 
 
width(sacce)
 
as.character(sacce)
 
 
# Biostrings takes a sophisticated approach to sequence alignment ...
 
?pairwiseAlignment
 
 
# ... but the use in practice is quite simple:
 
ali <- pairwiseAlignment(sacce, ustma, substitutionMatrix = "BLOSUM50")
 
ali
 
 
pattern(ali)
 
subject(ali)
 
 
writePairwiseAlignments(ali)
 
 
p <- aligned(pattern(ali))
 
names(p) <- "Mbp1 SACCE aligned"
 
s <- aligned(subject(ali))
 
names(s) <- "Mbp1 USTMA aligned"
 
 
# don't overwrite your EMBOSS .fal files
 
writeXStringSet(p, "mbp1-sacce.R.fal", append=FALSE, format="fasta")
 
writeXStringSet(s, "mbp1-ustma.R.fal", append=FALSE, format="fasta")
 
 
# Done.
 
 
</source>
 
 
* Compare the alignments you received from the EMBOSS server, and that you co puted using '''R'''. Are they aproximately the same? Exactly? You did use different matrices and gap aameters, so minor differences are to be expected. But by and large you should get the same alignments.
 
 
}}
 
 
We will now use the aligned sequences to compute a graphical display of alignment quality.
 
 
 
{{task|1=
 
 
* Study this code carefully, execute it, section by section and make sure you understand all of it. Ask on the list if anything is not clear.
 
 
<source lang="R">
 
# aliScore.R
 
# Evaluating an alignment with a sliding window score
 
# Boris Steipe, October 2012. Update October 2013
 
setwd("~/path/to/your/R_files/")
 
 
# Scoring matrices can be found at the NCBI.
 
# ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62
 
 
# It is good practice to set variables you might want to change
 
# in a header block so you don't need to hunt all over the code
 
# for strings you need to update.
 
#
 
fa1      <- "mbp1-sacce.R.fal"
 
fa2      <- "mbp1-ustma.R.fal"
 
code1    <- "SACCE"
 
code2    <- "USTMA"
 
mdmFile  <- "BLOSUM62.mdm"
 
window  <- 9  # window-size (should be an odd integer)
 
 
# ================================================
 
#    Read data files
 
# ================================================
 
 
# read fasta datafiles using seqinr function read.fasta()
 
install.packages("seqinr")
 
library(seqinr)
 
tmp  <- unlist(read.fasta(fa1, seqtype="AA", as.string=FALSE, seqonly=TRUE))
 
seq1 <- unlist(strsplit(as.character(tmp), split=""))
 
 
tmp  <- unlist(read.fasta(fa2, seqtype="AA", as.string=FALSE, seqonly=TRUE))
 
seq2 <- unlist(strsplit(as.character(tmp), split=""))
 
 
if (length(seq1) != length(seq2)) {
 
print("Error: Sequences have unequal length!")
 
}
 
 
lSeq <- length(seq1)
 
 
# ================================================
 
#    Read scoring matrix
 
# ================================================
 
 
MDM <- read.table(mdmFile, skip=6)
 
 
# This is a dataframe. Study how it can be accessed:
 
 
MDM
 
MDM[1,]
 
MDM[,1]
 
MDM[5,5]  # Cys-Cys
 
MDM[20,20] # Val-Val
 
MDM[,"W"]  # the tryptophan column
 
MDM["R","W"]  # Arg-Trp pairscore
 
MDM["W","R"]  # Trp-Arg pairscore: pairscores are symmetric
 
 
colnames(MDM)  # names of columns
 
rownames(MDM)  # names of rows
 
colnames(MDM)[3]  # third column
 
rownames(MDM)[12]  # twelfth row
 
 
# change the two "*" names to "-" so we can use them to score
 
# indels of the alignment. This is a bit of a hack, since this
 
# does not reflect the actual indel penalties (which is, as you)
 
# remember from your lectures, calculated as a gap opening
 
# + gap extension penalty; it can't be calculated in a pairwise
 
# manner) EMBOSS defaults for BLODSUM62 are opening -10 and
 
# extension -0.5 i.e. a gap of size 3 (-11.5) has approximately
 
# the same penalty as a 3-character score of "-" matches (-12)
 
# so a pairscore of -4 is not entirely unreasonable.
 
 
colnames(MDM)[24]
 
rownames(MDM)[24]
 
colnames(MDM)[24] <- "-"
 
rownames(MDM)[24] <- "-"
 
colnames(MDM)[24]
 
rownames(MDM)[24]
 
MDM["Q", "-"]
 
MDM["-", "D"]
 
# so far so good.
 
 
# ================================================
 
#    Tabulate pairscores for alignment
 
# ================================================
 
 
 
# It is trivial to create a pairscore vector along the
 
# length of the aligned sequences.
 
 
PS <- vector()
 
for (i in 1:lSeq) {
 
  aa1 <- seq1[i]
 
  aa2 <- seq2[i]
 
  PS[i] = MDM[aa1, aa2]
 
}
 
 
PS
 
 
 
# The same vector could be created - albeit perhaps not so
 
# easy to understand - with the expression ...
 
MDM[cbind(seq1,seq2)]
 
 
 
 
# ================================================
 
#    Calculate moving averages
 
# ================================================
 
 
# In order to evaluate the alignment, we will calculate a
 
# sliding window average over the pairscores. Somewhat surprisingly
 
# R doesn't (yet) have a native function for moving averages: options
 
# that are quoted are:
 
#  - rollmean() in the "zoo" package http://rss.acs.unt.edu/Rdoc/library/zoo/html/rollmean.html
 
#  - MovingAverages() in "TTR"  http://rss.acs.unt.edu/Rdoc/library/TTR/html/MovingAverages.html
 
#  - ma() in "forecast"  http://robjhyndman.com/software/forecast/
 
# But since this is easy to code, we shall implement it ourselves.
 
 
PSma <- vector()          # will hold the averages
 
winS <- floor(window/2)    # span of elements above/below the centre
 
winC <- winS+1            # centre of the window
 
 
# extend the vector PS with zeros (virtual observations) above and below
 
PS <- c(rep(0, winS), PS , rep(0, winS))
 
 
# initialize the window score for the first position
 
winScore <- sum(PS[1:window])
 
 
# write the first score to PSma
 
PSma[1] <- winScore
 
 
# Slide the window along the sequence, and recalculate sum()
 
# Loop from the next position, to the last position that does not exceed the vector...
 
for (i in (winC + 1):(lSeq + winS)) {
 
  # subtract the value that has just dropped out of the window
 
  winScore <- winScore - PS[(i-winS-1)]
 
  # add the value that has just entered the window
 
  winScore <- winScore + PS[(i+winS)] 
 
  # put score into PSma
 
  PSma[i-winS] <- winScore
 
}
 
 
# convert the sums to averages
 
PSma <- PSma / window
 
 
# have a quick look at the score distributions
 
 
boxplot(PSma)
 
hist(PSma)
 
 
# ================================================
 
#    Plot the alignment scores
 
# ================================================
 
 
# normalize the scores
 
PSma <- (PSma-min(PSma))/(max(PSma) - min(PSma) + 0.0001)
 
# spread the normalized values to a desired range, n
 
nCol <- 10
 
PSma <- floor(PSma * nCol) + 1
 
 
# Assign a colorspectrum to a vector (with a bit of colormagic,
 
# don't worry about that for now). Dark colors are poor scores,
 
# "hot" colors are high scores
 
spect <- colorRampPalette(c("black", "red", "yellow", "white"), bias=0.4)(nCol)
 
 
# Color is an often abused aspect of plotting. One can use color to label
 
# *quantities* or *qualities*. For the most part, our pairscores measure amino
 
# acid similarity. That is a quantity and with the spectrum that we just defined
 
# we associte the measured quantities with the color of a glowing piece
 
# of metal: we start with black #000000, then first we ramp up the red
 
# (i.e. low-energy) part of the visible spectrum to red #FF0000, then we
 
# add and ramp up the green spectrum giving us yellow #FFFF00 and finally we
 
# add blue, giving us white #FFFFFF. Let's have a look at the spectrum:
 
 
s <- rep(1, nCol)
 
barplot(s, col=spect, axes=F, main="Color spectrum")
 
 
# But one aspect of our data is not quantitatively different: indels.
 
# We valued indels with pairscores of -4. But indels are not simply poor alignment,
 
# rather they are non-alignment. This means stretches of -4 values are really
 
# *qualitatively* different. Let's color them differently by changing the lowest
 
# level of the spectrum to grey.
 
 
spect[1] <- "#CCCCCC"
 
barplot(s, col=spect, axes=F, main="Color spectrum")
 
 
# Now we can display our alignment score vector with colored rectangles.
 
 
# Convert the integers in PSma to color values from spect
 
PScol <- vector()
 
for (i in 1:length(PSma)) {
 
PScol[i] <- spect[ PSma[i] ]  # this is how a value from PSma is used as an index of spect
 
}
 
 
# Plot the scores. The code is similar to the last assignment.
 
# Create an empty plot window of appropriate size
 
plot(1,1, xlim=c(-100, lSeq), ylim=c(0, 2) , type="n", yaxt="n", bty="n", xlab="position in alignment", ylab="")
 
 
# Add a label to the left
 
text (-30, 1, adj=1, labels=c(paste("Mbp1:\n", code1, "\nvs.\n", code2)), cex=0.9 )
 
 
# Loop over the vector and draw boxes  without border, filled with color.
 
for (i in 1:lSeq) {
 
  rect(i, 0.9, i+1, 1.1, border=NA, col=PScol[i])
 
}
 
 
# Note that the numbers along the X-axis are not sequence numbers, but numbers
 
# of the alignment, i.e. sequence number + indel length. That is important to
 
# realize: if you would like to add the annotations from the last assignment
 
# which I will leave as an exercise, you need to map your sequence numbering
 
# into alignment numbering. Let me know in case you try that but need some help.
 
 
</source>
 
}}
 
 
 
;That is all.
 
 
 
&nbsp;
 
  
 
== Links and resources ==
 
== Links and resources ==

Revision as of 14:28, 2 October 2015

Assignment for Week 5
Structure Analysis

< Assignment 4 Assignment 6 >

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

 
 

Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.





From A1:

  • install the molecular graphics viewer UCSF Chimera[1] on your own computer, work through a tutorial on its use and begin practicing the skill of viewing split-screen stereographic scenes without aids;

Molecular graphics

A molecular viewer is a program that takes protein structure data and allows you to display and explore it. For a number of reasons, I chose to use the UCSF Chimera viewer for this course.


UCSF Chimera

Task:

  • Access the Chimera page.
  • Install the program as per the instructions in the section: "Installing Chimera".
  • Access the Chimera User's Guide tutorial section. The "Getting Started" tutorial is offered in two versions: one for work with the graphical user interface (GUI), i.e. the usual system of windows and drop-down menu selections. The other is a command-line version for the same. What is the difference? In general, GUI interfaces (Menu version) are well suited for beginners who are not yet familiar with all the options. Having the commands and alternatives presented on a menu makes first steps very easy through simple selection of keywords. On the other hand, work from command line interfaces is much faster and more flexible if you know what you are doing and thus much better suited for the experienced user. It is also quite straightforward to execute series of commands in stored scripts, allowing you to automate tasks. For now, we will stay with the menu version but we will use commands later in the course and you are of course welcome to explore.
  • Work through the Chimera tutorial Getting Started - Menu version, Part 1.

Stereo vision

Task:

Access the Stereo Vision tutorial and practice viewing molecular structures in stereo.

Practice at least ...

  • two times daily,
  • for 3-5 minutes each session,

Keep up your practice throughout the course. Stereo viewing will be required in the final exam, but more importantly, it is a wonderful skill that will greatly support any activity of yours related to structural molecular biology. Practice with different molecules and try out different colours and renderings.

Note: do not go through your practice sessions mechanically. If you are not making any progress with stereo vision, contact me so we can help you on the right track.






from A2;



Structure search

The search options in the PDB structure database are as sophisticated as those at the NCBI. For now, we will try a simple keyword search to get us started.


Task:

  1. Visit the RCSB PDB website at http://www.pdb.org/
  2. Briefly orient yourself regarding the database contents and its information offerings and services.
  3. Enter Mbp1 into the search field.
  4. In your journal, note down the PDB IDs for the three Saccharomyces cerevisiae Mbp1 transcription factor structures your search has retrieved.
  5. Click on one of the entries and explore the information and services linked from that page.

 

Chimera

In this task we will explore the sequence interface of Chimera, use it to select specific parts of a molecule, and colour specific regions (or residues) of a molecule separately.

 

Task:

  1. Open Chimera.
  2. One of the three yeast Mbp1 fragment structures has the PDB ID 1BM8. Load it in Chimera (simply enter the ID into the appropriate field of the FileFetch by ID... window).
  3. Display the protein in PresetsInteractive 1 mode and familiarize yourself with its topology of helices and strands.
  4. Open the sequence tool: ToolsSequenceSequence. You will see the sequence for each chain - here there is only one chain. By default, coloured rectangles overlay the secondary structure elements of the sequence.
  5. Hover the mouse over some residues and note that the sequence number and chain is shown at the bottom of the window.
  6. Click/drag one residue to select it. (Simply a click wont work, you need to drag a little bit for the selection to catch on.) Note that the residue gets a green overlay in the sequence window, as it also gets selected with a green border in the graphics window.
  7. In the bottom of the sequence window, there are instructions how to select (multiple) regions. Try this: colour the protein white (SelectSelect All; ActionsColorlight gray). Clear the selection. Now select all the helical regions (pale yellow boxes) by click/dragging and using the shift key. Color them red. Then select all the strands by clicking into any of the pale green boxes and color them green.
  8. Finally, generate a stereo-view that shows the molecule well, in which the domain is coloured dark grey, and the APSES domain residues (as defined in the FASTA listing above, from I19 to Y93) are coloured with a colour ramp (ToolsDepictionRainbow)[2]
  9. Show the first and last residue's CA atom[3] as a sphere and colour the first one blue (to mark the N-terminus) and the last one red. E.g.:
    1. SelectAtom specifier:4@CA
    2. ActionsRibbonhide
    3. ActionsAtoms/bondsshow
    4. ActionsAtoms/bondssphere
    5. ActionsColorcornflower blue
    6. Then click on the selection inspector (the green button with the magnifying glass at the lower right of the graphics window) and set the sphere radius to 1.0Å.
  10. Save the image in your Wiki journal in JPEG format (FileSave Image and upload it to the Student Wiki).


 

Stereo vision

Task:

Continue with your stereo practice.

Practice at least ...

  • two times daily,
  • for 3-5 minutes each session.
  • Measure your interocular distance and your fusion distance as explained here on the Student Wiki and add it to the table.

Keep up your practice throughout the course. Once again: do not go through your practice sessions mechanically. If you are not making constant progress in your practice sessions, contact me so we can help you on the right track.

Modeling small molecules (optional)

As an optional part of the assignment, here is a small tutorial for modeling and visualizing "small-molecule" structures.


Defining a molecule

A number of public repositories make small molecule information available, such as PubChem at the NCBI, the ligand collection at the PDB, the ChEBI database at the European Bioinformatics Institute, or the NCI database browser at the US National Cancer Institute. One general way to export topology information from these services is to use SMILES strings—a shorthand notation for the composition and topology of chemical compounds.


Task:

  1. Access each of the databases mentioned above.
  2. Enter "caffeine" as a search term.
  3. Explore the contents of the result, in particular note and copy the SMILES string for the compound.


Alternatively, you can sketch your own compound. Versions of Peter Ertl's Java Molecular Editor (JME) are offered on several websites (e.g. click on Transfer to Java Editor on a NCI results page), and PubChem offers this functionality via its Sketcher tool.

Task:

  1. Navigate to PubChem.
  2. Follow the link to Chemical structure search (in the right hand menu).
  3. Click on the 3D conformer tab and on the Launch button to launch the molecular editor in its own window.
  4. Sketch the structure of caffeine. I find the editor quite intuitive but if you need help, just use the Help button in the editor.
  5. Save the SMILES string of your compound.
  6. Also Export your result in SMILES format as a file.

Translating SMILES to structure

Online services exist to translate SMILES to (idealized) coordinates.

Task:

  1. Access the online SMILES translation service at the NCI.
  2. Paste a caffeine SMILES string into the form, choose the PDB radio button, click on Translate and download your file.
  3. Load the molecule in Chimera.

Chimera also has a function to translate SMILES to coordinates.

Task:

  1. In Chimera:
    1. FileClose Session.
    2. ToolsStructure EditingBuild Structure.
    3. Select SMILES string, paste the string and click Apply.
  2. The caffeine molecule will be generated and visualized in the graphics window.




Links and resources

 

 


Footnotes and references

  1. * Previous versions of this course have used the VMD molecular viewer. Material on this is still available at the VMD page.
  2. The Rainbow tool can only create color ramps for an entire molecule. In order to achieve this effect: color the molecule with a color ramp, then select the APSES domain, then invert the selection and color the new selection dark grey.
  3. See here for details of the specification syntax.


 

Ask, if things don't work for you!

If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.



< Assignment 4 Assignment 6 >