Difference between revisions of "Database Exam Questions"

Revision as of 14:44, 11 December 2006

One aspect of Bioinformatics concerns itself with the storage, organisation, and retreival of biological information. The questions in this section consider the contents and use of some of the key abstractions (sequences, structures, graphs ...) that we deal with, and the databases we store them in.

2003

In the excerpt from the PDBsum database shown here, please comment briefly on the following points. Questions correspond to the numbers shown on the image. The region within the circle is an enlargement of the original, for better legibility.

(5.1) What is the relationship between PDBsum and the database linked from this button ("PDB")?
(5.2) What do these terms mean and what use can you make of this information when you analyse the structure ("Resolution", "R-Factor", "R-free") ?
(5.3) What is the purpose of the database linked from this button and what use can you make from its contents when you analyse the structure ("CATH")?
(5.4) What is this information and what use can you make of it ("Residue interactions: * with DNA + with ligand")?
(5.5) What is this sequence and why is it here ?

PDBsum page for Glutamyl-tRNA Synthetase tRNA complex 1EAD

2004 - PDB Format

Despite its many shortcomings and inconsistencies, the PDB format for coordinate datasets is still the most widely accepted format, chiefly due to the large number of legacy programs that use it, but also because it is human readable. The following is an excerpt from the PDB file of pea defensin (1JKZ.PDB).

[...]
ATOM    404  N   ALA A  28       1.084   7.614   2.493  1.00  0.00           N  
ATOM    405  CA  ALA A  28       0.164   7.660   3.616  1.00  0.00           C  
ATOM    406  C   ALA A  28       0.842   7.090   4.856  1.00  0.00           C  
ATOM    407  O   ALA A  28       0.731   5.902   5.139  1.00  0.00           O  
ATOM    408  CB  ALA A  28      -1.123   6.911   3.287  1.00  0.00           C  
ATOM    409  H   ALA A  28       1.535   6.768   2.288  1.00  0.00           H  
ATOM    410  HA  ALA A  28      -0.085   8.696   3.802  1.00  0.00           H  
ATOM    411 1HB  ALA A  28      -1.278   6.918   2.218  1.00  0.00           H  
ATOM    412 2HB  ALA A  28      -1.957   7.396   3.773  1.00  0.00           H  
ATOM    413 3HB  ALA A  28      -1.047   5.891   3.634  1.00  0.00           H  
[...]

Which atom numbers correspond to the backbone atoms and which atomnumbers correspond to the sidechain of this aminoacid ?
Describe the information in the following columns (indicated by the values in the first record): " N ", "A", "1.084", "1.00", "0.00".
Are any of these columns optional, and if yes, would their absence shift the positions of the other columns?
Briefly discuss the relationship between SEQRES records in a PDB file, the genetic sequence of a protein, and the sequence that can be derived from the coordinate records.

Here is a sequence file for this protein.

>gi|20139322|sp|P81929|PSD1_PEA Defense-related peptide 1
KTCEHLADTYRGVCFTNASCDDHCKNKAHLISGTCHNWKCFCTQNC

What is the name of this file-format?
Find the amino acid for which the coordinates were given above, in this sequence. (Write its one letter code into your exam booklet together with the preceeding and the following amino acid and underline it e.g. ABC ).

In the coordinate file of the immunoglobulin domain 2IMM.pdb you find the following record.

HETATM  877  O   HOH     1      -4.169  60.050  40.145  1.00  3.00           O

What does this record describe ?
When you display the structure of 2IMM.pdb with RasMol, the protein is displayed as a wireframe model but you see nothing that corresponds to the above record. What do you need to do ?

(Indeed, since the RasMol tutorial was a task of the first assignment, a question like this may turn up every now and then.)

2003 - Entrez

This is a screenshot of the result of searching the NCBI Entrez database with the search string "sh3".

Briefly discuss each database that the following terms link to, what relationship the results have to the search term and what use one can make of the links.

PubMed
Protein
CDD
OMIM

(A similar question was given in the 2004 practice exam and I like the following format much better:)

Discuss briefly which of the links you would follow to solve the following problems and summarize what the respective database contains and how you would use it. (Be reasonably complete, more than one link may be needed or helpful. Assume you know nothing about the problem but what is stated in the question.)

Retrieve 1 kb of upstream sequence for each yeast protein that contains an SH3 domain.
Check whether a mutation in a residue of yeast protein is known to cause disease in its closest human homologue.

2002

File:Stereo 000000.jpg

Caption.

Task

Comment

@@ Line 97: / Line 97: @@
 *'''OMIM'''
 </div>
+&nbsp;<br>
+&nbsp;<br>
+<small>''(A similar question was given in the 2004 practice exam and I like the following format '''much''' better:)''</small>
+Discuss briefly which of the links you would follow to solve the following problems and summarize what the respective database contains and how you would use it. (Be reasonably complete, more than one link may be needed or helpful. Assume you know nothing about the problem but what is stated in the question.)
+<div style="padding: 5px; background: #DDDDDD;  border:solid 1px #000000;">
+*'''Retrieve 1 kb of upstream sequence for each yeast protein that contains an SH3 domain.'''
+*'''Check whether a mutation in a residue of yeast protein is known to cause disease in its closest human homologue.'''
+</div>
+&nbsp;<br>
+&nbsp;<br>
 ==2002==

Difference between revisions of "Database Exam Questions"

Revision as of 14:44, 11 December 2006

2003

2004 - PDB Format

2003 - Entrez

2002

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools