Database Exam Questions

From "A B C"
Revision as of 05:09, 11 December 2006 by Boris (talk | contribs) (→‎2002)
Jump to navigation Jump to search

   

One aspect of Bioinformatics concerns itself with the storage, organisation, and retreival of biological information. The questions in this section consider the contents and use of some of the key abstractions (sequences, structures, graphs ...) that we deal with, and the databases we store them in.

   

2003

In the excerpt from the PDBsum database shown here, please comment briefly on the following points. Questions correspond to the numbers shown on the image. The region within the circle is an enlargement of the original, for better legibility.

 

 

 

 

 

 

  • (5.1) What is the relationship between PDBsum and the database linked from this button ("PDB")?
  • (5.2) What do these terms mean and what use can you make of this information when you analyse the structure ("Resolution", "R-Factor", "R-free") ?
  • (5.3) What is the purpose of the database linked from this button and what use can you make from its contents when you analyse the structure ("CATH")?
  • (5.4) What is this information and what use can you make of it ("Residue interactions: * with DNA + with ligand")?
  • (5.5) What is this sequence and why is it here ?
PDBsum page for Glutamyl-tRNA Synthetase tRNA complex 1EAD


2004 - PDB Format

Despite its many shortcomings and inconsistencies, the PDB format for coordinate datasets is still the most widely accepted format, chiefly due to the large number of legacy programs that use it, but also because it is human readable. The following is an excerpt from the PDB file of pea defensin (1JKZ.PDB).

[...]
ATOM    404  N   ALA A  28       1.084   7.614   2.493  1.00  0.00           N  
ATOM    405  CA  ALA A  28       0.164   7.660   3.616  1.00  0.00           C  
ATOM    406  C   ALA A  28       0.842   7.090   4.856  1.00  0.00           C  
ATOM    407  O   ALA A  28       0.731   5.902   5.139  1.00  0.00           O  
ATOM    408  CB  ALA A  28      -1.123   6.911   3.287  1.00  0.00           C  
ATOM    409  H   ALA A  28       1.535   6.768   2.288  1.00  0.00           H  
ATOM    410  HA  ALA A  28      -0.085   8.696   3.802  1.00  0.00           H  
ATOM    411 1HB  ALA A  28      -1.278   6.918   2.218  1.00  0.00           H  
ATOM    412 2HB  ALA A  28      -1.957   7.396   3.773  1.00  0.00           H  
ATOM    413 3HB  ALA A  28      -1.047   5.891   3.634  1.00  0.00           H  
[...]


  • Which atom numbers correspond to the backbone atoms and which atomnumbers correspond to the sidechain of this aminoacid ?

Here is a sequence file for this protein.

>gi|20139322|sp|P81929|PSD1_PEA Defense-related peptide 1
KTCEHLADTYRGVCFTNASCDDHCKNKAHLISGTCHNWKCFCTQNC

  • What is the name of this file-format?
  • Find the amino acid for which the coordinates were given above, in this sequence. (Write its one letter code into your exam booklet together with the preceeding and the following amino acid and underline it e.g. ABC ).

   

In the coordinate file of the immunoglobulin domain 2IMM.pdb you find the following record.

HETATM  877  O   HOH     1      -4.169  60.050  40.145  1.00  3.00           O 
  • What does this record describe ?
  • When you display the structure of 2IMM.pdb with RasMol, the protein is displayed as a wireframe model but you see nothing that corresponds to the above record. What do you need to do ?

(Indeed, since the RasMol tutorial was a task of the first assignment, a question like this may turn up every now and then.)


2002


Task

Comment