Difference between revisions of "BIN-SX-Domains"

From "A B C"
Jump to navigation Jump to search
m (Created page with "<div id="BIO"> <div class="b1"> Structure domains </div> {{Vspace}} <div class="keywords"> <b>Keywords:</b>  Structural domains, domain databases - CATH, SCOP, ...")
 
m
Line 19: Line 19:
  
  
{{STUB}}
+
{{DEV}}
  
 
{{Vspace}}
 
{{Vspace}}
Line 38: Line 38:
 
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
 
You need to complete the following units before beginning this one:
 
You need to complete the following units before beginning this one:
*[[BIN-PDB]]
+
*[[BIN-SX-Chimera]]
  
 
{{Vspace}}
 
{{Vspace}}
Line 82: Line 82:
 
== Contents ==
 
== Contents ==
 
<!-- included from "../components/BIN-SX-Domains.components.wtxt", section: "contents" -->
 
<!-- included from "../components/BIN-SX-Domains.components.wtxt", section: "contents" -->
...
+
 
 +
{{Task|1=
 +
*Read the introductory notes on {{ABC-PDF|BIN-SX-Domains|protein domains defined by 3D structure analysis}}.
 +
}}
 +
 
 +
 
 +
{{Vspace}}
 +
 
 +
 
 +
===APSES domains in Chimera (from A4)===
 +
What precisely constitutes an APSES domain however is a matter of definition, as you can explore in the following (optional) task.
 +
 
 +
 
 +
<div class="mw-collapsible mw-collapsed" data-expandtext="Expand" data-collapsetext="Collapse" style="border:#000000 solid 1px; padding: 10px; margin-left:25px; margin-right:25px;">Optional: Load the structure in Chimera, like you did in the last assignment and switch on stereo viewing ... (more) <div  class="mw-collapsible-content">
 +
<ol start="7">
 +
<li>Display the protein in ribbon style, e.g. with the '''Interactive 1''' preset.
 +
<li>Access the '''Interpro''' information page for Mbp1 at the EBI: http://www.ebi.ac.uk/interpro/protein/P39678
 +
<li>In the section '''Domains and repeats''', mouse over the red annotations and note down the residue numbers for the annotated domains. Also follow the links to the respective Interpro domain definition pages.
 +
</ol>
 +
 
 +
At this point we have definitions for the following regions on the Mbp1 protein ...
 +
*The KilA-N (pfam 04383) domain definition as applied to the Mbp1 protein sequence by CDD;
 +
*The InterPro ''KilA, N-terminal/APSES-type HTH, DNA-binding (IPR018004)'' definition annotated on the Mbp1 sequence;
 +
*The InterPro ''Transcription regulator HTH, APSES-type DNA-binding domain (IPR003163)'' definition annotated on the Mbp1 sequence;
 +
*<small>(... in addition &ndash; without following the source here &ndash; the UniProt record for Mbp1 annotates a "HTH APSES-type" domain from residues 5-111)</small>
 +
 
 +
... each with its distinct and partially overlapping sequence range. Back to Chimera:
 +
 
 +
<!-- For reference:
 +
1MB1: 3-100
 +
2BM8: 4-102
 +
CDD KilA-N: 19-93
 +
InterPro KilA-N: 23-88
 +
InterPro APSES: 3-133
 +
Uniprot HTH/APSES: 5-111
 +
-->
 +
 
 +
<ol start="10">
 +
<li>In the sequence window, select the sequence corresponding to the '''Interpro KilA-N''' annotation and colour this fragment red. <small>Remember that you can get the sequence numbers of a residue in the sequence window when you hover the pointer over it - but do confirm that the sequence numbering that Chimera displays matches the numbering of the Interpro domain definition.</small></li>
 +
 
 +
<li>Then select the residue range(s) by which the '''CDD KilA-N''' definition is larger, and colour that fragment orange.</li>
 +
 
 +
<li>Then select the residue range(s) by which the '''InterPro APSES domain''' definition is larger, and colour that fragment yellow.</li>
 +
 
 +
<li>If the structure contains residues outside these ranges, colour these white.</li>
 +
 
 +
<li>Study this in a side-by-side stereo view and get a sense for how the ''extra'' sequence beyond the Kil-A N domain(s) is part of the structure, and how the integrity of the folded structure would be affected if these fragments were missing.</li>
 +
 
 +
<li>Display Hydrogen bonds, to get a sense of interactions between residues from the differently colored parts. First show the protein as a stick model, with sticks that are thicker than the default to give a better sense of sidechain packing:<br />
 +
::(i) '''Select''' &rarr; '''Select all''' <br />
 +
::(ii) '''Actions''' &rarr; '''Ribbon''' &rarr; '''hide''' <br />
 +
::(iii) '''Select''' &rarr; '''Structure''' &rarr; '''protein''' <br />
 +
::(iv) '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''show''' <br />
 +
::(v)  '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''stick''' <br />
 +
::(vi) click on the looking glass icon at the bottom right of the graphics window to bring up the inspector window and choose '''Inspect ... Bond'''. Change the radius to 0.4.<br />
 +
</li>
 +
 
 +
<li>Then calculate and display the hydrogen bonds:<br />
 +
::(vii) '''Tools''' &rarr; '''Surface/Binding Analysis''' &rarr; '''FindHbond''' <br />
 +
::(viii) Set the '''Line width''' to 3.0, leave all other parameters with their default values an click '''Apply'''<br />
 +
:: Clear the selection.<br />
 +
Study this view, especially regarding side chain H-bonds. Are there many? Do side chains interact more with other sidechains, or with the backbone?
 +
</li>
 +
 
 +
<li>Let's now simplify the scene a bit and focus on backbone/backbone H-bonds:<br />
 +
::(ix) '''Select''' &rarr; '''Structure''' &rarr; '''Backbone''' &rarr; '''full'''<br />
 +
::(x)  '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''show only'''<br /><br />
 +
:: Clear the selection.<br />
 +
In this way you can appreciate how H-bonds build secondary structure - &alpha;-helices and &beta;-sheets - and how these interact with each other ... in part '''across the KilA N boundary'''.
 +
</li>
 +
 
 +
 
 +
<li>Save the resulting image as a jpeg no larger than 600px across and upload it to your Lab notebook on the Wiki.</li>
 +
<li>When you are done, congratulate yourself on having earned a bonus of 10% on the next quiz.</li>
 +
</ol>
 +
 
 +
</div>
 +
</div>
 +
 
 +
 
 +
There is a rather important lesson in this: domain definitions may be fluid, and their boundaries may be computationally derived from sequence comparisons across many families, and do not necessarily correspond to individual structures. Make sure you understand this well.
 +
}}
 +
 
 +
 
 +
Given this, it seems appropriate to search the sequence database with the sequence of an Mbp1 structure&ndash;this being a structured, stable, subdomain of the whole that presumably contains the protein's most unique and specific function. Let us retrieve this sequence. All PDB structures have their sequences stored in the NCBI protein database. They can be accessed simply via the PDB-ID, which serves as an identifier both for the NCBI and the PDB databases. However there is a small catch (isn't there always?). PDB files can contain more than one protein, e.g. if the crystal structure contains a complex<ref>Think of the [http://www.pdb.org/pdb/101/motm.do?momID=121 ribosome] or [http://www.pdb.org/pdb/101/motm.do?momID=3 DNA-polymerase] as extreme examples.</ref>. Each of the individual proteins gets a so-called '''chain ID'''&ndash;a one letter identifier&ndash; to identify them uniquely. To find their unique sequence in the database, you need to know the PDB ID as well as the chain ID. If the file contains only a single protein (as in our case), the chain ID is always '''<code>A</code>'''<ref>Otherwise, you need to study the PDB Web page for the structure, or the text in the PDB file itself, to identify which part of the complex is labeled with which chain ID. For example, immunoglobulin structures some time label the ''light-'' and ''heavy chain'' fragments as "L" and "H", and sometimes as "A" and "B"&ndash;there are no fixed rules. You can also load the structure in VMD, color "by chain" and use the mouse to click on residues in each chain to identify it.</ref>. make sure you understand the concept of protein chains, and chain IDs.
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
  
 
{{Vspace}}
 
{{Vspace}}

Revision as of 04:31, 31 August 2017

Structure domains


 

Keywords:  Structural domains, domain databases - CATH, SCOP, cDART


 



 


Caution!

This unit is under development. There is some contents here but it is incomplete and/or may change significantly: links may lead to nowhere, the contents is likely going to be rearranged, and objectives, deliverables etc. may be incomplete or missing. Do not work with this material until it is updated to "live" status.


 


Abstract

...


 


This unit ...

Prerequisites

You need to complete the following units before beginning this one:


 


Objectives

...


 


Outcomes

...


 


Deliverables

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your course journal.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


 


Evaluation

Evaluation: NA

This unit is not evaluated for course marks.


 


Contents

Task:


 


APSES domains in Chimera (from A4)

What precisely constitutes an APSES domain however is a matter of definition, as you can explore in the following (optional) task.


Optional: Load the structure in Chimera, like you did in the last assignment and switch on stereo viewing ... (more)
  1. Display the protein in ribbon style, e.g. with the Interactive 1 preset.
  2. Access the Interpro information page for Mbp1 at the EBI: http://www.ebi.ac.uk/interpro/protein/P39678
  3. In the section Domains and repeats, mouse over the red annotations and note down the residue numbers for the annotated domains. Also follow the links to the respective Interpro domain definition pages.

At this point we have definitions for the following regions on the Mbp1 protein ...

  • The KilA-N (pfam 04383) domain definition as applied to the Mbp1 protein sequence by CDD;
  • The InterPro KilA, N-terminal/APSES-type HTH, DNA-binding (IPR018004) definition annotated on the Mbp1 sequence;
  • The InterPro Transcription regulator HTH, APSES-type DNA-binding domain (IPR003163) definition annotated on the Mbp1 sequence;
  • (... in addition – without following the source here – the UniProt record for Mbp1 annotates a "HTH APSES-type" domain from residues 5-111)

... each with its distinct and partially overlapping sequence range. Back to Chimera:


  1. In the sequence window, select the sequence corresponding to the Interpro KilA-N annotation and colour this fragment red. Remember that you can get the sequence numbers of a residue in the sequence window when you hover the pointer over it - but do confirm that the sequence numbering that Chimera displays matches the numbering of the Interpro domain definition.
  2. Then select the residue range(s) by which the CDD KilA-N definition is larger, and colour that fragment orange.
  3. Then select the residue range(s) by which the InterPro APSES domain definition is larger, and colour that fragment yellow.
  4. If the structure contains residues outside these ranges, colour these white.
  5. Study this in a side-by-side stereo view and get a sense for how the extra sequence beyond the Kil-A N domain(s) is part of the structure, and how the integrity of the folded structure would be affected if these fragments were missing.
  6. Display Hydrogen bonds, to get a sense of interactions between residues from the differently colored parts. First show the protein as a stick model, with sticks that are thicker than the default to give a better sense of sidechain packing:
    (i) SelectSelect all
    (ii) ActionsRibbonhide
    (iii) SelectStructureprotein
    (iv) ActionsAtoms/Bondsshow
    (v) ActionsAtoms/Bondsstick
    (vi) click on the looking glass icon at the bottom right of the graphics window to bring up the inspector window and choose Inspect ... Bond. Change the radius to 0.4.
  7. Then calculate and display the hydrogen bonds:
    (vii) ToolsSurface/Binding AnalysisFindHbond
    (viii) Set the Line width to 3.0, leave all other parameters with their default values an click Apply
    Clear the selection.
    Study this view, especially regarding side chain H-bonds. Are there many? Do side chains interact more with other sidechains, or with the backbone?
  8. Let's now simplify the scene a bit and focus on backbone/backbone H-bonds:
    (ix) SelectStructureBackbonefull
    (x) ActionsAtoms/Bondsshow only

    Clear the selection.
    In this way you can appreciate how H-bonds build secondary structure - α-helices and β-sheets - and how these interact with each other ... in part across the KilA N boundary.
  9. Save the resulting image as a jpeg no larger than 600px across and upload it to your Lab notebook on the Wiki.
  10. When you are done, congratulate yourself on having earned a bonus of 10% on the next quiz.


There is a rather important lesson in this: domain definitions may be fluid, and their boundaries may be computationally derived from sequence comparisons across many families, and do not necessarily correspond to individual structures. Make sure you understand this well. }}


Given this, it seems appropriate to search the sequence database with the sequence of an Mbp1 structure–this being a structured, stable, subdomain of the whole that presumably contains the protein's most unique and specific function. Let us retrieve this sequence. All PDB structures have their sequences stored in the NCBI protein database. They can be accessed simply via the PDB-ID, which serves as an identifier both for the NCBI and the PDB databases. However there is a small catch (isn't there always?). PDB files can contain more than one protein, e.g. if the crystal structure contains a complex[1]. Each of the individual proteins gets a so-called chain ID–a one letter identifier– to identify them uniquely. To find their unique sequence in the database, you need to know the PDB ID as well as the chain ID. If the file contains only a single protein (as in our case), the chain ID is always A[2]. make sure you understand the concept of protein chains, and chain IDs.





 


Further reading, links and resources

 


Notes

  1. Think of the ribosome or DNA-polymerase as extreme examples.
  2. Otherwise, you need to study the PDB Web page for the structure, or the text in the PDB file itself, to identify which part of the complex is labeled with which chain ID. For example, immunoglobulin structures some time label the light- and heavy chain fragments as "L" and "H", and sometimes as "A" and "B"–there are no fixed rules. You can also load the structure in VMD, color "by chain" and use the mouse to click on residues in each chain to identify it.


 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-08-05

Version:

0.1

Version history:

  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.