Difference between revisions of "BIN-SX-Domains"

From "A B C"
Jump to navigation Jump to search
m
m
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
 
Structure domains
 
Structure domains
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 +
(Structural domains, domain databases - CATH, SCOP, CDART)
 +
</div>
 +
</div>
  
  {{Vspace}}
+
{{Smallvspace}}
 
 
<div class="keywords">
 
<b>Keywords:</b>&nbsp;
 
Structural domains, domain databases - CATH, SCOP, cDART
 
</div>
 
  
{{Vspace}}
 
  
 +
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
 +
<div style="font-size:118%;">
 +
<b>Abstract:</b><br />
 +
<section begin=abstract />
 +
Structural definition of domains allows a classification of protein structures, which in turn supports the discovery of distant relationships.
 +
<section end=abstract />
 +
</div>
 +
<!-- ============================  -->
 +
<hr>
 +
<table>
 +
<tr>
 +
<td style="padding:10px;">
 +
<b>Objectives:</b><br />
  
__TOC__
+
This unit will ...
 +
* ... introduce concepts of structural domains, the hierarchical nature of protein structure; that domains are folding units, units of inheritance and functional modules; that domains are ubiquitous in proteins; that domain assignment allows to to organize structures into domain databases;
 +
* ... teach how to access domain databases and get the data for structural domain annotations;
 +
</td>
 +
<td style="padding:10px;">
 +
<b>Outcomes:</b><br />
 +
After working through this unit you ...
 +
* ... are familar with domain annotations obtained via the PDB or CDD, and derived from CATH or Pfam and know how to annotate proteins based on those.
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================  -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 +
*[[BIN-SX-Chimera|BIN-SX-Chimera (UCSF ChimeraX: Structure Visualization and Analysis)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 +
</div>
  
{{Vspace}}
+
{{Smallvspace}}
  
  
{{DEV}}
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
</div>
+
__TOC__
<div id="ABC-unit-framework">
 
== Abstract ==
 
<!-- included from "../components/BIN-SX-Domains.components.wtxt", section: "abstract" -->
 
...
 
  
 
{{Vspace}}
 
{{Vspace}}
  
  
== This unit ... ==
+
=== Evaluation ===
=== Prerequisites ===
+
<b>Evaluation: NA</b><br />
<!-- included from "../components/BIN-SX-Domains.components.wtxt", section: "prerequisites" -->
+
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
+
== Contents ==
You need to complete the following units before beginning this one:
 
*[[BIN-SX-Chimera]]
 
  
{{Vspace}}
 
  
 +
{{Task|1=
 +
*Read the introductory notes on {{ABC-PDF|BIN-SX-Domains|protein domains defined by 3D structure analysis}}.
 +
}}
  
=== Objectives ===
 
<!-- included from "../components/BIN-SX-Domains.components.wtxt", section: "objectives" -->
 
...
 
  
 
{{Vspace}}
 
{{Vspace}}
  
 +
===CATH===
  
=== Outcomes ===
+
The '''Annotations''' tab of PDB entries allows to search for SCOP, CATH and Pfam annotations - i.e. clicking on the respective categories finds all PDB structures that share the same category. But let's have a quick look at CATH itself.
<!-- included from "../components/BIN-SX-Domains.components.wtxt", section: "outcomes" -->
 
...
 
  
{{Vspace}}
+
{{task|1=
  
 +
* Access CATH at http://www.cathdb.info/
 +
* Search for 1BM8
  
=== Deliverables ===
+
There are not many members of this family, so the information we get is not much more than what we got at the PDB. But we can see where our domain falls in the CATH hierarchy by noting the CATH ID <code>3.10.260.10</code>.
<!-- included from "../components/BIN-SX-Domains.components.wtxt", section: "deliverables" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|course journal]].
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|insights! page]].
 
  
{{Vspace}}
+
* Click on the [http://www.cathdb.info/version/latest/superfamily/3.10.260.10 <code>3.10.260.10</code> ID]
 +
* Click on the [http://www.cathdb.info/version/v4_2_0/superfamily/3.10.260.10/classification '''Classification / Domains''' section link] in the left menu
 +
* Continue to [http://www.cathdb.info/browse/sunburst?from_cath_id=3 '''Class 3 Alpha Beta''']
 +
* Hover over the sections of the spokes-graph - this will show a small image of what the representative structure of the various '''Architectures''' in that Class look like. Spend some time with that to get a sense of the diversity of protein structure. Also note that some Architectures do not have many member domains at all - and some have tens of thousands.
 +
* Focus on '''Architecture 3.10. Roll'''
 +
* Explore the different Topologies it contains. One can argue that at the Topology level, CATH categorizes non-homologous structures. You will realize that these structures are very diverse, but they are all organized along the general principle of a twisted beta-sheet with helices usually on both faces.
 +
* Focus on '''Topology 3.10.260 Mlu-1 box binding protein'''.
 +
* Homology group 3.10.260.10 contains 1BM8
  
 +
}}
  
=== Evaluation ===
+
Through this exploration you can get a sense of where this fold fits in the "structural domain universe".
<!-- included from "../components/BIN-SX-Domains.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
<b>Evaluation: NA</b><br />
 
:This unit is not evaluated for course marks.
 
  
 
{{Vspace}}
 
{{Vspace}}
  
 +
===CDD===
  
</div>
+
{{task|1=
<div id="BIO">
+
* Navigate to the NCBI entry for the MYSPE Mbp1 orthologue.
== Contents ==
+
* Click on '''CDD Search Results''' in the right hand column.
<!-- included from "../components/BIN-SX-Domains.components.wtxt", section: "contents" -->
+
* Explore the page options. Note that you can click on '''Zoom to residue level''', which simplifies defining exactly where the domain boundaries are in your sequence.
 +
* Note that you can expand the domain annotations, to show how the actual sequnce aligns with the domain profiles.
 +
* Explore the linked pages for the KilA-N and the Ankyrin superfamily folds.
 +
* Finally click on '''Search for similar domain architectures''' which spawns a search at CDART, the Conserved Domain Architecture Retrieval Tool. Remember this well, it is a '''very''' useful tool: definition of functional domains, and the arrangement of domain modules may allow mechanistic insight in domain function. You will find a number of proteins that share the KilA-N - Ankyrin architecture, but some of the families (and possibly your MYSPE protein) have interesting accessory domains.
  
{{Task|1=
 
*Read the introductory notes on {{ABC-PDF|BIN-SX-Domains|protein domains defined by 3D structure analysis}}.
 
 
}}
 
}}
  
 +
{{Vspace}}
  
{{Vspace}}
+
===APSES and KilA-N domain boundaries===
  
 +
{{Smallvspace}}
  
===APSES domains in Chimera (from A4)===
+
What precisely constitutes an APSES domain is a matter of definition, as we will explore in the following task.
What precisely constitutes an APSES domain however is a matter of definition, as you can explore in the following (optional) task.
 
  
 +
{{task|1=
  
<div class="mw-collapsible mw-collapsed" data-expandtext="Expand" data-collapsetext="Collapse" style="border:#000000 solid 1px; padding: 10px; margin-left:25px; margin-right:25px;">Optional: Load the structure in Chimera, like you did in the last assignment and switch on stereo viewing ... (more) <div  class="mw-collapsible-content">
+
* Access the '''Interpro''' information page for Mbp1 at the EBI: http://www.ebi.ac.uk/interpro/protein/P39678
<ol start="7">
+
* Mouse over the domain annotations and '''note down the residue ranges for the annotated domains covering the N-terminus. You should find:
<li>Display the protein in ribbon style, e.g. with the '''Interactive 1''' preset.
+
** IPR003163 <!--  5 - 111 -->(InterPro) ''Transcription regulator HTH, APSES-type DNA-binding domain (IPR003163)'' annotated on the Mbp1 sequence;
<li>Access the '''Interpro''' information page for Mbp1 at the EBI: http://www.ebi.ac.uk/interpro/protein/P39678
+
** IPR018004 <!-- 22 - 105 -->(InterPro; same as SMART SM01252) The  ''KilA, N-terminal/APSES-type HTH, DNA-binding '' definition annotated on the Mbp1 sequence;
<li>In the section '''Domains and repeats''', mouse over the red annotations and note down the residue numbers for the annotated domains. Also follow the links to the respective Interpro domain definition pages.
+
** PF04383  <!-- 23 -  88 -->(Pfam): the KilA-N domain definition, which is also the one that is annotated to the Mbp1 protein sequence by CDD.
</ol>
 
  
At this point we have definitions for the following regions on the Mbp1 protein ...
+
* Follow the links to the respective Interpro and Pfam domain definition pages and read about the domain. Each domain definition describes essentailly the same biomolecule, but the have distinct and partially overlapping sequence rangex.
*The KilA-N (pfam 04383) domain definition as applied to the Mbp1 protein sequence by CDD;
 
*The InterPro ''KilA, N-terminal/APSES-type HTH, DNA-binding (IPR018004)'' definition annotated on the Mbp1 sequence;
 
*The InterPro ''Transcription regulator HTH, APSES-type DNA-binding domain (IPR003163)'' definition annotated on the Mbp1 sequence;
 
*<small>(... in addition &ndash; without following the source here &ndash; the UniProt record for Mbp1 annotates a "HTH APSES-type" domain from residues 5-111)</small>
 
  
... each with its distinct and partially overlapping sequence range. Back to Chimera:
+
* Navigate to the [https://www.ncbi.nlm.nih.gov/protein/NP_010227 NCBI page for the Mbp1 protein] and click on '''CDD Search Results'''.
 +
* Hover over the Pfam KilA-N annotation in the linked page, note the highlight in the table below, and note down the annotated range - called "Interval" on this page. Hint: it is different from the annotation you find at Interpro. <!-- 19 - 93 -->
  
<!-- For reference:
+
* Open ChimeraX and load the 1BM8 structure.
1MB1: 3-100
+
* Type <code>camera sbs</code> to turn stereo viewing on.
2BM8: 4-102
+
* Select the entire protein chain and colour it white (residues 4 to 102, practically identical to the IPR003163 APSES domain definition.)
CDD KilA-N: 19-93
 
InterPro KilA-N: 23-88
 
InterPro APSES: 3-133
 
Uniprot HTH/APSES: 5-111
 
-->
 
  
<ol start="10">
+
Next, use the "Sequence Window" to select specific residue ranges:
<li>In the sequence window, select the sequence corresponding to the '''Interpro KilA-N''' annotation and colour this fragment red. <small>Remember that you can get the sequence numbers of a residue in the sequence window when you hover the pointer over it - but do confirm that the sequence numbering that Chimera displays matches the numbering of the Interpro domain definition.</small></li>
 
  
<li>Then select the residue range(s) by which the '''CDD KilA-N''' definition is larger, and colour that fragment orange.</li>
+
* Choose '''Tools''' &rarr; '''Sequence''' &rarr; '''Show Sequence Viewer''' to open the sequence window, select the sequence corresponding to '''IPR018004''' (Kil-A N) annotation and colour this fragment yellow. <small>You can get the sequence numbers of a residue in the sequence window when you hover the pointer over it - but do confirm that the sequence numbering that Chimera displays matches the numbering of the Interpro domain definition.</small>
  
<li>Then select the residue range(s) by which the '''InterPro APSES domain''' definition is larger, and colour that fragment yellow.</li>
+
* Then select the residue range of Pfam 04383, the '''KilA-N domain as defined by CDD''' and colour that fragment orange.
  
<li>If the structure contains residues outside these ranges, colour these white.</li>
+
* Finally, choose the residues for PF04383, the '''KilA-N domain as defined by InterPro''' and color them red.
  
<li>Study this in a side-by-side stereo view and get a sense for how the ''extra'' sequence beyond the Kil-A N domain(s) is part of the structure, and how the integrity of the folded structure would be affected if these fragments were missing.</li>
+
* Study this in a side-by-side stereo view and get a sense for how the ''extra'' sequence beyond the Kil-A N domain(s) is part of the structure, and how the integrity of the folded structure would be affected if these fragments were missing.
  
<li>Display Hydrogen bonds, to get a sense of interactions between residues from the differently colored parts. First show the protein as a stick model, with sticks that are thicker than the default to give a better sense of sidechain packing:<br />
+
* Display Hydrogen bonds, to get a sense of interactions between residues from the differently colored parts. First show the protein as a stick model, with sticks that are thicker than the default to give a better sense of sidechain packing:<br />
 
::(i) '''Select''' &rarr; '''Select all''' <br />
 
::(i) '''Select''' &rarr; '''Select all''' <br />
::(ii) '''Actions''' &rarr; '''Ribbon''' &rarr; '''hide''' <br />
+
::(ii) '''Actions''' &rarr; '''Cartoon''' &rarr; '''hide''' <br />
::(iii) '''Select''' &rarr; '''Structure''' &rarr; '''protein''' <br />
+
::(iii) '''Select''' &rarr; '''Chemistry''' &rarr; '''Protein''' <br />
 
::(iv) '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''show''' <br />
 
::(iv) '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''show''' <br />
::(v) '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''stick''' <br />
+
::(vi) type <code>size stickRadius 0.4</code> to give the bonds more volume<br />
::(vi) click on the looking glass icon at the bottom right of the graphics window to bring up the inspector window and choose '''Inspect ... Bond'''. Change the radius to 0.4.<br />
 
</li>
 
  
<li>Then calculate and display the hydrogen bonds:<br />
+
 
::(vii) '''Tools''' &rarr; '''Surface/Binding Analysis''' &rarr; '''FindHbond''' <br />
+
* Then calculate and display the hydrogen bonds:<br />
::(viii) Set the '''Line width''' to 3.0, leave all other parameters with their default values an click '''Apply'''<br />
+
::(vii) '''Tools''' &rarr; '''Structure Analysis''' &rarr; '''H-bond''' <br />
 +
::(viii) Set the '''Radius''' to 0.2 A, the colour to bright green, leave all other parameters at their default values and click '''Apply'''<br />
 
:: Clear the selection.<br />
 
:: Clear the selection.<br />
 
Study this view, especially regarding side chain H-bonds. Are there many? Do side chains interact more with other sidechains, or with the backbone?
 
Study this view, especially regarding side chain H-bonds. Are there many? Do side chains interact more with other sidechains, or with the backbone?
</li>
 
  
<li>Let's now simplify the scene a bit and focus on backbone/backbone H-bonds:<br />
+
 
::(ix) '''Select''' &rarr; '''Structure''' &rarr; '''Backbone''' &rarr; '''full'''<br />
+
* Let's now simplify the scene a bit and focus on backbone/backbone H-bonds:<br />
 +
::(ix) '''Select''' &rarr; '''Structure''' &rarr; '''Backbone''''<br />
 
::(x)  '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''show only'''<br /><br />
 
::(x)  '''Actions''' &rarr; '''Atoms/Bonds''' &rarr; '''show only'''<br /><br />
 
:: Clear the selection.<br />
 
:: Clear the selection.<br />
 
In this way you can appreciate how H-bonds build secondary structure - &alpha;-helices and &beta;-sheets - and how these interact with each other ... in part '''across the KilA N boundary'''.
 
In this way you can appreciate how H-bonds build secondary structure - &alpha;-helices and &beta;-sheets - and how these interact with each other ... in part '''across the KilA N boundary'''.
</li>
 
  
 
+
* Orient the protein well, save the resulting image as a jpeg and upload it to your Journal on the Wiki.
<li>Save the resulting image as a jpeg no larger than 600px across and upload it to your Lab notebook on the Wiki.</li>
 
<li>When you are done, congratulate yourself on having earned a bonus of 10% on the next quiz.</li>
 
</ol>
 
 
 
</div>
 
</div>
 
 
 
 
 
There is a rather important lesson in this: domain definitions may be fluid, and their boundaries may be computationally derived from sequence comparisons across many families, and do not necessarily correspond to individual structures. Make sure you understand this well.
 
 
}}
 
}}
  
 +
{{Vspace}}
  
Given this, it seems appropriate to search the sequence database with the sequence of an Mbp1 structure&ndash;this being a structured, stable, subdomain of the whole that presumably contains the protein's most unique and specific function. Let us retrieve this sequence. All PDB structures have their sequences stored in the NCBI protein database. They can be accessed simply via the PDB-ID, which serves as an identifier both for the NCBI and the PDB databases. However there is a small catch (isn't there always?). PDB files can contain more than one protein, e.g. if the crystal structure contains a complex<ref>Think of the [http://www.pdb.org/pdb/101/motm.do?momID=121 ribosome] or [http://www.pdb.org/pdb/101/motm.do?momID=3 DNA-polymerase] as extreme examples.</ref>. Each of the individual proteins gets a so-called '''chain ID'''&ndash;a one letter identifier&ndash; to identify them uniquely. To find their unique sequence in the database, you need to know the PDB ID as well as the chain ID. If the file contains only a single protein (as in our case), the chain ID is always '''<code>A</code>'''<ref>Otherwise, you need to study the PDB Web page for the structure, or the text in the PDB file itself, to identify which part of the complex is labeled with which chain ID. For example, immunoglobulin structures some time label the ''light-'' and ''heavy chain'' fragments as "L" and "H", and sometimes as "A" and "B"&ndash;there are no fixed rules. You can also load the structure in VMD, color "by chain" and use the mouse to click on residues in each chain to identify it.</ref>. make sure you understand the concept of protein chains, and chain IDs.
+
There is a rather important lesson in this: domain definitions may be fluid, their boundaries may be computationally derived from sequence comparisons across many families, and they do not necessarily correspond to the situation in specific structures. In our example, you saw that the more restrictive KilA-N domain definitions omit two beta-strands at the N-terminus that are well integrated with the rest of the structure - and may even modulate DNA binding through their interactions with the back of the "wing" domain. Database definition of structural domains are important guides, but the cannot replace your detailed judgement. Make sure you understand this well.
 
 
 
 
 
 
 
 
 
 
 
 
  
 
{{Vspace}}
 
{{Vspace}}
 
  
 
== Further reading, links and resources ==
 
== Further reading, links and resources ==
<!-- {{#pmid: 19957275}} -->
+
{{#pmid: 27899584}}
 +
{{#pmid: 26434392}}
 +
{{#pmid: 25348408}}
 
<!-- {{WWW|WWW_GMOD}} -->
 
<!-- {{WWW|WWW_GMOD}} -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
{{Vspace}}
 
 
 
 
== Notes ==
 
== Notes ==
<!-- included from "../components/BIN-SX-Domains.components.wtxt", section: "notes" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
 
 
<references />
 
<references />
  
 
{{Vspace}}
 
{{Vspace}}
  
 
</div>
 
<div id="ABC-unit-framework">
 
== Self-evaluation ==
 
<!-- included from "../components/BIN-SX-Domains.components.wtxt", section: "self-evaluation" -->
 
<!--
 
=== Question 1===
 
 
Question ...
 
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
Answer ...
 
<div class="mw-collapsible-content">
 
Answer ...
 
 
</div>
 
  </div>
 
 
  {{Vspace}}
 
 
-->
 
 
{{Vspace}}
 
 
 
 
{{Vspace}}
 
 
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
 
 
----
 
 
{{Vspace}}
 
 
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
 
 
----
 
 
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 241: Line 201:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-05
+
:2020-09-26
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:0.1
+
:1.1
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.1 2020 updates to CATH; now using ChimeraX
 +
*1.0 First live version
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 15:57, 26 September 2020

Structure domains

(Structural domains, domain databases - CATH, SCOP, CDART)


 


Abstract:

Structural definition of domains allows a classification of protein structures, which in turn supports the discovery of distant relationships.


Objectives:

This unit will ...

  • ... introduce concepts of structural domains, the hierarchical nature of protein structure; that domains are folding units, units of inheritance and functional modules; that domains are ubiquitous in proteins; that domain assignment allows to to organize structures into domain databases;
  • ... teach how to access domain databases and get the data for structural domain annotations;

Outcomes:
After working through this unit you ...

  • ... are familar with domain annotations obtained via the PDB or CDD, and derived from CATH or Pfam and know how to annotate proteins based on those.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

  • Prerequisites:
    This unit builds on material covered in the following prerequisite units:


     



     



     


    Evaluation

    Evaluation: NA

    This unit is not evaluated for course marks.

    Contents

    Task:


     

    CATH

    The Annotations tab of PDB entries allows to search for SCOP, CATH and Pfam annotations - i.e. clicking on the respective categories finds all PDB structures that share the same category. But let's have a quick look at CATH itself.

    Task:

    There are not many members of this family, so the information we get is not much more than what we got at the PDB. But we can see where our domain falls in the CATH hierarchy by noting the CATH ID 3.10.260.10.

    • Click on the 3.10.260.10 ID
    • Click on the Classification / Domains section link in the left menu
    • Continue to Class 3 Alpha Beta
    • Hover over the sections of the spokes-graph - this will show a small image of what the representative structure of the various Architectures in that Class look like. Spend some time with that to get a sense of the diversity of protein structure. Also note that some Architectures do not have many member domains at all - and some have tens of thousands.
    • Focus on Architecture 3.10. Roll
    • Explore the different Topologies it contains. One can argue that at the Topology level, CATH categorizes non-homologous structures. You will realize that these structures are very diverse, but they are all organized along the general principle of a twisted beta-sheet with helices usually on both faces.
    • Focus on Topology 3.10.260 Mlu-1 box binding protein.
    • Homology group 3.10.260.10 contains 1BM8

    Through this exploration you can get a sense of where this fold fits in the "structural domain universe".


     

    CDD

    Task:

    • Navigate to the NCBI entry for the MYSPE Mbp1 orthologue.
    • Click on CDD Search Results in the right hand column.
    • Explore the page options. Note that you can click on Zoom to residue level, which simplifies defining exactly where the domain boundaries are in your sequence.
    • Note that you can expand the domain annotations, to show how the actual sequnce aligns with the domain profiles.
    • Explore the linked pages for the KilA-N and the Ankyrin superfamily folds.
    • Finally click on Search for similar domain architectures which spawns a search at CDART, the Conserved Domain Architecture Retrieval Tool. Remember this well, it is a very useful tool: definition of functional domains, and the arrangement of domain modules may allow mechanistic insight in domain function. You will find a number of proteins that share the KilA-N - Ankyrin architecture, but some of the families (and possibly your MYSPE protein) have interesting accessory domains.


     

    APSES and KilA-N domain boundaries

     

    What precisely constitutes an APSES domain is a matter of definition, as we will explore in the following task.

    Task:

    • Access the Interpro information page for Mbp1 at the EBI: http://www.ebi.ac.uk/interpro/protein/P39678
    • Mouse over the domain annotations and note down the residue ranges for the annotated domains covering the N-terminus. You should find:
      • IPR003163 (InterPro) Transcription regulator HTH, APSES-type DNA-binding domain (IPR003163) annotated on the Mbp1 sequence;
      • IPR018004 (InterPro; same as SMART SM01252) The KilA, N-terminal/APSES-type HTH, DNA-binding definition annotated on the Mbp1 sequence;
      • PF04383 (Pfam): the KilA-N domain definition, which is also the one that is annotated to the Mbp1 protein sequence by CDD.
    • Follow the links to the respective Interpro and Pfam domain definition pages and read about the domain. Each domain definition describes essentailly the same biomolecule, but the have distinct and partially overlapping sequence rangex.
    • Navigate to the NCBI page for the Mbp1 protein and click on CDD Search Results.
    • Hover over the Pfam KilA-N annotation in the linked page, note the highlight in the table below, and note down the annotated range - called "Interval" on this page. Hint: it is different from the annotation you find at Interpro.
    • Open ChimeraX and load the 1BM8 structure.
    • Type camera sbs to turn stereo viewing on.
    • Select the entire protein chain and colour it white (residues 4 to 102, practically identical to the IPR003163 APSES domain definition.)

    Next, use the "Sequence Window" to select specific residue ranges:

    • Choose ToolsSequenceShow Sequence Viewer to open the sequence window, select the sequence corresponding to IPR018004 (Kil-A N) annotation and colour this fragment yellow. You can get the sequence numbers of a residue in the sequence window when you hover the pointer over it - but do confirm that the sequence numbering that Chimera displays matches the numbering of the Interpro domain definition.
    • Then select the residue range of Pfam 04383, the KilA-N domain as defined by CDD and colour that fragment orange.
    • Finally, choose the residues for PF04383, the KilA-N domain as defined by InterPro and color them red.
    • Study this in a side-by-side stereo view and get a sense for how the extra sequence beyond the Kil-A N domain(s) is part of the structure, and how the integrity of the folded structure would be affected if these fragments were missing.
    • Display Hydrogen bonds, to get a sense of interactions between residues from the differently colored parts. First show the protein as a stick model, with sticks that are thicker than the default to give a better sense of sidechain packing:
    (i) SelectSelect all
    (ii) ActionsCartoonhide
    (iii) SelectChemistryProtein
    (iv) ActionsAtoms/Bondsshow
    (vi) type size stickRadius 0.4 to give the bonds more volume


    • Then calculate and display the hydrogen bonds:
    (vii) ToolsStructure AnalysisH-bond
    (viii) Set the Radius to 0.2 A, the colour to bright green, leave all other parameters at their default values and click Apply
    Clear the selection.

    Study this view, especially regarding side chain H-bonds. Are there many? Do side chains interact more with other sidechains, or with the backbone?


    • Let's now simplify the scene a bit and focus on backbone/backbone H-bonds:
    (ix) SelectStructureBackbone'
    (x) ActionsAtoms/Bondsshow only

    Clear the selection.

    In this way you can appreciate how H-bonds build secondary structure - α-helices and β-sheets - and how these interact with each other ... in part across the KilA N boundary.

    • Orient the protein well, save the resulting image as a jpeg and upload it to your Journal on the Wiki.


     

    There is a rather important lesson in this: domain definitions may be fluid, their boundaries may be computationally derived from sequence comparisons across many families, and they do not necessarily correspond to the situation in specific structures. In our example, you saw that the more restrictive KilA-N domain definitions omit two beta-strands at the N-terminus that are well integrated with the rest of the structure - and may even modulate DNA binding through their interactions with the back of the "wing" domain. Database definition of structural domains are important guides, but the cannot replace your detailed judgement. Make sure you understand this well.


     

    Further reading, links and resources

    Dawson et al. (2017) CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res 45:D289-D295. (pmid: 27899584)

    PubMed ] [ DOI ] The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.

    Das & Orengo (2016) Protein function annotation using protein domain family resources. Methods 93:24-34. (pmid: 26434392)

    PubMed ] [ DOI ] As a result of the genome sequencing and structural genomics initiatives, we have a wealth of protein sequence and structural data. However, only about 1% of these proteins have experimental functional annotations. As a result, computational approaches that can predict protein functions are essential in bridging this widening annotation gap. This article reviews the current approaches of protein function prediction using structure and sequence based classification of protein domain family resources with a special focus on functional families in the CATH-Gene3D resource.

    Sillitoe et al. (2015) CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 43:D376-81. (pmid: 25348408)

    PubMed ] [ DOI ] The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235,000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our 'current' putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.

    Notes


     


    About ...
     
    Author:

    Boris Steipe <boris.steipe@utoronto.ca>

    Created:

    2017-08-05

    Modified:

    2020-09-26

    Version:

    1.1

    Version history:

    • 1.1 2020 updates to CATH; now using ChimeraX
    • 1.0 First live version
    • 0.1 First stub

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.