Difference between revisions of "FND-CSC-Data models"

From "A B C"
Jump to navigation Jump to search
m
m
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
Data Models and Data Management
+
Relational Data Models
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 
+
(Relational data models - what, why, how)
  {{Vspace}}
+
</div>
 
 
<div class="keywords">
 
<b>Keywords:</b>&nbsp;
 
Relational data models - what, why, how
 
 
</div>
 
</div>
  
{{Vspace}}
+
{{Smallvspace}}
 
 
 
 
__TOC__
 
 
 
{{Vspace}}
 
  
  
{{DEV}}
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
 
+
<div style="font-size:118%;">
{{Vspace}}
+
<b>Abstract:</b><br />
 
 
 
 
</div>
 
<div id="ABC-unit-framework">
 
== Abstract ==
 
 
<section begin=abstract />
 
<section begin=abstract />
<!-- included from "../components/FND-CSC-Data_models.components.wtxt", section: "abstract" -->
+
Computational work with data often begins with data modeling: parsing facts about the world into a set of entities, their attributes, and their relationships. These are usually represented in a "relational data model". This unit introduces the concept, discusses pitfalls in creating such models and how they can be addressed, and practices designing and evaluating datamodels.
...
 
 
<section end=abstract />
 
<section end=abstract />
 
+
</div>
{{Vspace}}
+
<!-- ============================  -->
 
+
<hr>
 
+
<table>
== This unit ... ==
+
<tr>
=== Prerequisites ===
+
<td style="padding:10px;">
<!-- included from "../components/FND-CSC-Data_models.components.wtxt", section: "prerequisites" -->
+
<b>Objectives:</b><br />
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
+
This unit will ...
You need to complete the following units before beginning this one:
+
* Introduce the concept of relational datamodels and the terms we use to describe them;
 +
* Provide a template that you can use for modelling;
 +
* Present problems for you to think through in designing your own data models.
 +
</td>
 +
<td style="padding:10px;">
 +
<b>Outcomes:</b><br />
 +
After working through this unit you ...
 +
* can interpret and critically evaluate an ERD diagram;
 +
* have practiced building your own datamodels;
 +
* have participated in the discussion and improvement of three models motivated by common bioinformatics tasks.
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================ -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<ul>
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
</ul>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 
*[[RPR-Introduction|RPR-Introduction (Introduction to R)]]
 
*[[RPR-Introduction|RPR-Introduction (Introduction to R)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 +
</div>
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Objectives ===
 
<!-- included from "../components/FND-CSC-Data_models.components.wtxt", section: "objectives" -->
 
...
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Outcomes ===
+
__TOC__
<!-- included from "../components/FND-CSC-Data_models.components.wtxt", section: "outcomes" -->
 
...
 
 
 
{{Vspace}}
 
 
 
 
 
=== Deliverables ===
 
<!-- included from "../components/FND-CSC-Data_models.components.wtxt", section: "deliverables" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 72: Line 70:
  
 
=== Evaluation ===
 
=== Evaluation ===
<!-- included from "../components/FND-CSC-Data_models.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
 
<b>Evaluation: NA</b><br />
 
<b>Evaluation: NA</b><br />
:This unit is not evaluated for course marks.
+
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
 
 
{{Vspace}}
 
 
 
 
 
</div>
 
<div id="BIO">
 
 
== Contents ==
 
== Contents ==
<!-- included from "../components/FND-CSC-Data_models.components.wtxt", section: "contents" -->
 
  
 
{{Task|1=
 
{{Task|1=
 
* Read the introductory notes on {{ABC-PDF|FND-CSC-Data_models|data models}}.
 
* Read the introductory notes on {{ABC-PDF|FND-CSC-Data_models|data models}}.
}}
 
 
{{Vspace}}
 
  
 +
In the PDF notes, a protein data model is developed. You can access a sketch of the data model by clicking on the image:
  
 +
{{Smallvspace}}
  
{{Vspace}}
+
[[File:ProteinDBschema.svg|width=500px|link=https://docs.google.com/presentation/d/13vWaVcFpWEOGeSNhwmqugj2qTQuH1eZROgxWdHGEMr0]]
  
 +
{{Smallvspace}}
  
== Further reading, links and resources ==
+
Open the slide, save it, and edit it. (Or download to use any other drawing tool.) To practice data modeling, think about and try modeling the following extensions to the "'''proteinDB'''" model (in a new slide):
<!-- {{#pmid: 19957275}} -->
 
<!-- {{WWW|WWW_GMOD}} -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
  
{{Vspace}}
+
<ul>
 +
<li>Some of the proteins that you might want to store are transcription factors. A transcription factor has a canonical binding site sequence, and there are sequences it actually has been observed to bind to. The actual binding instances in specific locations may have genes associated with them, which encode proteins. You might come up with other facts that are important too.</li>
 +
<li>A "systems model" would group together a number of proteins to a system such as "G1/S checkpoint control", "cell-wall repair", "acid/base homeostasis" etc.: i.e. a set of proteins that collaborate towards a common goal. Within that system a protein performs one or more functions, that may be associated with specific states of the protein (like its intracellular location, or its post-translational modification state). Proteins may be structurally part of any number of systems, and they may shuttle between systems. Systems can overlap, and sometimes we might want to group systems in a hierarchical fashion.</li>
 +
<li>A protein-protein interaction database stores interaction information. Interactions may be observed by a number of different experimental methods and thus several different interactions may be reported for the same protein pair. Moreover, there may be meta-information, such as a confidence score that evaluates whether two proteins functionally interact with each other in the cell, rather than the observation being an experimental artefact. Some of the interactions may be between a protein and a complex, or between two complexes and can't be further resolved. But if a interaction is between two disinct proteins, and one of them is part of a complex, that too is importnat to know. Some interactions may be permanent, and some may be transient, i.e. depend on particular conditions.</li>
 +
</ul>
  
 +
*Sketch each of these datamodels on your own. Think about the principles that were discussed in the introduction. You will probably start by listing the entities first, then the attributes of the entities, then the relationships that you need to represent the facts. Don't forget the cardinalities.
  
== Notes ==
+
}}
<!-- included from "../components/FND-CSC-Data_models.components.wtxt", section: "notes" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
 
<references />
 
  
 
{{Vspace}}
 
{{Vspace}}
  
  
</div>
 
<div id="ABC-unit-framework">
 
== Self-evaluation ==
 
<!-- included from "../components/FND-CSC-Data_models.components.wtxt", section: "self-evaluation" -->
 
<!--
 
=== Question 1===
 
 
Question ...
 
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
Answer ...
 
<div class="mw-collapsible-content">
 
Answer ...
 
 
</div>
 
  </div>
 
 
  {{Vspace}}
 
 
-->
 
  
 
{{Vspace}}
 
{{Vspace}}
  
 
 
{{Vspace}}
 
 
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
 
 
----
 
 
{{Vspace}}
 
 
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
 
 
----
 
 
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 160: Line 112:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-09-21
+
:2020-09-20
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:1.0
+
:1.1
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.1 New schema sketch
 
*1.0 First live version
 
*1.0 First live version
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 09:27, 25 September 2020

Relational Data Models

(Relational data models - what, why, how)


 


Abstract:

Computational work with data often begins with data modeling: parsing facts about the world into a set of entities, their attributes, and their relationships. These are usually represented in a "relational data model". This unit introduces the concept, discusses pitfalls in creating such models and how they can be addressed, and practices designing and evaluating datamodels.


Objectives:
This unit will ...

  • Introduce the concept of relational datamodels and the terms we use to describe them;
  • Provide a template that you can use for modelling;
  • Present problems for you to think through in designing your own data models.

Outcomes:
After working through this unit you ...

  • can interpret and critically evaluate an ERD diagram;
  • have practiced building your own datamodels;
  • have participated in the discussion and improvement of three models motivated by common bioinformatics tasks.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Prerequisites:
This unit builds on material covered in the following prerequisite units:


 



 



 


Evaluation

Evaluation: NA

This unit is not evaluated for course marks.

Contents

Task:

In the PDF notes, a protein data model is developed. You can access a sketch of the data model by clicking on the image:


 

width=500px


 

Open the slide, save it, and edit it. (Or download to use any other drawing tool.) To practice data modeling, think about and try modeling the following extensions to the "proteinDB" model (in a new slide):

  • Some of the proteins that you might want to store are transcription factors. A transcription factor has a canonical binding site sequence, and there are sequences it actually has been observed to bind to. The actual binding instances in specific locations may have genes associated with them, which encode proteins. You might come up with other facts that are important too.
  • A "systems model" would group together a number of proteins to a system such as "G1/S checkpoint control", "cell-wall repair", "acid/base homeostasis" etc.: i.e. a set of proteins that collaborate towards a common goal. Within that system a protein performs one or more functions, that may be associated with specific states of the protein (like its intracellular location, or its post-translational modification state). Proteins may be structurally part of any number of systems, and they may shuttle between systems. Systems can overlap, and sometimes we might want to group systems in a hierarchical fashion.
  • A protein-protein interaction database stores interaction information. Interactions may be observed by a number of different experimental methods and thus several different interactions may be reported for the same protein pair. Moreover, there may be meta-information, such as a confidence score that evaluates whether two proteins functionally interact with each other in the cell, rather than the observation being an experimental artefact. Some of the interactions may be between a protein and a complex, or between two complexes and can't be further resolved. But if a interaction is between two disinct proteins, and one of them is part of a complex, that too is importnat to know. Some interactions may be permanent, and some may be transient, i.e. depend on particular conditions.
  • Sketch each of these datamodels on your own. Think about the principles that were discussed in the introduction. You will probably start by listing the entities first, then the attributes of the entities, then the relationships that you need to represent the facts. Don't forget the cardinalities.


 



 


About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2020-09-20

Version:

1.1

Version history:

  • 1.1 New schema sketch
  • 1.0 First live version
  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.