Difference between revisions of "BIN-Data integration"

From "A B C"
Jump to navigation Jump to search
m
m
 
(2 intermediate revisions by the same user not shown)
Line 41: Line 41:
 
<b>Deliverables:</b><br />
 
<b>Deliverables:</b><br />
 
<section begin=deliverables />
 
<section begin=deliverables />
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-time_management" -->
+
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
+
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-journal" -->
+
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
 
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 
 
<section end=deliverables />
 
<section end=deliverables />
 
<!-- ============================  -->
 
<!-- ============================  -->
Line 52: Line 49:
 
<section begin=prerequisites />
 
<section begin=prerequisites />
 
<b>Prerequisites:</b><br />
 
<b>Prerequisites:</b><br />
<!-- included from "./data/ABC-unit_components.txt", section: "notes-prerequisites" -->
+
This unit builds on material covered in the following prerequisite units:<br />
This unit builds on material covered in the following prerequisite units:
 
 
*[[BIN-EBI|BIN-EBI (Databases and services at the EBI)]]
 
*[[BIN-EBI|BIN-EBI (Databases and services at the EBI)]]
 
*[[BIN-FUNC-Databases|BIN-FUNC-Databases (Molecular Function Databases)]]
 
*[[BIN-FUNC-Databases|BIN-FUNC-Databases (Molecular Function Databases)]]
Line 75: Line 71:
  
  
 +
=== Evaluation ===
 +
<b>Evaluation: NA</b><br />
 +
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
 
== Contents ==
 
== Contents ==
<!-- included from "./components/BIN-Data_integration.components.txt", section: "contents" -->
 
  
 
{{Task|1=
 
{{Task|1=
Line 105: Line 103:
  
  
== Self-evaluation ==
 
<!--
 
=== Question 1===
 
 
Question ...
 
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
Answer ...
 
<div class="mw-collapsible-content">
 
Answer ...
 
 
</div>
 
  </div>
 
 
  {{Vspace}}
 
 
-->
 
== Notes ==
 
<!-- included from "./components/BIN-Data_integration.components.txt", section: "notes" -->
 
<!-- included from "./data/ABC-unit_components.txt", section: "notes" -->
 
<references />
 
 
== Further reading, links and resources ==
 
== Further reading, links and resources ==
  
Line 132: Line 109:
 
{{#pmid: 20652519}}
 
{{#pmid: 20652519}}
  
 +
== Notes ==
 +
<references />
  
 
{{Vspace}}
 
{{Vspace}}
  
 
<!-- included from "./data/ABC-unit_components.txt", section: "ABC-unit_ask" -->
 
 
----
 
 
{{Vspace}}
 
 
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
 
 
----
 
 
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 156: Line 123:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-05
+
:2020-09-24
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:1.0
+
:1.1
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.1 2020 Maintenance
 
*1.0 First live version.
 
*1.0 First live version.
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "./data/ABC-unit_components.txt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 16:32, 24 September 2020

Data Integration

(Integration of biological data; Identifier mapping; Entrez; UniProt; BioMart. ID mapping service and match() function.)


 


Abstract:

Data integration is a challenging problem. This unit discusses the issues and how the large databases solve this with NCBI's Entrez system and the EBI's UniProt Knoledeg Base and BioMart System. R coding exercises put some technical issues in practice.


Objectives:
This unit will ...

  • ... introduce issue of database integration and how the NCBI and the EBI address this;
  • ... demonstrate use of Entrez, UniProt and BioMart;
  • ... teach ID mapping techniques with R.

Outcomes:
After working through this unit you ...

  • ... are familar with the NCBI and EBI query and retrieval systems;
  • ... can use BioMart bot online and in R code;
  • ... can retrieve ID cross references via scripts and match IDs in large tables with R's match() function.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

  • Prerequisites:
    This unit builds on material covered in the following prerequisite units:


     



     



     


    Evaluation

    Evaluation: NA

    This unit is not evaluated for course marks.

    Contents


     

    Task:

    • Visit the UniProt ID mapping service, enter NP_010227 into the identifier field, select options from RefSeq Protein to UniProtKB and click Go.
    • Confirm that this retrieved the right identifier.
    • Also note that you could have searched with a list of IDs, and downloaded the results, e.g. for further processing in R.


     

    Task:

     
    • Open RStudio and load the ABC-units R project. If you have loaded it before, choose FileRecent projectsABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit.
    • Choose ToolsVersion ControlPull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
    • Type init() if requested.
    • Open the file BIN-Data_integration.R and follow the instructions.


     

    Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.


     


     

    Task:
    The biomartr bioconductor package is a second-generation R interface to BioMart that extends the biomaRt package. It has a good quick start introduction to "Functional Annotation".


    Further reading, links and resources

    UniProt - NCBI ID mapping - detailed information on how it works.
    Xie & Ahn (2010) Statistical methods for integrating multiple types of high-throughput data. Methods Mol Biol 620:511-29. (pmid: 20652519)

    PubMed ] [ DOI ]

    Notes


     


    About ...
     
    Author:

    Boris Steipe <boris.steipe@utoronto.ca>

    Created:

    2017-08-05

    Modified:

    2020-09-24

    Version:

    1.1

    Version history:

    • 1.1 2020 Maintenance
    • 1.0 First live version.
    • 0.1 First stub

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.