Difference between revisions of "BIN-Data integration"
m |
m |
||
Line 1: | Line 1: | ||
<div id="ABC"> | <div id="ABC"> | ||
− | <div style="padding:5px; border:1px solid #000000; background-color:# | + | <div style="padding:5px; border:1px solid #000000; background-color:#f4d7b7; font-size:300%; font-weight:400; color: #000000; width:100%;"> |
Data Integration | Data Integration | ||
− | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:# | + | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#f4d7b7; font-size:30%; font-weight:200; color: #000000; "> |
(Integration of biological data; Identifier mapping; Entrez; UniProt; BioMart. ID mapping service and match() function.) | (Integration of biological data; Identifier mapping; Entrez; UniProt; BioMart. ID mapping service and match() function.) | ||
</div> | </div> | ||
Line 10: | Line 10: | ||
− | <div style="padding:5px; border:1px solid #000000; background-color:# | + | <div style="padding:5px; border:1px solid #000000; background-color:#f4d7b733; font-size:85%;"> |
<div style="font-size:118%;"> | <div style="font-size:118%;"> | ||
<b>Abstract:</b><br /> | <b>Abstract:</b><br /> | ||
Line 42: | Line 42: | ||
<section begin=deliverables /> | <section begin=deliverables /> | ||
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-time_management" --> | <!-- included from "./data/ABC-unit_components.txt", section: "deliverables-time_management" --> | ||
− | + | <li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li> | |
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-journal" --> | <!-- included from "./data/ABC-unit_components.txt", section: "deliverables-journal" --> | ||
− | + | <li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li> | |
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-insights" --> | <!-- included from "./data/ABC-unit_components.txt", section: "deliverables-insights" --> | ||
− | + | <li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li> | |
<section end=deliverables /> | <section end=deliverables /> | ||
<!-- ============================ --> | <!-- ============================ --> | ||
Line 53: | Line 53: | ||
<b>Prerequisites:</b><br /> | <b>Prerequisites:</b><br /> | ||
<!-- included from "./data/ABC-unit_components.txt", section: "notes-prerequisites" --> | <!-- included from "./data/ABC-unit_components.txt", section: "notes-prerequisites" --> | ||
− | This unit builds on material covered in the following prerequisite units: | + | This unit builds on material covered in the following prerequisite units:<br /> |
*[[BIN-EBI|BIN-EBI (Databases and services at the EBI)]] | *[[BIN-EBI|BIN-EBI (Databases and services at the EBI)]] | ||
*[[BIN-FUNC-Databases|BIN-FUNC-Databases (Molecular Function Databases)]] | *[[BIN-FUNC-Databases|BIN-FUNC-Databases (Molecular Function Databases)]] | ||
Line 66: | Line 66: | ||
+ | {{REVISE}} | ||
{{Smallvspace}} | {{Smallvspace}} |
Revision as of 12:38, 16 September 2020
Data Integration
(Integration of biological data; Identifier mapping; Entrez; UniProt; BioMart. ID mapping service and match() function.)
Abstract:
Data integration is a challenging problem. This unit discusses the issues and how the large databases solve this with NCBI's Entrez system and the EBI's UniProt Knoledeg Base and BioMart System. R coding exercises put some technical issues in practice.
Objectives:
|
Outcomes:
|
Deliverables:
Prerequisites:
This unit builds on material covered in the following prerequisite units:
Contents
Task:
- Read the introductory notes on concepts and approaches to data integration in bioinformatics.
Task:
- Visit the UniProt ID mapping service, enter
NP_010227
into the identifier field, select options from RefSeq Protein to UniProtKB and click Go. - Confirm that this retrieved the right identifier.
- Also note that you could have searched with a list of IDs, and downloaded the results, e.g. for further processing in R.
Task:
- Open RStudio and load the
ABC-units
R project. If you have loaded it before, choose File → Recent projects → ABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit. - Choose Tools → Version Control → Pull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
- Type
init()
if requested. - Open the file
BIN-Data_integration.R
and follow the instructions.
Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.
Task:
The biomartr
bioconductor package is a second-generation R interface to BioMart that extends the biomaRt
package. It has a good quick start introduction to "Functional Annotation".
- Navigate to https://cran.r-project.org/web/packages/biomartr/vignettes/Functional_Annotation.html
- Work through the tutorial.
Self-evaluation
Notes
Further reading, links and resources
Xie & Ahn (2010) Statistical methods for integrating multiple types of high-throughput data. Methods Mol Biol 620:511-29. (pmid: 20652519) |
If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2017-08-05
Version:
- 1.0
Version history:
- 1.0 First live version.
- 0.1 First stub
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.