Difference between revisions of "BIN-Data integration"
m |
m |
||
(2 intermediate revisions by the same user not shown) | |||
Line 41: | Line 41: | ||
<b>Deliverables:</b><br /> | <b>Deliverables:</b><br /> | ||
<section begin=deliverables /> | <section begin=deliverables /> | ||
− | < | + | <li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li> |
− | + | <li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li> | |
− | < | + | <li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li> |
− | |||
− | < | ||
− | |||
<section end=deliverables /> | <section end=deliverables /> | ||
<!-- ============================ --> | <!-- ============================ --> | ||
Line 52: | Line 49: | ||
<section begin=prerequisites /> | <section begin=prerequisites /> | ||
<b>Prerequisites:</b><br /> | <b>Prerequisites:</b><br /> | ||
− | + | This unit builds on material covered in the following prerequisite units:<br /> | |
− | This unit builds on material covered in the following prerequisite units: | ||
*[[BIN-EBI|BIN-EBI (Databases and services at the EBI)]] | *[[BIN-EBI|BIN-EBI (Databases and services at the EBI)]] | ||
*[[BIN-FUNC-Databases|BIN-FUNC-Databases (Molecular Function Databases)]] | *[[BIN-FUNC-Databases|BIN-FUNC-Databases (Molecular Function Databases)]] | ||
Line 75: | Line 71: | ||
+ | === Evaluation === | ||
+ | <b>Evaluation: NA</b><br /> | ||
+ | <div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div> | ||
== Contents == | == Contents == | ||
− | |||
{{Task|1= | {{Task|1= | ||
Line 105: | Line 103: | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Further reading, links and resources == | == Further reading, links and resources == | ||
Line 132: | Line 109: | ||
{{#pmid: 20652519}} | {{#pmid: 20652519}} | ||
+ | == Notes == | ||
+ | <references /> | ||
{{Vspace}} | {{Vspace}} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
<div class="about"> | <div class="about"> | ||
Line 156: | Line 123: | ||
:2017-08-05 | :2017-08-05 | ||
<b>Modified:</b><br /> | <b>Modified:</b><br /> | ||
− | : | + | :2020-09-24 |
<b>Version:</b><br /> | <b>Version:</b><br /> | ||
− | :1. | + | :1.1 |
<b>Version history:</b><br /> | <b>Version history:</b><br /> | ||
+ | *1.1 2020 Maintenance | ||
*1.0 First live version. | *1.0 First live version. | ||
*0.1 First stub | *0.1 First stub | ||
</div> | </div> | ||
− | |||
− | |||
{{CC-BY}} | {{CC-BY}} | ||
+ | [[Category:ABC-units]] | ||
+ | {{UNIT}} | ||
+ | {{LIVE}} | ||
</div> | </div> | ||
<!-- [END] --> | <!-- [END] --> |
Latest revision as of 16:32, 24 September 2020
Data Integration
(Integration of biological data; Identifier mapping; Entrez; UniProt; BioMart. ID mapping service and match() function.)
Abstract:
Data integration is a challenging problem. This unit discusses the issues and how the large databases solve this with NCBI's Entrez system and the EBI's UniProt Knoledeg Base and BioMart System. R coding exercises put some technical issues in practice.
Objectives:
|
Outcomes:
|
Deliverables:
Prerequisites:
This unit builds on material covered in the following prerequisite units:
Evaluation
Evaluation: NA
Contents
Task:
- Read the introductory notes on concepts and approaches to data integration in bioinformatics.
Task:
- Visit the UniProt ID mapping service, enter
NP_010227
into the identifier field, select options from RefSeq Protein to UniProtKB and click Go. - Confirm that this retrieved the right identifier.
- Also note that you could have searched with a list of IDs, and downloaded the results, e.g. for further processing in R.
Task:
- Open RStudio and load the
ABC-units
R project. If you have loaded it before, choose File → Recent projects → ABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit. - Choose Tools → Version Control → Pull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
- Type
init()
if requested. - Open the file
BIN-Data_integration.R
and follow the instructions.
Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.
Task:
The biomartr
bioconductor package is a second-generation R interface to BioMart that extends the biomaRt
package. It has a good quick start introduction to "Functional Annotation".
- Navigate to https://cran.r-project.org/web/packages/biomartr/vignettes/Functional_Annotation.html
- Work through the tutorial.
Further reading, links and resources
Xie & Ahn (2010) Statistical methods for integrating multiple types of high-throughput data. Methods Mol Biol 620:511-29. (pmid: 20652519) |
Notes
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2020-09-24
Version:
- 1.1
Version history:
- 1.1 2020 Maintenance
- 1.0 First live version.
- 0.1 First stub
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.