Difference between revisions of "BIN-Data integration"
m |
m |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | <div id=" | + | <div id="ABC"> |
− | + | <div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;"> | |
Data Integration | Data Integration | ||
− | + | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; "> | |
− | + | (Integration of biological data; Identifier mapping; Entrez; UniProt; BioMart. ID mapping service and match() function.) | |
− | + | </div> | |
− | |||
− | <div | ||
− | |||
− | Integration of biological data; Identifier mapping; Entrez; UniProt; BioMart. ID mapping service and match() function. | ||
</div> | </div> | ||
− | {{ | + | {{Smallvspace}} |
− | |||
− | |||
− | + | <div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;"> | |
− | + | <div style="font-size:118%;"> | |
− | + | <b>Abstract:</b><br /> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | < | ||
− | <div | ||
− | |||
<section begin=abstract /> | <section begin=abstract /> | ||
− | |||
Data integration is a challenging problem. This unit discusses the issues and how the large databases solve this with NCBI's Entrez system and the EBI's UniProt Knoledeg Base and BioMart System. R coding exercises put some technical issues in practice. | Data integration is a challenging problem. This unit discusses the issues and how the large databases solve this with NCBI's Entrez system and the EBI's UniProt Knoledeg Base and BioMart System. R coding exercises put some technical issues in practice. | ||
<section end=abstract /> | <section end=abstract /> | ||
− | + | </div> | |
− | + | <!-- ============================ --> | |
− | + | <hr> | |
− | + | <table> | |
− | == | + | <tr> |
− | === | + | <td style="padding:10px;"> |
− | < | + | <b>Objectives:</b><br /> |
− | < | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | < | ||
This unit will ... | This unit will ... | ||
* ... introduce issue of database integration and how the NCBI and the EBI address this; | * ... introduce issue of database integration and how the NCBI and the EBI address this; | ||
* ... demonstrate use of Entrez, UniProt and BioMart; | * ... demonstrate use of Entrez, UniProt and BioMart; | ||
* ... teach ID mapping techniques with R. | * ... teach ID mapping techniques with R. | ||
− | + | </td> | |
− | + | <td style="padding:10px;"> | |
− | + | <b>Outcomes:</b><br /> | |
− | |||
− | |||
− | < | ||
After working through this unit you ... | After working through this unit you ... | ||
* ... are familar with the NCBI and EBI query and retrieval systems; | * ... are familar with the NCBI and EBI query and retrieval systems; | ||
* ... can use BioMart bot online and in R code; | * ... can use BioMart bot online and in R code; | ||
* ... can retrieve ID cross references via scripts and match IDs in large tables with R's <code>match()</code> function. | * ... can retrieve ID cross references via scripts and match IDs in large tables with R's <code>match()</code> function. | ||
+ | </td> | ||
+ | </tr> | ||
+ | </table> | ||
+ | <!-- ============================ --> | ||
+ | <hr> | ||
+ | <b>Deliverables:</b><br /> | ||
+ | <section begin=deliverables /> | ||
+ | <li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li> | ||
+ | <li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li> | ||
+ | <li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li> | ||
+ | <section end=deliverables /> | ||
+ | <!-- ============================ --> | ||
+ | <hr> | ||
+ | <section begin=prerequisites /> | ||
+ | <b>Prerequisites:</b><br /> | ||
+ | This unit builds on material covered in the following prerequisite units:<br /> | ||
+ | *[[BIN-EBI|BIN-EBI (Databases and services at the EBI)]] | ||
+ | *[[BIN-FUNC-Databases|BIN-FUNC-Databases (Molecular Function Databases)]] | ||
+ | *[[BIN-Miscellaneous_DB|BIN-Miscellaneous_DB (Miscellaneous Databases for Bioinformatics)]] | ||
+ | *[[BIN-NCBI|BIN-NCBI (The NCBI Database and Services)]] | ||
+ | *[[BIN-PDB|BIN-PDB (The RCSB-PDB Structure Database)]] | ||
+ | <section end=prerequisites /> | ||
+ | <!-- ============================ --> | ||
+ | </div> | ||
− | {{ | + | {{Smallvspace}} |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | {{ | + | {{Smallvspace}} |
− | + | __TOC__ | |
− | |||
− | |||
− | |||
− | |||
{{Vspace}} | {{Vspace}} | ||
− | </ | + | === Evaluation === |
− | <div | + | <b>Evaluation: NA</b><br /> |
+ | <div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div> | ||
== Contents == | == Contents == | ||
− | |||
{{Task|1= | {{Task|1= | ||
Line 122: | Line 103: | ||
+ | == Further reading, links and resources == | ||
− | + | <div class="reference-box">[http://www.uniprot.org/help/ncbi_mappings '''UniProt - NCBI ID mapping'''] - detailed information on how it works.</div> | |
− | |||
− | |||
− | |||
{{#pmid: 20652519}} | {{#pmid: 20652519}} | ||
− | |||
− | |||
− | |||
− | |||
== Notes == | == Notes == | ||
− | |||
− | |||
<references /> | <references /> | ||
{{Vspace}} | {{Vspace}} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
<div class="about"> | <div class="about"> | ||
Line 190: | Line 123: | ||
:2017-08-05 | :2017-08-05 | ||
<b>Modified:</b><br /> | <b>Modified:</b><br /> | ||
− | : | + | :2020-09-24 |
<b>Version:</b><br /> | <b>Version:</b><br /> | ||
− | :1. | + | :1.1 |
<b>Version history:</b><br /> | <b>Version history:</b><br /> | ||
+ | *1.1 2020 Maintenance | ||
*1.0 First live version. | *1.0 First live version. | ||
*0.1 First stub | *0.1 First stub | ||
</div> | </div> | ||
− | |||
− | |||
{{CC-BY}} | {{CC-BY}} | ||
+ | [[Category:ABC-units]] | ||
+ | {{UNIT}} | ||
+ | {{LIVE}} | ||
</div> | </div> | ||
<!-- [END] --> | <!-- [END] --> |
Latest revision as of 16:32, 24 September 2020
Data Integration
(Integration of biological data; Identifier mapping; Entrez; UniProt; BioMart. ID mapping service and match() function.)
Abstract:
Data integration is a challenging problem. This unit discusses the issues and how the large databases solve this with NCBI's Entrez system and the EBI's UniProt Knoledeg Base and BioMart System. R coding exercises put some technical issues in practice.
Objectives:
|
Outcomes:
|
Deliverables:
Prerequisites:
This unit builds on material covered in the following prerequisite units:
Evaluation
Evaluation: NA
Contents
Task:
- Read the introductory notes on concepts and approaches to data integration in bioinformatics.
Task:
- Visit the UniProt ID mapping service, enter
NP_010227
into the identifier field, select options from RefSeq Protein to UniProtKB and click Go. - Confirm that this retrieved the right identifier.
- Also note that you could have searched with a list of IDs, and downloaded the results, e.g. for further processing in R.
Task:
- Open RStudio and load the
ABC-units
R project. If you have loaded it before, choose File → Recent projects → ABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit. - Choose Tools → Version Control → Pull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
- Type
init()
if requested. - Open the file
BIN-Data_integration.R
and follow the instructions.
Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.
Task:
The biomartr
bioconductor package is a second-generation R interface to BioMart that extends the biomaRt
package. It has a good quick start introduction to "Functional Annotation".
- Navigate to https://cran.r-project.org/web/packages/biomartr/vignettes/Functional_Annotation.html
- Work through the tutorial.
Further reading, links and resources
Xie & Ahn (2010) Statistical methods for integrating multiple types of high-throughput data. Methods Mol Biol 620:511-29. (pmid: 20652519) |
Notes
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2020-09-24
Version:
- 1.1
Version history:
- 1.1 2020 Maintenance
- 1.0 First live version.
- 0.1 First stub
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.