Expected Preparations:
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Keywords: Integration of biological data; Identifier mapping; Entrez; UniProt; BioMart ID mapping service and match() function. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Objectives:
This unit will …
|
Outcomes:
After working through this unit you …
|
||||||||||||||||||||||
|
|||||||||||||||||||||||
Deliverables: Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these. Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page. |
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Evaluation: NA: This unit is not evaluated for course marks. |
Data integration is a challenging problem. This unit discusses the issues and how the large databases solve this with NCBI’s Entrez system and the EBI’s UniProt Knoledeg Base and BioMart System. R coding exercises put some technical issues in practice.
Task…
Task…
NP_010227
into the
identifier field, select options from RefSeq Protein to
UniProtKB and click Go.
Task…
ABC-units
R project. If you
have loaded it before, choose File ▸ Recent
projects ▸ ABC-Units. If you have not loaded
it before, follow the instructions in the RPR-Introduction
unit.init()
if requested.BIN-Data_integration.R
and follow the
instructions.
Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.
Task…
The biomartr
bioconductor package is a second-generation
R interface to BioMart that extends the biomaRt
package. It
has a good quick start introduction to “Functional Annotation”.
Navigate to:
https://cran.r-project.org/web/packages/biomartr/vignettes/Functional_Annotation.html
Work through the tutorial.
UniProt - NCBI ID mapping - detailed information on how it works.
Xie, Yang
and Chul Ahn. (2010). “Statistical methods for integrating multiple
types of high-throughput data”. Methods in Molecular Biology
(Clifton, N.j.) 620:511–29 .
[PMID: 20652519]
[DOI: 10.1007/978-1-60761-580-4_19]
If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.
Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.
[END]