Difference between revisions of "BIN-EBI"

From "A B C"
Jump to navigation Jump to search
m
m
Line 84: Line 84:
 
== Contents ==
 
== Contents ==
 
<!-- included from "../components/BIN-EBI.components.wtxt", section: "contents" -->
 
<!-- included from "../components/BIN-EBI.components.wtxt", section: "contents" -->
 
== Contents ==
 
  
 
The [https://www.ebi.ac.uk/ '''EBI''' (European Bioinformatics Institute)] is one of the two largest, international providers of data for genomics and molecular biology (the NCBI is the other). It organizes a cutting-edge program of data management at the largest scale with a special focus on data integration and services, it makes data, services, and educational resources freely and openly available over the Internet, and it runs significant in-house research projects.
 
The [https://www.ebi.ac.uk/ '''EBI''' (European Bioinformatics Institute)] is one of the two largest, international providers of data for genomics and molecular biology (the NCBI is the other). It organizes a cutting-edge program of data management at the largest scale with a special focus on data integration and services, it makes data, services, and educational resources freely and openly available over the Internet, and it runs significant in-house research projects.

Revision as of 03:07, 3 October 2017

Databases and services at the EBI


 

Keywords:  The EBI databases and services, Uniprot


 



 


Caution!

This unit is under development. There is some contents here but it is incomplete and/or may change significantly: links may lead to nowhere, the contents is likely going to be rearranged, and objectives, deliverables etc. may be incomplete or missing. Do not work with this material until it is updated to "live" status.


 


Abstract

...


 


This unit ...

Prerequisites

You need to complete the following units before beginning this one:


 


Objectives

...


 


Outcomes

...


 


Deliverables

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


 


Evaluation

Evaluation: Integrated Unit

This unit should be submitted for evaluation for a maximum of 10 marks. Details TBD.


 


Contents

The EBI (European Bioinformatics Institute) is one of the two largest, international providers of data for genomics and molecular biology (the NCBI is the other). It organizes a cutting-edge program of data management at the largest scale with a special focus on data integration and services, it makes data, services, and educational resources freely and openly available over the Internet, and it runs significant in-house research projects.

In this unit we explore some of the offerings of the EBI that can contribute to our objective of studying a particular gene in an organism of interest.


Task:

  • Read the introductory article on EBI databases and services:
Cook et al. (2016) The European Bioinformatics Institute in 2016: Data growth and integration. Nucleic Acids Res 44:D20-6. (pmid: 26673705)

PubMed ] [ DOI ] New technologies are revolutionising biological research and its applications by making it easier and cheaper to generate ever-greater volumes and types of data. In response, the services and infrastructure of the European Bioinformatics Institute (EMBL-EBI, www.ebi.ac.uk) are continually expanding: total disk capacity increases significantly every year to keep pace with demand (75 petabytes as of December 2015), and interoperability between resources remains a strategic priority. Since 2014 we have launched two new resources: the European Variation Archive for genetic variation data and EMPIAR for two-dimensional electron microscopy data, as well as a Resource Description Framework platform. We also launched the Embassy Cloud service, which allows users to run large analyses in a virtual environment next to EMBL-EBI's vast public data resources.


 


Task:

  • Read
The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158-D169. (pmid: 27899622)

PubMed ] [ DOI ] The UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The remainder are automatically annotated based on rule systems that rely on the expert curated knowledge. Since our last update in 2014, we have more than doubled the number of reference proteomes to 5631, giving a greater coverage of taxonomic diversity. We implemented a pipeline to remove redundant highly similar proteomes that were causing excessive redundancy in UniProt. The initial run of this pipeline reduced the number of sequences in UniProt by 47 million. For our users interested in the accessory proteomes, we have made available sets of pan proteome sequences that cover the diversity of sequences for each species that is found in its strains and sub-strains. To help interpretation of genomic variants, we provide tracks of detailed protein information for the major genome browsers. We provide a SPARQL endpoint that allows complex queries of the more than 22 billion triples of data in UniProt (http://sparql.uniprot.org/). UniProt resources can be accessed via the website at http://www.uniprot.org/.

Finn et al. (2017) InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res 45:D190-D199. (pmid: 27899635)

PubMed ] [ DOI ] InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.

Task:

  1. In the BIN-Storing_data unit you have found the protein of YFO that is most similar to yeast Mbp1, in YFO.
  2. Finally access the UniProt ID mapping service to retrieve the UniProt ID for the protein. Paste the RefSeq ID and choose RefSeq Protein as the From: option and UniProtKB as the To: option.
If the mapping works, the UniProt ID will be in the Entry: column of the table that is being returned. Click the link and have a look at the UniProt entry page while you're there.




 


Further reading, links and resources

 


Notes


 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-08-07

Version:

0.1

Version history:

  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.