BIN-Miscellaneous DB

From "A B C"
Revision as of 17:32, 7 September 2017 by Boris (talk | contribs)
Jump to navigation Jump to search

Miscellaneous Databases for Bioinformatics


 

Keywords:  SGD; HuGO HGNC; Genecards; OMIM; STRING; ...


 



 


Caution!

This unit is under development. There is some contents here but it is incomplete and/or may change significantly: links may lead to nowhere, the contents is likely going to be rearranged, and objectives, deliverables etc. may be incomplete or missing. Do not work with this material until it is updated to "live" status.


 


Abstract

This unit collects short explorations of a number of databases. It is probably best not to wotk through the units all in one go, but to go through them in context of an actual use case, when you need information from one of them.


 


This unit ...

Prerequisites

You need to complete the following units before beginning this one:


 


Objectives

...


 


Outcomes

...


 


Deliverables

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


 


Evaluation

Evaluation: NA

This unit is not evaluated for course marks.


 


Contents

SGD - a Yeast Model Organism Database=

Yeast happens to have a very well maintained model organism database - a Web resource dedicated to Saccharomyces cerevisiae. Where such dedicated resources are available, they are very useful for the community. For the general case however, we need to work with one of the large, general data providers - the NCBI and the EBI. But in order to get a sense of the type of data that is available, let's explore the SGD database.

Task:
Access the information page on Mbp1 at the Saccharomyces Genome Database.

  1. Browse through the Summary page and note the available information: you should see:
    • information about the gene and the protein;
    • Information about it's roles in the cell curated at the Gene Ontology database;
    • Information about knock-out phenotypes; (Amazing. Would you have imagined that this is a non-essential gene?)
    • Information about protein-protein interactions;
    • Regulation and expression;
    • A curators' summary of our understanding of the protein. Mandatory reading.
    • And key references.
  2. Access the Protein tab and note the much more detailed information.
    • Domains and their classification;
    • Sequence;
    • Shared domains;
    • and much more...

You will notice that some of this information relates to the molecule itself, and some of it relates to its relationship with other molecules. Some of it is stored at SGD, and some of it is cross-referenced from other databases. And we have textual data, numeric data, and images.

How would you store such data to use it in your project?


 


 

If we would be working on yeast, most data we need is right here: curated, kept current and consistent, referenced to the literature and ready to use. But if you are working on a different species - some "YFO"- you need to integrate data yourself, from data sources such as the NCBI, or UniProt. The upside is that most of the information like this is available for many, many species. The downside is that you have to integrate information from many different sources essentially "by hand".



 

The state of integrated online interaction viewers these days could be improved. Have a look at this article that discusses the gap between what one would need to do, and what is offered:

Jeanquartier et al. (2015) Integrated web visualizations for protein-protein interaction databases. BMC Bioinformatics 16:195. (pmid: 26077899)

PubMed ] [ DOI ] BACKGROUND: Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. RESULTS: We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. CONCLUSIONS: Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.


 


The online resource that comes out as the best is the one at the String database.

Task:

  • Navigate to the String database and search for saccharomyces cerevisiae Mbp1 interactors.
  • Visualize the network. Add a few proteins by clicking the (+) button a two or three times.
  • Click on a node to get a synopsis of its function.
  • Explore the "confidence", "evidence" and "actions" networks for the retrieved interactors.
  • Not all interacting proteins are also predicted to have a functional relationship with Mbp1. Do you agree?
  • Explore the clustering and layout options. Do you understand what they do?
  • Explore the Views on
  • Neighborhood (not relevant for our query though)
  • Fusion (also not relevant for our query)
  • Occurence
  • Coexpression
  • Experiments
  • Database, and
  • Textmining

Each of these are methods for predicting functional relationships. Figure out how each one contributes to evidence of a functional interaction between Mbp1 and its predicted functional partners. I find the Occurrence view a unique and intriguing tool: visualizing in which organisms groups of genes are either all absent or all present allows to quickly establish functional clusters.

In summary, String is a convincingly well built tool to explore functional relationships between proteins.


 



 


Further reading, links and resources

 


Notes


 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-08-05

Version:

0.1

Version history:

  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.