Difference between revisions of "BIO Assignment Week 11"

Revision as of 11:08, 19 September 2016

Assignment for Week 11
Protein-Protein Interactions

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.

Data Sources

Interaction databases have similar problems as sequence databases: the need for standards for abstracting biological concepts into computable objects, data integrity, search and retrieval, and the metrics of comparison. There is however an added complication: interactions are rarely all-or-none, and the high-throughput experimental methods have large false-positive and false-negative rates. This makes it necessary to define confidence scores for interactions. On top of experimental methods, there are also a variety of methods for computational interaction prediction. However, even though the "gold standard" are careful, small-scale laboratory experiments, different curated efforts on the same experimental publication usually lead to different results - with as little as 42% overlap between databases being reported.

Currently, likely the best integrated protein-protein interaction database is iRefWeb, built on the iRefIndex (which incidentally is available via an R-package on CRAN.) Funding and support for interaction databases is very patchy and we have seen far, far too many promising resources fall into irrelevance for lack of updating. iRef is a case in point. The Wodak lab's iRefWeb represents iRefIndex version 13 - from 2013. iRefIndex has since been updated to version 14.0 (in 2015), but Ian Donaldson who built this resource is now a freelance research scientist...

Another excellent database - and perhaps the one with the most stable, continuous curation effort is the EBI's IntAct database.

Task:

Find interactors for yeast Mbp1 in both iRefWeb and IntAct.
Are they largely the same?
The various visualization options of iRefWeb are currently not working.
... but the ones at IntAct are. Click on the Graph tab.

Then what?

If you are like me, you would now like to be able to link expression profiles, information about known complexes, GO annotations, knock-out phenotypes etc. etc. Too bad.

Data visualization and analysis

If you are serious about working with interaction networks, sooner or later you will be working with Cytoscape. It is more or less the standard among "professional" systems biologists. But it is not an online tool.

Task:

Navigate to the Cytoscape homepage and inform yourself what the program does and how to install it. There are many tutorials online available. But this is software that needs to be downloaded, and installed and it definitively has a learning curve.

The state of integrated online interaction viewers these days is actually pretty dismal. Have a look at this article that discusses the gap between what one would need to do, and what is offered:

Jeanquartier et al. (2015) Integrated web visualizations for protein-protein interaction databases. BMC Bioinformatics 16:195. (pmid: 26077899)

[ PubMed ] [ DOI ] BACKGROUND: Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. RESULTS: We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. CONCLUSIONS: Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.

The online resource that comes out as the best is the one at the String database.

Task:

Navigate to the String database and search for saccharomyces cerevisiae Mbp1 interactors.
Visualize the network. Add a few proteins by clicking the (+) button a two or three times.
Click on a node to get a synopsis of its function.
Explore the "confidence", "evidence" and "actions" networks for the retrieved interactors.
Not all interacting proteins are also predicted to have a functional relationship with Mbp1. Do you agree?
Explore the clustering and layout options. Do you understand what they do?
Explore the Views on

Neighborhood (not relevant for our query though)
Fusion (also not relevant for our query)
Occurence
Coexpression
Experiments
Database, and
Textmining

Each of these are methods for predicting functional relationships. Figure out how each one contributes to evidence of a functional interaction between Mbp1 and its predicted functional partners. I find the Occurrence view a unique and intriguing tool: visualizing in which organisms groups of genes are either all absent or all present allows to quickly establish functional clusters.

In summary, String is a convincingly well built tool to explore functional relationships between proteins.

Links and resources

Razick et al. (2008) iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9:405. (pmid: 18823568)

[ PubMed ] [ DOI ] BACKGROUND: Interaction data for a given protein may be spread across multiple databases. We set out to create a unifying index that would facilitate searching for these data and that would group together redundant interaction data while recording the methods used to perform this grouping. RESULTS: We present a method to generate a key for a protein interaction record and a key for each participant protein. These keys may be generated by anyone using only the primary sequence of the proteins, their taxonomy identifiers and the Secure Hash Algorithm. Two interaction records will have identical keys if they refer to the same set of identical protein sequences and taxonomy identifiers. We define records with identical keys as a redundant group. Our method required that we map protein database references found in interaction records to current protein sequence records. Operations performed during this mapping are described by a mapping score that may provide valuable feedback to source interaction databases on problematic references that are malformed, deprecated, ambiguous or unfound. Keys for protein participants allow for retrieval of interaction information independent of the protein references used in the original records. CONCLUSION: We have applied our method to protein interaction records from BIND, BioGrid, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. The resulting interaction reference index is provided in PSI-MITAB 2.5 format at http://irefindex.uio.no. This index may form the basis of alternative redundant groupings based on gene identifiers or near sequence identity groupings.

Mora & Donaldson (2011) iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database. BMC Bioinformatics 12:455. (pmid: 22115179)

[ PubMed ] [ DOI ] BACKGROUND: The iRefIndex addresses the need to consolidate protein interaction data into a single uniform data resource. iRefR provides the user with access to this data source from an R environment. RESULTS: The iRefR package includes tools for selecting specific subsets of interest from the iRefIndex by criteria such as organism, source database, experimental method, protein accessions and publication identifier. Data may be converted between three representations (MITAB, edgeList and graph) for use with other R packages such as igraph, graph and RBGL.The user may choose between different methods for resolving redundancies in interaction data and how n-ary data is represented. In addition, we describe a function to identify binary interaction records that possibly represent protein complexes. We show that the user choice of data selection, redundancy resolution and n-ary data representation all have an impact on graphical analysis. CONCLUSIONS: The package allows the user to control how these issues are dealt with and communicate them via an R-script written using the iRefR package - this will facilitate communication of methods, reproducibility of network analyses and further modification and comparison of methods by researchers.

Footnotes and references

Ask, if things don't work for you!

If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.

Do consider how to ask your questions so that a meaningful answer is possible:
- How to create a Minimal, Complete, and Verifiable example on stackoverflow and ...
- How to make a great R reproducible example are required reading.

< Assignment 10

@@ Line 9: / Line 9: @@
 </tr></table>
-{{Template:Active}}
+{{Template:Inactive}}
 Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.

Difference between revisions of "BIO Assignment Week 11"

Revision as of 11:08, 19 September 2016

Contents

Data Sources

Data visualization and analysis

Links and resources

Footnotes and references

Ask, if things don't work for you!

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools