BIO Assignment Week 11

Assignment for Week 11
Protein-Protein Interactions

Note! This assignment is currently active. All significant changes will be announced on the mailing list.

Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.

Data Sources

Interaction databases have similar problems as sequence databases: the need for standards for abstracting biological concepts into computable objects, data integrity, search and retrieval, and the metrics of comparison. There is however an added complication: interactions are rarely all-or-none, and the high-throughput experimental methods have large false-positive and false-negative rates. This makes it necessary to define confidence scores for interactions. On top of experimental methods, there are also a variety of methods for computational interaction prediction. However, even though the "gold standard" are careful, small-scale laboratory experiments, different curated efforts on the same experimental publication usually lead to different results - with as little as 42% overlap between databases being reported.

Currently, likely the best integrated protein-protein interaction database is iRefWeb, built on the iRefIndex (which incidentally is available via an R-package on CRAN.) Funding and support for interaction databases is very patchy and we have seen far, far too many promising resources fall into irrelevance for lack of updating. iRef is a case in point. The Wodak lab's iRefWeb represents iRefIndex version 13 - from 2013. iRefIndex has since been updated to version 14.0 (in 2015), but Ian Donaldson who built this resource is now a freelance research scientist...

Another excellent database - and perhaps the one with the most stable, continuous curation effort is the EBI's IntAct database.

Task:

Find interactors for yeast Mbp1 in both iRefWeb and IntAct.
Are they largely the same?
The various visualization options of iRefWeb are currently not working.
... but the ones at IntAct are. Click on the Graph tab.

Then what?

If you are like me, you would now like to be able to link expression profiles, information about known complexes, GO annotations, knock-out phenotypes etc. etc. Too bad.

Data visualization and analysis

If you are serious about working with interaction networks, sooner or later you will be working with Cytoscape. It is more or less the standard among "professional" systems biologists. But it is not an online tool.

Task:

Navigate to the Cytoscape homepage and inform yourself what the program does and how to install it. There are many tutorials online available. But this is software that needs to be downloaded, and installed and it definitively has a learning curve.

The state of integrated online interaction viewers these days is actually pretty dismal. Have a look at this article that discusses the gap between what one would need to do, and what is offered:

Jeanquartier et al. (2015) Integrated web visualizations for protein-protein interaction databases. BMC Bioinformatics 16:195. (pmid: 26077899)

[ PubMed ] [ DOI ] Abstract

BACKGROUND: Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. RESULTS: We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. CONCLUSIONS: Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.

The online resource that comes out as the best is the one at the String database.

Task:

Navigate to the String database and search for saccharomyces cerevisiae Mbp1 interactors.
Visualize the network. Add a few proteins by clicking the (+) button a two or three times.
Click on a node to get a synopsis of its function.
Explore the "confidence", "evidence" and "actions" networks for the retrieved interactors.
Not all interacting proteins are also predicted to have a functional relationship with Mbp1. Do you agree?
Explore the clustering and layout options. Do you understand what they do?
Explore the Views on

Neighborhood (not relevant for our query though)
Fusion (also not relevant for our query)
Occurence
Coexpression
Experiments
Database, and
Textmining

Each of these are methods for predicting functional relationships. Figure out how each one contributes to evidence of a functional interaction between Mbp1 and its predicted functional partners. I find the Occurrence view a unique and intriguing tool: visualizing in which organisms groups of genes are either all absent or all present allows to quickly establish functional clusters.

In summary, String is a convincingly well built tool to explore functional relationships between proteins.

Links and resources

Razick et al. (2008) iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9:405. (pmid: 18823568)

[ PubMed ] [ DOI ] Abstract

Mora & Donaldson (2011) iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database. BMC Bioinformatics 12:455. (pmid: 22115179)

[ PubMed ] [ DOI ] Abstract

Footnotes and references

Ask, if things don't work for you!

If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.

Do consider how to ask your questions so that a meaningful answer is possible:
- How to create a Minimal, Complete, and Verifiable example on stackoverflow and ...
- How to make a great R reproducible example are required reading.

< Assignment 10

BIO Assignment Week 11

Contents

Data Sources

Data visualization and analysis

Links and resources

Footnotes and references

Ask, if things don't work for you!

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools