Information Sources for Bioinformatics

Contents
- Bioinformatics: Concepts
Information Sources
Questions, comments
References

Expected Preparations:

Begin right away: This unit needs no specific preparations.

Keywords: Wikipedia; NAR; Bioinformatics.ca; PubMed; Citation index

Objectives:

Get a sense for the academic field of bioinformatics and its community.

Outcomes:

You are familiar with the sites that facilitate much of the ongoing discussion in the field.

Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.

Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these.

Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Evaluation:

NA: This unit is not evaluated for course marks.

This unit introduces key information sources for bioinformatics: journals, forums, and supporting sites.

Bioinformatics: Concepts

Let us begin with some general observations on bioinformatics, to set the stage.

Molecular biology is an information science, just as much as a molecular science. The 20th century has seen profound advances in our knowledge of life’s processes, and our understanding of its principles, and we have discovered that at its basis there is a flow of pure information.

Information is stored as nucleotide sequences in DNA, expressed and regulated as messenger RNA, and translated into sequences of amino acids. Randomly configured polypeptide chains spontaneously fold into a defined three-dimensional structure along a gradient that minimizes their free energy. That structure is the basis of a protein’s function.

Changes of the nucleotide sequence thus entail changes in protein function, which determines what sequences are more or less likely to be propagated in a population. This causes the nucleotide sequences to evolve over time.

Abstractions and models that focus on inheritable information, rather than on the details of its representation, have proven to be remarkably powerful in explaining the basic features of life, such as robust self-organization and the process–and consequences–of evolution.

Proteins assemble to complexes, complexes form pathways, pathways build cells, cells make organisms. Non-reducable complexity arises at every level.

In principle, all the information that is required to specify an organism is contained in its genome. This is trivially implied by the successful whole-genome synthesis experiments. The genome itself can be fully sequenced, therefore the information it contains is easily accessible to us. However, the expression of the information is organized in a hierarchical fashion, in complex, interacting subsystems. Knowledge of a DNA sequence does not (yet) allow us to predict the protein’s structure. Knowledge of a protein’s structure does not (yet) allow us to predict its interactions and assembly to molecular “machines”. Knowledge of these complexes does not (yet) allow us to piece together their functional connections, as they build up the metabolic or regulatory systems, or the structural framework of a cell.

At each level, incomplete information prevents us from predicting the next-higher level of organization from its components. The sheer volume of data - though it is indeed challenging - is a merely a technical obstacle.

When studying molecular biology with computational tools, we are reminded that knowledge in science is always dynamic. The field changes, and the methods change as well. Once we solve a problem, we move on to the next one. Our targets are moving. The information in bioinformatics may half a half-life of only two years or so. Therefore: learning how to learn is as important as learning any particular fact.

What is Bioinformatics?

As I see it, Bioinformatics is a set of paradigms that can be roughly said to span two poles:

Data management is the fundamental task of bioinformatics.

This is the pole of bioinformatics. If we look at the practice of bioinformatics, with its many on-line databases that need to be curated, integrated, kept consistent and made queryable, and with the industrial-scale databases of genome sequences, we might conclude that biological data management is what bioinformatics is all about.

Important examples of such data resources include the US NCBI (National Center for Biotechnology Information)] and the EBI (Eurepean Bioinformatics Institute) as major centres for molecular data, especially sequence and genome data. But their multitude of services, data, and tools need to be constructed, populated and maintained - and used judiciously.

The sheer number of information sources is not even the only problem! Other challenges for the computational biologist include …

Data overload - the sheer number of public databases that are freely available ranges in the thousands;
Service overload - the sheer number of free online services ranges in the thousands;
Poor integration of databases who are competing for recognition and funding;
Peer review and quality control is lacking;
Cultural gap between life- and computer sciences.

To find direction, it is important not to focus too much on methods, but to ask: How can bioinformatics help to understand biology? I.e. the question is not: “What can you do?” but: “What should you do?” !

Data does not explain itself. While we must apply modern concepts and tools to manage our data, the crucial endpoint is always our interpretation in the context of biological insight. Keeping up with the tools however is a challenge. What is the best available data repository? What is the best available tool to search for information? What is the best way to access data? It is hard to compare data resources, for example, to rank quality of curation. And by the time we have reviewed all relevant databases, many will already have become obselete. This problem holds for the data sources, as well as for analysis tools and services. To survive in this domain, we need to focus on objectives, not on methodology.

Therefore, we could say …

Modeling is the fundamental task of bioinformatics.

This is the pole of computational biology. Looking beyond data management takes bioinformatics as a way to study biology. This aspect has a lot more to do with modeling, and with the question of understanding biology, than with managing large amounts of data. Understanding biology means being able to abstract from apparent complexity, in order to interpret observations in the framework of simple, fundamental principles. Such understanding should allow us to make precise, confident predictions.

This involves abstraction, and working with abstractions means we are working with models.

Information Sources

On the Web…

Google’s search algorithm often finds results better than I could express my query. It’s a remarkable piece of engineering, Try: bioinformatics or “computational biology”. Among the hits I see the Wikipedia article on bioinformatics, and a nature subjects page on the topic, which compiles recently published articles into a single access point.

Many more links to resources are listed below.

Journals

Task…

Visit the Nucleic Acids Research Journal (NAR) site and find the current database issue and the Web service issue
Task yourself to find at least one database and service that interests you, visit it and poke around. You should aim to develop an intuition for what to expect with such resources and how to use the services.

Incidentally: you can subscribe to regular Table of Contents updates from any journal. nature and Science should for sure be in your inbox, but subscribe to some of the bioinformatics journal alerts too - such as Bioinformatics - at least for this term.

Among the top ranked journals (according to ISI), we have PLoS Computational Biology, BMC Bioinformatics, Nature Systems Biology and Applications, and Briefings in Bioinformatics. Explore.

Forums

Much current, active exchange of bioinformatics knowledge happens in non-traditional platforms:

Task…

Visit each of the forums below and find (at least) one item that interests you.

BioStars: General bioinformatics, computational-, and systems biology questions (timesink warning!)

Reddit: the bioinformatics “subreddit” (timesink warning!)

R-help: The R programming language

Stack Overflow: R-related questions

BioConductor Support: for all questions about the BioConductor Project

Cross Validated: statistics related questions on Stack-exchange

Questions, comments

If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.

Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.

References

About this page …

[END]