Systems curation

From "A B C"
Jump to navigation Jump to search

Systems curation

Notes on curating a biological system.


 


 


Biocuration

Curation[1] is not the same as our normal, day to day reading and the kind of report writing you may be used to. Curation does not only collect and present facts, but ensures the facts are valid, complete and verifiable.

To collect and present your facts you:

  • plan explicitly what information items you need to collect (and record your plan), this includes enumerating the components and their observable behaviour and annotating their roles and relationships;
  • define how your information will be structured;
  • commit your information to an appropriate resource: a structured document, a spreadsheet, or a database, where it can be found and retrieved.

To ensure your information is valid you:

  • define your terms in an ontology (preferred), or controlled vocabulary to ensure that all terms across a curation project are used consistently, with identical semantics;
  • work from reliable sources;
  • carefully assess facts for consistency (and record conflicts).

To ensure your results are complete, you

  • employ multiple, current information sources;
  • explicitly express your expectations about complete information so you recognize when information is missing (i.e. you work with SyRO);
  • record open questions.

To ensure your results are verifiable, you

  • record the process: where did you find the information, why did you use that resource;
  • record the evidence and what is the evidence that the information is reliable.


 

System

 

A system maps a set of collaborating components to its emergent behaviour.

This operational definition applies when we consider instances of systems in a "bottom up" approach (synthetic – how is the system built from its components) as well as a "top down" approach (analytic – how can the behaviour of the system be explained from its constituents). It emphasizes observables (behaviour), clarifies the conceptual nature of system (it maps), it has a scope and boundaries (a set), it is decomposable (components), it has a structure (induced by collaborations), its components are necessary and sufficient (leading to emergent behaviour).

Through curation, we instantiate this abstract concept with concrete facts:

  • We enumerate the components – genes, proteins, complexes and assemblies, metabolites, even specific environmental conditions if they are cause or result of the system's activity. Components can be atomic (individual molecules) and composed (complexes, assemblies ...); each component has a role in the system;
  • We describe collaborations in terms of observables, taking special note of behaviour through which components complement each other;
  • A feature of a system is emergent behaviour: we note especially how the components collaborate in a way that their joint activity can not be described as the sum of each individual activity. Not all collaborations result in emergent behaviour;
  • We describe behaviour in terms of observables, as well as mapping those to specific, conceptual roles.


 

System curation goals

 

We collect information in order to support our description of the system architecture. Refer to the PHALY model for an example.



 
General goal: System Architecture

A system architecture describes the system’s behaviour in terms of its subsystems and their relationships, given its context, within its boundaries.


 
Deliverables: Contents
  • A structured description of the system, including its name, definition, description, associated GO terms, an initial set of computationally defined genes it contains, and references to a seed set of literature articles that will be used for curation;
  • A description of concepts of importance. This includes the biological context, and background knowledge about the components.
  • An enumeration of components from:
    • literature review;
    • direct annotation, i.e. genes discovered because they have been annotated with a relationship to the system, in a database such as UniProt, NCBI-Protein or any of the three GO ontologies represented in GOA (GO annotations);
    • network and pathway annotation, i.e. genes discovered in the network neighbourhood of system components, in a database like STRING or IntAct, or in pathways such as KEGG or Reactome;
    • phenotype and behaviour, i.e. genes annotated to a related phenotype in OMIM or the GWAS catalog;
    • ... each with a note on the type and quality of evidence that supports their inclusion.
  • Completion of role annotation: each component has one role annotated to it (list components more than once if several distinct roles relate to the same, or overlapping entities); list roles that are expected, or required, but have no components associated with them.
  • A system architecture sketch that integrates the system information;
  • A formatted set of system data, ready to be imported into a system database.



 


Resources

 
Systems

Biocuration


 

Notes

 
  1. The etymology of curator ultimately derives from latin curator (guardian, agent), a derivation of curare (to care for, to cure).