Difference between revisions of "Autonomous agents"

From "A B C"
Jump to navigation Jump to search
(Created page with "<div id="APB"> <div class="b1"> Autonomous agents </div> {{dev}} Autonomous agents for bioinformatics. __TOC__   ==Introductory reading== <section begin=reading ...")
 
Line 56: Line 56:
 
==Further reading and resources==
 
==Further reading and resources==
 
{{#pmid: 20459813}}
 
{{#pmid: 20459813}}
 +
{{#pmid: 15890745}}
 +
{{#pmid: 15196484}}
 
<!-- {{WWW|WWW_UniProt}} -->
 
<!-- {{WWW|WWW_UniProt}} -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->

Revision as of 17:37, 18 September 2012

Autonomous agents


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


Autonomous agents for bioinformatics.



 

Introductory reading

Merelli et al. (2007) Agents in bioinformatics, computational and systems biology. Brief Bioinformatics 8:45-59. (pmid: 16772270)

PubMed ] [ DOI ] The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meeting provided an opportunity for seeding collaborations between the agent and bioinformatics communities to develop a different (agent-based) approach of computational frameworks both for data analysis and management in bioinformatics and for systems modelling and simulation in computational and systems biology. The collaborations gave rise to applications and integrated tools that we summarize and discuss in context of the state of the art in this area. We investigate on future challenges and argue that the field should still be explored from many perspectives ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages to be used by information agents, and to the adoption of agents for computational grids.


 

Levels of software coupling

Considering flexibility in design and development of software introduces the notion of coupling of components. This describes how widely and deeply components depend on each other. Tight coupling may lead to better performance. Loose coupling may lead to higher flexibility. Dependencies can exist along many dimensions. Thus coupling can be structural (a component includes another component), explicit (two components use each other), or implicit through sharing resources, requiring to communicate thorugh a common language or standards, assuming some (or no) synchronicity or sequence of execution. As a general rule: unnecessary coupling is always bad.

One can identify degrees of coupling with programming paradigms as follows:

  • Sequential (unstructured) programming ([1])
Instructions are written and executed one after another. Intermediate data is not isolated but kept in variables in memory. Everything is tightly coupled. This was the traditional way to develop code. Advantage: can be quick to develop and very effcient to run (little overhead). Disadvantage: code is hard to maintain and not easily reusable; changes often have unanticipated sideeffects.
  • Procedural programming ([2])
Code is broken up into modules that communicate through well-defined interfaces. Advantages: code becomes much easier to structure and to maintain as projects become more complex. Rather than requiring awareness of the entire state of the program, the procedures (or functions, or subroutines ...) need only be aware of the parameters that are passed to them. Disadvantages: Parameters can still move out of synchrony regarding their syntax (their datatypes or datastructures) or their semantics (their meaning), since they are not defined and maintained in paralell with the procedures that use them.
  • Object oriented programming ([3])
To further insulate code components from side-effects and inadvertent change, support code-reuse, simplify maintenance and extensibility, the idea of objects was introduced. An object contains both the description of parameters (attributes, properties ...) and of the functions that operate on the object (methods). The object oriented paradigm is usually said to facilitate three goals: encapsulation (no need to concern oneself with the internal workings of a procedure if the interface is specified), polymorphism (the same request can have different results depending on its context, e.g. an object may support the method multiply() that behaves dffernetly, depending on whether an instance of the object is a scalar or a matrix), and inheritance (classes of objects can be defined based on the properties of other classes, which they inherit). Advantages: An emphasis on modeling and structured design supports tackling very complex problems through iterated development. Disadvantages: Encapsulation can make code hard to debug, polymorphism can make code hard to read, inheritance may not be all that useful in the real world and may introduce side-effects (changing code in base-classes effects all derived classes). OO is not a panacea and not a substitute for clear thinking.
  • Distributed computing ([4])
In the quest for increased computing resources, distributed computing schemes have been developed that farm out parts of a larger computation across a network to other machines, typically ones that have nothing to do at the moment. As the code is executed on remote machines, it needs to be sufficiently independent. Structural or procedural coupling are avoided but implicit coupling can be significant. Advantages: Cheap access to resources. Easily scalable. redundancy and fault-tolerance. Disadvantages: Security concerns; not all problems can be divided up into distributable tasks; development overhead for scheduling, communication and integration of results.
  • Autonomous agent systems ([5])
The loosest coupling could be achieved if software components could act totally autonomously. Such "autonomous" components can be called agents. Agents are abstract concepts, not recipes for implementation. The emphasis is on behaviour, not data or method. The many exisiting definitions for agents usually include concepts such as persistence (code is not executed on demand but runs continuously and decides for itself when it should perform some activity), autonomy (agents have capabilities of task selection, prioritization, goal-directed behaviour, decision-making without human intervention), social ability (agents are able to engage other components through some sort of communication and coordination), reactivity (agents perceive the context in which they operate and react to it appropriately). Advantages: Most flexible of all programming paradigms, weakest coupling, easily able to integrate wide variety of standards, resources and languages. Disadvantages: Hype has obscured concepts; Computations are no longer strictly deterministic (since they are dependent on external, changing context) and may thus not be reproducible. It may be difficult to keep track of task progress. Scheduling overhead may be significant.


   

Further reading and resources

Severin et al. (2010) eHive: an artificial intelligence workflow system for genomic analysis. BMC Bioinformatics 11:240. (pmid: 20459813)

PubMed ] [ DOI ] BACKGROUND: The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. RESULTS: We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. CONCLUSIONS: eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.

Karasavvas et al. (2005) A criticality-based framework for task composition in multi-agent bioinformatics integration systems. Bioinformatics 21:3155-63. (pmid: 15890745)

PubMed ] [ DOI ] MOTIVATION: During task composition, such as can be found in distributed query processing, workflow systems and AI planning, decisions have to be made by the system and possibly by users with respect to how a given problem should be solved. Although there is often more than one correct way of solving a given problem, these multiple solutions do not necessarily lead to the same result. Some researchers are addressing this problem by providing data provenance information. Others use expert advice encoded in a supporting knowledge-base. In this paper, we propose an approach that assesses the importance of such decisions with respect to the overall result. We present a way of measuring decision criticality and describe its potential use. RESULTS: A multi-agent bioinformatics integration system is used as the basis of a framework that facilitates such functionality. We propose an agent architecture, and a concrete bioinformatics example (prototype) is used to show how certain decisions may not be critical in the context of more complex tasks.

Karasavvas et al. (2004) Bioinformatics integration and agent technology. J Biomed Inform 37:205-19. (pmid: 15196484)

PubMed ] [ DOI ] Vast amounts of life sciences data are scattered around the world in the form of a variety of heterogeneous data sources. The need to be able to co-relate relevant information is fundamental to increase the overall knowledge and understanding of a specific subject. Bioinformaticians aspire to find ways to integrate biological data sources for this purpose and system integration is a very important research topic. The purpose of this paper is to provide an overview of important integration issues that should be considered when designing a bioinformatics integration system. The currently prevailing approach for integration is presented with examples of bioinformatics information systems together with their main characteristics. Here, we introduce agent technology and we argue why it provides an appropriate solution for designing bioinformatics integration systems.