Autonomous agent

From "A B C"
Revision as of 00:54, 1 February 2012 by Boris (talk | contribs) (→‎Contents)
Jump to navigation Jump to search

Autonomous Agent


Autonomous agents are software programs that perform unsupervised (and perhaps collaborative) tasks by flexibly responding to changing contexts.



 

Introductory reading

Severin et al. (2010) eHive: an artificial intelligence workflow system for genomic analysis. BMC Bioinformatics 11:240. (pmid: 20459813)

PubMed ] [ DOI ] BACKGROUND: The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. RESULTS: We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. CONCLUSIONS: eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.


 

Contents

Considering flexibility in design and development of software introduces the notion of coupling of components. Coupling describes how widely and deeply components depend on each other. Tight coupling may lead to better performance. Loose coupling may lead to higher flexibility. Dependencies can exist along many dimensions. Thus coupling can be structural (a component includes another component), explicit (two components use each other), or implicit when components share resources, which in turn requires them communicate through a common protocol[1]. As a general rule: unnecessary coupling is always bad.

One can identify degrees of coupling with specific programming paradigms as follows:

Instructions are written and executed one after another. Intermediate data is not isolated but kept in variables in memory. Everything is tightly coupled. This was the traditional way to develop code. Advantage: can be quick to develop and very efficient to run (little overhead). Disadvantage: code is hard to maintain and not easily reusable; changes often have unanticipated side-effects.
Code is broken up into modules that communicate through well-defined interfaces. Advantages: code becomes much easier to structure and to maintain as projects become more complex. Rather than requiring awareness of the entire state of the program, the procedures (or functions, or subroutines ...) need only be aware of the parameters that are passed to them. Disadvantages: Parameters can still move out of synchrony regarding their syntax (their datatypes or datastructures) or their semantics (their meaning), since they are not defined and maintained in parallel with the procedures that use them.
To further insulate code components from side-effects and inadvertent change, to support code-reuse, and to simplify maintenance and extensibility, the idea of objects was introduced. An object contains both the description of parameters (attributes, properties ...) and of the functions that operate on the object (methods). The object-oriented paradigm is usually said to facilitate three goals: encapsulation (no need to concern oneself with the internal workings of a procedure if the interface is specified), polymorphism (the same request can have different results depending on its context, e.g. an object may support a method multiply() that behaves differently, depending on whether an instance of the object is a scalar or a matrix), and inheritance (classes of objects can be defined based on the properties of other classes, which they inherit). Advantages: An emphasis on modelling and structured design helps to address very complex problems through iterated development. Disadvantages: Encapsulation can make code hard to debug, polymorphism can make code hard to read, inheritance may not be all that useful in the real world and may introduce side-effects (changing code in base-classes effects all derived classes). OO is not a panacea and not a substitute for clear thinking.
In the quest for increased computing resources, distributed computing schemes have been developed that farm out parts of a larger computation across a network to other machines, typically ones that have nothing to do at the moment. As the code is executed on remote machines, it needs to be sufficiently independent. Structural or procedural coupling are avoided but implicit coupling can be significant. Interesting recent developments use the label Cloud computing. Advantages: Cheap access to resources. Easily scalable. redundancy and fault-tolerance. Disadvantages: Security concerns; not all problems can be divided up into distributable tasks; development overhead for scheduling, communication and integration of results.


The loosest coupling could be achieved if software components could act totally autonomously. Such "autonomous" components have been called agents. Agents are abstract concepts, not recipes for implementation. The emphasis is on behaviour, not data or method. The many existing definitions for agents usually include concepts such as persistence (code is not executed on demand but runs continuously and decides for itself when it should perform some activity), autonomy (agents have capabilities of task selection, prioritization, goal-directed behaviour, decision-making without human intervention), social ability (agents are able to engage other components through some sort of communication and coordination), reactivity (agents perceive the context in which they operate and react to it appropriately). Advantages: The most flexible of all programming paradigms, weakest coupling, easily able to integrate wide variety of standards, resources and languages. Disadvantages: Hype has obscured concepts; Computations are no longer strictly deterministic (since they are dependent on external, changing context) and may thus not be reproducible. It may be difficult to keep track of task progress. The scheduling overhead may be significant.


   

References

  1. Such a "protocol itself may be explicit - as in http or ftp protocols, or implicit, e.g. if components take turns using a resource, which requires synchronizing their behaviour.


 

Further reading and resources

Ye et al. (2008) Adaptive clustering algorithm for community detection in complex networks. Phys Rev E Stat Nonlin Soft Matter Phys 78:046115. (pmid: 18999501)

PubMed ] [ DOI ] Community structure is common in various real-world networks; methods or algorithms for detecting such communities in complex networks have attracted great attention in recent years. We introduced a different adaptive clustering algorithm capable of extracting modules from complex networks with considerable accuracy and robustness. In this approach, each node in a network acts as an autonomous agent demonstrating flocking behavior where vertices always travel toward their preferable neighboring groups. An optimal modular structure can emerge from a collection of these active nodes during a self-organization process where vertices constantly regroup. In addition, we show that our algorithm appears advantageous over other competing methods (e.g., the Newman-fast algorithm) through intensive evaluation. The applications in three real-world networks demonstrate the superiority of our algorithm to find communities that are parallel with the appropriate organization in reality.

I have added this article even though it is not about the software concept, but about the idea of autonomy in principle. A pleasure to read.

Kauffman (2003) Molecular autonomous agents. Philos Trans A Math Phys Eng Sci 361:1089-99. (pmid: 12816601)

PubMed ] [ DOI ] I consider an autonomous agent to be a physical system able to act on its own behalf, such as a bacterium swimming up a glucose gradient. I tentatively define an autonomous agent to be a system capable of self-reproduction and at least capable of performing one thermodynamic work cycle. I give a hypothetical chemical example. I then explore the increasingly odd implications of this definition.