Autonomous agents

From "A B C"
Revision as of 13:57, 14 September 2013 by Boris (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Autonomous agents


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


Autonomous agents for bioinformatics.



 

Introductory reading

Merelli et al. (2007) Agents in bioinformatics, computational and systems biology. Brief Bioinformatics 8:45-59. (pmid: 16772270)

PubMed ] [ DOI ] The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meeting provided an opportunity for seeding collaborations between the agent and bioinformatics communities to develop a different (agent-based) approach of computational frameworks both for data analysis and management in bioinformatics and for systems modelling and simulation in computational and systems biology. The collaborations gave rise to applications and integrated tools that we summarize and discuss in context of the state of the art in this area. We investigate on future challenges and argue that the field should still be explored from many perspectives ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages to be used by information agents, and to the adoption of agents for computational grids.


 

Levels of software coupling

Considering flexibility in design and development of software introduces the notion of coupling of components. This describes how widely and deeply components depend on each other. Tight coupling may lead to better performance. Loose coupling may lead to higher flexibility. Dependencies can exist along many dimensions. Thus coupling can be structural (a component includes another component), explicit (two components use each other), or implicit through sharing resources, requiring to communicate thorugh a common language or standards, assuming some (or no) synchronicity or sequence of execution. As a general rule: unnecessary coupling is always bad.

One can identify degrees of coupling with programming paradigms as follows:

  • Sequential (unstructured) programming ([1])
Instructions are written and executed one after another. Intermediate data is not isolated but kept in variables in memory. Everything is tightly coupled. This was the traditional way to develop code. Advantage: can be quick to develop and very effcient to run (little overhead). Disadvantage: code is hard to maintain and not easily reusable; changes often have unanticipated sideeffects.
  • Procedural programming ([2])
Code is broken up into modules that communicate through well-defined interfaces. Advantages: code becomes much easier to structure and to maintain as projects become more complex. Rather than requiring awareness of the entire state of the program, the procedures (or functions, or subroutines ...) need only be aware of the parameters that are passed to them. Disadvantages: Parameters can still move out of synchrony regarding their syntax (their datatypes or datastructures) or their semantics (their meaning), since they are not defined and maintained in paralell with the procedures that use them.
  • Object oriented programming ([3])
To further insulate code components from side-effects and inadvertent change, support code-reuse, simplify maintenance and extensibility, the idea of objects was introduced. An object contains both the description of parameters (attributes, properties ...) and of the functions that operate on the object (methods). The object oriented paradigm is usually said to facilitate three goals: encapsulation (no need to concern oneself with the internal workings of a procedure if the interface is specified), polymorphism (the same request can have different results depending on its context, e.g. an object may support the method multiply() that behaves dffernetly, depending on whether an instance of the object is a scalar or a matrix), and inheritance (classes of objects can be defined based on the properties of other classes, which they inherit). Advantages: An emphasis on modeling and structured design supports tackling very complex problems through iterated development. Disadvantages: Encapsulation can make code hard to debug, polymorphism can make code hard to read, inheritance may not be all that useful in the real world and may introduce side-effects (changing code in base-classes effects all derived classes). OO is not a panacea and not a substitute for clear thinking.
  • Distributed computing ([4])
In the quest for increased computing resources, distributed computing schemes have been developed that farm out parts of a larger computation across a network to other machines, typically ones that have nothing to do at the moment. As the code is executed on remote machines, it needs to be sufficiently independent. Structural or procedural coupling are avoided but implicit coupling can be significant. Advantages: Cheap access to resources. Easily scalable. redundancy and fault-tolerance. Disadvantages: Security concerns; not all problems can be divided up into distributable tasks; development overhead for scheduling, communication and integration of results.
  • Autonomous agent systems ([5])
The loosest coupling could be achieved if software components could act totally autonomously. Such "autonomous" components can be called agents. Agents are abstract concepts, not recipes for implementation. The emphasis is on behaviour, not data or method. The many exisiting definitions for agents usually include concepts such as persistence (code is not executed on demand but runs continuously and decides for itself when it should perform some activity), autonomy (agents have capabilities of task selection, prioritization, goal-directed behaviour, decision-making without human intervention), social ability (agents are able to engage other components through some sort of communication and coordination), reactivity (agents perceive the context in which they operate and react to it appropriately). Advantages: Most flexible of all programming paradigms, weakest coupling, easily able to integrate wide variety of standards, resources and languages. Disadvantages: Hype has obscured concepts; Computations are no longer strictly deterministic (since they are dependent on external, changing context) and may thus not be reproducible. It may be difficult to keep track of task progress. Scheduling overhead may be significant.


   

Further reading and resources

Hassanien et al. (2013) Computational intelligence techniques in bioinformatics. Comput Biol Chem 47:37-47. (pmid: 23891719)

PubMed ] [ DOI ] Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included.

Su & Huang (2012) Cooperative output regulation with application to multi-agent consensus under switching network. IEEE Trans Syst Man Cybern B Cybern 42:864-75. (pmid: 22311865)

PubMed ] [ DOI ] In this paper, we consider the cooperative output regulation of linear multi-agent systems under switching network. The problem can be viewed as a generalization of the leader-following consensus problem of multi-agent systems. Due to the limited information exchanges of different subsystems, the problem cannot be solved by the decentralized approach and is not allowed to be solved by the centralized control. By devising a distributed observer network, we can solve the problem by both dynamic state feedback control and dynamic measurement output feedback control. As an application of our main result, we show that a special case of our results leads to the solution of the leader-following consensus problem of linear multi-agent systems.

Holcombe et al. (2012) Modelling complex biological systems using an agent-based approach. Integr Biol (Camb) 4:53-64. (pmid: 22052476)

PubMed ] [ DOI ] Many of the complex systems found in biology are comprised of numerous components, where interactions between individual agents result in the emergence of structures and function, typically in a highly dynamic manner. Often these entities have limited lifetimes but their interactions both with each other and their environment can have profound biological consequences. We will demonstrate how modelling these entities, and their interactions, can lead to a new approach to experimental biology bringing new insights and a deeper understanding of biological systems.

Severin et al. (2010) eHive: an artificial intelligence workflow system for genomic analysis. BMC Bioinformatics 11:240. (pmid: 20459813)

PubMed ] [ DOI ] BACKGROUND: The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. RESULTS: We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. CONCLUSIONS: eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.

García-Sánchez et al. (2008) Combining Semantic Web technologies with Multi-Agent Systems for integrated access to biological resources. J Biomed Inform 41:848-59. (pmid: 18585096)

PubMed ] [ DOI ] The increasing volume and diversity of information in biomedical research is demanding new approaches for data integration in this domain. Semantic Web technologies and applications can leverage the potential of biomedical information integration and discovery, facing the problem of semantic heterogeneity of biomedical information sources. In such an environment, agent technology can assist users in discovering and invoking the services available on the Internet. In this paper we present SEMMAS, an ontology-based, domain-independent framework for seamlessly integrating Intelligent Agents and Semantic Web Services. Our approach is backed with a proof-of-concept implementation where the breakthrough and efficiency of integrating disparate biomedical information sources have been tested.

Ren et al. (2008) Multi-agent-based bio-network for systems biology: protein-protein interaction network as an example. Amino Acids 35:565-72. (pmid: 18425405)

PubMed ] [ DOI ] Recently, a collective effort from multiple research areas has been made to understand biological systems at the system level. This research requires the ability to simulate particular biological systems as cells, organs, organisms, and communities. In this paper, a novel bio-network simulation platform is proposed for system biology studies by combining agent approaches. We consider a biological system as a set of active computational components interacting with each other and with an external environment. Then, we propose a bio-network platform for simulating the behaviors of biological systems and modelling them in terms of bio-entities and society-entities. As a demonstration, we discuss how a protein-protein interaction (PPI) network can be seen as a society of autonomous interactive components. From interactions among small PPI networks, a large PPI network can emerge that has a remarkable ability to accomplish a complex function or task. We also simulate the evolution of the PPI networks by using the bio-operators of the bio-entities. Based on the proposed approach, various simulators with different functions can be embedded in the simulation platform, and further research can be done from design to development, including complexity validation of the biological system.

Bartocci et al. (2007) An agent-based multilayer architecture for bioinformatics grids. IEEE Trans Nanobioscience 6:142-8. (pmid: 17695749)

PubMed ] [ DOI ] Due to the huge volume and complexity of biological data available today, a fundamental component of biomedical research is now in silico analysis. This includes modelling and simulation of biological systems and processes, as well as automated bioinformatics analysis of high-throughput data. The quest for bioinformatics resources (including databases, tools, and knowledge) becomes therefore of extreme importance. Bioinformatics itself is in rapid evolution and dedicated Grid cyberinfrastructures already offer easier access and sharing of resources. Furthermore, the concept of the Grid is progressively interleaving with those of Web Services, semantics, and software agents. Agent-based systems can play a key role in learning, planning, interaction, and coordination. Agents constitute also a natural paradigm to engineer simulations of complex systems like the molecular ones. We present here an agent-based, multilayer architecture for bioinformatics Grids. It is intended to support both the execution of complex in silico experiments and the simulation of biological systems. In the architecture a pivotal role is assigned to an "alive" semantic index of resources, which is also expected to facilitate users' awareness of the bioinformatics domain.

Bartocci et al. (2007) BioWMS: a web-based Workflow Management System for bioinformatics. BMC Bioinformatics 8 Suppl 1:S2. (pmid: 17430564)

PubMed ] [ DOI ] BACKGROUND: An in-silico experiment can be naturally specified as a workflow of activities implementing, in a standardized environment, the process of data and control analysis. A workflow has the advantage to be reproducible, traceable and compositional by reusing other workflows. In order to support the daily work of a bioscientist, several Workflow Management Systems (WMSs) have been proposed in bioinformatics. Generally, these systems centralize the workflow enactment and do not exploit standard process definition languages to describe, in order to be reusable, workflows. While almost all WMSs require heavy stand-alone applications to specify new workflows, only few of them provide a web-based process definition tool. RESULTS: We have developed BioWMS, a Workflow Management System that supports, through a web-based interface, the definition, the execution and the results management of an in-silico experiment. BioWMS has been implemented over an agent-based middleware. It dynamically generates, from a user workflow specification, a domain-specific, agent-based workflow engine. Our approach exploits the proactiveness and mobility of the agent-based technology to embed, inside agents behaviour, the application domain features. Agents are workflow executors and the resulting workflow engine is a multiagent system - a distributed, concurrent system--typically open, flexible, and adaptative. A demo is available at http://litbio.unicam.it:8080/biowms. CONCLUSION: BioWMS, supported by Hermes mobile computing middleware, guarantees the flexibility, scalability and fault tolerance required to a workflow enactment over distributed and heterogeneous environment. BioWMS is funded by the FIRB project LITBIO (Laboratory for Interdisciplinary Technologies in Bioinformatics).

Alonso-Calvo et al. (2007) An agent- and ontology-based system for integrating public gene, protein, and disease databases. J Biomed Inform 40:17-29. (pmid: 16621723)

PubMed ] [ DOI ] In this paper, we describe OntoFusion, a database integration system. This system has been designed to provide unified access to multiple, heterogeneous biological and medical data sources that are publicly available over Internet. Many of these databases do not offer a direct connection, and inquiries must be made via Web forms, returning results as HTML pages. A special module in the OntoFusion system is needed to integrate these public 'Web-based' databases. Domain ontologies are used to do this and provide database mapping and unification. We have used the system to integrate seven significant and widely used public biomedical databases: OMIM, PubMed, Enzyme, Prosite and Prosite documentation, PDB, SNP, and InterPro. A case study is detailed in depth, showing system performance. We analyze the system's architecture and methods and discuss its use as a tool for biomedical researchers.

Keele & Wray (2005) Software agents in molecular computational biology. Brief Bioinformatics 6:370-9. (pmid: 16420735)

PubMed ] [ DOI ] Progress made in applying agent systems to molecular computational biology is reviewed and strategies by which to exploit agent technology to greater advantage are investigated. Communities of software agents could play an important role in helping genome scientists design reagents for future research. The advent of genome sequencing in cattle and swine increases the complexity of data analysis required to conduct research in livestock genomics. Databases are always expanding and semantic differences among data are common. Agent platforms have been developed to deal with generic issues such as agent communication, life cycle management and advertisement of services (white and yellow pages). This frees computational biologists from the drudgery of having to re-invent the wheel on these common chores, giving them more time to focus on biology and bioinformatics. Agent platforms that comply with the Foundation for Intelligent Physical Agents (FIPA) standards are able to interoperate. In other words, agents developed on different platforms can communicate and cooperate with one another if domain-specific higher-level communication protocol details are agreed upon between different agent developers. Many software agent platforms are peer-to-peer, which means that even if some of the agents and data repositories are temporarily unavailable, a subset of the goals of the system can still be met. Past use of software agents in bioinformatics indicates that an agent approach should prove fruitful. Examination of current problems in bioinformatics indicates that existing agent platforms should be adaptable to novel situations.

Karasavvas et al. (2005) A criticality-based framework for task composition in multi-agent bioinformatics integration systems. Bioinformatics 21:3155-63. (pmid: 15890745)

PubMed ] [ DOI ] MOTIVATION: During task composition, such as can be found in distributed query processing, workflow systems and AI planning, decisions have to be made by the system and possibly by users with respect to how a given problem should be solved. Although there is often more than one correct way of solving a given problem, these multiple solutions do not necessarily lead to the same result. Some researchers are addressing this problem by providing data provenance information. Others use expert advice encoded in a supporting knowledge-base. In this paper, we propose an approach that assesses the importance of such decisions with respect to the overall result. We present a way of measuring decision criticality and describe its potential use. RESULTS: A multi-agent bioinformatics integration system is used as the basis of a framework that facilitates such functionality. We propose an agent architecture, and a concrete bioinformatics example (prototype) is used to show how certain decisions may not be critical in the context of more complex tasks.

Karasavvas et al. (2004) Bioinformatics integration and agent technology. J Biomed Inform 37:205-19. (pmid: 15196484)

PubMed ] [ DOI ] Vast amounts of life sciences data are scattered around the world in the form of a variety of heterogeneous data sources. The need to be able to co-relate relevant information is fundamental to increase the overall knowledge and understanding of a specific subject. Bioinformaticians aspire to find ways to integrate biological data sources for this purpose and system integration is a very important research topic. The purpose of this paper is to provide an overview of important integration issues that should be considered when designing a bioinformatics integration system. The currently prevailing approach for integration is presented with examples of bioinformatics information systems together with their main characteristics. Here, we introduce agent technology and we argue why it provides an appropriate solution for designing bioinformatics integration systems.

Franklin & Graesser: Is it an Agent or a Program ... ? – whitepaper at IIS, Memphis
FIPA (Foundation for Intelligent Physical Agents) – A rich resource of specifications for a well thought-out agent standard. (Index of specifications is here.)