Communication in Multi Agent Systems

Abstractly, an agent is "a computer system that is situated in some environment, and that is capable of autonomous action in this environment in order to meet its design objective" (Wooldridge, 2009, p. 15; Wooldridge, 1995). Just as humans must negotiate and cooperate to achieve shared goals, so too must agents within multi-agent systems (Wooldridge, 2009, p. 24-25). This is only possible if agents have an effective means of communication. To communicate they must have a (R1) shared conceptual understanding of the topic on which they communicate (e.g. two agents can only communicate about weather if they both 'know' what it means to be sunny, raining and overcast, and have a notion of temperature and a measure - such as degree celsius - by which to describe it) and (R2) a means of sender (resp. reciever) encoding (resp. decoding) messages to transmit these concepts between agents. We shall refer to the requirement that agents are "capable of autonomous action [execution]" (Wooldridge, 2009, p. 15) as (R1E).

Since the emergence of autonomous agents in the 1980's (Wooldridge, 2009, p. 304) there have been numerous developments in the way in which agents understand (R1) and transmit (R2) concepts.

First, came systems where human programmers would encode specific execution semantics in code (R1E) for instance the "concept of temperature" would be defined by a function which takes a reading from a temperature sensor and these systems would communicate (R2) using a structured encoding such as sending JSON encoded objects over the Web - containing a "temperature" key. In this paradigm, the conceptual understanding (e.g. of temperature) (R1) did not fully live within the system, but was rather implicitly communicated between systems in the form of API documentation where the developers implementing each system would explain to each other "the temperature key in the JSON object is a 32bit floating point value representing the temperature, in central London, as measured in degrees celsius". Naturally, these primitive agents had very limited versatility as human developers were required to define the execution semantics and conceptual semantics (documentation) by one exact very precise concept at a time. Moreover, these agents, which are still widespread in the form of Web APIs, face an interoperability problem - as there are countless Web APIs all of which have a "temperature" key in the JSON object they return; but with different meanings due to the implicit semantics of the metric, collection location, collection date etc. encoded in the API documentation. This tightly binds/couples the client-agent to communicate with a single server agent and to communicate with more "agents" the developer must manually write code to decode the different encodings of different server agents and align the conceptual understanding across different documentation.

Subsequent developments saw attempts to migrate this conceptual understanding (R1) from documentation to data which is sent between agents (Verborgh et al., 2013, Verborgh, 2013, Verborgh, 2021, webservices). In domains such as the Semantic Web, conceptual understanding is described symbolically using ontologies (Wooldridge, 2009, p. 180) and rules (Wooldridge, 1991, verborgh2015drawing) (R1) - and encoded in highly generic RDF syntaxes (R2). With these agents (systems) possessing a symbolic understanding of concepts, they are able to precisely describe and reason about the information they send and receive. This increases agent interoperability and portability by making explicit the semantics that are often implicitly encoded in API documentation; moreover, agent versatility increases with the ability to describe requests and responses on the fly without being constrained to use the finite set of concepts listed in API documentation. However, ontologists are still required to build the vocabularies and rules for concept description (e.g. someone must manually define "degrees celsius" as a "unit of measure" for "temperature"; someone must provide the rules that define the mapping between "degrees celsius" and "fahrenheit" so that systems using two different measures can interoperate) (R1) and developers are still required to write code that defines the execution semantics (R1E) that is, a system can semantically formulate the question "what is the current temperature in London in degrees Celsius"; but a developer is then still required to write the code that then fetches the temperature from the sensor. Furthermore, in this paradigm "If two agents are to communicate about some domain, then it is necessary for them to agree on the terminology that they use to describe this domain." (Wooldridge, 2009), often requiring agents to share the same vocabularies and hence "worldviews".

Now LLM-powered agents are emerging, with many using LLMs trained upon a textual corpus containing a vast array of human knowledge. Consequently, these agents have been demonstrating a greater breadth and depth of conceptual understanding (R1) than human developers could imagine encoding by hand in formal ontologies. Moreover, LLMs can encode and decode these concepts in natural language (as well as, increasingly, machine syntaxes) (R2). Conversational LLMs also possess inherent execution semantics (R1E) with the capability to formulate a written response to any input which they are given. These execution semantics, however, do not extend to access of system-level knowledge or resources (such as temperature sensors) and are less reliable/deterministic than production-ready implementations of previous generations of agents. The trade-off of the breadth and depth of LLM understanding; is an increased vagueness of concepts and the presence of internal inconsistencies in LLM conceptualisations - moreover, there is no single worldview, per se, that the language model holds; and, just as with humans, the facts that an LLM claims are highly dependent on the context of the conversation in which they occur.