Basic Research Agents

(Part 3 of a four part series presented at the Masterclass “From chatbots to personalized research assistants: a journey through the new ecosystems related to Large Language Models” at the Medien Triennale Südwest 2023)

  • Journalistic use cases: Support for complex research tasks
  • Take away message: Large Language Models can handle planning and result synthesis within an agent using different tools. Agents are still unreliable and should be replaced by suitable chains of Large Language Models for most productive applications. However, this may change relatively quickly.

The old dream of autonomous agents, of programs that can interact independently with a (virtual) world in order to tackle a wide variety of problems according to their own plans without outside help – this dream is being dreamed more frequently again since Large Language Models provide new means for realizing autonomous agents due to their flexibility and their tolerance of ambiguous situations and inputs.

In Autonomous LLM-Based Agents, a language model is first provided with an identity or role, a task with targets, and background information about the task. Provided with this information, the language model prompts itself in a defined way (auto prompting) to find a way to solve the task under the defined objectives. This basic architecture can be adapted and extended in several ways: linking multiple agents with different roles and functions; designing the memory that agents have; accessing external tools such as web searches, database access, programming environments or other programs.

Plan-and-Execute Agents

We will now look at an architecture for agents inspired by BabyAGI and the “Plan-and-Solve” paper, which has already been implemented in an experimental part of LangChain.

So-called plan-and-execute agents achieve a goal by first planning what to do and then executing the subtasks. Planning and execution are performed by two different agents. Planning is almost always done by a simple agent using a Large Language Model. Execution is usually done by a separate agent that has access to different defined tools.

Before we get into the details of this architecture and the execution steps of the agents, the following demo will give you a basic idea of how agents work. In the following example, the agent is started with a task to answer whether LK-99 is a superconductor. After creating a plan of what steps are needed to answer this question, these steps are processed by an agent that has access to a search engine and various other data sources (arXiv, PubMed, and Wikipedia).

Planning Agent

The planning stage is quite simple: a system prompt is used to specify the task of generating the shortest possible list of individual steps for a given task. Finally, the user part of the prompt describes the task to be solved, e.g. “Is LK-99 a superconductor?”

The response generated by the language model is converted into a data structure in which the individual steps are separate elements. These elements are then passed step by step to the execution agent.

Execution Agent

The prompt for the Execution Agent, on the other hand, is quite complex. However, this complexity can be easily unravelled if we look at the prompt in its individual components.

The system prompt first describes the role and general task of the agent. This is followed by a description of the available tools (which we will look at in detail in a moment) and the desired way of using these tools by defining a specific format for calling them.

Then the so-called thought-action-observation loop is defined. Thoughts about the posed question lead to the selection of an action, which in turn leads to an observation, which triggers thoughts that select an action, … Until eventually a final answer is generated.

The user prompt contains a list of already processed steps and their answers and the current objective, i.e. the currently processed step.

A tool description consists of a precise description of the intended use, including the questions and domains for which this tool can be used, and a specification of the arguments that a tool expects.

For example, the duckduckgo_search tool is described as “A wrapper around DuckDuckGo Search. Useful when you need to answer questions about current events. Input should be a search query.” And the format of the input is defined so that there must be a key “query” with a value of type string.

Execution steps

In an execution environment, the Execution Agent is executed for each planned step. The results of the previous steps and the current step are the input into the consideration of which action to execute. The results of the action are observed and combined into a response.

In the first step, a web search for “LK-99 superconductor” is performed to obtain up-to-date information on this issue.

With the knowledge from the web search and the specification of the second planned step to check reliable scientific sources and databases for any information on the superconducting properties of LK-99, a query to arXiv is generated in the second step.

With the knowledge from the web search and the search in arXiv, the third planned step is processed: “Analyze the gathered information to determine if LK-99 is indeed a superconductor.” The agent does not see any further need for information that would have to be provided by a tool, and generates a final answer.

The final step uses all the information gathered in the previous steps to generate the final answer as response to the user.

Agents are fascinating because of their autonomous approach to problem solving. They try to move freely through problem spaces and gather relevant information. Sometimes they are successful in doing so. Sometimes they fail because they find the wrong sources or because they simply get caught in unproductive loops that do not advance the understanding of a problem. Autonomous LLM-based agents are still a nascent technology that is too unreliable for most production applications. However, this could change relatively quickly.

LangChain Implementation

Since there is an experimental implementation of Plan-and-Execute Agents in LangChain the code to set up the agent used in this example is very short.

from langchain.chat_models import ChatOpenAI
from langchain_experimental.plan_and_execute import PlanAndExecute, 
load_agent_executor, load_chat_planner
from langchain.agents import load_tools
from langchain.callbacks import StdOutCallbackHandler

# this is an overly verbose implementation so that all intermediate steps and prompts are visible to the user 
# (see the output panel after the code block for the full output)

# Large Language Model to use
llm = ChatOpenAI(temperature=0, verbose=True)
# Load predefined tools
tools = load_tools(["ddg-search", "arxiv", "pubmed", "wikipedia"])
# Create planner agent
planner = load_chat_planner(llm)
planner.llm_chain.verbose = True
# Create executor agent
executor = load_agent_executor(llm, tools, verbose=True)
# Stitch together Plan-And-Execute Agent
agent = PlanAndExecute(planner=planner, executor=executor, verbose=True)
# Define callbacks for extensive logging
callbacks = [StdOutCallbackHandler()]

# Run agent with task
agent.run("Is LK-99 a superconductor?", callbacks=callbacks)

Output