Tools for Large Language Model Agents
Language models are an essential backbone of future AI systems, with its great natural langauge understanding capabilities and world model learned from carefully curated set of massive amount of data. More over, they are also few-shot learners that could learn from given prompts. However, they do suffer from many drawbacks, such as lack of access to the current or proprietary information sources, lack of the ability to reason or to plan, and hallucinations. In order to build a more robust system, we do need other mechanisms to provide tools to language models as agents.
Now, that said, what exactly is a tool and how does a language model ‘use’ tools?
Before we get inundated by marketing communications that uses words very liberally, let’s first look at some of the earlier research literatures, and then review the current industry implementations.
Academic Research
Pre ChatGPT
Back in May 2022, in the good old days of pre-ChatGPT, AI21 Labs in Israeli published a paper called MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning. In this paper, it suggested that we could augment language models with external neural modules or symbolic modules. Neural modules include be other language models and symbolic modules include callables such as a math calculator, currency converter, or API calls. It proposed to use LLM to generate an input adapter, and then use the input adapter to use the expert modules, and then use the output of the expert modules to generate the final output. It can be visualized as the following.
Simultaneously, there are other efforts to equip large language models with Web Browsing (Internet-Augmented Dialogue Generation) and Python Interpreters (PAL: Program-aided Language Models).
Post ChatGPT
In February 2023, Meta AI Research further developed the idea to teach LLMs to use tools in Toolformer: Language Models Can Teach Themselves to Use Tools that showcased that large language models can be trained to use tools, including calculators and search engines over API calls. A team in Berkeley further extend the idea of tool use to design a system where LLM leverages a retriever to retrieve from a large set of tools in Gorilla: Large Language Model Connected with Massive APIs, where tools are sampled from Huggingface APIs.
Industry Implementation
Now we have some brief history of LLM agents tool use in academia, we can turn our attention to how the leading large language model developers designs their API for tool use.
OpenAI
In June 2023, OpenAI released its LLMs with capability of tool use by having an optional parameter tools
in its Chat Completion API. The tools
parameter provides function specifications. The purpose of this is to enable models to generate function arguments which adhere to the provided specifications. Note that the API will not actually execute any function calls. It is up to developers to execute function calls using model outputs.
In other words, the detailed descriptions are the prompts to large language model to generate the correct function parameters.
OpenAI’s definition of tools as follows:
tools = [
{
"type": "function",
"function": {
"name": "tool name",
"description": "detailed description of the tool",
"parameters": { // input parameters to the tool
"type": "object",
"properties": {
"param_1": {
"type": "string",
"description": "detailed description of param_1",
},
"param_2": {
"type": "string",
"enum": ["enum_1", "enum_2"], // bounded output of param_2
"description": "detailed description of param_1",
},
...
},
"required": ["param_1", "param_2", ...], // required output parameters
},
}
},
...
]
Gemini, Anthropic, Cohere, and Langchain
Gemini
Not to be left behind, Google’s Gemini Pro announced in November 2023 has the capability of tool use. The way it uses tools is exactly the same as OpenAI’s schema, except function
is named as functionDeclarations
and without defining the type
of the tool.
tools = [
{
"functionDeclarations": [
{
"name": string,
"description": string,
"parameters": {
object (OpenAPI Object Schema)
}
}
]
}
]
Anthropic
Anthropic also announced tool calling API functionalities beta starting in April 2024, with the identical schema except parameters
field became input_schema
and without defining the type
of the tool.. It’s implemtation is as follows:
tools = [
{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
],
Cohere
Cohere also announced their model with tool use capabilities in April 2024, using the same schema, except with parameters
fields became parameter_definitions
and without defining the type
of the tool.. It’s implementation sample is as follows:
tools = [
{
"name": "query_daily_sales_report",
"description": "Connects to a database to retrieve overall sales volumes and sales information for a given day.",
"parameter_definitions": {
"day": {
"description": "Retrieves sales data for this day, formatted as YYYY-MM-DD.",
"type": "str",
"required": True
}
}
},
{
"name": "query_product_catalog",
"description": "Connects to a a product catalog with information about all the products being sold, including categories, prices, and stock levels.",
"parameter_definitions": {
"category": {
"description": "Retrieves product information data for all products in this category.",
"type": "str",
"required": True
}
}
}
]
Langchain
With tool use now becoming a standard, Langchain added generic support to LLM Tool Use across all models in April 2024. Langchain tool use can be implemented
- using its tool decorator, or
from langchain_core.tools import tool
@tool
def func(param_1, ...):
...
- extending from
langchain_core.tools.BaseTool
class.
Note, regardless of the implementation, the upmost important factor is the descriptions of individual tools or parameters, as they are the prompts that LLM uses to understand how to generate the best input parameters.
Conclusion
With the above research and industry implementations, we can now define a Tool for large language models as a callable (function, API, SQL query, etc) with the following:
- name
- description
- clearly defined input json schema
In addition, LLM does not use tools. It only generate the input parameters to the tool. It is the developer’s responsibility to use the generated parameters to call the tool, and append the result to the conversation history for the LLM to generate the final output.
References
- MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning
- Toolformer: Language Models Can Teach Themselves to Use Tools
- Gorilla: Large Language Model Connected with Massive APIs
@article{
leehanchung,
author = {Lee, Hanchung},
title = {Tools for Large Language Model Agents},
year = {2024},
month = {05},
howpublished = {\url{https://leehanchung.github.io}},
url = {https://leehanchung.github.io/blogs/2024/05/09/tools-for-llms/}
}