Build a Local LLM Tool-Using Agent with OpenAI SDK

Introduction to Building a Local LLM Tool-Using Agent

Welcome to this comprehensive tutorial on building a powerful, lightweight research agent powered by a local Large Language Model (LLM) and orchestrated using the OpenAI Agents SDK. In an era where AI is becoming increasingly integrated into our workflows, understanding how to deploy and manage AI agents locally offers unparalleled benefits in terms of privacy, cost, and customization. This guide will walk you through setting up a robust development environment and crafting an agent capable of interacting with external tools, all from the comfort of your own machine.

By the end of this article, you will have a functional tool-using agent that leverages the open-source Gemma model via Ollama and the versatile OpenAI Agents SDK. This setup provides a fantastic foundation for developers looking to experiment with AI agents without relying solely on cloud-based services. We'll emphasize practical application, ensuring you gain hands-on experience in bringing your AI projects to life.

Prerequisites:

Basic familiarity with Python programming and command-line interface (CLI).
Python 3.8+ installed on your system.
A stable internet connection for initial downloads.

Time Estimate: Approximately 60-90 minutes, depending on your system's download speeds and prior setup.

How to Build a Tool-Using AI Agent?

Building a tool-using AI agent involves more than just interacting with a large language model. It's about empowering the LLM with the ability to *reason*, *plan*, and *execute* actions in the real world or digital environments by leveraging specialized functions, which we call "tools." These tools extend the LLM's capabilities beyond pure text generation, allowing it to perform calculations, search the web, interact with APIs, or even control other software.

The core idea is to create an intelligent loop: the agent receives a prompt, the LLM processes it and decides if a tool is needed. If so, it selects the appropriate tool, generates the necessary inputs for it, executes the tool, and then incorporates the tool's output back into its reasoning process to formulate a final answer or decide on the next action. This iterative process is what makes agents incredibly powerful and versatile.

Our approach will utilize the OpenAI Agents SDK, a powerful framework designed to streamline this process. While often associated with OpenAI's proprietary models, the SDK is flexible enough to work seamlessly with local LLMs, providing a unified interface for agent development. This allows us to harness the sophisticated orchestration capabilities of the SDK while maintaining the privacy and control of a local model.

Can I Run LLMs Locally? Absolutely: Introducing Ollama

One of the most exciting developments in the AI landscape is the increasing feasibility of running powerful Large Language Models directly on your personal computer. This capability democratizes AI development, offering significant advantages over cloud-based solutions, including enhanced privacy, reduced costs, and the ability to operate offline. For our tutorial, we will leverage Ollama, an incredible open-source platform designed to make running, creating, and sharing large language models simple and accessible.

Ollama packages LLMs into easily digestible bundles, providing a straightforward command-line interface and an API endpoint that mimics the OpenAI API. This compatibility is crucial for our project, as it allows the OpenAI Agents SDK to interact with our local model as if it were a remote OpenAI service. We'll be using Gemma, a lightweight yet capable open-source model developed by Google, perfect for local experimentation.

Installation Steps for Ollama and Gemma:

Install Ollama:
Visit the official Ollama website (ollama.com/download) and download the installer for your operating system (macOS, Windows, or Linux). Follow the on-screen instructions to complete the installation.
[IMAGE: Ollama download page screenshot]
Verify Ollama Installation:
Open your terminal or command prompt and run:
```
ollama --version
```
You should see the installed Ollama version. If not, troubleshoot your installation.
Download the Gemma Model:
With Ollama installed, downloading a model is as simple as running a single command. We'll use the gemma:2b model for its balance of performance and resource requirements, making it ideal for local development.
```
ollama run gemma:2b
```
The first time you run this command, Ollama will download the model, which might take a few minutes depending on your internet speed. Once downloaded, you'll see a prompt, indicating Gemma is ready to receive input. You can type "hi" and press Enter to test it.
[IMAGE: Terminal showing 'ollama run gemma:2b' downloading and then a prompt]

Keep your Ollama server running in the background. If you close the terminal where you ran ollama run gemma:2b, the model will stop serving. For continuous use, you can simply ensure the Ollama application is running (it often runs as a background service after installation).

What is OpenAI Agents SDK?

The OpenAI Agents SDK is a powerful Python library designed to help developers build sophisticated AI agents that can interact with external tools, manage state, and engage in multi-turn conversations. While its name suggests a tight coupling with OpenAI's cloud models, a key advantage of this SDK is its flexibility: it can be configured to work with any API-compatible LLM, including local models served by Ollama. This makes it an excellent choice for our "build local LLM agent" project.

At its core, the SDK provides abstractions for defining agents, tools, and the interaction loop between them. It simplifies the complex process of agent orchestration, allowing you to focus on defining the agent's capabilities and the tools it can use, rather than implementing the intricate logic of how the agent decides which tool to use and when. It handles the parsing of agent thoughts, tool calls, and tool outputs, presenting a clean interface for developers.

Key features of the OpenAI Agents SDK include:

Agent Definition: Easily define an agent's persona, system instructions, and available tools.
Tool Management: Register custom Python functions as tools, complete with descriptions that the LLM can interpret.
Execution Loop: Manages the back-and-forth between the LLM and tool execution, including handling tool outputs and subsequent LLM calls.
Observability: Provides insights into the agent's thought process, showing how it uses tools and arrives at decisions.

This SDK acts as the central brain for our agent, translating our high-level instructions into concrete actions by leveraging the local LLM and the tools we provide. Its ability to integrate seamlessly with Ollama's API endpoint is what enables us to build a robust, local tool-using agent without significant architectural overhead.

Step-by-Step Guide: Building Your Local LLM Tool-Using Agent

Now that we have Ollama and Gemma set up, and a good understanding of the OpenAI Agents SDK, let's dive into the practical implementation. We'll set up our Python environment, define some simple tools, configure the OpenAI client to use Ollama, and finally, create and run our tool-using agent.

1. Set Up Your Python Environment

First, create a new directory for your project and set up a virtual environment. This keeps your project dependencies isolated and organized.

mkdir local_llm_agent
cd local_llm_agent
python3 -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`

Next, install the necessary Python packages. We'll need the openai library, which includes the Agents SDK.

pip install openai

2. Define Your Tools

Tools are regular Python functions that your agent can call. The key is to provide a clear docstring that describes what the tool does, its arguments, and what it returns. The LLM uses this description to decide when and how to use the tool. Let's create a simple calculator and a placeholder for a web search tool.

Create a file named agent_app.py and add the following code:

# agent_app.py

import os
import openai
from openai.beta.agents import Agent, Tool

# --- 1. Define Tools ---
def add(a: float, b: float) -> float:
    """Adds two numbers together.

    Args:
        a: The first number.
        b: The second number.

    Returns:
        The sum of the two numbers.
    """
    return a + b

def subtract(a: float, b: float) -> float:
    """Subtracts the second number from the first.

    Args:
        a: The number to subtract from.
        b: The number to subtract.

    Returns:
        The difference between the two numbers.
    """
    return a - b

def multiply(a: float, b: float) -> float:
    """Multiplies two numbers together.

    Args:
        a: The first number.
        b: The second number.
        
    Returns:
        The product of the two numbers.
    """
    return a * b

def divide(a: float, b: float) -> float:
    """Divides the first number by the second.

    Args:
        a: The numerator.
        b: The denominator.
        
    Returns:
        The result of the division.
    
    Raises:
        ValueError: If the denominator is zero.
    """
    if b == 0:
        raise ValueError("Cannot divide by zero.")
    return a / b

def web_search(query: str) -> str:
    """Performs a web search for the given query and returns a summary of the results.
    This is a placeholder tool and will always return a predefined string.

    Args:
        query: The search query.

    Returns:
        A summary of the web search results.
    """
    print(f"DEBUG: Performing web search for: '{query}'")
    return f"Search results for '{query}': According to Wikipedia, the capital of France is Paris."

# Organize tools in a dictionary for easy access
available_tools = {
    "add": add,
    "subtract": subtract,
    "multiply": multiply,
    "divide": divide,
    "web_search": web_search,
}

# Create Tool objects for the OpenAI Agents SDK
tools_for_agent = [
    Tool(
        type="function",
        function={
            "name": "add",
            "description": add.__doc__,
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {"type": "number", "description": "The first number."},
                    "b": {"type": "number", "description": "The second number."},
                },
                "required": ["a", "b"],
            },
        },
    ),
    Tool(
        type="function",
        function={
            "name": "subtract",
            "description": subtract.__doc__,
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {"type": "number", "description": "The number to subtract from."},
                    "b": {"type": "number", "description": "The number to subtract."},
                },
                "required": ["a", "b"],
            },
        },
    ),
    Tool(
        type="function",
        function={
            "name": "multiply",
            "description": multiply.__doc__,
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {"type": "number", "description": "The first number."},
                    "b": {"type": "number", "description": "The second number."},
                },
                "required": ["a", "b"],
            },
        },
    ),
    Tool(
        type="function",
        function={
            "name": "divide",
            "description": divide.__doc__,
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {"type": "number", "description": "The numerator."},
                    "b": {"type": "number", "description": "The denominator."},
                },
                "required": ["a", "b"],
            },
        },
    ),
    Tool(
        type="function",
        function={
            "name": "web_search",
            "description": web_search.__doc__,
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "The search query."},
                },
                "required": ["query"],
            },
        },
    ),
]

Notice how each function has a clear docstring. The Tool objects are then created, linking the function's name and description to the agent. The parameters field uses JSON Schema to define the expected arguments for each tool.

3. Initialize the Local LLM Client

This is where we tell the OpenAI SDK to communicate with our local Ollama server instead of OpenAI's cloud API. We achieve this by setting the base_url of the OpenAI client to Ollama's local endpoint, which is typically http://localhost:11434/v1. We also specify the model name that Ollama is serving (gemma:2b).

Add the following to your agent_app.py file, after the tool definitions:

# --- 2. Initialize Local LLM Client ---
# Point the OpenAI client to your local Ollama instance
# Ensure Ollama is running and has 'gemma:2b' pulled
client = openai.OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama", # Can be any placeholder, Ollama doesn't use it
)

# Specify the model served by Ollama
local_llm_model = "gemma:2b"

It's crucial that your Ollama server is running with the gemma:2b model active (e.g., by having run ollama run gemma:2b in another terminal or ensuring the Ollama service is active) for this client to connect successfully.

4. Create the Agent

Now, let's instantiate our agent. We'll give it a system prompt to define its role and provide it with the tools we created.

Continue adding to agent_app.py:

# --- 3. Create the Agent ---
# Define the agent's system instructions and available tools
research_agent = Agent(
    client=client,
    tools=tools_for_agent,
    model=local_llm_model,
    system_prompt=(
        "You are a helpful research assistant. Your primary goal is to answer user questions "
        "accurately and concisely, using the tools provided to gather information or perform calculations. "
        "If a question requires mathematical operations, use the calculator tools (add, subtract, multiply, divide). "
        "If a question requires external knowledge, use the web_search tool. "
        "Always state which tool you are using if you use one. "
        "When performing calculations, show your steps clearly. "
        "If you use web_search, clearly state that the information comes from a search."
    ),
)

The system_prompt is vital; it guides the LLM on how to behave and when to use its tools. A clear and descriptive prompt significantly improves agent performance.

5. Run the Agent

Finally, let's put our agent to the test! We'll use a simple loop to allow for multiple turns of conversation.

Append this to agent_app.py:

# --- 4. Run the Agent ---
print(f"Local LLM Agent (using {local_llm_model} via Ollama) is ready! Type 'exit' to quit.")

while True:
    user_query = input("\nYour query: ")
    if user_query.lower() == 'exit':
        print("Exiting agent. Goodbye!")
        break

    print(f"Agent processing query: '{user_query}'...")
    try:
        # Stream the agent's responses and tool calls
        for event in research_agent.run(user_query, stream=True):
            if event.type == "text_delta":
                print(event.text_delta, end="", flush=True)
            elif event.type == "tool_code":
                print(f"\n--- Agent calling tool: {event.tool_code} ---")
                # Execute the tool code
                tool_name = event.tool_code.split('(')[0]
                if tool_name in available_tools:
                    try:
                        # Safely evaluate the tool call (be cautious with eval in production)
                        tool_output = eval(event.tool_code, {"__builtins__": None}, available_tools)
                        print(f"--- Tool output: {tool_output} ---")
                        # Provide the tool output back to the agent
                        for output_event in research_agent.tool_output(tool_output, stream=True):
                            if output_event.type == "text_delta":
                                print(output_event.text_delta, end="", flush=True)
                    except Exception as e:
                        print(f"--- Error executing tool {tool_name}: {e} ---")
                        # Provide error feedback to the agent
                        for output_event in research_agent.tool_output(f"Error: {e}", stream=True):
                            if output_event.type == "text_delta":
                                print(output_event.text_delta, end="", flush=True)
                else:
                    print(f"--- Error: Tool '{tool_name}' not found. ---")
                    for output_event in research_agent.tool_output(f"Error: Tool '{tool_name}' not found.", stream=True):
                        if output_event.type == "text_delta":
                            print(output_event.text_delta, end="", flush=True)
            elif event.type == "message_delta":
                # This event type might also contain text deltas, depending on SDK version
                if hasattr(event.delta, 'content') and event.delta.content:
                    print(event.delta.content, end="", flush=True)
            else:
                # Handle other event types if necessary for debugging
                # print(f"\nDEBUG: Unhandled event type: {event.type} - {event}\n")
                pass

    except openai.APIConnectionError as e:
        print(f"\nERROR: Could not connect to Ollama. Is it running? Details: {e}")
    except Exception as e:
        print(f"\nAn unexpected error occurred: {e}")
    finally:
        print("\n") # Add a newline for better readability after each turn

Run your script from the terminal:

python agent_app.py

You can now interact with your agent! Try queries like:

"What is 123 plus 456?"
"Subtract 50 from 100."
"What is the capital of France?"
"Multiply 7 by 8, then add 5."

Observe how the agent uses the `tool_code` event to indicate it's calling a tool, and then processes the `tool_output` before formulating its final response.

[IMAGE: Terminal output showing an agent interaction, including tool_code and tool_output]

How Do AI Agents Use External Tools? A Deep Dive into the Reasoning Loop

The magic behind an AI agent's ability to use external tools lies in its sophisticated reasoning loop, often referred to as the "Observe-Orient-Decide-Act" (OODA) loop. When an agent receives a user query, it doesn't just generate a direct response. Instead, it enters a multi-step process, powered by the underlying LLM and orchestrated by the Agents SDK.

First, the LLM observes the input query and its internal knowledge base (its training data and system prompt). It then orients itself by analyzing whether the query requires external information or computation that its core language model capabilities cannot provide. This is where the tool descriptions become critical: the LLM reads these descriptions to understand what each tool does and what arguments it expects.

If the LLM determines a tool is necessary, it then decides which tool to use and formulates the precise arguments for that tool. This decision manifests as a structured tool call, often in a JSON-like format or a specific function call syntax, which the Agents SDK intercepts. The SDK then takes this tool_code, executes the corresponding Python function (the tool), and captures its output. This execution step effectively allows the agent to "act" in the external environment.

Finally, the tool_output is fed back into the LLM as additional context. The LLM then re-evaluates the original query, now armed with the new information obtained from the tool. It might decide to use another tool, synthesize the information into a final answer, or ask for clarification. This continuous feedback loop of observation, orientation, decision, and action is how agents effectively leverage external capabilities to solve complex problems that are beyond the scope of a standalone LLM.

"The power of a tool-using agent lies not just in its ability to generate text, but in its strategic application of external functions to augment its intelligence and interact with the world."

Tips & Best Practices for Local LLM Agent Development

Building effective local LLM agents requires more than just connecting components. Here are some tips and best practices to help you get the most out of your development:

Craft Clear Tool Descriptions: The LLM relies heavily on the docstrings and descriptions you provide for your tools. Make them as precise and unambiguous as possible, detailing what the tool does, its inputs, and its outputs. Think of it as writing API documentation for an AI.
Granular Tools: Design tools that perform single, well-defined tasks. Instead of a single "research" tool, consider separate tools for "web_search", "summarize_text", "extract_entities", etc. This gives the LLM more fine-grained control and improves its decision-making.
Robust Error Handling in Tools: Your tools should be resilient. Implement proper error handling (e.g., try-except blocks) within your tool functions. If a tool fails, it should return a clear, informative error message that the agent can then process and potentially report back to the user or attempt a different strategy.
Iterative Prompt Engineering: The agent's system_prompt is crucial. Experiment with different phrasings, instructions, and examples to guide the LLM's behavior. Explicitly tell the agent when and how to use tools, and what format to expect for answers. For example, "Always use the calculator for math problems."
Observability and Debugging: Leverage the streaming output (stream=True) and print statements within your tools (like our DEBUG print in web_search) to observe the agent's thought process. This helps you understand why it chooses certain tools or makes specific decisions, making debugging much easier.
Consider Memory: For multi-turn conversations, agents often need memory. While not explicitly covered in this basic tutorial, the OpenAI Agents SDK supports message history. For more advanced agents, explore how to pass conversation history to your agent to maintain context across turns.
Resource Management: Local LLMs consume significant CPU/GPU and RAM. Monitor your system resources. If your agent is slow or crashes, consider using a smaller LLM (e.g., gemma:2b is a good start, but there are even smaller models) or upgrading your hardware.

Common Issues and Troubleshooting

Developing with local LLMs and agent frameworks can sometimes present unique challenges. Here are some common issues you might encounter and how to troubleshoot them:

Ollama Not Running or Model Not Loaded:
- Symptom: openai.APIConnectionError: Connection refused or similar error when running your Python script.
- Solution: Ensure Ollama is running in the background. If you previously ran ollama run gemma:2b in a terminal, that terminal needs to remain open. Alternatively, verify the Ollama application itself is active (it often runs as a background service). You can check its status via ollama list in your terminal. If the model isn't listed, run ollama run gemma:2b again to download and load it.
Incorrect API Base URL or Model Name:
- Symptom: openai.BadRequestError: Error code: 404 - {'error': 'The model `gemma:2b` does not exist.'} or similar.
- Solution: Double-check that your client.base_url is set correctly to "http://localhost:11434/v1". Also, ensure the local_llm_model variable exactly matches the model name you've pulled with Ollama (e.g., "gemma:2b"). Case sensitivity matters.
Agent Not Using Tools or Using Them Incorrectly:
- Symptom: The agent generates text instead of calling a tool when it should, or calls a tool with incorrect arguments.
- Solution: Review your tool descriptions and the agent's system_prompt. Make sure the descriptions are clear, concise, and accurately reflect the tool's purpose. Enhance the system prompt to explicitly instruct the agent on when to use specific tools (e.g., "Always use the add tool for addition"). Sometimes, adding examples to the system prompt can help the LLM understand better.
Tool Execution Errors:
- Symptom: The agent calls a tool, but the output in the terminal shows a Python error from within your tool function.
- Solution: Debug your tool function directly. Run the function with the arguments the agent tried to pass to it to identify the bug. Ensure type hints are correct and that the tool can handle various inputs, including edge cases (like division by zero).
Agent Getting Stuck in a Loop:
- Symptom: The agent repeatedly calls the same tool or generates similar responses without progressing to a solution.
- Solution: This often indicates an issue with the tool's output or the agent's interpretation of it. Ensure tool outputs are clear and informative. Revise the system prompt to include instructions on how to handle specific tool outputs or to avoid redundant actions. Sometimes, a more capable LLM or a more specific system prompt can resolve this.