How to Build a Multi-Agent Research Assistant in Python

In this article, you will learn how to build a multi-agent AI research assistant using the OpenAI Agents SDK, the GPT-5.4 mini model, and the Olostep Web API, including how to wire together a manager agent, specialist sub-agents, and live web tools to produce structured, source-grounded research reports.

Topics we will cover include:

  • How to define a manager agent that orchestrates a judge agent and an analyst agent to progressively gather and evaluate evidence.
  • How to integrate Olostep’s Answer, Search, Search-with-Scrape, and Scrape APIs as callable tools inside the OpenAI Agents SDK workflow.
  • How to expose the finished research assistant as an interactive web application built with Reflex, complete with PDF export.
How to Build a Multi-Agent Research Assistant in Python

How to Build a Multi-Agent Research Assistant in Python

Introduction

I have been experimenting with the OpenAI Agents SDK, and it has quickly become one of my favorite ways to build agentic AI applications. What stood out to me is how simple it is to create a multi-agent workflow: you define a manager agent, connect it with specialist sub-agents and tools, and let it decide how to complete the task.

The manager agent can delegate work to other agents, call tools directly, and coordinate the overall research process. This makes it possible to build AI applications that do more than generate text — they can search the web, gather information, organize findings, and produce grounded outputs.

In this guide, we will build a multi-agent AI research assistant using the OpenAI Agents SDK, the GPT-5.4 mini model, and the Olostep Web API. The assistant will generate a structured research report that is grounded in web data, easy to read, and produced in just a few seconds.

The guide also includes the full code, a web app, and a link to the deployed version so you can test the system yourself. By the end, you will understand how multiple agents can work together to create a practical research assistant from scratch.

1. Set Up the Environment

Before building the multi-agent research assistant, we need to set up the Python environment and configure the required API keys.

We will use four main packages:

  • openai-agents for building and running the multi-agent workflow
  • olostep for accessing live web data
  • pydantic for defining structured outputs
  • python-dotenv for loading API keys from a .env file

Run the following command to install the required packages:

Next, create a .env file in your project directory. This file will store your API keys securely, so you do not need to hardcode them inside your notebook or application.

You can create your OpenAI API key from the OpenAI Platform. Sign in to your OpenAI account, go to the API keys section, and generate a new key for this project. Make sure your OpenAI API account has billing enabled and at least $5 in credits available before running the examples. You may also need to complete account verification to access the latest models.

For Olostep, create a free account from the Olostep website and generate an API key from your dashboard. The free plan includes 500 successful requests with no credit card required, which is enough to test the research assistant in this guide.

Once your keys are ready, we will start in a Jupyter Notebook by importing the required libraries and loading the environment variables. This setup prepares the notebook to work with OpenAI, Olostep, structured outputs, and tracing.

2. Test Olostep Search with Scraping

Before building the full multi-agent workflow, it is useful to test whether Olostep can search the web and scrape the returned pages successfully. This step confirms that your API key is working and that the search results include enough page content for downstream analysis.

The Olostep Search API is especially useful because it can return search results with a built-in scraping option. Instead of only receiving page titles, snippets, and links, you can ask Olostep to scrape the returned URLs and provide the extracted content in formats such as Markdown.

This means the agent can work with high-quality page content directly, rather than relying only on search snippets. It also saves time because you do not need to build a separate search-and-scrape pipeline yourself.

This tells Olostep to scrape each returned page and provide the extracted content in Markdown format. Markdown is useful because it keeps the content readable while removing unnecessary page clutter. The timeout value gives Olostep enough time to fetch and process each page.

After the search is complete, we loop through the returned links and print each URL along with the number of characters extracted from the page.

3. Add Helper Functions

Before creating the agents and tools, we need to add a few helper functions. These utilities keep the rest of the code cleaner and make the workflow easier to debug.

The helper functions will handle six things:

  • Check whether the Olostep API key is available
  • Create a reusable Olostep client
  • Convert SDK responses into standard Python dictionaries
  • Compress large JSON outputs so they are easier to inspect
  • Add the current date and year as context for the agents
  • Normalize search results into a simpler format for the agents

4. Define Structured Output Models

Next, define the structured outputs that the agents will return. These models make the workflow more reliable because each agent must return information in a consistent format.

The judge agent uses the Judgment model to decide whether the gathered evidence is strong enough. The analyst agent uses the MarkdownResearchReport model to return the final report as polished Markdown.

Judgment works as the quality-control schema. It helps the judge agent decide whether the gathered evidence is strong enough or whether the manager agent should continue searching.

MarkdownResearchReport defines the final research output. Since the final app only needs the completed report, this model keeps a single markdown_report field instead of extra metadata fields. This makes the output simpler and easier to display in the notebook, web app, and PDF export.

5. Create Olostep Tool Functions

Now create the tools that the manager agent can call during the research process. These tools wrap Olostep’s Answer API, Search API, Search with Scrape, and Scrape API.

Each tool includes tracing spans so you can inspect what happened during execution in the OpenAI trace viewer. This is useful for debugging because you can see which tool was called, what input it received, and how the manager agent moved through the workflow.

Answer Query Tool

This tool asks Olostep for a quick answer to the user’s research question. It is used as the first step in the workflow before deciding whether more research is needed.

Search Web Tool

This tool runs a standard web search and returns normalized results. It is useful when the assistant needs to discover additional sources before scraping specific pages.

The output is intentionally compact. Instead of returning the full raw API response, the tool returns the query and a cleaned list of search results. This makes the response easier for the manager agent to read and reduces unnecessary context.

Search with Scrape Tool

This tool searches the web and scrapes the returned pages in one step. It gives the research assistant richer evidence than search snippets alone.

This is one of the most important tools in the project because it lets the agent retrieve both links and page content from a single call. Instead of first searching the web, selecting URLs, and then scraping each one separately, the tool can return usable Markdown content directly from the discovered pages.

Scrape URL Tool

This tool scrapes a single URL and returns compact page content. The manager agent uses it when it needs deeper evidence from selected sources.

For example, the manager may first use search_web to find relevant pages, then use scrape_url to retrieve the full content from the most useful links.

These tools give the manager agent different ways to gather evidence. It can start with a quick answer, search for more sources, use Search with Scrape for richer context, or scrape a specific URL when it needs more detail.

6. Build the Specialist Agents

The workflow uses two specialist agents: a judge agent and an analyst agent.

The judge agent checks whether the gathered evidence is strong enough to answer the user’s question. The analyst agent then turns the approved evidence into a polished Markdown research report.

First, define the model that both specialist agents will use:

Judge Agent

The judge agent evaluates answer quality. It checks whether the answer is specific, current, source-backed, and complete enough to stop the research process.

This is important because the manager agent should not produce a final report from weak evidence. If the answer is vague, outdated, unsupported, or missing key details, the judge agent will reject it and the manager agent can continue searching.

The judge agent returns a Judgment object. This includes whether the evidence is good enough, a quality score, a short reason, and any missing information that still needs to be addressed.

Analyst Agent

The analyst agent writes the final Markdown report. It turns the gathered evidence into a readable research brief with clear sections, source notes, and references.

This agent is responsible for making the output useful for a professional reader. Instead of simply summarizing raw tool outputs, it organizes the findings into a complete report that explains the topic, highlights the most important evidence, and cites the sources used.

The analyst agent returns a MarkdownResearchReport object with one field: markdown_report. This keeps the final output simple because the complete title, summary, findings, analysis, source notes, and references are all included inside the Markdown report itself.

7. Create the Manager Agent

The manager agent is the orchestrator. It controls the full research workflow and decides which tool or specialist agent should run next.

The workflow follows this pattern:

  1. Start with a quick answer using the Olostep Answer API
  2. Ask the judge agent whether the answer is good enough
  3. If not, run Search with Scrape for stronger evidence
  4. Ask the judge agent again
  5. If the evidence is still weak, run targeted searches and scrape the most relevant pages
  6. Ask the analyst agent to write the final report

First, convert the judge and analyst agents into tools. This allows the manager agent to call them during the workflow.

Now define the manager agent. This agent does not answer from memory. Instead, it follows a clear research process: answer, judge, search, judge again, scrape if needed, and then write the final report.

The manager agent is the core of the system. It decides when the first answer is enough, when more evidence is needed, and when to call the analyst agent to produce the final report.

The key idea is that the manager does not rely on a single tool call. It starts with a fast answer, checks quality, and only does deeper research when needed. This keeps the workflow efficient while still allowing the assistant to gather stronger evidence for more complex questions.

8. Run the Research Assistant with Tracing

The final function runs the manager agent and returns a structured research report. It also creates an OpenAI trace ID so you can inspect the full workflow, including manager decisions, specialist agent calls, tool usage, and Olostep spans.

Tracing is especially useful when debugging multi-agent systems because it shows exactly what happened at each step. You can see whether the manager followed the required workflow, which tools were called, what evidence was gathered, how the judge evaluated the answer, and when the analyst produced the final report.

The function first checks that both API keys are available. It then creates a trace ID and prints a trace URL so you can inspect the full run in the OpenAI trace viewer.

The trace() block groups the full workflow, while custom_span() marks the manager-agent run. Since the Olostep tools also use custom spans, you can see both agent decisions and API calls in one place.

Finally, flush_traces() sends the trace data to OpenAI so the run is available for review.

9. Test the Multi-Agent Research Assistant

Now test the full workflow with a sample research question. The assistant will run the manager agent, gather evidence, judge the quality of that evidence, and return a structured Markdown report.

When you run this cell, the notebook prints an OpenAI trace ID and trace URL:

Click the trace URL to open the run in the OpenAI trace viewer. This lets you inspect the full execution path across the manager agent, judge agent, analyst agent, Olostep tools, and scraping calls.

The trace is useful because it shows the actual orchestration process. You can see when the manager agent starts with answer_query, when it sends the result to the judge agent, and how the judge decides whether the evidence is strong enough. If the first answer is not enough, the trace also shows when the manager uses search_with_scrape or targeted scraping to gather better evidence.

Building a Multi-Agent Research Assistant with OpenAI Agents SDK and Olostep

After the manager completes the research process, the notebook displays the final Markdown report:

Building a Multi-Agent Research Assistant with OpenAI Agents SDK and Olostep

The final report is written in a clean, reader-friendly format with an executive summary, key findings, context, analysis, source notes, and references.

If you face any issues running the notebook or reproducing the results above, you can review the full Jupyter Notebook on GitHub. It includes the complete setup, helper functions, agent definitions, tool calls, tracing workflow, and example output: multi_agent_research_assistant_openai_agents_olostep.ipynb

10. Build a Web UI with Reflex

After testing the code in the notebook, you can turn the research assistant into a simple web app using Reflex, a Python web framework for building interactive user interfaces.

The web app focuses on creating a clean interface where users can:

  • Enter a research question
  • Run the multi-agent workflow
  • View the agent activity logs
  • Read the final research report
  • Download the report as a PDF

You can find the app code in the project repository: Multi-Agent-Research-Assistant/app

First, clone the project repository:

Move into the project folder and install the required dependencies:

Next, create a .env file from the provided template and add your API keys:

Then run the Reflex app:

Once the app starts, open the local URL printed in your terminal. It is usually:

You now have a working web interface for your multi-agent research assistant.

The UI is designed to be simple, fast, and practical. When you enter a question and click the search button, the app shows the workflow logs so you can follow what the assistant is doing.

Building a Multi-Agent Research Assistant with OpenAI Agents SDK and Olostep

The assistant starts by calling the Olostep Answer API to get an initial answer. It then sends that evidence to the judge agent to check whether the answer is strong enough. If the judge decides more evidence is needed, the manager agent continues with web search, scraping, and additional source gathering before sending everything to the analyst agent for the final report.

Building a Multi-Agent Research Assistant with OpenAI Agents SDK and Olostep

The final report is displayed in a clean, professional format that is easy to read. You can also download the generated research report as a PDF file, making it easier to save, share, or review later.

Building a Multi-Agent Research Assistant with OpenAI Agents SDK and Olostep

If you do not want to build the app locally and only want to try it quickly, you can use the deployed Hugging Face Space: Multi-Agent Research Assistant – a Hugging Face Space by kingabzpro

Final Thoughts

Building this multi-agent research assistant showed how easy it is to create a practical agentic workflow using specialist agents and multiple tools. Instead of relying on one large, expensive research run every time, the system uses a manager agent to choose the right path based on the quality of the evidence.

The workflow is designed to balance speed, cost, and accuracy. It starts with the Olostep Answer API for a fast first response. If the judge agent gives that answer a strong score, the analyst agent immediately turns it into a final report. This keeps simple research tasks fast and cost-effective.

If the judge decides the first answer is not strong enough, the manager agent moves to Search with Scrape. This gives the system richer evidence without jumping straight into a deeper and more expensive research process. The judge then checks the evidence again. If it is good enough, the analyst writes the report.

Only when the evidence is still weak does the manager agent run targeted searches and scrape selected pages. This means the system can still produce a more accurate report when the question is complex, current, or missing important context.

The best part is that every query does not cost the same or take the same amount of time. Simple questions can finish quickly, while harder questions get more research depth when needed. This makes the assistant more efficient, more reliable, and better suited for real-world research workflows.

2 Responses to How to Build a Multi-Agent Research Assistant in Python

  1. nihad May 23, 2026 at 5:31 pm #

    I am reading your blog, I not yet finished but jub forward to thanking and would like to connect as I am interested n developing for my research initiative if u interested?
    of course know and following Jason for a while now and the value of his forward looking too over years
    regards

  2. Naderahzson SITHA May 25, 2026 at 12:04 am #

    Would like to learn

Leave a Reply

Machine Learning Mastery is part of Guiding Tech Media, a leading digital media publisher focused on helping people figure out technology. Visit our corporate website to learn more about our mission and team.