Open-source agent testingOpen-source testing for AI agents

Catch silentagent failuresbefore productionCatch silent agent failuresbefore production

Simulate LLM responses, messy tool outputs, and failure cases so you can verify what your agent did, not just what it returned.

$pip install tenro

Star on GitHub

@tenro.simulate
def test_cancel_order_simulation():
    llm.simulate(
        Provider.ANTHROPIC,
        responses=[
            ToolCall(cancel_order, order_id="123"),
            "Done! Order #123 has been cancelled.",
        ],
    )
    tool.simulate(cancel_order, result={"status": "cancelled"})

    support_agent.run("Cancel my order #123")

    tool.verify(cancel_order, order_id="123")
    agent.verify(support_agent, result="Done! Order #123 has been cancelled.")

Why simulation

Catch failures normal tests miss

An agent run can look fine while the agent used the wrong tool, trusted bad context, or chose the wrong action.

01args failed

Tools use the wrong details

The call looks fine, but the agent picked the wrong customer, date range, amount, or action.

02wrong steps

Answers hide bad decisions

The final response sounds right, but the agent took steps you would never want in production.

03case missed

Demos skip messy cases

Clean demos pass. Real agent runs hit empty results, delays, bad data, and partial answers.

04money wasted

Loops burn through tokens

The agent retries, replans, and fills context without getting closer to the task.

How it works

Link. Simulate. Verify.

Turn the failure modes you worry about into repeatable agent tests.

Link

@link_agent | @link_tool

Mark the agents, tools, and LLM calls you want to test.

Simulate

llm.simulate(...)

Control LLM responses, tool outputs, timeouts, empty results, and malformed payloads.

Verify

agent.verify(...)

Assert the final result, tool calls, arguments, call order, and number of attempts.

Test the shape of the run

For agent tests, the final answer is not enough. Tenro helps you check which tools were called, what arguments were passed, how many times the agent retried, and whether it handled bad data correctly.

API calls in tests

100%

reproducible

Tool

outputs simulated

Run

shape asserted

Integrations

No provider lock-in

Simulate behavior around the providers and frameworks you already use.

LLM Providers

OpenAI

Anthropic

Google Gemini

Agent Frameworks

LangChain

LangGraph

LlamaIndex

CrewAI

AutoGen

PydanticAI

Custom

Early access

Test agents with repeatable simulations

Get release notes, examples, and practical testing patterns for agent teams.