Custom agents¶
Test agents built with raw LLM SDK calls (OpenAI, Anthropic, Google, etc.).
When to use this guide
Use this guide if you're calling LLM APIs directly (e.g., openai.chat.completions.create()). If you're using a framework like LangChain or CrewAI, see the framework guides instead.
What you'll use¶
| Decorator | Purpose |
|---|---|
@link_tool |
Your custom tools (search, database, APIs) |
@link_llm |
Your direct LLM calls (optional, for cleaner verification) |
@link_agent |
Entry point for tracing |
Why @link_llm is optional¶
Tenro intercepts LLM calls at the HTTP level. This means llm.simulate() works whether or not you use @link_llm:
from tenro import Provider
from tenro.simulate import llm
# This works WITHOUT @link_llm - Tenro intercepts HTTP directly
llm.simulate(Provider.OPENAI, response="Hello!")
# Your OpenAI SDK call is intercepted at the network level
response = openai.chat.completions.create(...) # Intercepted!
What @link_llm does:
- Creates an
LLMScopeto trace which function made each LLM call - Enables targeted simulation with
target=when you have multiple LLM functions - Provides better error messages showing which function failed
Benefits of using @link_llm:
- Route simulations to specific functions with
target= - Explicit tracing of LLM call origins for debugging
- Works when HTTP interception isn't available (non-httpx SDKs)
Customer support example¶
A customer support agent with direct OpenAI SDK calls.
"""Customer Support: Testing knowledge base retrieval with custom OpenAI agents."""
from __future__ import annotations
from examples.myapp import CustomerSupportAgent, generate_response, search_knowledge_base
from tenro import Provider
from tenro.simulate import llm, tool
from tenro.testing import tenro
@tenro
def test_customer_support_answers_question() -> None:
"""Test customer support agent uses knowledge base and LLM."""
tool.simulate(
search_knowledge_base,
result=[{"title": "Refund Policy", "content": "Full refunds within 30 days."}],
)
llm.simulate(
Provider.OPENAI,
target=generate_response,
response="You can get a full refund within 30 days of purchase.",
)
CustomerSupportAgent().run("How do I get a refund?")
tool.verify_many(search_knowledge_base, count=1)
llm.verify(Provider.OPENAI)
llm.verify(output_contains="refund")
RAG pipeline example¶
A retrieval-augmented generation pipeline with direct LLM calls.
"""RAG Pipeline: Testing document retrieval with custom OpenAI agents."""
from __future__ import annotations
from examples.myapp import RAGPipeline, fetch_documents, generate_response
from tenro import Provider
from tenro.simulate import llm, tool
from tenro.testing import tenro
@tenro
def test_rag_pipeline_synthesizes_answer() -> None:
"""Test RAG pipeline fetches documents and generates answer."""
tool.simulate(
fetch_documents,
result=[
{"id": "doc1", "text": "Machine learning uses algorithms to learn."},
{"id": "doc2", "text": "Deep learning is a subset of ML."},
],
)
llm.simulate(
Provider.OPENAI,
target=generate_response,
response="Machine learning is a field where algorithms learn patterns from data.",
)
RAGPipeline().run("What is machine learning?", "AI")
tool.verify_many(fetch_documents, count=1)
llm.verify(Provider.OPENAI)
llm.verify(output_contains="Machine learning")
Multi-turn conversation example¶
An agent handling multi-turn conversations.
"""Multi-Turn Conversation: Testing sequential LLM calls with custom OpenAI agents."""
from __future__ import annotations
from examples.myapp import ConversationAgent
from tenro import Provider
from tenro.simulate import llm
from tenro.testing import tenro
@tenro
def test_multi_turn_conversation() -> None:
"""Test agent handles multi-turn conversation with context."""
llm.simulate(
Provider.OPENAI,
responses=[
"A list in Python is created with square brackets: my_list = [1, 2, 3]",
"To add items, use append(): my_list.append(4)",
],
)
responses = ConversationAgent().run(
["How do I create a list in Python?", "How do I add items to it?"]
)
assert len(responses) == 2
llm.verify_many(Provider.OPENAI, count=2)
llm.verify(output_contains="list", call_index=0)
llm.verify(output_contains="append", call_index=1)
More examples¶
Research assistant¶
"""Basic example: Research assistant agent.
Tests an agent that searches the web and summarizes findings.
"""
from __future__ import annotations
from tenro import Provider, link_agent, link_llm, link_tool
from tenro.simulate import llm, tool
from tenro.testing import tenro
# APPLICATION CODE
@link_tool("web_search")
def web_search(query: str, num_results: int = 5) -> list[dict]:
"""Search the web for information."""
return [{"title": "Result 1", "snippet": "...", "url": "https://..."}]
@link_llm(Provider.OPENAI)
def synthesize_findings(query: str, search_results: list[dict]) -> str:
"""Synthesize search results into a coherent summary."""
import openai
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": f"Query: {query}\nResults: {search_results}\n\nSynthesize.",
}
],
)
return response.choices[0].message.content
@link_agent("ResearchAssistantAgent")
class ResearchAssistantAgent:
"""Agent that researches a topic and provides a summary with sources."""
def run(self, question: str) -> dict:
"""Run the research assistant agent."""
results = web_search(question, num_results=5)
summary = synthesize_findings(question, results)
return {
"summary": summary,
"sources": [r["url"] for r in results],
}
# TESTS
@tenro
def test_research_agent_finds_and_summarizes():
"""Test that agent searches and synthesizes results."""
# Control what tools and LLMs return
tool.simulate(
web_search,
result=[
{
"title": "AI Trends 2025",
"snippet": "AI agents are...",
"url": "https://example.com/1",
},
{"title": "Future of AI", "snippet": "Agentic AI...", "url": "https://example.com/2"},
],
)
llm.simulate(
Provider.OPENAI,
response="AI agents are becoming the dominant paradigm in 2025.",
)
# Run the agent
ResearchAssistantAgent().run("What are the AI trends in 2025?")
# Verify behavior
tool.verify_many(web_search, count=1)
llm.verify_many(Provider.OPENAI, at_least=1)
@tenro
def test_research_agent_handles_no_results():
"""Test agent behavior when search returns nothing."""
# Simulate empty search results
tool.simulate(web_search, result=[])
llm.simulate(
Provider.OPENAI,
response="I couldn't find relevant information on this topic.",
)
# Run the agent
ResearchAssistantAgent().run("Very obscure topic")
# Verify graceful fallback
tool.verify_many(web_search, count=1)
llm.verify(output_contains="couldn't find")
Code review agent¶
"""Intermediate example: Code review agent.
Tests an agent that reviews pull requests and suggests improvements.
"""
from __future__ import annotations
from tenro import Provider, link_agent, link_llm, link_tool
from tenro.simulate import llm, tool
from tenro.testing import tenro
# APPLICATION CODE
@link_tool("fetch_pr_diff")
def fetch_pr_diff(pr_url: str) -> str:
"""Fetch the diff from a pull request."""
return "diff --git a/main.py..."
@link_tool("post_review_comment")
def post_review_comment(pr_url: str, comment: str) -> bool:
"""Post a review comment on the PR."""
return True
@link_llm(Provider.OPENAI)
def analyze_code(diff: str) -> dict:
"""Analyze code changes for issues and improvements."""
import openai
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": f"Review this code diff:\n{diff}"}],
)
return {"review": response.choices[0].message.content}
@link_agent("CodeReviewAgent")
class CodeReviewAgent:
"""Agent that reviews PRs and posts feedback."""
def run(self, pr_url: str) -> dict:
"""Run the code review agent."""
diff = fetch_pr_diff(pr_url)
analysis = analyze_code(diff)
post_review_comment(pr_url, analysis["review"])
return analysis
# TESTS
@tenro
def test_code_review_agent_finds_security_issue():
"""Test that agent identifies code issues."""
# Control what tools and LLMs return
tool.simulate(
fetch_pr_diff,
result="+ def process(data):\n+ eval(data) # dangerous!",
)
tool.simulate(post_review_comment, result=True)
# Simulate at HTTP level - response becomes the message content
llm.simulate(
Provider.OPENAI,
response="Security issue: Using eval() is dangerous. Use ast.literal_eval().",
)
# Run the agent
CodeReviewAgent().run("https://github.com/org/repo/pull/123")
# Verify issue was detected
tool.verify_many(fetch_pr_diff, count=1)
tool.verify_many(post_review_comment, count=1)
llm.verify(output_contains="Security issue")
@tenro
def test_code_review_agent_approves_clean_code():
"""Test agent with well-formatted code."""
# Simulate well-formatted PR
tool.simulate(
fetch_pr_diff,
result="+ def add(a: int, b: int) -> int:\n+ return a + b",
)
tool.simulate(post_review_comment, result=True)
llm.simulate(
Provider.OPENAI,
response="LGTM! Good implementation with type hints.",
)
# Run the agent
CodeReviewAgent().run("https://github.com/org/repo/pull/456")
# Verify approval given
tool.verify_many(fetch_pr_diff, count=1)
llm.verify(output_contains="LGTM")
Key patterns¶
Agentic loop (LLM calls tool)¶
When the LLM decides to call a tool, then responds with the result:
from tenro import Provider, ToolCall
from tenro.simulate import llm, tool
# Assuming web_search is defined with @link_tool("web_search")
# 1. Set up tool result (use function reference)
tool.simulate(web_search, result={"title": "AI Trends", "content": "..."})
# 2. Set up LLM responses: first triggers tool, second is final response
llm.simulate(Provider.OPENAI, responses=[
ToolCall(web_search, query="AI trends 2025"),
"Based on my research, AI trends in 2025 include...",
])
Using @link_llm for tracing¶
from tenro import link_llm, Provider
@link_llm(Provider.OPENAI)
def call_gpt(prompt: str) -> str:
"""Direct LLM call - you control this."""
import openai
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
Targeted simulation¶
from tenro import Provider
from tenro.simulate import llm
# Route simulation to a specific @link_llm function
llm.simulate(
Provider.OPENAI,
target=call_gpt,
response="Expected answer",
)
When to use @link_llm¶
| Scenario | Use @link_llm? |
|---|---|
| Single LLM function, httpx SDK | Optional (HTTP interception works) |
| Multiple LLM functions | Recommended (for targeted simulation) |
| Non-httpx SDK | Required (HTTP interception unavailable) |
| Complex agent with many LLM calls | Recommended (for tracing) |
See also¶
- Testing patterns - Common testing recipes
- How Tenro works - HTTP interception explained
- API reference: linking - Decorator documentation