LLM Application Patterns
Problem: You need proven architectural patterns and tracing strategies for building complex LLM applications like agents, RAG systems, and multi-step reasoning workflows.
Solution: Use these battle-tested LLM-specific patterns with HoneyHive tracing to build observable, maintainable, and debuggable AI systems.
This guide focuses on LLM-specific architectures and patterns, not generic software patterns.
Agent Architecture Patterns
Pattern 1: ReAct (Reasoning + Acting)
Use Case: Agents that alternate between reasoning about the problem and taking actions with tools.
Architecture:
graph TD
A[User Query] --> B[Reasoning Step]
B --> C{Need Tool?}
C -->|Yes| D[Tool Call]
C -->|No| E[Final Answer]
D --> F[Observe Result]
F --> B
E --> G[Response]
Implementation with Tracing:
from honeyhive import HoneyHiveTracer, trace, enrich_span
from honeyhive.models import EventType
import openai
tracer = HoneyHiveTracer.init(project="react-agent")
@trace(tracer=tracer, event_type=EventType.chain)
def react_agent(query: str, max_steps: int = 5) -> str:
"""ReAct agent with reasoning and acting."""
enrich_span({
"agent.type": "react",
"agent.query": query,
"agent.max_steps": max_steps
})
conversation_history = []
for step in range(max_steps):
# Reasoning step
thought = reason_about_problem(query, conversation_history, step)
if thought["action"] == "final_answer":
enrich_span({"agent.steps_used": step + 1})
return thought["answer"]
# Acting step
observation = execute_tool(thought["tool"], thought["input"])
conversation_history.append({
"step": step,
"thought": thought,
"observation": observation
})
return "Max steps reached"
@trace(tracer=tracer, event_type=EventType.model)
def reason_about_problem(query: str, history: list, step: int) -> dict:
"""Reasoning step using LLM."""
enrich_span({"reasoning.step": step, "reasoning.history_length": len(history)})
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Think step by step. Decide action: use tool or give final answer."},
{"role": "user", "content": f"Query: {query}\nHistory: {history}"}
]
)
# Parse response into thought/action/input
return parse_reasoning(response.choices[0].message.content)
Trace Hierarchy:
Session: react_agent - Chain: reason_about_problem (step 1) - Tool: execute_tool (step 1) - Chain: reason_about_problem (step 2) - Tool: execute_tool (step 2) - Chain: reason_about_problem (final)
Tradeoffs:
✅ Pros: Flexible, handles dynamic situations, transparent reasoning
❌ Cons: Higher token cost (multiple LLM calls), slower than pre-planned approaches
💡 When to Use: Open-ended problems, unpredictable tool needs, exploratory tasks
🚫 When to Avoid: High-latency sensitivity, token budget constraints, predictable workflows
Pattern 2: Plan-and-Execute
Use Case: Complex queries requiring upfront planning before execution.
Implementation:
@trace(tracer=tracer, event_type=EventType.chain)
def plan_and_execute_agent(query: str) -> str:
"""Agent that plans first, then executes."""
enrich_span({"agent.type": "plan_and_execute", "agent.query": query})
# Phase 1: Planning
plan = create_execution_plan(query)
enrich_span({"agent.plan_steps": len(plan["steps"])})
# Phase 2: Execution
results = []
for i, step in enumerate(plan["steps"]):
result = execute_step(step, results)
results.append(result)
enrich_span({f"agent.step_{i}_status": "complete"})
# Phase 3: Synthesis
final_answer = synthesize_results(query, results)
return final_answer
@trace(tracer=tracer, event_type=EventType.model)
def create_execution_plan(query: str) -> dict:
"""Create step-by-step execution plan."""
enrich_span({"planning.query_complexity": estimate_complexity(query)})
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"Create a step-by-step plan for: {query}"
}]
)
plan = parse_plan(response.choices[0].message.content)
enrich_span({"planning.steps_generated": len(plan["steps"])})
return plan
Tradeoffs:
✅ Pros: Better for complex tasks, clear execution path, easier to debug
❌ Cons: Less flexible, planning overhead, struggles with dynamic environments
💡 When to Use: Multi-step tasks, parallel execution needs, known problem space
🚫 When to Avoid: Rapidly changing conditions, simple single-step tasks
Pattern 3: Reflexion (Self-Reflection)
Use Case: Agents that critique and improve their own outputs.
Implementation:
@trace(tracer=tracer, event_type=EventType.chain)
def reflexion_agent(query: str, max_iterations: int = 3) -> str:
"""Agent that reflects on and improves its output."""
enrich_span({
"agent.type": "reflexion",
"agent.max_iterations": max_iterations
})
current_answer = generate_initial_answer(query)
for iteration in range(max_iterations):
critique = self_critique(query, current_answer)
if critique["quality_score"] >= 0.9:
enrich_span({"agent.converged_at_iteration": iteration})
break
current_answer = improve_answer(query, current_answer, critique)
return current_answer
@trace(tracer=tracer, event_type=EventType.model)
def self_critique(query: str, answer: str) -> dict:
"""Self-critique the current answer."""
enrich_span({"critique.answer_length": len(answer)})
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"Critique this answer to '{query}': {answer}\nScore 0-1 for quality."
}]
)
critique = parse_critique(response.choices[0].message.content)
enrich_span({"critique.quality_score": critique["quality_score"]})
return critique
Tradeoffs:
✅ Pros: Higher quality outputs, self-correction, learns from mistakes
❌ Cons: Expensive (multiple critique cycles), slow convergence possible
💡 When to Use: Quality-critical tasks, creative work, complex reasoning
🚫 When to Avoid: Real-time applications, simple factual queries, tight budgets
Pattern 4: Multi-Agent Collaboration
Use Case: Multiple specialized agents working together.
Implementation:
@trace(tracer=tracer, event_type=EventType.chain)
def multi_agent_system(task: str) -> str:
"""System with multiple specialized agents."""
enrich_span({"system.type": "multi_agent", "system.task": task})
# Agent 1: Research specialist
research = research_agent(task)
# Agent 2: Analysis specialist
analysis = analysis_agent(research)
# Agent 3: Synthesis specialist
final_output = synthesis_agent(task, research, analysis)
enrich_span({"system.agents_used": 3})
return final_output
@trace(tracer=tracer, event_type=EventType.model)
def research_agent(task: str) -> dict:
"""Specialized research agent."""
enrich_span({"agent.role": "researcher", "agent.specialty": "information_gathering"})
# Research logic...
return {"findings": [...]}
Tradeoffs:
✅ Pros: Specialized expertise, parallel execution, diverse perspectives
❌ Cons: Complex coordination, high resource usage, potential conflicts
💡 When to Use: Multi-domain problems, need for specialization, parallel work
🚫 When to Avoid: Simple tasks, tight latency requirements, limited resources
Pattern 5: Tool-Using Agents
Use Case: Agents that can discover and use external tools dynamically.
Implementation:
@trace(tracer=tracer, event_type=EventType.chain)
def tool_using_agent(query: str, available_tools: list) -> str:
"""Agent that selects and uses appropriate tools."""
enrich_span({
"agent.type": "tool_user",
"agent.available_tools": len(available_tools),
"agent.tool_names": [t.name for t in available_tools]
})
# Select appropriate tool
selected_tool = select_tool(query, available_tools)
enrich_span({"agent.selected_tool": selected_tool.name})
# Use the tool
result = execute_tool_with_llm(query, selected_tool)
return result
@trace(tracer=tracer, event_type=EventType.model)
def select_tool(query: str, tools: list) -> object:
"""LLM selects the best tool for the query."""
tool_descriptions = "\n".join([f"- {t.name}: {t.description}" for t in tools])
enrich_span({"tool_selection.options": len(tools)})
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"Select best tool for: {query}\n\nTools:\n{tool_descriptions}"
}]
)
selected = parse_tool_selection(response.choices[0].message.content, tools)
enrich_span({"tool_selection.chosen": selected.name})
return selected
Pattern 6: Memory-Augmented Agents
Use Case: Agents that maintain and query long-term memory.
Implementation:
@trace(tracer=tracer, event_type=EventType.chain)
def memory_augmented_agent(query: str, user_id: str) -> str:
"""Agent with long-term memory."""
enrich_span({
"agent.type": "memory_augmented",
"agent.user_id": user_id
})
# Retrieve relevant memories
relevant_memories = retrieve_memories(user_id, query)
enrich_span({"agent.memories_retrieved": len(relevant_memories)})
# Generate response with memory context
response = generate_with_memory(query, relevant_memories)
# Store new memory
store_memory(user_id, query, response)
return response
@trace(tracer=tracer, event_type=EventType.tool)
def retrieve_memories(user_id: str, query: str) -> list:
"""Retrieve relevant memories from vector store."""
enrich_span({
"memory.user_id": user_id,
"memory.query_embedding": "generated"
})
# Vector similarity search
memories = vector_store.search(user_id, query, top_k=5)
enrich_span({"memory.results_found": len(memories)})
return memories
Tradeoffs:
✅ Pros: Personalization, context preservation, improves over time
❌ Cons: Privacy concerns, storage costs, retrieval accuracy challenges
💡 When to Use: Conversational agents, personalized systems, long-term interactions
🚫 When to Avoid: Stateless services, privacy-sensitive domains, simple one-shot tasks
LLM Workflow Patterns
Pattern 1: RAG (Retrieval-Augmented Generation)
Implementation:
@trace(tracer=tracer, event_type=EventType.chain)
def rag_pipeline(query: str, knowledge_base: str) -> str:
"""RAG pipeline with full tracing."""
enrich_span({
"workflow.type": "rag",
"workflow.query": query,
"workflow.kb": knowledge_base
})
# Stage 1: Retrieval
documents = retrieve_documents(query, knowledge_base)
# Stage 2: Context building
context = build_context(documents)
# Stage 3: Generation
response = generate_with_context(query, context)
return response
@trace(tracer=tracer, event_type=EventType.tool)
def retrieve_documents(query: str, kb: str) -> list:
"""Retrieve relevant documents."""
enrich_span({
"retrieval.query_length": len(query),
"retrieval.kb": kb
})
# Vector search
docs = vector_search(query, kb, top_k=5)
enrich_span({
"retrieval.docs_found": len(docs),
"retrieval.avg_relevance": calculate_avg_relevance(docs)
})
return docs
Trace Hierarchy:
graph TD
A[RAG Pipeline] --> B[Retrieve Documents]
A --> C[Build Context]
A --> D[Generate with Context]
B --> E[Vector Search]
D --> F[LLM Generation]
Tradeoffs:
✅ Pros: Factual accuracy, up-to-date information, reduces hallucinations
❌ Cons: Retrieval quality dependency, increased latency, context window limits
💡 When to Use: Knowledge-intensive tasks, factual QA, domain-specific content
🚫 When to Avoid: Creative generation, general reasoning, low-latency needs
Pattern 2: Chain-of-Thought
Implementation:
@trace(tracer=tracer, event_type=EventType.model)
def chain_of_thought_reasoning(problem: str) -> str:
"""LLM uses chain-of-thought prompting."""
enrich_span({
"workflow.type": "chain_of_thought",
"workflow.problem_complexity": estimate_complexity(problem)
})
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "system",
"content": "Think step-by-step. Show your reasoning."
}, {
"role": "user",
"content": problem
}]
)
reasoning = response.choices[0].message.content
steps = extract_reasoning_steps(reasoning)
enrich_span({
"workflow.reasoning_steps": len(steps),
"workflow.tokens_used": len(reasoning.split())
})
return reasoning
Pattern 3: Self-Correction Loops
Implementation:
@trace(tracer=tracer, event_type=EventType.chain)
def self_correcting_generation(task: str) -> str:
"""Generate, validate, and correct output."""
enrich_span({"workflow.type": "self_correction"})
max_attempts = 3
for attempt in range(max_attempts):
output = generate_output(task)
validation = validate_output(output, task)
if validation["is_valid"]:
enrich_span({"workflow.succeeded_at_attempt": attempt + 1})
return output
# Self-correct based on validation feedback
task = f"{task}\n\nPrevious attempt had issues: {validation['issues']}"
return output # Return best attempt
Pattern 4: Prompt Chaining
Implementation:
@trace(tracer=tracer, event_type=EventType.chain)
def prompt_chain_workflow(input_text: str) -> str:
"""Chain multiple prompts for complex tasks."""
enrich_span({
"workflow.type": "prompt_chain",
"workflow.input_length": len(input_text)
})
# Step 1: Extract key information
key_info = extract_information(input_text)
# Step 2: Analyze extracted info
analysis = analyze_information(key_info)
# Step 3: Generate final output
final_output = generate_final_response(analysis)
enrich_span({"workflow.chain_steps": 3})
return final_output
Pattern 5: Dynamic Few-Shot Learning
Implementation:
@trace(tracer=tracer, event_type=EventType.model)
def dynamic_few_shot(query: str, example_pool: list) -> str:
"""Select relevant examples dynamically."""
enrich_span({
"workflow.type": "dynamic_few_shot",
"workflow.example_pool_size": len(example_pool)
})
# Select most relevant examples
selected_examples = select_relevant_examples(query, example_pool, k=3)
enrich_span({"workflow.examples_selected": len(selected_examples)})
# Build few-shot prompt
prompt = build_few_shot_prompt(query, selected_examples)
# Generate with examples
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
Best Practices for LLM Applications
Always Enrich with Agent Context
enrich_span({
"agent.type": "react",
"agent.step": current_step,
"agent.decision": "tool_call",
"agent.confidence": 0.95
})
Track Workflow Performance
import time
start = time.time()
result = execute_workflow()
enrich_span({
"workflow.duration_ms": (time.time() - start) * 1000,
"workflow.steps_executed": step_count,
"workflow.cost_estimate": calculate_cost()
})
Use Consistent Event Types
EventType.chain - Multi-step workflows
EventType.model - LLM calls
EventType.tool - Tool/function executions
EventType.session - Complete user sessions
Implement Fallbacks with Tracing
@trace(tracer=tracer, event_type=EventType.chain)
def resilient_agent(query: str) -> str:
strategies = ["gpt-4", "gpt-3.5-turbo", "claude-3"]
for i, model in enumerate(strategies):
try:
result = try_model(query, model)
enrich_span({
"resilience.succeeded_with": model,
"resilience.attempts": i + 1
})
return result
except Exception as e:
enrich_span({f"resilience.attempt_{i}_failed": str(e)})
continue
raise Exception("All strategies failed")
Next Steps
Production Deployment Guide - Production deployment patterns
Span Enrichment Patterns - Advanced enrichment patterns
Custom Span Management - Custom span creation
Tutorials - Complete LLM application tutorials
Key Takeaway: LLM applications require specialized architectural patterns. Use these proven agent and workflow patterns with comprehensive tracing to build observable, debuggable AI systems. ✨