LLM Application Patterns
========================

**Problem:** You need proven architectural patterns and tracing strategies for building complex LLM applications like agents, RAG systems, and multi-step reasoning workflows.

**Solution:** Use these battle-tested LLM-specific patterns with HoneyHive tracing to build observable, maintainable, and debuggable AI systems.

This guide focuses on LLM-specific architectures and patterns, not generic software patterns.

.. contents:: Quick Navigation
   :local:
   :depth: 2

Agent Architecture Patterns
---------------------------

Pattern 1: ReAct (Reasoning + Acting)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Use Case:** Agents that alternate between reasoning about the problem and taking actions with tools.

**Architecture:**

.. mermaid::

   graph TD
       A[User Query] --> B[Reasoning Step]
       B --> C{Need Tool?}
       C -->|Yes| D[Tool Call]
       C -->|No| E[Final Answer]
       D --> F[Observe Result]
       F --> B
       E --> G[Response]

**Implementation with Tracing:**

.. code-block:: python

   from honeyhive import HoneyHiveTracer, trace, enrich_span
   from honeyhive.models import EventType
   import openai
   
   tracer = HoneyHiveTracer.init(project="react-agent")
   
   @trace(tracer=tracer, event_type=EventType.chain)
   def react_agent(query: str, max_steps: int = 5) -> str:
       """ReAct agent with reasoning and acting."""
       enrich_span({
           "agent.type": "react",
           "agent.query": query,
           "agent.max_steps": max_steps
       })
       
       conversation_history = []
       
       for step in range(max_steps):
           # Reasoning step
           thought = reason_about_problem(query, conversation_history, step)
           
           if thought["action"] == "final_answer":
               enrich_span({"agent.steps_used": step + 1})
               return thought["answer"]
           
           # Acting step  
           observation = execute_tool(thought["tool"], thought["input"])
           conversation_history.append({
               "step": step,
               "thought": thought,
               "observation": observation
           })
       
       return "Max steps reached"
   
   @trace(tracer=tracer, event_type=EventType.model)
   def reason_about_problem(query: str, history: list, step: int) -> dict:
       """Reasoning step using LLM."""
       enrich_span({"reasoning.step": step, "reasoning.history_length": len(history)})
       
       client = openai.OpenAI()
       response = client.chat.completions.create(
           model="gpt-4",
           messages=[
               {"role": "system", "content": "Think step by step. Decide action: use tool or give final answer."},
               {"role": "user", "content": f"Query: {query}\nHistory: {history}"}
           ]
       )
       
       # Parse response into thought/action/input
       return parse_reasoning(response.choices[0].message.content)

**Trace Hierarchy:**

- Session: `react_agent`
  - Chain: `reason_about_problem` (step 1)
  - Tool: `execute_tool` (step 1)
  - Chain: `reason_about_problem` (step 2)
  - Tool: `execute_tool` (step 2)
  - Chain: `reason_about_problem` (final)

**Tradeoffs:**

- ✅ **Pros**: Flexible, handles dynamic situations, transparent reasoning
- ❌ **Cons**: Higher token cost (multiple LLM calls), slower than pre-planned approaches
- 💡 **When to Use**: Open-ended problems, unpredictable tool needs, exploratory tasks
- 🚫 **When to Avoid**: High-latency sensitivity, token budget constraints, predictable workflows

Pattern 2: Plan-and-Execute
^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Use Case:** Complex queries requiring upfront planning before execution.

**Implementation:**

.. code-block:: python

   @trace(tracer=tracer, event_type=EventType.chain)
   def plan_and_execute_agent(query: str) -> str:
       """Agent that plans first, then executes."""
       enrich_span({"agent.type": "plan_and_execute", "agent.query": query})
       
       # Phase 1: Planning
       plan = create_execution_plan(query)
       enrich_span({"agent.plan_steps": len(plan["steps"])})
       
       # Phase 2: Execution
       results = []
       for i, step in enumerate(plan["steps"]):
           result = execute_step(step, results)
           results.append(result)
           enrich_span({f"agent.step_{i}_status": "complete"})
       
       # Phase 3: Synthesis
       final_answer = synthesize_results(query, results)
       return final_answer
   
   @trace(tracer=tracer, event_type=EventType.model)
   def create_execution_plan(query: str) -> dict:
       """Create step-by-step execution plan."""
       enrich_span({"planning.query_complexity": estimate_complexity(query)})
       
       client = openai.OpenAI()
       response = client.chat.completions.create(
           model="gpt-4",
           messages=[{
               "role": "user",
               "content": f"Create a step-by-step plan for: {query}"
           }]
       )
       
       plan = parse_plan(response.choices[0].message.content)
       enrich_span({"planning.steps_generated": len(plan["steps"])})
       return plan

**Tradeoffs:**

- ✅ **Pros**: Better for complex tasks, clear execution path, easier to debug
- ❌ **Cons**: Less flexible, planning overhead, struggles with dynamic environments
- 💡 **When to Use**: Multi-step tasks, parallel execution needs, known problem space
- 🚫 **When to Avoid**: Rapidly changing conditions, simple single-step tasks

Pattern 3: Reflexion (Self-Reflection)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Use Case:** Agents that critique and improve their own outputs.

**Implementation:**

.. code-block:: python

   @trace(tracer=tracer, event_type=EventType.chain)
   def reflexion_agent(query: str, max_iterations: int = 3) -> str:
       """Agent that reflects on and improves its output."""
       enrich_span({
           "agent.type": "reflexion",
           "agent.max_iterations": max_iterations
       })
       
       current_answer = generate_initial_answer(query)
       
       for iteration in range(max_iterations):
           critique = self_critique(query, current_answer)
           
           if critique["quality_score"] >= 0.9:
               enrich_span({"agent.converged_at_iteration": iteration})
               break
           
           current_answer = improve_answer(query, current_answer, critique)
       
       return current_answer
   
   @trace(tracer=tracer, event_type=EventType.model)
   def self_critique(query: str, answer: str) -> dict:
       """Self-critique the current answer."""
       enrich_span({"critique.answer_length": len(answer)})
       
       client = openai.OpenAI()
       response = client.chat.completions.create(
           model="gpt-4",
           messages=[{
               "role": "user",
               "content": f"Critique this answer to '{query}': {answer}\nScore 0-1 for quality."
           }]
       )
       
       critique = parse_critique(response.choices[0].message.content)
       enrich_span({"critique.quality_score": critique["quality_score"]})
       return critique

**Tradeoffs:**

- ✅ **Pros**: Higher quality outputs, self-correction, learns from mistakes
- ❌ **Cons**: Expensive (multiple critique cycles), slow convergence possible
- 💡 **When to Use**: Quality-critical tasks, creative work, complex reasoning
- 🚫 **When to Avoid**: Real-time applications, simple factual queries, tight budgets

Pattern 4: Multi-Agent Collaboration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Use Case:** Multiple specialized agents working together.

**Implementation:**

.. code-block:: python

   @trace(tracer=tracer, event_type=EventType.chain)
   def multi_agent_system(task: str) -> str:
       """System with multiple specialized agents."""
       enrich_span({"system.type": "multi_agent", "system.task": task})
       
       # Agent 1: Research specialist
       research = research_agent(task)
       
       # Agent 2: Analysis specialist
       analysis = analysis_agent(research)
       
       # Agent 3: Synthesis specialist
       final_output = synthesis_agent(task, research, analysis)
       
       enrich_span({"system.agents_used": 3})
       return final_output
   
   @trace(tracer=tracer, event_type=EventType.model)
   def research_agent(task: str) -> dict:
       """Specialized research agent."""
       enrich_span({"agent.role": "researcher", "agent.specialty": "information_gathering"})
       # Research logic...
       return {"findings": [...]}

**Tradeoffs:**

- ✅ **Pros**: Specialized expertise, parallel execution, diverse perspectives
- ❌ **Cons**: Complex coordination, high resource usage, potential conflicts
- 💡 **When to Use**: Multi-domain problems, need for specialization, parallel work
- 🚫 **When to Avoid**: Simple tasks, tight latency requirements, limited resources

Pattern 5: Tool-Using Agents
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Use Case:** Agents that can discover and use external tools dynamically.

**Implementation:**

.. code-block:: python

   @trace(tracer=tracer, event_type=EventType.chain)
   def tool_using_agent(query: str, available_tools: list) -> str:
       """Agent that selects and uses appropriate tools."""
       enrich_span({
           "agent.type": "tool_user",
           "agent.available_tools": len(available_tools),
           "agent.tool_names": [t.name for t in available_tools]
       })
       
       # Select appropriate tool
       selected_tool = select_tool(query, available_tools)
       enrich_span({"agent.selected_tool": selected_tool.name})
       
       # Use the tool
       result = execute_tool_with_llm(query, selected_tool)
       
       return result
   
   @trace(tracer=tracer, event_type=EventType.model)
   def select_tool(query: str, tools: list) -> object:
       """LLM selects the best tool for the query."""
       tool_descriptions = "\n".join([f"- {t.name}: {t.description}" for t in tools])
       
       enrich_span({"tool_selection.options": len(tools)})
       
       client = openai.OpenAI()
       response = client.chat.completions.create(
           model="gpt-4",
           messages=[{
               "role": "user",
               "content": f"Select best tool for: {query}\n\nTools:\n{tool_descriptions}"
           }]
       )
       
       selected = parse_tool_selection(response.choices[0].message.content, tools)
       enrich_span({"tool_selection.chosen": selected.name})
       return selected

Pattern 6: Memory-Augmented Agents
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Use Case:** Agents that maintain and query long-term memory.

**Implementation:**

.. code-block:: python

   @trace(tracer=tracer, event_type=EventType.chain)
   def memory_augmented_agent(query: str, user_id: str) -> str:
       """Agent with long-term memory."""
       enrich_span({
           "agent.type": "memory_augmented",
           "agent.user_id": user_id
       })
       
       # Retrieve relevant memories
       relevant_memories = retrieve_memories(user_id, query)
       enrich_span({"agent.memories_retrieved": len(relevant_memories)})
       
       # Generate response with memory context
       response = generate_with_memory(query, relevant_memories)
       
       # Store new memory
       store_memory(user_id, query, response)
       
       return response
   
   @trace(tracer=tracer, event_type=EventType.tool)
   def retrieve_memories(user_id: str, query: str) -> list:
       """Retrieve relevant memories from vector store."""
       enrich_span({
           "memory.user_id": user_id,
           "memory.query_embedding": "generated"
       })
       
       # Vector similarity search
       memories = vector_store.search(user_id, query, top_k=5)
       
       enrich_span({"memory.results_found": len(memories)})
       return memories

**Tradeoffs:**

- ✅ **Pros**: Personalization, context preservation, improves over time
- ❌ **Cons**: Privacy concerns, storage costs, retrieval accuracy challenges
- 💡 **When to Use**: Conversational agents, personalized systems, long-term interactions
- 🚫 **When to Avoid**: Stateless services, privacy-sensitive domains, simple one-shot tasks

LLM Workflow Patterns
---------------------

Pattern 1: RAG (Retrieval-Augmented Generation)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Implementation:**

.. code-block:: python

   @trace(tracer=tracer, event_type=EventType.chain)
   def rag_pipeline(query: str, knowledge_base: str) -> str:
       """RAG pipeline with full tracing."""
       enrich_span({
           "workflow.type": "rag",
           "workflow.query": query,
           "workflow.kb": knowledge_base
       })
       
       # Stage 1: Retrieval
       documents = retrieve_documents(query, knowledge_base)
       
       # Stage 2: Context building
       context = build_context(documents)
       
       # Stage 3: Generation
       response = generate_with_context(query, context)
       
       return response
   
   @trace(tracer=tracer, event_type=EventType.tool)
   def retrieve_documents(query: str, kb: str) -> list:
       """Retrieve relevant documents."""
       enrich_span({
           "retrieval.query_length": len(query),
           "retrieval.kb": kb
       })
       
       # Vector search
       docs = vector_search(query, kb, top_k=5)
       
       enrich_span({
           "retrieval.docs_found": len(docs),
           "retrieval.avg_relevance": calculate_avg_relevance(docs)
       })
       
       return docs

**Trace Hierarchy:**

.. mermaid::

   graph TD
       A[RAG Pipeline] --> B[Retrieve Documents]
       A --> C[Build Context]
       A --> D[Generate with Context]
       B --> E[Vector Search]
       D --> F[LLM Generation]

**Tradeoffs:**

- ✅ **Pros**: Factual accuracy, up-to-date information, reduces hallucinations
- ❌ **Cons**: Retrieval quality dependency, increased latency, context window limits
- 💡 **When to Use**: Knowledge-intensive tasks, factual QA, domain-specific content
- 🚫 **When to Avoid**: Creative generation, general reasoning, low-latency needs

Pattern 2: Chain-of-Thought
^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Implementation:**

.. code-block:: python

   @trace(tracer=tracer, event_type=EventType.model)
   def chain_of_thought_reasoning(problem: str) -> str:
       """LLM uses chain-of-thought prompting."""
       enrich_span({
           "workflow.type": "chain_of_thought",
           "workflow.problem_complexity": estimate_complexity(problem)
       })
       
       client = openai.OpenAI()
       response = client.chat.completions.create(
           model="gpt-4",
           messages=[{
               "role": "system",
               "content": "Think step-by-step. Show your reasoning."
           }, {
               "role": "user",
               "content": problem
           }]
       )
       
       reasoning = response.choices[0].message.content
       steps = extract_reasoning_steps(reasoning)
       
       enrich_span({
           "workflow.reasoning_steps": len(steps),
           "workflow.tokens_used": len(reasoning.split())
       })
       
       return reasoning

Pattern 3: Self-Correction Loops
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Implementation:**

.. code-block:: python

   @trace(tracer=tracer, event_type=EventType.chain)
   def self_correcting_generation(task: str) -> str:
       """Generate, validate, and correct output."""
       enrich_span({"workflow.type": "self_correction"})
       
       max_attempts = 3
       for attempt in range(max_attempts):
           output = generate_output(task)
           validation = validate_output(output, task)
           
           if validation["is_valid"]:
               enrich_span({"workflow.succeeded_at_attempt": attempt + 1})
               return output
           
           # Self-correct based on validation feedback
           task = f"{task}\n\nPrevious attempt had issues: {validation['issues']}"
       
       return output  # Return best attempt

Pattern 4: Prompt Chaining
^^^^^^^^^^^^^^^^^^^^^^^^^^

**Implementation:**

.. code-block:: python

   @trace(tracer=tracer, event_type=EventType.chain)
   def prompt_chain_workflow(input_text: str) -> str:
       """Chain multiple prompts for complex tasks."""
       enrich_span({
           "workflow.type": "prompt_chain",
           "workflow.input_length": len(input_text)
       })
       
       # Step 1: Extract key information
       key_info = extract_information(input_text)
       
       # Step 2: Analyze extracted info
       analysis = analyze_information(key_info)
       
       # Step 3: Generate final output
       final_output = generate_final_response(analysis)
       
       enrich_span({"workflow.chain_steps": 3})
       return final_output

Pattern 5: Dynamic Few-Shot Learning
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Implementation:**

.. code-block:: python

   @trace(tracer=tracer, event_type=EventType.model)
   def dynamic_few_shot(query: str, example_pool: list) -> str:
       """Select relevant examples dynamically."""
       enrich_span({
           "workflow.type": "dynamic_few_shot",
           "workflow.example_pool_size": len(example_pool)
       })
       
       # Select most relevant examples
       selected_examples = select_relevant_examples(query, example_pool, k=3)
       enrich_span({"workflow.examples_selected": len(selected_examples)})
       
       # Build few-shot prompt
       prompt = build_few_shot_prompt(query, selected_examples)
       
       # Generate with examples
       client = openai.OpenAI()
       response = client.chat.completions.create(
           model="gpt-4",
           messages=[{"role": "user", "content": prompt}]
       )
       
       return response.choices[0].message.content

Best Practices for LLM Applications
-----------------------------------

1. **Always Enrich with Agent Context**

.. code-block:: python

   enrich_span({
       "agent.type": "react",
       "agent.step": current_step,
       "agent.decision": "tool_call",
       "agent.confidence": 0.95
   })

2. **Track Workflow Performance**

.. code-block:: python

   import time
   
   start = time.time()
   result = execute_workflow()
   
   enrich_span({
       "workflow.duration_ms": (time.time() - start) * 1000,
       "workflow.steps_executed": step_count,
       "workflow.cost_estimate": calculate_cost()
   })

3. **Use Consistent Event Types**

- `EventType.chain` - Multi-step workflows
- `EventType.model` - LLM calls
- `EventType.tool` - Tool/function executions
- `EventType.session` - Complete user sessions

4. **Implement Fallbacks with Tracing**

.. code-block:: python

   @trace(tracer=tracer, event_type=EventType.chain)
   def resilient_agent(query: str) -> str:
       strategies = ["gpt-4", "gpt-3.5-turbo", "claude-3"]
       
       for i, model in enumerate(strategies):
           try:
               result = try_model(query, model)
               enrich_span({
                   "resilience.succeeded_with": model,
                   "resilience.attempts": i + 1
               })
               return result
           except Exception as e:
               enrich_span({f"resilience.attempt_{i}_failed": str(e)})
               continue
       
       raise Exception("All strategies failed")

Next Steps
----------

- :doc:`/how-to/deployment/production` - Production deployment patterns
- :doc:`/how-to/advanced-tracing/span-enrichment` - Advanced enrichment patterns
- :doc:`/how-to/advanced-tracing/custom-spans` - Custom span creation
- :doc:`/tutorials/index` - Complete LLM application tutorials

**Key Takeaway:** LLM applications require specialized architectural patterns. Use these proven agent and workflow patterns with comprehensive tracing to build observable, debuggable AI systems. ✨