Tracing Fundamentals
====================
.. note::
This document explains the fundamental concepts of distributed tracing and how they apply to LLM applications.
.. seealso::
**HoneyHive Tracer Architecture**
For a deep dive into how the HoneyHive SDK implements these concepts with a modular, mixin-based architecture, see :doc:`/reference/api/tracer-architecture`.
What is Distributed Tracing?
----------------------------
Distributed tracing is a method for tracking requests as they flow through complex systems. It provides:
- **End-to-end visibility** into request execution
- **Performance insights** at each step
- **Error correlation** across system boundaries
- **Context propagation** between services
**Traditional Web Application Tracing:**
.. code-block:: text
User Request → Load Balancer → Web Server → Database → Response
[-------------- Single Trace --------------]
**LLM Application Tracing:**
.. code-block:: text
User Query → Preprocessing → LLM Call → Post-processing → Response
[-------------- Enhanced with AI Context --------------]
Core Tracing Concepts
---------------------
**Traces**
A trace represents a complete request journey:
.. code-block:: text
# Example trace hierarchy
customer_support_request # Root span
├── validate_input # Child span
├── classify_query # Child span
├── llm_completion # Child span
│ ├── prompt_preparation
│ └── api_call
└── format_response # Child span
**Spans**
Individual operations within a trace:
.. code-block:: python
# Each span contains:
{
"span_id": "abc123",
"trace_id": "xyz789",
"parent_id": "parent456",
"operation_name": "llm_completion",
"start_time": "2024-01-15T10:30:00Z",
"end_time": "2024-01-15T10:30:02Z",
"duration": 2000, # milliseconds
"attributes": {
"llm.model": "gpt-4",
"llm.tokens.input": 45,
"llm.tokens.output": 67
},
"status": "ok"
}
**Attributes**
Key-value metadata attached to spans:
.. code-block:: python
# Standard attributes
"http.method": "POST"
"http.status_code": 200
# LLM-specific attributes
"llm.model": "gpt-3.5-turbo"
"llm.temperature": 0.7
"llm.tokens.prompt": 150
"llm.tokens.completion": 89
# Business attributes
"customer.id": "cust_123"
"support.priority": "high"
**Context Propagation**
How trace context flows between operations:
.. code-block:: python
def parent_function():
with tracer.trace("parent_operation") as span:
span.set_attribute("operation.type", "parent")
child_function() # Automatically inherits context
def child_function():
with tracer.trace("child_operation") as span:
span.set_attribute("operation.type", "child")
# This span is automatically a child of parent_operation
**Unified Enrichment Architecture**
The HoneyHive SDK provides a unified approach to span and session enrichment through a carefully designed architecture that supports multiple usage patterns while maintaining backwards compatibility:
.. mermaid::
%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#4F81BD', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#333333', 'lineColor': '#333333', 'mainBkg': 'transparent', 'secondBkg': 'transparent', 'tertiaryColor': 'transparent', 'clusterBkg': 'transparent', 'clusterBorder': '#333333', 'edgeLabelBackground': 'transparent', 'background': 'transparent'}, 'flowchart': {'linkColor': '#333333', 'linkWidth': 2}}}%%
graph TB
subgraph "Enrichment Entry Points"
EP1["from tracer
import enrich_span"]
EP2["from decorators
import enrich_span"]
EP3["from otel
import enrich_span"]
end
subgraph "Unified Implementation"
UI["otel_tracer.enrich_span()
(Main Implementation)"]
subgraph "Pattern Detection Logic"
PD["if context_manager_args:
return context_manager
else:
return direct_call"]
end
end
subgraph "Execution Paths"
CM["Context Manager Pattern
_enrich_span_context_manager()
• Sets span attributes
• Yields context
• Rich experiments"]
DC["Direct Method Call
HoneyHiveTracer.enrich_span()
• Updates HH events
• Returns boolean
• Direct API calls"]
end
subgraph "OpenTelemetry Integration"
SPAN["Span Creation & Attributes"]
OTEL["OpenTelemetry Tracer"]
end
EP1 ==> UI
EP2 ==> UI
EP3 ==> UI
UI ==> PD
PD ==> CM
PD ==> DC
CM ==> SPAN
DC ==> SPAN
SPAN ==> OTEL
classDef entryPoint fill:#01579b,stroke:#ffffff,stroke-width:4px,color:#ffffff
classDef unified fill:#e65100,stroke:#ffffff,stroke-width:4px,color:#ffffff
classDef pattern fill:#4a148c,stroke:#ffffff,stroke-width:4px,color:#ffffff
classDef execution fill:#1b5e20,stroke:#ffffff,stroke-width:4px,color:#ffffff
classDef otel fill:#ad1457,stroke:#ffffff,stroke-width:4px,color:#ffffff
class EP1,EP2,EP3 entryPoint
class UI unified
class PD pattern
class CM,DC execution
class SPAN,OTEL otel
**Key Benefits:**
1. **Single Source of Truth** - All enrichment logic centralized in ``otel_tracer.py``
2. **No Circular Imports** - Clean dependency flow from decorators → otel_tracer
3. **Consistent Behavior** - Same functionality regardless of import path
4. **Pattern Detection** - Automatic detection of usage pattern based on arguments
5. **Full Backwards Compatibility** - All existing code continues to work unchanged
LLM-Specific Tracing Considerations
-----------------------------------
**Token-Level Observability**
Unlike traditional requests, LLM calls have unique characteristics:
.. code-block:: python
# Traditional API call
{
"operation": "database_query",
"duration": 50, # milliseconds
"rows_returned": 25
}
# LLM API call
{
"operation": "llm_completion",
"duration": 1500, # milliseconds
"tokens": {
"prompt": 150,
"completion": 89,
"total": 239
},
"cost_usd": 0.00478,
"model": "gpt-3.5-turbo"
}
**Prompt Engineering Context**
Tracking how different prompts affect outcomes:
.. code-block:: python
from honeyhive.models import EventType
@trace(tracer=tracer, event_type=EventType.tool)
def test_prompt_variations(query: str):
"""Test different prompt strategies."""
prompts = {
"basic": f"Answer: {query}",
"detailed": f"Provide a detailed answer to: {query}",
"step_by_step": f"Think step by step and answer: {query}"
}
results = {}
for strategy, prompt in prompts.items():
with tracer.trace(f"prompt_strategy_{strategy}") as span:
span.set_attribute("prompt.strategy", strategy)
span.set_attribute("prompt.length", len(prompt))
result = llm_call(prompt)
span.set_attribute("response.length", len(result))
span.set_attribute("response.quality_score", evaluate_quality(result))
results[strategy] = result
return results
**Quality and Evaluation Tracking**
Embedding evaluation directly in traces:
.. code-block:: python
@trace(tracer=tracer)
@evaluate(evaluator=quality_evaluator)
def generate_response(prompt: str) -> str:
"""Generate response with automatic quality evaluation."""
response = llm_call(prompt)
# Evaluation results automatically added to span:
# - evaluation.score: 8.5
# - evaluation.feedback: "Clear and helpful response"
# - evaluation.criteria_scores: {...}
return response
Sampling and Performance
------------------------
**Why Sampling Matters**
High-volume applications need intelligent sampling:
.. code-block:: python
# Sampling strategies
# 1. Percentage-based sampling
@trace(tracer=tracer) if random.random() < 0.1 else lambda f: f
def high_volume_function():
pass # Only trace 10% of calls
# 2. Conditional sampling
def should_trace(request):
# Always trace errors
if request.get("error"):
return True
# Always trace premium customers
if request.get("customer_tier") == "premium":
return True
# Sample 1% of regular requests
return random.random() < 0.01
# 3. Adaptive sampling
def adaptive_trace(tracer, request):
current_load = get_system_load()
sample_rate = 0.1 if current_load < 0.7 else 0.01
if random.random() < sample_rate:
return trace(tracer=tracer)
return lambda f: f
**Performance Best Practices**
.. code-block:: python
# Good: Selective attribute collection
@trace(tracer=tracer)
def optimized_function(large_data: dict):
# Don't trace large objects directly
enrich_span({
"data.size_mb": len(str(large_data)) / 1024 / 1024,
"data.keys_count": len(large_data),
"data.type": type(large_data).__name__
})
# Process large_data...
# Bad: Tracing large objects
@trace(tracer=tracer)
def unoptimized_function(large_data: dict):
enrich_span({
"data.full_content": large_data # This could be huge!
})
Trace Analysis Patterns
-----------------------
**Finding Performance Bottlenecks**
.. code-block:: python
# Query traces to find slow operations
slow_traces = tracer.query_traces(
time_range="last_24h",
filter="duration > 5000", # Slower than 5 seconds
group_by="operation_name"
)
for operation, traces in slow_traces.items():
avg_duration = sum(t.duration for t in traces) / len(traces)
print(f"{operation}: {avg_duration}ms average")
**Error Pattern Analysis**
.. code-block:: python
# Find common error patterns
error_traces = tracer.query_traces(
time_range="last_7d",
filter="status = error",
group_by=["error.type", "llm.model"]
)
for (error_type, model), count in error_traces.items():
print(f"Model {model}: {count} {error_type} errors")
**Cost Analysis**
.. code-block:: python
# Track LLM costs over time
cost_data = tracer.query_traces(
time_range="last_30d",
filter="llm.cost_usd > 0",
aggregate=["sum(llm.cost_usd)", "avg(llm.tokens.total)"],
group_by=["llm.model", "date"]
)
Integration with Monitoring Systems
-----------------------------------
**Metrics from Traces**
Convert trace data into monitoring metrics:
.. code-block:: python
# Example: Generate metrics from trace data
def generate_metrics_from_traces():
recent_traces = tracer.get_traces(hours=1)
metrics = {
"llm_requests_total": len(recent_traces),
"llm_requests_by_model": Counter(),
"llm_avg_latency": {},
"llm_error_rate": {},
"llm_cost_per_hour": 0
}
for trace in recent_traces:
model = trace.get_attribute("llm.model")
if model:
metrics["llm_requests_by_model"][model] += 1
# Track latency
if model not in metrics["llm_avg_latency"]:
metrics["llm_avg_latency"][model] = []
metrics["llm_avg_latency"][model].append(trace.duration)
# Track costs
cost = trace.get_attribute("llm.cost_usd", 0)
metrics["llm_cost_per_hour"] += cost
return metrics
**Alerting Integration**
.. code-block:: python
def check_trace_health():
"""Monitor trace data for alerting conditions."""
recent_traces = tracer.get_traces(minutes=15)
# Check error rate
error_rate = sum(1 for t in recent_traces if t.status == "error") / len(recent_traces)
if error_rate > 0.05: # 5% error rate
send_alert(f"High error rate: {error_rate:.2%}")
# Check latency
avg_latency = sum(t.duration for t in recent_traces) / len(recent_traces)
if avg_latency > 5000: # 5 seconds
send_alert(f"High latency: {avg_latency}ms")
# Check cost burn rate
hourly_cost = sum(t.get_attribute("llm.cost_usd", 0) for t in recent_traces) * 4 # 15min → 1hr
if hourly_cost > 10: # $10/hour
send_alert(f"High cost burn rate: ${hourly_cost:.2f}/hour")
Best Practices Summary
----------------------
**1. Start Simple**
- Begin with basic @trace decorators
- Add complexity gradually
- Focus on business-critical operations
**2. Balance Detail with Performance**
- Use sampling for high-volume operations
- Avoid tracing large data objects
- Focus on actionable metrics
**3. Structure Your Traces**
- Use consistent naming conventions
- Add business context with attributes
- Maintain clear span hierarchies
**4. Monitor Your Monitoring**
- Track tracing overhead
- Monitor data volume and costs
- Set up alerting on trace health
**5. Use Traces for Improvement**
- Analyze patterns regularly
- Use data to optimize prompts
- Feed insights back into development
See Also
--------
- :doc:`llm-observability` - LLM-specific observability concepts
- :doc:`../architecture/overview` - Overall system architecture
- :doc:`../../tutorials/01-setup-first-tracer` - Practical tracing tutorial