Tracing Fundamentals
Note
This document explains the fundamental concepts of distributed tracing and how they apply to LLM applications.
See also
HoneyHive Tracer Architecture
For a deep dive into how the HoneyHive SDK implements these concepts with a modular, mixin-based architecture, see Tracer Architecture Overview.
What is Distributed Tracing?
Distributed tracing is a method for tracking requests as they flow through complex systems. It provides:
End-to-end visibility into request execution
Performance insights at each step
Error correlation across system boundaries
Context propagation between services
Traditional Web Application Tracing:
User Request → Load Balancer → Web Server → Database → Response
[-------------- Single Trace --------------]
LLM Application Tracing:
User Query → Preprocessing → LLM Call → Post-processing → Response
[-------------- Enhanced with AI Context --------------]
Core Tracing Concepts
Traces
A trace represents a complete request journey:
# Example trace hierarchy
customer_support_request # Root span
├── validate_input # Child span
├── classify_query # Child span
├── llm_completion # Child span
│ ├── prompt_preparation
│ └── api_call
└── format_response # Child span
Spans
Individual operations within a trace:
# Each span contains:
{
"span_id": "abc123",
"trace_id": "xyz789",
"parent_id": "parent456",
"operation_name": "llm_completion",
"start_time": "2024-01-15T10:30:00Z",
"end_time": "2024-01-15T10:30:02Z",
"duration": 2000, # milliseconds
"attributes": {
"llm.model": "gpt-4",
"llm.tokens.input": 45,
"llm.tokens.output": 67
},
"status": "ok"
}
Attributes
Key-value metadata attached to spans:
# Standard attributes
"http.method": "POST"
"http.status_code": 200
# LLM-specific attributes
"llm.model": "gpt-3.5-turbo"
"llm.temperature": 0.7
"llm.tokens.prompt": 150
"llm.tokens.completion": 89
# Business attributes
"customer.id": "cust_123"
"support.priority": "high"
Context Propagation
How trace context flows between operations:
def parent_function():
with tracer.trace("parent_operation") as span:
span.set_attribute("operation.type", "parent")
child_function() # Automatically inherits context
def child_function():
with tracer.trace("child_operation") as span:
span.set_attribute("operation.type", "child")
# This span is automatically a child of parent_operation
Unified Enrichment Architecture
The HoneyHive SDK provides a unified approach to span and session enrichment through a carefully designed architecture that supports multiple usage patterns while maintaining backwards compatibility:
%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#4F81BD', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#333333', 'lineColor': '#333333', 'mainBkg': 'transparent', 'secondBkg': 'transparent', 'tertiaryColor': 'transparent', 'clusterBkg': 'transparent', 'clusterBorder': '#333333', 'edgeLabelBackground': 'transparent', 'background': 'transparent'}, 'flowchart': {'linkColor': '#333333', 'linkWidth': 2}}}%%
graph TB
subgraph "Enrichment Entry Points"
EP1["from tracer<br/>import enrich_span"]
EP2["from decorators<br/>import enrich_span"]
EP3["from otel<br/>import enrich_span"]
end
subgraph "Unified Implementation"
UI["otel_tracer.enrich_span()<br/>(Main Implementation)"]
subgraph "Pattern Detection Logic"
PD["if context_manager_args:<br/>return context_manager<br/>else:<br/>return direct_call"]
end
end
subgraph "Execution Paths"
CM["Context Manager Pattern<br/>_enrich_span_context_manager()<br/>• Sets span attributes<br/>• Yields context<br/>• Rich experiments"]
DC["Direct Method Call<br/>HoneyHiveTracer.enrich_span()<br/>• Updates HH events<br/>• Returns boolean<br/>• Direct API calls"]
end
subgraph "OpenTelemetry Integration"
SPAN["Span Creation & Attributes"]
OTEL["OpenTelemetry Tracer"]
end
EP1 ==> UI
EP2 ==> UI
EP3 ==> UI
UI ==> PD
PD ==> CM
PD ==> DC
CM ==> SPAN
DC ==> SPAN
SPAN ==> OTEL
classDef entryPoint fill:#01579b,stroke:#ffffff,stroke-width:4px,color:#ffffff
classDef unified fill:#e65100,stroke:#ffffff,stroke-width:4px,color:#ffffff
classDef pattern fill:#4a148c,stroke:#ffffff,stroke-width:4px,color:#ffffff
classDef execution fill:#1b5e20,stroke:#ffffff,stroke-width:4px,color:#ffffff
classDef otel fill:#ad1457,stroke:#ffffff,stroke-width:4px,color:#ffffff
class EP1,EP2,EP3 entryPoint
class UI unified
class PD pattern
class CM,DC execution
class SPAN,OTEL otel
Key Benefits:
Single Source of Truth - All enrichment logic centralized in
otel_tracer.pyNo Circular Imports - Clean dependency flow from decorators → otel_tracer
Consistent Behavior - Same functionality regardless of import path
Pattern Detection - Automatic detection of usage pattern based on arguments
Full Backwards Compatibility - All existing code continues to work unchanged
LLM-Specific Tracing Considerations
Token-Level Observability
Unlike traditional requests, LLM calls have unique characteristics:
# Traditional API call
{
"operation": "database_query",
"duration": 50, # milliseconds
"rows_returned": 25
}
# LLM API call
{
"operation": "llm_completion",
"duration": 1500, # milliseconds
"tokens": {
"prompt": 150,
"completion": 89,
"total": 239
},
"cost_usd": 0.00478,
"model": "gpt-3.5-turbo"
}
Prompt Engineering Context
Tracking how different prompts affect outcomes:
from honeyhive.models import EventType
@trace(tracer=tracer, event_type=EventType.tool)
def test_prompt_variations(query: str):
"""Test different prompt strategies."""
prompts = {
"basic": f"Answer: {query}",
"detailed": f"Provide a detailed answer to: {query}",
"step_by_step": f"Think step by step and answer: {query}"
}
results = {}
for strategy, prompt in prompts.items():
with tracer.trace(f"prompt_strategy_{strategy}") as span:
span.set_attribute("prompt.strategy", strategy)
span.set_attribute("prompt.length", len(prompt))
result = llm_call(prompt)
span.set_attribute("response.length", len(result))
span.set_attribute("response.quality_score", evaluate_quality(result))
results[strategy] = result
return results
Quality and Evaluation Tracking
Embedding evaluation directly in traces:
@trace(tracer=tracer)
@evaluate(evaluator=quality_evaluator)
def generate_response(prompt: str) -> str:
"""Generate response with automatic quality evaluation."""
response = llm_call(prompt)
# Evaluation results automatically added to span:
# - evaluation.score: 8.5
# - evaluation.feedback: "Clear and helpful response"
# - evaluation.criteria_scores: {...}
return response
Sampling and Performance
Why Sampling Matters
High-volume applications need intelligent sampling:
# Sampling strategies
# 1. Percentage-based sampling
@trace(tracer=tracer) if random.random() < 0.1 else lambda f: f
def high_volume_function():
pass # Only trace 10% of calls
# 2. Conditional sampling
def should_trace(request):
# Always trace errors
if request.get("error"):
return True
# Always trace premium customers
if request.get("customer_tier") == "premium":
return True
# Sample 1% of regular requests
return random.random() < 0.01
# 3. Adaptive sampling
def adaptive_trace(tracer, request):
current_load = get_system_load()
sample_rate = 0.1 if current_load < 0.7 else 0.01
if random.random() < sample_rate:
return trace(tracer=tracer)
return lambda f: f
Performance Best Practices
# Good: Selective attribute collection
@trace(tracer=tracer)
def optimized_function(large_data: dict):
# Don't trace large objects directly
enrich_span({
"data.size_mb": len(str(large_data)) / 1024 / 1024,
"data.keys_count": len(large_data),
"data.type": type(large_data).__name__
})
# Process large_data...
# Bad: Tracing large objects
@trace(tracer=tracer)
def unoptimized_function(large_data: dict):
enrich_span({
"data.full_content": large_data # This could be huge!
})
Trace Analysis Patterns
Finding Performance Bottlenecks
# Query traces to find slow operations
slow_traces = tracer.query_traces(
time_range="last_24h",
filter="duration > 5000", # Slower than 5 seconds
group_by="operation_name"
)
for operation, traces in slow_traces.items():
avg_duration = sum(t.duration for t in traces) / len(traces)
print(f"{operation}: {avg_duration}ms average")
Error Pattern Analysis
# Find common error patterns
error_traces = tracer.query_traces(
time_range="last_7d",
filter="status = error",
group_by=["error.type", "llm.model"]
)
for (error_type, model), count in error_traces.items():
print(f"Model {model}: {count} {error_type} errors")
Cost Analysis
# Track LLM costs over time
cost_data = tracer.query_traces(
time_range="last_30d",
filter="llm.cost_usd > 0",
aggregate=["sum(llm.cost_usd)", "avg(llm.tokens.total)"],
group_by=["llm.model", "date"]
)
Integration with Monitoring Systems
Metrics from Traces
Convert trace data into monitoring metrics:
# Example: Generate metrics from trace data
def generate_metrics_from_traces():
recent_traces = tracer.get_traces(hours=1)
metrics = {
"llm_requests_total": len(recent_traces),
"llm_requests_by_model": Counter(),
"llm_avg_latency": {},
"llm_error_rate": {},
"llm_cost_per_hour": 0
}
for trace in recent_traces:
model = trace.get_attribute("llm.model")
if model:
metrics["llm_requests_by_model"][model] += 1
# Track latency
if model not in metrics["llm_avg_latency"]:
metrics["llm_avg_latency"][model] = []
metrics["llm_avg_latency"][model].append(trace.duration)
# Track costs
cost = trace.get_attribute("llm.cost_usd", 0)
metrics["llm_cost_per_hour"] += cost
return metrics
Alerting Integration
def check_trace_health():
"""Monitor trace data for alerting conditions."""
recent_traces = tracer.get_traces(minutes=15)
# Check error rate
error_rate = sum(1 for t in recent_traces if t.status == "error") / len(recent_traces)
if error_rate > 0.05: # 5% error rate
send_alert(f"High error rate: {error_rate:.2%}")
# Check latency
avg_latency = sum(t.duration for t in recent_traces) / len(recent_traces)
if avg_latency > 5000: # 5 seconds
send_alert(f"High latency: {avg_latency}ms")
# Check cost burn rate
hourly_cost = sum(t.get_attribute("llm.cost_usd", 0) for t in recent_traces) * 4 # 15min → 1hr
if hourly_cost > 10: # $10/hour
send_alert(f"High cost burn rate: ${hourly_cost:.2f}/hour")
Best Practices Summary
1. Start Simple - Begin with basic @trace decorators - Add complexity gradually - Focus on business-critical operations
2. Balance Detail with Performance - Use sampling for high-volume operations - Avoid tracing large data objects - Focus on actionable metrics
3. Structure Your Traces - Use consistent naming conventions - Add business context with attributes - Maintain clear span hierarchies
4. Monitor Your Monitoring - Track tracing overhead - Monitor data volume and costs - Set up alerting on trace health
5. Use Traces for Improvement - Analyze patterns regularly - Use data to optimize prompts - Feed insights back into development
See Also
LLM Observability Concepts - LLM-specific observability concepts
Architecture Overview - Overall system architecture
Set Up Your First Tracer - Practical tracing tutorial