How to Export Traces
Problem: You need to export trace data from HoneyHive for analysis, backup, or integration with other tools.
Solution: Use the HoneyHive CLI or API to export traces in multiple formats.
Overview
HoneyHive provides multiple ways to export trace data:
CLI Export: Quick command-line exports for ad-hoc analysis
API Export: Programmatic access for automated pipelines
Multiple Formats: JSON, JSONL, CSV, Parquet for different use cases
Flexible Filtering: Time ranges, operations, status filters
When to Export Traces
Common Use Cases:
Data Analysis: Export for Jupyter notebooks, pandas analysis
Backup & Archival: Long-term storage of trace data
Compliance: Audit trail requirements
ML Training: Export traces for model training datasets
Debugging: Detailed offline analysis of specific issues
Cost Analysis: Export for billing and usage analytics
Export Methods
CLI Export (Recommended for Ad-Hoc)
Basic Export:
# Export all traces from last 24 hours
honeyhive export traces traces.jsonl
# Export as CSV
honeyhive export traces traces.csv --format csv
# Export with time range
honeyhive export traces traces.jsonl \
--since "2024-01-20T00:00:00Z" \
--until "2024-01-21T00:00:00Z"
Filtered Exports:
# Export only error traces
honeyhive trace search --query "status:error" --format json > errors.json
# Export specific operations
honeyhive trace search \
--query "operation:llm_call" \
--format jsonl > llm_calls.jsonl
# Export with metadata
honeyhive export traces full_traces.jsonl --include all
Note
CLI Installation Required
Install the HoneyHive CLI: pip install honeyhive[cli]
API Export (Recommended for Automation)
Using Python SDK:
from honeyhive import HoneyHive
import json
from datetime import datetime, timedelta
# Initialize client
client = HoneyHive(api_key="your-api-key")
# Query traces from last 7 days
end_date = datetime.now()
start_date = end_date - timedelta(days=7)
# Get sessions (traces) with filtering
sessions = client.sessions.get_sessions(
project="your-project",
filters={
"start_time": {
"gte": start_date.isoformat(),
"lte": end_date.isoformat()
},
"source": "production"
},
limit=1000 # Adjust as needed
)
# Export to file
with open("traces_export.jsonl", "w") as f:
for session in sessions:
f.write(json.dumps(session.model_dump()) + "\n")
print(f"✅ Exported {len(sessions)} traces")
Paginated Export (Large Datasets):
from honeyhive import HoneyHive
import json
client = HoneyHive(api_key="your-api-key")
def export_all_traces(project: str, output_file: str):
"""Export all traces with pagination."""
page = 0
page_size = 100
total_exported = 0
with open(output_file, "w") as f:
while True:
# Get page of sessions
sessions = client.sessions.get_sessions(
project=project,
offset=page * page_size,
limit=page_size
)
if not sessions:
break # No more data
# Write to file
for session in sessions:
f.write(json.dumps(session.model_dump()) + "\n")
total_exported += 1
print(f"Exported page {page + 1} ({total_exported} traces so far)")
page += 1
print(f"✅ Total exported: {total_exported} traces")
# Run export
export_all_traces("your-project", "all_traces.jsonl")
Export Formats
JSONL (Recommended)
Best for:
Large datasets
Streaming processing
Line-by-line parsing
honeyhive export traces traces.jsonl --format jsonl
Advantages:
One trace per line
Easy to stream/process incrementally
Standard format for data pipelines
JSON
Best for:
Small datasets
Pretty printing
Direct API integration
honeyhive export traces traces.json --format json
Structure:
{
"traces": [
{
"session_id": "session_123",
"start_time": "2024-01-20T10:30:00Z",
"spans": [] // Array of span objects
}
]
}
CSV
Best for:
Excel analysis
Spreadsheet tools
Business intelligence
honeyhive export traces traces.csv --format csv
Note: Complex nested data is flattened or JSON-encoded in CSV format.
Parquet
Best for:
Data lakes
Big data processing
Columnar analytics
honeyhive export traces traces.parquet --format parquet
Advantages:
Efficient compression
Fast columnar queries
Industry standard for analytics
Advanced Export Patterns
Filtered Export by Status
# Export only successful traces
sessions = client.sessions.get_sessions(
project="your-project",
filters={"status": "success"},
limit=1000
)
Export with Span Details
from honeyhive import HoneyHive
import json
client = HoneyHive(api_key="your-api-key")
def export_with_events(project: str, session_id: str):
"""Export session with all events (spans)."""
# Get session details
session = client.sessions.get_session(session_id)
# Get all events for this session
events = client.events.get_events(
project=project,
filters={"session_id": session_id}
)
# Combine data
export_data = {
"session": session.model_dump(),
"events": [event.model_dump() for event in events]
}
with open(f"session_{session_id}.json", "w") as f:
json.dump(export_data, f, indent=2)
return export_data
# Export specific session with all spans
export_with_events("your-project", "session_abc123")
Scheduled Exports
Daily Export Script:
#!/usr/bin/env python3
"""Daily trace export for archival."""
from honeyhive import HoneyHive
import json
from datetime import datetime, timedelta
def daily_export():
client = HoneyHive(api_key="your-api-key")
# Export yesterday's data
yesterday = datetime.now() - timedelta(days=1)
start = yesterday.replace(hour=0, minute=0, second=0)
end = yesterday.replace(hour=23, minute=59, second=59)
sessions = client.sessions.get_sessions(
project="production-app",
filters={
"start_time": {
"gte": start.isoformat(),
"lte": end.isoformat()
}
}
)
# Save to dated file
filename = f"traces_{yesterday.strftime('%Y%m%d')}.jsonl"
with open(filename, "w") as f:
for session in sessions:
f.write(json.dumps(session.model_dump()) + "\n")
print(f"✅ Exported {len(sessions)} traces to {filename}")
if __name__ == "__main__":
daily_export()
Cron Schedule:
# Run daily at 1 AM
0 1 * * * /path/to/venv/bin/python /path/to/daily_export.py
Export Performance Tips
For Large Datasets:
Use Pagination: Process in chunks of 100-1000 traces
Use JSONL: Faster than JSON for large datasets
Filter by Time: Export specific time ranges
Use Compression: Gzip output files for storage
import gzip
import json
# Export with compression
with gzip.open("traces.jsonl.gz", "wt") as f:
for session in sessions:
f.write(json.dumps(session.model_dump()) + "\n")
For Real-Time Export:
import time
from honeyhive import HoneyHive
client = HoneyHive(api_key="your-api-key")
last_export_time = datetime.now()
while True:
# Export new traces every 5 minutes
time.sleep(300)
now = datetime.now()
sessions = client.sessions.get_sessions(
project="your-project",
filters={
"start_time": {"gte": last_export_time.isoformat()}
}
)
# Process new sessions...
last_export_time = now
Troubleshooting
Export Fails with “Too Many Results”:
Use pagination:
# Bad: Trying to get everything at once
sessions = client.sessions.get_sessions(limit=100000) # ❌ Too large
# Good: Use pagination
for page in range(0, 1000, 100):
sessions = client.sessions.get_sessions(offset=page, limit=100)
Missing Span Data:
Ensure you’re exporting both sessions and events:
# Export sessions (traces)
sessions = client.sessions.get_sessions(project="your-project")
# Also export events (spans) for each session
for session in sessions:
events = client.events.get_events(
project="your-project",
filters={"session_id": session.session_id}
)
Slow Exports:
Reduce time range
Use filters to limit results
Export during off-peak hours
Use JSONL instead of JSON
Next Steps
Build Custom Tracing - Advanced tracing patterns
CLI Reference - Complete CLI reference
Key Takeaway: HoneyHive provides flexible export options for any use case - from ad-hoc CLI exports to automated production pipelines. Choose the right format and method based on your needs. ✨