How to Export Traces

Problem: You need to export trace data from HoneyHive for analysis, backup, or integration with other tools.

Solution: Use the HoneyHive CLI or API to export traces in multiple formats.

Overview 

HoneyHive provides multiple ways to export trace data:

CLI Export: Quick command-line exports for ad-hoc analysis
API Export: Programmatic access for automated pipelines
Multiple Formats: JSON, JSONL, CSV, Parquet for different use cases
Flexible Filtering: Time ranges, operations, status filters

When to Export Traces 

Common Use Cases:

Data Analysis: Export for Jupyter notebooks, pandas analysis
Backup & Archival: Long-term storage of trace data
Compliance: Audit trail requirements
ML Training: Export traces for model training datasets
Debugging: Detailed offline analysis of specific issues
Cost Analysis: Export for billing and usage analytics

Export Methods 

CLI Export (Recommended for Ad-Hoc)

Basic Export:

# Export all traces from last 24 hours
honeyhive export traces traces.jsonl

# Export as CSV
honeyhive export traces traces.csv --format csv

# Export with time range
honeyhive export traces traces.jsonl \
  --since "2024-01-20T00:00:00Z" \
  --until "2024-01-21T00:00:00Z"

Filtered Exports:

# Export only error traces
honeyhive trace search --query "status:error" --format json > errors.json

# Export specific operations
honeyhive trace search \
  --query "operation:llm_call" \
  --format jsonl > llm_calls.jsonl

# Export with metadata
honeyhive export traces full_traces.jsonl --include all

Note

CLI Installation Required

Install the HoneyHive CLI: pip install honeyhive[cli]

API Export (Recommended for Automation)

Using Python SDK:

from honeyhive import HoneyHive
import json
from datetime import datetime, timedelta

# Initialize client
client = HoneyHive(api_key="your-api-key")

# Query traces from last 7 days
end_date = datetime.now()
start_date = end_date - timedelta(days=7)

# Get sessions (traces) with filtering
sessions = client.sessions.get_sessions(
    project="your-project",
    filters={
        "start_time": {
            "gte": start_date.isoformat(),
            "lte": end_date.isoformat()
        },
        "source": "production"
    },
    limit=1000  # Adjust as needed
)

# Export to file
with open("traces_export.jsonl", "w") as f:
    for session in sessions:
        f.write(json.dumps(session.model_dump()) + "\n")

print(f"✅ Exported {len(sessions)} traces")

Paginated Export (Large Datasets):

from honeyhive import HoneyHive
import json

client = HoneyHive(api_key="your-api-key")

def export_all_traces(project: str, output_file: str):
    """Export all traces with pagination."""
    page = 0
    page_size = 100
    total_exported = 0

    with open(output_file, "w") as f:
        while True:
            # Get page of sessions
            sessions = client.sessions.get_sessions(
                project=project,
                offset=page * page_size,
                limit=page_size
            )

            if not sessions:
                break  # No more data

            # Write to file
            for session in sessions:
                f.write(json.dumps(session.model_dump()) + "\n")
                total_exported += 1

            print(f"Exported page {page + 1} ({total_exported} traces so far)")
            page += 1

    print(f"✅ Total exported: {total_exported} traces")

# Run export
export_all_traces("your-project", "all_traces.jsonl")

Export Formats 

JSONL (Recommended)

Best for:

Large datasets
Streaming processing
Line-by-line parsing

honeyhive export traces traces.jsonl --format jsonl

Advantages:

One trace per line
Easy to stream/process incrementally
Standard format for data pipelines

JSON 

Best for:

Small datasets
Pretty printing
Direct API integration

honeyhive export traces traces.json --format json

Structure:

{
  "traces": [
    {
      "session_id": "session_123",
      "start_time": "2024-01-20T10:30:00Z",
      "spans": []  // Array of span objects
    }
  ]
}

CSV 

Best for:

Excel analysis
Spreadsheet tools
Business intelligence

honeyhive export traces traces.csv --format csv

Note: Complex nested data is flattened or JSON-encoded in CSV format.

Parquet 

Best for:

Data lakes
Big data processing
Columnar analytics

honeyhive export traces traces.parquet --format parquet

Advantages:

Efficient compression
Fast columnar queries
Industry standard for analytics

Advanced Export Patterns 

Filtered Export by Status 

# Export only successful traces
sessions = client.sessions.get_sessions(
    project="your-project",
    filters={"status": "success"},
    limit=1000
)

Export with Span Details 

from honeyhive import HoneyHive
import json

client = HoneyHive(api_key="your-api-key")

def export_with_events(project: str, session_id: str):
    """Export session with all events (spans)."""
    # Get session details
    session = client.sessions.get_session(session_id)

    # Get all events for this session
    events = client.events.get_events(
        project=project,
        filters={"session_id": session_id}
    )

    # Combine data
    export_data = {
        "session": session.model_dump(),
        "events": [event.model_dump() for event in events]
    }

    with open(f"session_{session_id}.json", "w") as f:
        json.dump(export_data, f, indent=2)

    return export_data

# Export specific session with all spans
export_with_events("your-project", "session_abc123")

Scheduled Exports 

Daily Export Script:

#!/usr/bin/env python3
"""Daily trace export for archival."""
from honeyhive import HoneyHive
import json
from datetime import datetime, timedelta

def daily_export():
    client = HoneyHive(api_key="your-api-key")

    # Export yesterday's data
    yesterday = datetime.now() - timedelta(days=1)
    start = yesterday.replace(hour=0, minute=0, second=0)
    end = yesterday.replace(hour=23, minute=59, second=59)

    sessions = client.sessions.get_sessions(
        project="production-app",
        filters={
            "start_time": {
                "gte": start.isoformat(),
                "lte": end.isoformat()
            }
        }
    )

    # Save to dated file
    filename = f"traces_{yesterday.strftime('%Y%m%d')}.jsonl"
    with open(filename, "w") as f:
        for session in sessions:
            f.write(json.dumps(session.model_dump()) + "\n")

    print(f"✅ Exported {len(sessions)} traces to {filename}")

if __name__ == "__main__":
    daily_export()

Cron Schedule:

# Run daily at 1 AM
0 1 * * * /path/to/venv/bin/python /path/to/daily_export.py

Export Performance Tips 

For Large Datasets:

Use Pagination: Process in chunks of 100-1000 traces
Use JSONL: Faster than JSON for large datasets
Filter by Time: Export specific time ranges
Use Compression: Gzip output files for storage

import gzip
import json

# Export with compression
with gzip.open("traces.jsonl.gz", "wt") as f:
    for session in sessions:
        f.write(json.dumps(session.model_dump()) + "\n")

For Real-Time Export:

import time
from honeyhive import HoneyHive

client = HoneyHive(api_key="your-api-key")
last_export_time = datetime.now()

while True:
    # Export new traces every 5 minutes
    time.sleep(300)

    now = datetime.now()
    sessions = client.sessions.get_sessions(
        project="your-project",
        filters={
            "start_time": {"gte": last_export_time.isoformat()}
        }
    )

    # Process new sessions...
    last_export_time = now

Troubleshooting 

Export Fails with “Too Many Results”:

Use pagination:

# Bad: Trying to get everything at once
sessions = client.sessions.get_sessions(limit=100000)  # ❌ Too large

# Good: Use pagination
for page in range(0, 1000, 100):
    sessions = client.sessions.get_sessions(offset=page, limit=100)

Missing Span Data:

Ensure you’re exporting both sessions and events:

# Export sessions (traces)
sessions = client.sessions.get_sessions(project="your-project")

# Also export events (spans) for each session
for session in sessions:
    events = client.events.get_events(
        project="your-project",
        filters={"session_id": session.session_id}
    )

Slow Exports:

Reduce time range
Use filters to limit results
Export during off-peak hours
Use JSONL instead of JSON

Next Steps 

Build Custom Tracing - Advanced tracing patterns
CLI Reference - Complete CLI reference

Key Takeaway: HoneyHive provides flexible export options for any use case - from ad-hoc CLI exports to automated production pipelines. Choose the right format and method based on your needs. ✨