How to Export Traces

Problem: You need to export trace data from HoneyHive for analysis, backup, or integration with other tools.

Solution: Use the HoneyHive CLI or API to export traces in multiple formats.

Overview

HoneyHive provides multiple ways to export trace data:

  • CLI Export: Quick command-line exports for ad-hoc analysis

  • API Export: Programmatic access for automated pipelines

  • Multiple Formats: JSON, JSONL, CSV, Parquet for different use cases

  • Flexible Filtering: Time ranges, operations, status filters

When to Export Traces

Common Use Cases:

  • Data Analysis: Export for Jupyter notebooks, pandas analysis

  • Backup & Archival: Long-term storage of trace data

  • Compliance: Audit trail requirements

  • ML Training: Export traces for model training datasets

  • Debugging: Detailed offline analysis of specific issues

  • Cost Analysis: Export for billing and usage analytics

Export Methods

Export Formats

JSON

Best for:

  • Small datasets

  • Pretty printing

  • Direct API integration

honeyhive export traces traces.json --format json

Structure:

{
  "traces": [
    {
      "session_id": "session_123",
      "start_time": "2024-01-20T10:30:00Z",
      "spans": []  // Array of span objects
    }
  ]
}

CSV

Best for:

  • Excel analysis

  • Spreadsheet tools

  • Business intelligence

honeyhive export traces traces.csv --format csv

Note: Complex nested data is flattened or JSON-encoded in CSV format.

Parquet

Best for:

  • Data lakes

  • Big data processing

  • Columnar analytics

honeyhive export traces traces.parquet --format parquet

Advantages:

  • Efficient compression

  • Fast columnar queries

  • Industry standard for analytics

Advanced Export Patterns

Filtered Export by Status

# Export only successful traces
sessions = client.sessions.get_sessions(
    project="your-project",
    filters={"status": "success"},
    limit=1000
)

Export with Span Details

from honeyhive import HoneyHive
import json

client = HoneyHive(api_key="your-api-key")

def export_with_events(project: str, session_id: str):
    """Export session with all events (spans)."""
    # Get session details
    session = client.sessions.get_session(session_id)

    # Get all events for this session
    events = client.events.get_events(
        project=project,
        filters={"session_id": session_id}
    )

    # Combine data
    export_data = {
        "session": session.model_dump(),
        "events": [event.model_dump() for event in events]
    }

    with open(f"session_{session_id}.json", "w") as f:
        json.dump(export_data, f, indent=2)

    return export_data

# Export specific session with all spans
export_with_events("your-project", "session_abc123")

Scheduled Exports

Daily Export Script:

#!/usr/bin/env python3
"""Daily trace export for archival."""
from honeyhive import HoneyHive
import json
from datetime import datetime, timedelta

def daily_export():
    client = HoneyHive(api_key="your-api-key")

    # Export yesterday's data
    yesterday = datetime.now() - timedelta(days=1)
    start = yesterday.replace(hour=0, minute=0, second=0)
    end = yesterday.replace(hour=23, minute=59, second=59)

    sessions = client.sessions.get_sessions(
        project="production-app",
        filters={
            "start_time": {
                "gte": start.isoformat(),
                "lte": end.isoformat()
            }
        }
    )

    # Save to dated file
    filename = f"traces_{yesterday.strftime('%Y%m%d')}.jsonl"
    with open(filename, "w") as f:
        for session in sessions:
            f.write(json.dumps(session.model_dump()) + "\n")

    print(f"✅ Exported {len(sessions)} traces to {filename}")

if __name__ == "__main__":
    daily_export()

Cron Schedule:

# Run daily at 1 AM
0 1 * * * /path/to/venv/bin/python /path/to/daily_export.py

Export Performance Tips

For Large Datasets:

  1. Use Pagination: Process in chunks of 100-1000 traces

  2. Use JSONL: Faster than JSON for large datasets

  3. Filter by Time: Export specific time ranges

  4. Use Compression: Gzip output files for storage

import gzip
import json

# Export with compression
with gzip.open("traces.jsonl.gz", "wt") as f:
    for session in sessions:
        f.write(json.dumps(session.model_dump()) + "\n")

For Real-Time Export:

import time
from honeyhive import HoneyHive

client = HoneyHive(api_key="your-api-key")
last_export_time = datetime.now()

while True:
    # Export new traces every 5 minutes
    time.sleep(300)

    now = datetime.now()
    sessions = client.sessions.get_sessions(
        project="your-project",
        filters={
            "start_time": {"gte": last_export_time.isoformat()}
        }
    )

    # Process new sessions...
    last_export_time = now

Troubleshooting

Export Fails with “Too Many Results”:

Use pagination:

# Bad: Trying to get everything at once
sessions = client.sessions.get_sessions(limit=100000)  # ❌ Too large

# Good: Use pagination
for page in range(0, 1000, 100):
    sessions = client.sessions.get_sessions(offset=page, limit=100)

Missing Span Data:

Ensure you’re exporting both sessions and events:

# Export sessions (traces)
sessions = client.sessions.get_sessions(project="your-project")

# Also export events (spans) for each session
for session in sessions:
    events = client.events.get_events(
        project="your-project",
        filters={"session_id": session.session_id}
    )

Slow Exports:

  1. Reduce time range

  2. Use filters to limit results

  3. Export during off-peak hours

  4. Use JSONL instead of JSON

Next Steps

Key Takeaway: HoneyHive provides flexible export options for any use case - from ad-hoc CLI exports to automated production pipelines. Choose the right format and method based on your needs. ✨