How to Export Traces ===================== **Problem:** You need to export trace data from HoneyHive for analysis, backup, or integration with other tools. **Solution:** Use the HoneyHive CLI or API to export traces in multiple formats. .. contents:: Table of Contents :local: :depth: 2 Overview -------- HoneyHive provides multiple ways to export trace data: - **CLI Export**: Quick command-line exports for ad-hoc analysis - **API Export**: Programmatic access for automated pipelines - **Multiple Formats**: JSON, JSONL, CSV, Parquet for different use cases - **Flexible Filtering**: Time ranges, operations, status filters When to Export Traces --------------------- **Common Use Cases:** - **Data Analysis**: Export for Jupyter notebooks, pandas analysis - **Backup & Archival**: Long-term storage of trace data - **Compliance**: Audit trail requirements - **ML Training**: Export traces for model training datasets - **Debugging**: Detailed offline analysis of specific issues - **Cost Analysis**: Export for billing and usage analytics Export Methods -------------- CLI Export (Recommended for Ad-Hoc) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Basic Export:** .. code-block:: bash # Export all traces from last 24 hours honeyhive export traces traces.jsonl # Export as CSV honeyhive export traces traces.csv --format csv # Export with time range honeyhive export traces traces.jsonl \ --since "2024-01-20T00:00:00Z" \ --until "2024-01-21T00:00:00Z" **Filtered Exports:** .. code-block:: bash # Export only error traces honeyhive trace search --query "status:error" --format json > errors.json # Export specific operations honeyhive trace search \ --query "operation:llm_call" \ --format jsonl > llm_calls.jsonl # Export with metadata honeyhive export traces full_traces.jsonl --include all .. note:: **CLI Installation Required** Install the HoneyHive CLI: ``pip install honeyhive[cli]`` API Export (Recommended for Automation) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Using Python SDK:** .. code-block:: python from honeyhive import HoneyHive import json from datetime import datetime, timedelta # Initialize client client = HoneyHive(api_key="your-api-key") # Query traces from last 7 days end_date = datetime.now() start_date = end_date - timedelta(days=7) # Get sessions (traces) with filtering sessions = client.sessions.get_sessions( project="your-project", filters={ "start_time": { "gte": start_date.isoformat(), "lte": end_date.isoformat() }, "source": "production" }, limit=1000 # Adjust as needed ) # Export to file with open("traces_export.jsonl", "w") as f: for session in sessions: f.write(json.dumps(session.model_dump()) + "\n") print(f"✅ Exported {len(sessions)} traces") **Paginated Export (Large Datasets):** .. code-block:: python from honeyhive import HoneyHive import json client = HoneyHive(api_key="your-api-key") def export_all_traces(project: str, output_file: str): """Export all traces with pagination.""" page = 0 page_size = 100 total_exported = 0 with open(output_file, "w") as f: while True: # Get page of sessions sessions = client.sessions.get_sessions( project=project, offset=page * page_size, limit=page_size ) if not sessions: break # No more data # Write to file for session in sessions: f.write(json.dumps(session.model_dump()) + "\n") total_exported += 1 print(f"Exported page {page + 1} ({total_exported} traces so far)") page += 1 print(f"✅ Total exported: {total_exported} traces") # Run export export_all_traces("your-project", "all_traces.jsonl") Export Formats -------------- JSONL (Recommended) ~~~~~~~~~~~~~~~~~~~ **Best for:** - Large datasets - Streaming processing - Line-by-line parsing .. code-block:: bash honeyhive export traces traces.jsonl --format jsonl **Advantages:** - One trace per line - Easy to stream/process incrementally - Standard format for data pipelines JSON ~~~~ **Best for:** - Small datasets - Pretty printing - Direct API integration .. code-block:: bash honeyhive export traces traces.json --format json **Structure:** .. code-block:: javascript { "traces": [ { "session_id": "session_123", "start_time": "2024-01-20T10:30:00Z", "spans": [] // Array of span objects } ] } CSV ~~~ **Best for:** - Excel analysis - Spreadsheet tools - Business intelligence .. code-block:: bash honeyhive export traces traces.csv --format csv **Note**: Complex nested data is flattened or JSON-encoded in CSV format. Parquet ~~~~~~~ **Best for:** - Data lakes - Big data processing - Columnar analytics .. code-block:: bash honeyhive export traces traces.parquet --format parquet **Advantages:** - Efficient compression - Fast columnar queries - Industry standard for analytics Advanced Export Patterns ------------------------- Filtered Export by Status ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Export only successful traces sessions = client.sessions.get_sessions( project="your-project", filters={"status": "success"}, limit=1000 ) Export with Span Details ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from honeyhive import HoneyHive import json client = HoneyHive(api_key="your-api-key") def export_with_events(project: str, session_id: str): """Export session with all events (spans).""" # Get session details session = client.sessions.get_session(session_id) # Get all events for this session events = client.events.get_events( project=project, filters={"session_id": session_id} ) # Combine data export_data = { "session": session.model_dump(), "events": [event.model_dump() for event in events] } with open(f"session_{session_id}.json", "w") as f: json.dump(export_data, f, indent=2) return export_data # Export specific session with all spans export_with_events("your-project", "session_abc123") Scheduled Exports ~~~~~~~~~~~~~~~~~ **Daily Export Script:** .. code-block:: python #!/usr/bin/env python3 """Daily trace export for archival.""" from honeyhive import HoneyHive import json from datetime import datetime, timedelta def daily_export(): client = HoneyHive(api_key="your-api-key") # Export yesterday's data yesterday = datetime.now() - timedelta(days=1) start = yesterday.replace(hour=0, minute=0, second=0) end = yesterday.replace(hour=23, minute=59, second=59) sessions = client.sessions.get_sessions( project="production-app", filters={ "start_time": { "gte": start.isoformat(), "lte": end.isoformat() } } ) # Save to dated file filename = f"traces_{yesterday.strftime('%Y%m%d')}.jsonl" with open(filename, "w") as f: for session in sessions: f.write(json.dumps(session.model_dump()) + "\n") print(f"✅ Exported {len(sessions)} traces to {filename}") if __name__ == "__main__": daily_export() **Cron Schedule:** .. code-block:: bash # Run daily at 1 AM 0 1 * * * /path/to/venv/bin/python /path/to/daily_export.py Export Performance Tips ----------------------- **For Large Datasets:** 1. **Use Pagination**: Process in chunks of 100-1000 traces 2. **Use JSONL**: Faster than JSON for large datasets 3. **Filter by Time**: Export specific time ranges 4. **Use Compression**: Gzip output files for storage .. code-block:: python import gzip import json # Export with compression with gzip.open("traces.jsonl.gz", "wt") as f: for session in sessions: f.write(json.dumps(session.model_dump()) + "\n") **For Real-Time Export:** .. code-block:: python import time from honeyhive import HoneyHive client = HoneyHive(api_key="your-api-key") last_export_time = datetime.now() while True: # Export new traces every 5 minutes time.sleep(300) now = datetime.now() sessions = client.sessions.get_sessions( project="your-project", filters={ "start_time": {"gte": last_export_time.isoformat()} } ) # Process new sessions... last_export_time = now Troubleshooting --------------- **Export Fails with "Too Many Results":** Use pagination: .. code-block:: python # Bad: Trying to get everything at once sessions = client.sessions.get_sessions(limit=100000) # ❌ Too large # Good: Use pagination for page in range(0, 1000, 100): sessions = client.sessions.get_sessions(offset=page, limit=100) **Missing Span Data:** Ensure you're exporting both sessions and events: .. code-block:: python # Export sessions (traces) sessions = client.sessions.get_sessions(project="your-project") # Also export events (spans) for each session for session in sessions: events = client.events.get_events( project="your-project", filters={"session_id": session.session_id} ) **Slow Exports:** 1. Reduce time range 2. Use filters to limit results 3. Export during off-peak hours 4. Use JSONL instead of JSON Next Steps ---------- - :doc:`../advanced-tracing/index` - Advanced tracing patterns - :doc:`/reference/cli/index` - Complete CLI reference **Key Takeaway:** HoneyHive provides flexible export options for any use case - from ad-hoc CLI exports to automated production pipelines. Choose the right format and method based on your needs. ✨