HoneyHive Python SDK Documentation

LLM Observability and Evaluation Platform

The HoneyHive Python SDK provides comprehensive observability, tracing, and evaluation capabilities for LLM applications with OpenTelemetry integration and a “Bring Your Own Instrumentor” architecture.

Note

Project Configuration: The project parameter is required when initializing the tracer. This identifies which HoneyHive project your traces belong to and must match your project name in the HoneyHive dashboard.

🚀 Quick Start

New to HoneyHive? Start here:

📚 Documentation Structure

Documentation Sections:

📖 Tutorials

Step-by-step guides that take you through building complete examples. Perfect for learning by doing.

→ Quick Start

🛠️ How-to Guides

Practical guides for solving specific problems. Jump straight to solutions for your use case.

→ Troubleshooting

📋 Reference

Comprehensive API documentation. Look up exact parameters, return values, and technical specifications.

→ API Reference

💡 Explanation

Conceptual guides explaining why HoneyHive works the way it does. Understand the design and architecture.

→ BYOI Design

📝 Changelog

Release history, version notes, and upgrade guides. Stay updated with latest changes.

→ Latest Release

🔧 SDK Development

For contributors and maintainers working on the SDK itself. Testing practices and development standards.

→ SDK Testing

🔄 Key Features

Bring Your Own Instrumentor (BYOI) Architecture

Avoid dependency conflicts by choosing exactly which LLM libraries to instrument. Supports multiple instrumentor providers:

  • OpenInference

  • Traceloop

  • Build your own custom instrumentors

Multi-Instance Tracer Support

Create independent tracer instances for different environments, workflows, or services within the same application.

Zero Code Changes for LLM Tracing

Add comprehensive observability to existing LLM provider code without modifications:

  • OpenAI

  • Anthropic

  • Google AI

Production-Ready Evaluation

Built-in and custom evaluators with threading support for high-performance LLM evaluation workflows.

OpenTelemetry Native

Built on industry-standard OpenTelemetry for maximum compatibility and future-proofing.

📖 Getting Started Path

👋 New to HoneyHive?

  1. Set Up Your First Tracer - Set up your first tracer in minutes

  2. Add LLM Tracing in 5 Minutes - Add LLM tracing to existing apps

  3. Enable Span Enrichment - Enrich traces with metadata

  4. Configure Multi-Instance Tracers - Configure multiple tracers

🔧 Solving Specific Problems?

📚 Need Technical Details?

🤔 Want to Understand the Design?

🔗 Main Documentation Sections

📦 Installation

# Core SDK only (minimal dependencies)
pip install honeyhive

# With LLM provider support (recommended)
pip install honeyhive[openinference-openai]      # OpenAI via OpenInference
pip install honeyhive[openinference-anthropic]   # Anthropic via OpenInference
pip install honeyhive[all-openinference]         # All OpenInference integrations

🔧 Quick Example

from honeyhive import HoneyHiveTracer, trace
from openinference.instrumentation.openai import OpenAIInstrumentor
import openai

# Initialize with BYOI architecture
tracer = HoneyHiveTracer.init(
    api_key="your-api-key",
    project="your-project"
)

# Initialize instrumentor separately (correct pattern)
instrumentor = OpenAIInstrumentor()
instrumentor.instrument(tracer_provider=tracer.provider)

# Use @trace for custom functions
@trace(tracer=tracer)
def analyze_sentiment(text: str) -> str:
    # OpenAI calls automatically traced via instrumentor
    client = openai.OpenAI()
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": f"Analyze sentiment: {text}"}]
    )
    return response.choices[0].message.content

# Both the function and the OpenAI call are traced!
result = analyze_sentiment("I love this new feature!")
from honeyhive import HoneyHiveTracer, trace, evaluate
from honeyhive.models import EventType
from honeyhive.evaluation import QualityScoreEvaluator
from openinference.instrumentation.openai import OpenAIInstrumentor
import openai

tracer = HoneyHiveTracer.init(
    api_key="your-api-key",
    project="your-project"
)

# Initialize instrumentor separately (correct pattern)
instrumentor = OpenAIInstrumentor()
instrumentor.instrument(tracer_provider=tracer.provider)

# Add automatic evaluation
quality_evaluator = QualityScoreEvaluator(criteria=["relevance", "clarity"])

@trace(tracer=tracer, event_type=EventType.model)
@evaluate(evaluator=quality_evaluator)
def handle_customer_query(query: str) -> str:
    client = openai.OpenAI()
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful customer service agent."},
            {"role": "user", "content": query}
        ]
    )
    return response.choices[0].message.content

# Automatically traced AND evaluated for quality
result = handle_customer_query("How do I reset my password?")
from honeyhive import HoneyHiveTracer, trace
from openinference.instrumentation.openai import OpenAIInstrumentor
from openinference.instrumentation.anthropic import AnthropicInstrumentor
import openai
import anthropic

# Multi-provider setup with BYOI
tracer = HoneyHiveTracer.init(
    api_key="your-api-key",
    project="your-project"
)

# Initialize instrumentors separately (correct pattern)
openai_instrumentor = OpenAIInstrumentor()
anthropic_instrumentor = AnthropicInstrumentor()

openai_instrumentor.instrument(tracer_provider=tracer.provider)
anthropic_instrumentor.instrument(tracer_provider=tracer.provider)

@trace(tracer=tracer, event_type=EventType.chain)
def compare_responses(prompt: str) -> dict:
    # Both calls automatically traced with provider context
    openai_client = openai.OpenAI()
    anthropic_client = anthropic.Anthropic()

    openai_response = openai_client.chat.completions.create(
        model="gpt-4", messages=[{"role": "user", "content": prompt}]
    )

    anthropic_response = anthropic_client.messages.create(
        model="claude-3-sonnet-20240229", max_tokens=100,
        messages=[{"role": "user", "content": prompt}]
    )

    return {
        "openai": openai_response.choices[0].message.content,
        "anthropic": anthropic_response.content[0].text
    }

result = compare_responses("Explain quantum computing simply")

🆘 Need Help?

📈 What’s New in This Version

  • 🔄 Major Architectural Refactor: Multi-instance tracer support

  • 📦 BYOI Architecture: Bring Your Own Instrumentor for dependency freedom

  • ⚡ Enhanced Performance: Optimized for production workloads

  • 🔧 Improved Developer Experience: Simplified APIs with powerful capabilities

  • 📊 Advanced Evaluation: Threading support for high-performance evaluation

📝 Release History: See Changelog for complete version history and upgrade notes

🔗 External Links

Indices and Tables