Testing Applications with HoneyHive =================================== **Problem:** You need to test your LLM application with HoneyHive tracing enabled, write unit tests for traced functions, and verify that traces are captured correctly without relying on mocks. **Solution:** Use pytest with real HoneyHive tracers in test mode, validate trace outputs programmatically, and follow testing best practices for LLM applications. .. contents:: Quick Navigation :local: :depth: 2 Testing Philosophy ------------------ **Key Principles:** 1. **Test with Real Tracers**: Don't mock HoneyHive - test with actual tracing 2. **Validate Trace Structure**: Ensure spans contain expected attributes 3. **Separate Test Projects**: Use dedicated test projects in HoneyHive 4. **Fixture-Based Setup**: Reusable tracer fixtures for consistency **Why Test with Real Tracing?** - ✅ Catches integration issues early - ✅ Validates span enrichment logic - ✅ Ensures production-like behavior - ❌ Mocking hides real-world failures Setup for Testing ----------------- Test Environment Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash # .env.test file HH_API_KEY=hh_test_your_test_api_key HH_PROJECT=test-project HH_SOURCE=pytest # Use separate API key and project for testing # DO NOT use production credentials in tests Pytest Configuration ~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # conftest.py - Shared test fixtures import pytest import os from honeyhive import HoneyHiveTracer from dotenv import load_dotenv # Load test environment load_dotenv('.env.test') @pytest.fixture(scope="session") def test_tracer(): """Provide a HoneyHive tracer for testing.""" tracer = HoneyHiveTracer.init( api_key=os.getenv("HH_API_KEY"), project=os.getenv("HH_PROJECT", "test-project"), source="pytest" ) yield tracer # Cleanup after all tests # HoneyHive automatically flushes on process exit @pytest.fixture def clean_tracer(): """Provide a fresh tracer for each test.""" tracer = HoneyHiveTracer.init( api_key=os.getenv("HH_API_KEY"), project=f"test-{pytest.current_test_name}", source="pytest" ) yield tracer # Test-specific cleanup if needed Unit Testing Traced Functions ----------------------------- Basic Function Testing ~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # test_traced_functions.py from honeyhive import trace, enrich_span from honeyhive.models import EventType import pytest # Function under test @trace(event_type=EventType.tool) def process_data(data: dict) -> dict: """Process data with tracing.""" enrich_span({ "input.size": len(data), "process.type": "transformation" }) result = {"processed": True, **data} enrich_span({"output.size": len(result)}) return result # Test the function def test_process_data(test_tracer): """Test data processing with real tracing.""" # Arrange input_data = {"key": "value", "count": 10} # Act result = process_data(input_data) # Assert assert result["processed"] is True assert result["key"] == "value" assert result["count"] == 10 # Trace is captured automatically in test project Testing with Span Validation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from opentelemetry import trace as otel_trace from opentelemetry.sdk.trace import ReadableSpan from opentelemetry.sdk.trace.export import SimpleSpanProcessor from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter @pytest.fixture def span_capture(test_tracer): """Capture spans for validation in tests.""" exporter = InMemorySpanExporter() processor = SimpleSpanProcessor(exporter) test_tracer.provider.add_span_processor(processor) yield exporter exporter.clear() def test_span_enrichment(test_tracer, span_capture): """Validate that span enrichment works correctly.""" # Act result = process_data({"key": "value"}) # Assert spans = span_capture.get_finished_spans() assert len(spans) > 0 span = spans[0] attributes = dict(span.attributes) # Validate expected attributes assert attributes.get("input.size") == 1 assert attributes.get("process.type") == "transformation" assert attributes.get("output.size") == 2 Testing Error Handling ~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python @trace(event_type=EventType.tool) def risky_operation(value: int) -> int: """Operation that may fail.""" enrich_span({"input.value": value}) if value < 0: enrich_span({"error.type": "ValueError"}) raise ValueError("Value must be non-negative") result = value * 2 enrich_span({"output.value": result}) return result def test_risky_operation_success(test_tracer): """Test successful execution.""" result = risky_operation(5) assert result == 10 def test_risky_operation_failure(test_tracer, span_capture): """Test error handling with trace validation.""" with pytest.raises(ValueError, match="Value must be non-negative"): risky_operation(-1) # Validate error was captured in span spans = span_capture.get_finished_spans() assert len(spans) > 0 span = spans[0] attributes = dict(span.attributes) assert attributes.get("error.type") == "ValueError" Integration Testing ------------------- Testing LLM Workflows ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # test_llm_workflow.py from honeyhive import HoneyHiveTracer, trace from honeyhive.models import EventType import openai import pytest @trace(event_type=EventType.chain) def llm_workflow(query: str) -> str: """Complete LLM workflow.""" from honeyhive import enrich_span enrich_span({"workflow.query": query, "workflow.type": "rag"}) # Step 1: Retrieve context context = retrieve_context(query) # Step 2: Generate response response = generate_response(query, context) enrich_span({"workflow.success": True}) return response @trace(event_type=EventType.tool) def retrieve_context(query: str) -> list: """Retrieve relevant context.""" from honeyhive import enrich_span enrich_span({"retrieval.query": query}) # Mock retrieval for testing context = ["doc1", "doc2"] enrich_span({"retrieval.found": len(context)}) return context @trace(event_type=EventType.model) def generate_response(query: str, context: list) -> str: """Generate LLM response.""" from honeyhive import enrich_span enrich_span({ "llm.provider": "openai", "llm.model": "gpt-4", "llm.context_size": len(context) }) # For testing, use a mock or test-safe LLM call response = f"Response to: {query} (with {len(context)} docs)" enrich_span({"llm.response_length": len(response)}) return response def test_llm_workflow_integration(test_tracer): """Test complete LLM workflow with tracing.""" query = "What is machine learning?" result = llm_workflow(query) assert "Response to:" in result assert "machine learning" in result # Trace automatically captured with 3 spans (chain + tool + model) Testing Multi-Provider Scenarios ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python @trace(event_type=EventType.chain) def multi_provider_call(prompt: str) -> str: """Try multiple LLM providers with fallback.""" from honeyhive import enrich_span providers = ["openai", "anthropic"] enrich_span({"providers.available": len(providers)}) for i, provider in enumerate(providers): try: result = call_provider(provider, prompt) enrich_span({ "providers.used": provider, "providers.attempts": i + 1 }) return result except Exception as e: enrich_span({f"providers.{provider}_failed": str(e)}) if i == len(providers) - 1: raise return "" @trace(event_type=EventType.model) def call_provider(provider: str, prompt: str) -> str: """Call specific LLM provider.""" from honeyhive import enrich_span enrich_span({"provider.name": provider, "provider.prompt_length": len(prompt)}) # Mock for testing if provider == "openai": return "OpenAI response" elif provider == "anthropic": return "Anthropic response" else: raise ValueError(f"Unknown provider: {provider}") def test_multi_provider_fallback(test_tracer): """Test provider fallback logic.""" result = multi_provider_call("Test prompt") assert result in ["OpenAI response", "Anthropic response"] Evaluation Testing ------------------ Testing with Evaluation Metrics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # test_evaluation.py from honeyhive import HoneyHiveTracer import pytest def test_llm_output_quality(test_tracer): """Test LLM output meets quality thresholds.""" query = "Explain Python decorators" response = generate_response(query, []) # Quality checks assert len(response) > 50, "Response too short" assert "decorator" in response.lower(), "Key term missing" assert not any(word in response.lower() for word in ["sorry", "cannot", "unable"]), \ "Negative response detected" # Trace captured automatically for review in HoneyHive dashboard def test_latency_requirements(test_tracer): """Test that operations meet latency requirements.""" import time start = time.time() result = llm_workflow("Simple query") duration = time.time() - start assert duration < 5.0, f"Operation took {duration:.2f}s, expected < 5s" assert result is not None For comprehensive evaluation testing, see :doc:`evaluation/index`. Best Practices -------------- **1. Use Separate Test Projects** .. code-block:: python # ✅ Good: Dedicated test project @pytest.fixture def test_tracer(): return HoneyHiveTracer.init( api_key=os.getenv("HH_TEST_API_KEY"), project="test-project", # Separate from production source="pytest" ) # ❌ Bad: Using production project # project="production-app" # DON'T do this **2. Clean Fixture Management** .. code-block:: python # conftest.py @pytest.fixture(scope="session") def session_tracer(): """One tracer for entire test session.""" tracer = HoneyHiveTracer.init( api_key=os.getenv("HH_TEST_API_KEY"), project="test-project", source="pytest-session" ) yield tracer @pytest.fixture def function_tracer(): """Fresh tracer for each test function.""" tracer = HoneyHiveTracer.init( api_key=os.getenv("HH_TEST_API_KEY"), project=f"test-{pytest.current_test_name}", source="pytest-function" ) yield tracer **3. Environment-Based Configuration** .. code-block:: python # tests/conftest.py import os import pytest from dotenv import load_dotenv def pytest_configure(config): """Load test environment before tests run.""" load_dotenv('.env.test') # Verify test configuration if not os.getenv("HH_API_KEY"): pytest.exit("HH_API_KEY not set in test environment") if os.getenv("HH_PROJECT") == "production": pytest.exit("Cannot use production project in tests") **4. Parametrized Testing** .. code-block:: python @pytest.mark.parametrize("input_value,expected_output", [ (5, 10), (0, 0), (100, 200), ]) def test_risky_operation_parametrized(test_tracer, input_value, expected_output): """Test multiple scenarios with tracing.""" result = risky_operation(input_value) assert result == expected_output Common Testing Patterns ----------------------- Pattern 1: Test Helper with Tracing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # test_helpers.py from contextlib import contextmanager from honeyhive import enrich_span import time @contextmanager def assert_trace_timing(max_duration_ms: float): """Context manager to validate operation timing.""" start = time.time() yield duration_ms = (time.time() - start) * 1000 enrich_span({"test.duration_ms": duration_ms}) assert duration_ms < max_duration_ms, \ f"Operation took {duration_ms:.2f}ms, expected < {max_duration_ms}ms" # Usage def test_with_timing(test_tracer): with assert_trace_timing(max_duration_ms=500): result = process_data({"key": "value"}) Pattern 2: Trace Assertion Helper ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python def assert_span_has_attributes(span, expected_attrs: dict): """Assert span contains expected attributes.""" actual_attrs = dict(span.attributes) for key, expected_value in expected_attrs.items(): actual_value = actual_attrs.get(key) assert actual_value == expected_value, \ f"Attribute {key}: expected {expected_value}, got {actual_value}" # Usage def test_span_attributes(test_tracer, span_capture): process_data({"key": "value"}) spans = span_capture.get_finished_spans() assert_span_has_attributes(spans[0], { "input.size": 1, "process.type": "transformation" }) Running Tests ------------- **Basic Test Execution:** .. code-block:: bash # Run all tests with test environment pytest tests/ --env-file=.env.test # Run specific test file pytest tests/test_traced_functions.py -v # Run with coverage pytest tests/ --cov=src --cov-report=html **Test Selection:** .. code-block:: bash # Run only integration tests pytest tests/ -m integration # Run only unit tests pytest tests/ -m unit # Skip slow tests pytest tests/ -m "not slow" **Pytest Markers:** .. code-block:: python import pytest @pytest.mark.unit def test_unit_function(test_tracer): """Unit test with tracing.""" pass @pytest.mark.integration def test_integration_workflow(test_tracer): """Integration test with tracing.""" pass @pytest.mark.slow def test_heavy_processing(test_tracer): """Slow test that may be skipped.""" pass Next Steps ---------- - :doc:`evaluation/index` - Comprehensive evaluation testing strategies - :doc:`deployment/production` - Production testing and monitoring - :doc:`../development/index` - SDK development testing (for contributors) **Key Takeaway:** Test with real HoneyHive tracing enabled to catch integration issues early. Use pytest fixtures for consistent tracer setup, validate trace attributes programmatically, and maintain separate test projects to avoid polluting production data. ✨