Integration Testing Strategy for HoneyHive SDK

This document outlines our comprehensive integration testing strategy, particularly focusing on preventing bugs like the ProxyTracerProvider issue that slipped through our initial testing.

Overview

Our testing strategy uses a multi-layered approach:

Unit Tests - Fast, isolated, heavily mocked
Integration Tests - Real components, real scenarios
End-to-End Tests - Full user workflows
Real Environment Tests - Subprocess-based testing

The ProxyTracerProvider Bug: Lessons Learned

What Happened

A critical bug existed where HoneyHive failed to handle OpenTelemetry’s default ProxyTracerProvider, causing instrumentor integration to fail silently.

Why It Wasn’t Caught

Over-Mocking: Our test suite completely mocked OpenTelemetry components
Missing Real Scenarios: No tests covered “fresh Python environment + instrumentor” scenarios
Documentation Gap: Examples didn’t follow documented best practices
Integration Test Gaps: Tests didn’t validate real TracerProvider behavior

The Fix

# Fixed: Properly detect and handle ProxyTracerProvider
is_noop_provider = (
    existing_provider is None
    or str(type(existing_provider).__name__) == "NoOpTracerProvider"
    or str(type(existing_provider).__name__) == "ProxyTracerProvider"  # ← Added this
    or "NoOp" in str(type(existing_provider).__name__)
    or "Proxy" in str(type(existing_provider).__name__)  # ← Added this
)

Testing Strategy Updates

Real Environment Testing

We now use subprocess-based tests to validate real-world scenarios:

def test_fresh_environment_proxy_tracer_provider_bug(self):
    """Test ProxyTracerProvider handling in fresh environment."""
    test_script = '''
    from opentelemetry import trace
    from honeyhive.tracer.otel_tracer import HoneyHiveTracer

    # Verify we start with ProxyTracerProvider
    initial_provider = trace.get_tracer_provider()
    assert "Proxy" in type(initial_provider).__name__

    # Initialize HoneyHive - should handle ProxyTracerProvider
    tracer = HoneyHiveTracer(api_key="test", project="test")

    # Should now have real TracerProvider
    final_provider = trace.get_tracer_provider()
    assert "Proxy" not in type(final_provider).__name__


    # Run in subprocess for fresh environment
    result = subprocess.run([sys.executable, script_path], ...)

Benefits:

Tests real OpenTelemetry behavior
Catches environment-specific bugs
Validates actual user experience
No mocking interference

Instrumentor Integration Testing

New tests specifically validate instrumentor integration patterns:

@pytest.mark.real_instrumentor
def test_real_openai_instrumentor_integration(self):
    """Test with actual OpenInference instrumentor."""
    # Test both initialization patterns:
    # 1. HoneyHive first, then instrumentor (recommended)
    # 2. Instrumentor passed to HoneyHive.init() (legacy)

Coverage Areas:

Fresh environment scenarios
Multiple TracerProvider types
Real instrumentor libraries
Initialization order variations
Span processor integration

Test Categories and When to Use

Unit Tests (Fast, Isolated)

Use for: - Individual function logic - Error handling paths - Configuration validation - Mock-friendly scenarios

Characteristics: - Heavy mocking - Fast execution (< 1s each) - No external dependencies - Isolated components

Integration Tests (Real Components)

Use for: - Component interaction - Real API integration - TracerProvider scenarios - Multi-instance behavior

Characteristics: - Minimal mocking - Real OpenTelemetry components - Moderate execution time - External service integration

Real Environment Tests (Subprocess)

Use for: - Fresh environment scenarios - Instrumentor integration - Environment-specific bugs - User experience validation

Characteristics: - No mocking - Subprocess execution - Real library behavior - Slower but comprehensive

Test Execution Strategy

Local Development

# Fast feedback loop
tox -e unit                    # Unit tests only

# Before committing
tox -e integration            # Integration tests

# Full validation
tox -e unit -e integration    # Complete test suite

CI/CD Pipeline

# GitHub Actions workflow
- name: Unit Tests
  run: tox -e unit

- name: Integration Tests
  run: tox -e integration

- name: Real Environment Tests
  run: tox -e real_env
  if: github.event_name == 'pull_request'

Test Execution Order:

Unit tests (fast feedback)
Integration tests (component validation)
Real environment tests (comprehensive validation)
End-to-end tests (user workflows)

Preventing Future Bugs

Mandatory Test Coverage

New Features Must Include:

Unit Tests - Core logic validation
Integration Tests - Component interaction
Real Environment Tests - User scenario validation
Documentation Examples - Working code samples

Quality Gates:

All tests must pass
Coverage >= 80% for new code
Real environment tests for instrumentor features
Documentation examples must be tested

Test Review Checklist

For New Tests:

[ ] Tests real user scenarios?
[ ] Covers error conditions?
[ ] Validates integration points?
[ ] Uses appropriate test category?
[ ] Includes cleanup/teardown?

For Bug Fixes:

[ ] Reproduces the original bug?
[ ] Tests the fix in isolation?
[ ] Validates fix in real environment?
[ ] Prevents regression?

Monitoring and Metrics

Test Health Metrics

Track: - Test execution time trends - Flaky test identification - Coverage percentage changes - Real environment test success rates

Alerts: - Integration test failures - Coverage drops below threshold - Real environment test timeouts - Instrumentor compatibility issues

Review Schedule: - Weekly: Test health review - Monthly: Strategy effectiveness assessment - Quarterly: Coverage and quality analysis

Tools and Infrastructure

Testing Tools

Core Testing: - pytest (test framework) - tox (environment management) - coverage.py (coverage tracking)

Integration Testing: - Real OpenTelemetry components - Subprocess execution - Temporary file management

CI/CD Integration: - GitHub Actions workflows - Automated test execution - Coverage reporting

Environment Management

Test Environments: - Unit: Heavily mocked, fast - Integration: Real components, moderate - Real Environment: Subprocess, comprehensive - Staging: Full user workflows

Dependency Management: - Isolated test dependencies - Version compatibility testing - Optional dependency handling

Conclusion

The ProxyTracerProvider bug taught us that comprehensive testing requires:

Multiple Test Layers - Unit, integration, and real environment
Real Scenario Coverage - Test actual user workflows
Minimal Mocking - Use real components when possible
Subprocess Testing - Validate fresh environment behavior

This strategy ensures we catch integration bugs early while maintaining fast feedback loops for development.

Key Takeaway: Test the user experience, not just the code.