Integration Testing Strategy for HoneyHive SDK

This document outlines our comprehensive integration testing strategy, particularly focusing on preventing bugs like the ProxyTracerProvider issue that slipped through our initial testing.

Overview

Our testing strategy uses a multi-layered approach:

  1. Unit Tests - Fast, isolated, heavily mocked

  2. Integration Tests - Real components, real scenarios

  3. End-to-End Tests - Full user workflows

  4. Real Environment Tests - Subprocess-based testing

The ProxyTracerProvider Bug: Lessons Learned

What Happened

A critical bug existed where HoneyHive failed to handle OpenTelemetry’s default ProxyTracerProvider, causing instrumentor integration to fail silently.

Why It Wasn’t Caught

  1. Over-Mocking: Our test suite completely mocked OpenTelemetry components

  2. Missing Real Scenarios: No tests covered “fresh Python environment + instrumentor” scenarios

  3. Documentation Gap: Examples didn’t follow documented best practices

  4. Integration Test Gaps: Tests didn’t validate real TracerProvider behavior

The Fix

# Fixed: Properly detect and handle ProxyTracerProvider
is_noop_provider = (
    existing_provider is None
    or str(type(existing_provider).__name__) == "NoOpTracerProvider"
    or str(type(existing_provider).__name__) == "ProxyTracerProvider"  # ← Added this
    or "NoOp" in str(type(existing_provider).__name__)
    or "Proxy" in str(type(existing_provider).__name__)  # ← Added this
)

Testing Strategy Updates

Real Environment Testing

We now use subprocess-based tests to validate real-world scenarios:

def test_fresh_environment_proxy_tracer_provider_bug(self):
    """Test ProxyTracerProvider handling in fresh environment."""
    test_script = '''
    from opentelemetry import trace
    from honeyhive.tracer.otel_tracer import HoneyHiveTracer

    # Verify we start with ProxyTracerProvider
    initial_provider = trace.get_tracer_provider()
    assert "Proxy" in type(initial_provider).__name__

    # Initialize HoneyHive - should handle ProxyTracerProvider
    tracer = HoneyHiveTracer(api_key="test", project="test")

    # Should now have real TracerProvider
    final_provider = trace.get_tracer_provider()
    assert "Proxy" not in type(final_provider).__name__


    # Run in subprocess for fresh environment
    result = subprocess.run([sys.executable, script_path], ...)

Benefits:

  • Tests real OpenTelemetry behavior

  • Catches environment-specific bugs

  • Validates actual user experience

  • No mocking interference

Instrumentor Integration Testing

New tests specifically validate instrumentor integration patterns:

@pytest.mark.real_instrumentor
def test_real_openai_instrumentor_integration(self):
    """Test with actual OpenInference instrumentor."""
    # Test both initialization patterns:
    # 1. HoneyHive first, then instrumentor (recommended)
    # 2. Instrumentor passed to HoneyHive.init() (legacy)

Coverage Areas:

  • Fresh environment scenarios

  • Multiple TracerProvider types

  • Real instrumentor libraries

  • Initialization order variations

  • Span processor integration

Test Categories and When to Use

Unit Tests (Fast, Isolated)

Use for: - Individual function logic - Error handling paths - Configuration validation - Mock-friendly scenarios

Characteristics: - Heavy mocking - Fast execution (< 1s each) - No external dependencies - Isolated components

Integration Tests (Real Components)

Use for: - Component interaction - Real API integration - TracerProvider scenarios - Multi-instance behavior

Characteristics: - Minimal mocking - Real OpenTelemetry components - Moderate execution time - External service integration

Real Environment Tests (Subprocess)

Use for: - Fresh environment scenarios - Instrumentor integration - Environment-specific bugs - User experience validation

Characteristics: - No mocking - Subprocess execution - Real library behavior - Slower but comprehensive

Test Execution Strategy

Local Development

# Fast feedback loop
tox -e unit                    # Unit tests only

# Before committing
tox -e integration            # Integration tests

# Full validation
tox -e unit -e integration    # Complete test suite

CI/CD Pipeline

# GitHub Actions workflow
- name: Unit Tests
  run: tox -e unit

- name: Integration Tests
  run: tox -e integration

- name: Real Environment Tests
  run: tox -e real_env
  if: github.event_name == 'pull_request'

Test Execution Order:

  1. Unit tests (fast feedback)

  2. Integration tests (component validation)

  3. Real environment tests (comprehensive validation)

  4. End-to-end tests (user workflows)

Preventing Future Bugs

Mandatory Test Coverage

New Features Must Include:

  1. Unit Tests - Core logic validation

  2. Integration Tests - Component interaction

  3. Real Environment Tests - User scenario validation

  4. Documentation Examples - Working code samples

Quality Gates:

  • All tests must pass

  • Coverage >= 80% for new code

  • Real environment tests for instrumentor features

  • Documentation examples must be tested

Test Review Checklist

For New Tests:

  • [ ] Tests real user scenarios?

  • [ ] Covers error conditions?

  • [ ] Validates integration points?

  • [ ] Uses appropriate test category?

  • [ ] Includes cleanup/teardown?

For Bug Fixes:

  • [ ] Reproduces the original bug?

  • [ ] Tests the fix in isolation?

  • [ ] Validates fix in real environment?

  • [ ] Prevents regression?

Monitoring and Metrics

Test Health Metrics

Track: - Test execution time trends - Flaky test identification - Coverage percentage changes - Real environment test success rates

Alerts: - Integration test failures - Coverage drops below threshold - Real environment test timeouts - Instrumentor compatibility issues

Review Schedule: - Weekly: Test health review - Monthly: Strategy effectiveness assessment - Quarterly: Coverage and quality analysis

Tools and Infrastructure

Testing Tools

Core Testing: - pytest (test framework) - tox (environment management) - coverage.py (coverage tracking)

Integration Testing: - Real OpenTelemetry components - Subprocess execution - Temporary file management

CI/CD Integration: - GitHub Actions workflows - Automated test execution - Coverage reporting

Environment Management

Test Environments: - Unit: Heavily mocked, fast - Integration: Real components, moderate - Real Environment: Subprocess, comprehensive - Staging: Full user workflows

Dependency Management: - Isolated test dependencies - Version compatibility testing - Optional dependency handling

Conclusion

The ProxyTracerProvider bug taught us that comprehensive testing requires:

  1. Multiple Test Layers - Unit, integration, and real environment

  2. Real Scenario Coverage - Test actual user workflows

  3. Minimal Mocking - Use real components when possible

  4. Subprocess Testing - Validate fresh environment behavior

This strategy ensures we catch integration bugs early while maintaining fast feedback loops for development.

Key Takeaway: Test the user experience, not just the code.