Integration Testing Strategy for HoneyHive SDK
==============================================

This document outlines our comprehensive integration testing strategy, particularly focusing on preventing bugs like the ProxyTracerProvider issue that slipped through our initial testing.

Overview
--------

Our testing strategy uses a multi-layered approach:

1. **Unit Tests** - Fast, isolated, heavily mocked
2. **Integration Tests** - Real components, real scenarios  
3. **End-to-End Tests** - Full user workflows
4. **Real Environment Tests** - Subprocess-based testing

The ProxyTracerProvider Bug: Lessons Learned
--------------------------------------------

**What Happened**
~~~~~~~~~~~~~~~~~

A critical bug existed where HoneyHive failed to handle OpenTelemetry's default ``ProxyTracerProvider``, causing instrumentor integration to fail silently.

**Why It Wasn't Caught**
~~~~~~~~~~~~~~~~~~~~~~~~

1. **Over-Mocking**: Our test suite completely mocked OpenTelemetry components
2. **Missing Real Scenarios**: No tests covered "fresh Python environment + instrumentor" scenarios  
3. **Documentation Gap**: Examples didn't follow documented best practices
4. **Integration Test Gaps**: Tests didn't validate real TracerProvider behavior

**The Fix**
~~~~~~~~~~~

.. code-block:: python

   # Fixed: Properly detect and handle ProxyTracerProvider
   is_noop_provider = (
       existing_provider is None
       or str(type(existing_provider).__name__) == "NoOpTracerProvider"
       or str(type(existing_provider).__name__) == "ProxyTracerProvider"  # ← Added this
       or "NoOp" in str(type(existing_provider).__name__)
       or "Proxy" in str(type(existing_provider).__name__)  # ← Added this
   )

Testing Strategy Updates
------------------------

Real Environment Testing
~~~~~~~~~~~~~~~~~~~~~~~~

We now use subprocess-based tests to validate real-world scenarios:

.. code-block:: python

   def test_fresh_environment_proxy_tracer_provider_bug(self):
       """Test ProxyTracerProvider handling in fresh environment."""
       test_script = '''
       from opentelemetry import trace
       from honeyhive.tracer.otel_tracer import HoneyHiveTracer
       
       # Verify we start with ProxyTracerProvider
       initial_provider = trace.get_tracer_provider()
       assert "Proxy" in type(initial_provider).__name__
       
       # Initialize HoneyHive - should handle ProxyTracerProvider
       tracer = HoneyHiveTracer(api_key="test", project="test")
       
       # Should now have real TracerProvider
       final_provider = trace.get_tracer_provider()
       assert "Proxy" not in type(final_provider).__name__

       
       # Run in subprocess for fresh environment
       result = subprocess.run([sys.executable, script_path], ...)

**Benefits:**

- Tests real OpenTelemetry behavior
- Catches environment-specific bugs  
- Validates actual user experience
- No mocking interference

Instrumentor Integration Testing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

New tests specifically validate instrumentor integration patterns:

.. code-block:: python

   @pytest.mark.real_instrumentor
   def test_real_openai_instrumentor_integration(self):
       """Test with actual OpenInference instrumentor."""
       # Test both initialization patterns:
       # 1. HoneyHive first, then instrumentor (recommended)
       # 2. Instrumentor passed to HoneyHive.init() (legacy)

**Coverage Areas:**

- Fresh environment scenarios
- Multiple TracerProvider types
- Real instrumentor libraries
- Initialization order variations
- Span processor integration

Test Categories and When to Use
-------------------------------

Unit Tests (Fast, Isolated)
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Use for:**
- Individual function logic
- Error handling paths
- Configuration validation
- Mock-friendly scenarios

**Characteristics:**
- Heavy mocking
- Fast execution (< 1s each)
- No external dependencies
- Isolated components

Integration Tests (Real Components)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Use for:**
- Component interaction
- Real API integration  
- TracerProvider scenarios
- Multi-instance behavior

**Characteristics:**
- Minimal mocking
- Real OpenTelemetry components
- Moderate execution time
- External service integration

Real Environment Tests (Subprocess)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Use for:**
- Fresh environment scenarios
- Instrumentor integration
- Environment-specific bugs
- User experience validation

**Characteristics:**
- No mocking
- Subprocess execution
- Real library behavior
- Slower but comprehensive

Test Execution Strategy
-----------------------

Local Development
~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # Fast feedback loop
   tox -e unit                    # Unit tests only
   
   # Before committing  
   tox -e integration            # Integration tests
   
   # Full validation
   tox -e unit -e integration    # Complete test suite

CI/CD Pipeline
~~~~~~~~~~~~~~

.. code-block:: yaml

   # GitHub Actions workflow
   - name: Unit Tests
     run: tox -e unit
     
   - name: Integration Tests  
     run: tox -e integration
     
   - name: Real Environment Tests
     run: tox -e real_env
     if: github.event_name == 'pull_request'

**Test Execution Order:**

1. Unit tests (fast feedback)
2. Integration tests (component validation)  
3. Real environment tests (comprehensive validation)
4. End-to-end tests (user workflows)

Preventing Future Bugs
----------------------

Mandatory Test Coverage
~~~~~~~~~~~~~~~~~~~~~~~

**New Features Must Include:**

1. **Unit Tests** - Core logic validation
2. **Integration Tests** - Component interaction  
3. **Real Environment Tests** - User scenario validation
4. **Documentation Examples** - Working code samples

**Quality Gates:**

- All tests must pass
- Coverage >= 80% for new code
- Real environment tests for instrumentor features
- Documentation examples must be tested

Test Review Checklist
~~~~~~~~~~~~~~~~~~~~~

**For New Tests:**

- [ ] Tests real user scenarios?
- [ ] Covers error conditions?  
- [ ] Validates integration points?
- [ ] Uses appropriate test category?
- [ ] Includes cleanup/teardown?

**For Bug Fixes:**

- [ ] Reproduces the original bug?
- [ ] Tests the fix in isolation?
- [ ] Validates fix in real environment?
- [ ] Prevents regression?

Monitoring and Metrics
----------------------

Test Health Metrics
~~~~~~~~~~~~~~~~~~~

**Track:**
- Test execution time trends
- Flaky test identification  
- Coverage percentage changes
- Real environment test success rates

**Alerts:**
- Integration test failures
- Coverage drops below threshold
- Real environment test timeouts
- Instrumentor compatibility issues

**Review Schedule:**
- Weekly: Test health review
- Monthly: Strategy effectiveness assessment
- Quarterly: Coverage and quality analysis

Tools and Infrastructure
------------------------

Testing Tools
~~~~~~~~~~~~~

**Core Testing:**
- pytest (test framework)
- tox (environment management)
- coverage.py (coverage tracking)

**Integration Testing:**
- Real OpenTelemetry components
- Subprocess execution
- Temporary file management

**CI/CD Integration:**
- GitHub Actions workflows
- Automated test execution
- Coverage reporting

Environment Management
~~~~~~~~~~~~~~~~~~~~~~

**Test Environments:**
- Unit: Heavily mocked, fast
- Integration: Real components, moderate
- Real Environment: Subprocess, comprehensive
- Staging: Full user workflows

**Dependency Management:**
- Isolated test dependencies
- Version compatibility testing
- Optional dependency handling

Conclusion
----------

The ProxyTracerProvider bug taught us that comprehensive testing requires:

1. **Multiple Test Layers** - Unit, integration, and real environment
2. **Real Scenario Coverage** - Test actual user workflows
3. **Minimal Mocking** - Use real components when possible  
4. **Subprocess Testing** - Validate fresh environment behavior

This strategy ensures we catch integration bugs early while maintaining fast feedback loops for development.

**Key Takeaway:** *Test the user experience, not just the code.*