Span Enrichment Patterns
========================

**Problem:** You need to add rich context, business metadata, and performance metrics to your traces to make them useful for debugging, analysis, and business intelligence.

**Solution:** Use these 5 proven span enrichment patterns to transform basic traces into powerful observability data.

This guide covers advanced enrichment techniques beyond the basics. For an introduction, see :doc:`/tutorials/03-enable-span-enrichment`.

Session-Level vs Span-Level Enrichment
---------------------------------------

HoneyHive provides two enrichment scopes: **session-level** and **span-level**.

**``enrich_session()`` - Apply metadata to all spans in a session:**

.. code-block:: python

   from honeyhive import HoneyHiveTracer
   
   tracer = HoneyHiveTracer.init(project="my-app")
   
   # Apply to ALL spans in this session
   tracer.enrich_session({
       "user_id": "user_456",
       "user_tier": "enterprise",
       "environment": "production",
       "deployment_region": "us-east-1"
   })
   
   # All subsequent operations inherit this metadata
   response1 = call_llm(...)
   response2 = call_llm(...)
   response3 = call_llm(...)
   # All 3 traces will have user_id, user_tier, environment, deployment_region

**Use ``enrich_session()`` for:**

- ✅ User identification (user_id, email, tier)
- ✅ Session context (session_type, workflow_name)
- ✅ Environment info (environment, region, version)
- ✅ Business context (customer_id, account_type, plan)
- ✅ Any metadata that applies to the entire user session

**``enrich_span()`` - Apply metadata to a single span:**

.. code-block:: python

   from honeyhive import enrich_span
   
   def process_query(query: str, use_cache: bool):
       # Apply to THIS specific span only
       enrich_span({
           "query_length": len(query),
           "cache_enabled": use_cache,
           "model": "gpt-4",
           "temperature": 0.7
       })
       
       return call_llm(query)

**Use ``enrich_span()`` for:**

- ✅ Per-call parameters (model, temperature, max_tokens)
- ✅ Call-specific metrics (input_length, cache_hit, latency)
- ✅ Dynamic metadata (intent_classification, confidence_score)
- ✅ Error details (error_type, retry_count)
- ✅ Any metadata that varies per LLM call

**Example combining both:**

.. code-block:: python

   from honeyhive import HoneyHiveTracer
   
   # Session-level: Set once for the entire user session
   tracer = HoneyHiveTracer.init(
       project="customer-support",
       session_name="support-session-789"
   )
   
   tracer.enrich_session({
       "user_id": "user_456",
       "support_tier": "premium",
       "issue_category": "billing"
   })
   
   # Span-level: Varies per call
   def handle_query(query: str):
       intent = classify_intent(query)
       
       tracer.enrich_span({
           "query_intent": intent,
           "query_length": len(query),
           "model": "gpt-4" if intent == "complex" else "gpt-3.5-turbo"
       })
       
       return generate_response(query)
   
   # Each call has both session + span metadata
   handle_query("How do I change my billing address?")
   handle_query("What's my current balance?")
   handle_query("Can I upgrade my plan?")

**Decision Matrix:**

+------------------------------+-------------------------+-------------------------+
| **Metadata Type**            | **Scope**               | **Method**              |
+==============================+=========================+=========================+
| User ID, email               | Session (constant)      | ``enrich_session()``    |
+------------------------------+-------------------------+-------------------------+
| Model name, temperature      | Span (varies)           | ``enrich_span()``       |
+------------------------------+-------------------------+-------------------------+
| Environment (prod/dev)       | Session (constant)      | ``enrich_session()``    |
+------------------------------+-------------------------+-------------------------+
| Cache hit/miss               | Span (per-call)         | ``enrich_span()``       |
+------------------------------+-------------------------+-------------------------+
| Customer tier                | Session (constant)      | ``enrich_session()``    |
+------------------------------+-------------------------+-------------------------+
| Prompt token count           | Span (per-call)         | ``enrich_span()``       |
+------------------------------+-------------------------+-------------------------+
| Deployment region            | Session (constant)      | ``enrich_session()``    |
+------------------------------+-------------------------+-------------------------+
| Error type/message           | Span (when it occurs)   | ``enrich_span()``       |
+------------------------------+-------------------------+-------------------------+

.. tip::
   **Rule of Thumb:**
   
   If the metadata is the same for all LLM calls in a user session, use ``enrich_session()``.
   If it changes per call, use ``enrich_span()``.

Understanding Enrichment Interfaces
-----------------------------------

``enrich_span()`` supports multiple invocation patterns. Choose the one that fits your use case:

Quick Reference Table
^^^^^^^^^^^^^^^^^^^^^

+----------------------------+----------------------------------+----------------------------------------------+
| Pattern                    | When to Use                      | Backend Namespace                            |
+============================+==================================+==============================================+
| Simple Dict                | Quick metadata                   | ``honeyhive_metadata.*``                     |
+----------------------------+----------------------------------+----------------------------------------------+
| Keyword Arguments          | Concise inline enrichment        | ``honeyhive_metadata.*``                     |
+----------------------------+----------------------------------+----------------------------------------------+
| Reserved Namespaces        | Structured organization          | ``honeyhive_<namespace>.*``                  |
+----------------------------+----------------------------------+----------------------------------------------+
| Mixed Usage                | Combine multiple patterns        | Multiple namespaces                          |
+----------------------------+----------------------------------+----------------------------------------------+

Simple Dict Pattern (New)
^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

   from honeyhive import enrich_span
   
   # Pass a dictionary - routes to metadata
   enrich_span({
       "user_id": "user_123",
       "feature": "chat",
       "session": "abc"
   })
   
   # Backend storage:
   # honeyhive_metadata.user_id = "user_123"
   # honeyhive_metadata.feature = "chat"
   # honeyhive_metadata.session = "abc"

Keyword Arguments Pattern (New)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

   from honeyhive import enrich_span
   
   # Pass keyword arguments - also routes to metadata
   enrich_span(
       user_id="user_123",
       feature="chat",
       session="abc"
   )
   
   # Same backend storage as simple dict

Reserved Namespaces Pattern (Backwards Compatible)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Use explicit namespace parameters for organized data:

.. code-block:: python

   from honeyhive import enrich_span
   
   # Explicit namespaces for structured organization
   enrich_span(
       metadata={"user_id": "user_123", "session": "abc"},
       metrics={"latency_ms": 150, "score": 0.95},
       user_properties={"user_id": "user_123", "plan": "premium"},
       feedback={"rating": 5, "helpful": True},
       inputs={"query": "What is AI?"},
       outputs={"answer": "AI is artificial intelligence..."},
       config={"model": "gpt-4", "temperature": 0.7},
       error="Optional error message",
       event_id="evt_unique_identifier"
   )
   
   # Backend storage:
   # honeyhive_metadata.user_id = "user_123"
   # honeyhive_metadata.session = "abc"
   # honeyhive_metrics.latency_ms = 150
   # honeyhive_metrics.score = 0.95
   # honeyhive_user_properties.user_id = "user_123"
   # honeyhive_user_properties.plan = "premium"
   # honeyhive_feedback.rating = 5
   # honeyhive_feedback.helpful = True
   # honeyhive_inputs.query = "What is AI?"
   # honeyhive_outputs.answer = "AI is artificial intelligence..."
   # honeyhive_config.model = "gpt-4"
   # honeyhive_config.temperature = 0.7
   # honeyhive_error = "Optional error message"
   # honeyhive_event_id = "evt_unique_identifier"

**Available Namespaces:**

- ``metadata``: Business context (user IDs, features, session info)
- ``metrics``: Numeric measurements (latencies, scores, counts)
- ``user_properties``: User-specific properties (user_id, plan, tier, etc.) - stored in dedicated namespace
- ``feedback``: User or system feedback (ratings, thumbs up/down)
- ``inputs``: Input data to the operation
- ``outputs``: Output data from the operation
- ``config``: Configuration parameters (model settings, hyperparams)
- ``error``: Error messages or exceptions (stored as direct attribute)
- ``event_id``: Unique event identifier (stored as direct attribute)

**Why use namespaces?**

- Organize different data types separately
- Easier to query specific categories in the backend
- Maintain backwards compatibility with existing code
- Clear semantic meaning for different attribute types

Mixed Usage Pattern
^^^^^^^^^^^^^^^^^^^

Combine multiple patterns - later values override earlier ones:

.. code-block:: python

   from honeyhive import enrich_span
   
   # Combine namespaces with kwargs
   enrich_span(
       metadata={"user_id": "user_123"},
       metrics={"score": 0.95, "latency_ms": 150},
       feature="chat",     # Adds to metadata
       priority="high",    # Also adds to metadata
       retries=3           # Also adds to metadata
   )
   
   # Backend storage:
   # honeyhive_metadata.user_id = "user_123"
   # honeyhive_metadata.feature = "chat"
   # honeyhive_metadata.priority = "high"
   # honeyhive_metadata.retries = 3
   # honeyhive_metrics.score = 0.95
   # honeyhive_metrics.latency_ms = 150

Using ``enrich_span_context()`` for Inline Span Creation
----------------------------------------------------------

**New in v1.0+:** When you need to create and enrich a named span without refactoring code into separate functions.

**When to use:**

- ✅ You want explicit named spans for specific code blocks
- ✅ It's hard or impractical to split code into separate functions
- ✅ You need to enrich spans with inputs/outputs immediately upon creation
- ✅ You want clear span boundaries without decorator overhead

**Problem:** Using ``@trace`` decorator requires refactoring code into separate functions:

.. code-block:: python

   # Without decorator - no span created
   def complex_workflow(data):
       # Step 1: Preprocessing
       cleaned = preprocess(data)
       
       # Step 2: Model inference
       result = model.predict(cleaned)
       
       # Step 3: Postprocessing
       final = postprocess(result)
       
       return final
   
   # With decorator - requires splitting into functions
   @trace(event_name="preprocess_step")
   def preprocess(data):
       # preprocessing logic
       pass
   
   @trace(event_name="inference_step")
   def predict(data):
       # inference logic
       pass
   
   @trace(event_name="postprocess_step")
   def postprocess(data):
       # postprocessing logic
       pass

**Solution:** Use ``enrich_span_context()`` to create named spans inline:

.. code-block:: python

   from honeyhive.tracer.processing.context import enrich_span_context
   from honeyhive import HoneyHiveTracer
   
   tracer = HoneyHiveTracer.init(project="my-app")
   
   def complex_workflow(data):
       """Workflow with inline named spans - no refactoring needed!"""
       
       # Step 1: Create span for preprocessing
       with enrich_span_context(
           event_name="preprocess_step",
           inputs={"raw_data_size": len(data)},
           metadata={"stage": "preprocessing"}
       ):
           cleaned = preprocess_data(data)
           tracer.enrich_span(outputs={"cleaned_size": len(cleaned)})
       
       # Step 2: Create span for model inference
       with enrich_span_context(
           event_name="inference_step",
           inputs={"input_shape": cleaned.shape},
           metadata={"model": "gpt-4", "temperature": 0.7}
       ):
           result = model.predict(cleaned)
           tracer.enrich_span(
               outputs={"prediction": result},
               metrics={"confidence": 0.95}
           )
       
       # Step 3: Create span for postprocessing
       with enrich_span_context(
           event_name="postprocess_step",
           inputs={"raw_result": result}
       ):
           final = postprocess(result)
           tracer.enrich_span(outputs={"final_result": final})
       
       return final

**What you get in HoneyHive:**

.. code-block:: text

   📊 complex_workflow [ROOT]
   ├── 🔧 preprocess_step
   │   └── inputs: {"raw_data_size": 1000}
   │   └── outputs: {"cleaned_size": 950}
   │   └── metadata: {"stage": "preprocessing"}
   ├── 🤖 inference_step
   │   └── inputs: {"input_shape": [950, 128]}
   │   └── outputs: {"prediction": "..."}
   │   └── metadata: {"model": "gpt-4", "temperature": 0.7}
   │   └── metrics: {"confidence": 0.95}
   └── ✨ postprocess_step
       └── inputs: {"raw_result": "..."}
       └── outputs: {"final_result": "..."}

**Advantages over decorator approach:**

+----------------------------+----------------------------------+----------------------------------+
| **Aspect**                 | **@trace decorator**             | **enrich_span_context()**        |
+============================+==================================+==================================+
| **Refactoring**            | Must split into functions        | No refactoring needed            |
+----------------------------+----------------------------------+----------------------------------+
| **Code Structure**         | Forces function boundaries       | Flexible inline usage            |
+----------------------------+----------------------------------+----------------------------------+
| **Enrichment Timing**      | After span creation              | On creation + during execution   |
+----------------------------+----------------------------------+----------------------------------+
| **Span Naming**            | Function name or explicit        | Always explicit                  |
+----------------------------+----------------------------------+----------------------------------+
| **Best for**               | Reusable functions               | Inline code blocks               |
+----------------------------+----------------------------------+----------------------------------+

**Real-world example: RAG Pipeline with inline spans**

.. code-block:: python

   from honeyhive.tracer.processing.context import enrich_span_context
   from honeyhive import HoneyHiveTracer, trace
   import openai
   
   tracer = HoneyHiveTracer.init(project="rag-app")
   
   @trace(event_type="chain", event_name="rag_query")
   def rag_query(query: str, context_docs: list) -> str:
       """RAG pipeline with explicit span boundaries."""
       
       # Span 1: Document retrieval
       with enrich_span_context(
           event_name="retrieve_documents",
           inputs={"query": query, "doc_count": len(context_docs)},
           metadata={"retrieval_method": "semantic_search"}
       ):
           relevant_docs = semantic_search(query, context_docs, top_k=5)
           tracer.enrich_span(
               outputs={"retrieved_count": len(relevant_docs)},
               metrics={"avg_relevance_score": 0.87}
           )
       
       # Span 2: Context building
       with enrich_span_context(
           event_name="build_context",
           inputs={"doc_count": len(relevant_docs)}
       ):
           context = "\n\n".join([doc.content for doc in relevant_docs])
           prompt = f"Context:\n{context}\n\nQuestion: {query}\n\nAnswer:"
           tracer.enrich_span(
               outputs={"context_length": len(context), "prompt_length": len(prompt)}
           )
       
       # Span 3: LLM generation (instrumentor creates child spans automatically)
       with enrich_span_context(
           event_name="generate_answer",
           inputs={"prompt_length": len(prompt)},
           metadata={"model": "gpt-4", "max_tokens": 500}
       ):
           client = openai.OpenAI()
           response = client.chat.completions.create(
               model="gpt-4",
               max_tokens=500,
               messages=[{"role": "user", "content": prompt}]
           )
           answer = response.choices[0].message.content
           tracer.enrich_span(
               outputs={"answer": answer},
               metrics={"completion_tokens": response.usage.completion_tokens}
           )
       
       return answer

**Key benefits:**

- **Clear span boundaries**: Each pipeline stage has an explicit named span
- **No refactoring**: Keep your logic in one function, add spans inline
- **Rich context**: Set inputs/outputs/metadata when creating the span
- **Flexible enrichment**: Can still call ``tracer.enrich_span()`` during execution
- **Works with instrumentors**: Auto-instrumented spans (e.g., OpenAI) become children

.. note::
   **When to use each approach:**
   
   - Use ``@trace`` decorator for **reusable functions** you call multiple times
   - Use ``enrich_span_context()`` for **inline code blocks** that are hard to extract into functions
   - Use ``tracer.enrich_span()`` for **adding metadata** to existing spans (decorator or instrumentor)
   - Use ``tracer.enrich_session()`` for **session-wide metadata** that applies to all spans

Advanced Techniques
-------------------

Conditional Enrichment
^^^^^^^^^^^^^^^^^^^^^^

Only enrich based on conditions:

.. code-block:: python

   def conditional_enrichment(user_tier: str, result: str):
       # Always enrich with tier
       enrich_span({"user_tier": user_tier})
       
       # Only enrich premium users with detailed info
       if user_tier == "premium":
           enrich_span({
               "result_length": len(result),
               "result_word_count": len(result.split()),
               "premium_features_used": True
           })

Structured Enrichment
^^^^^^^^^^^^^^^^^^^^^

Organize related metadata:

.. code-block:: python

   def structured_enrichment(user_data: dict, request_data: dict):
       # User namespace
       enrich_span({
           "user.id": user_data["id"],
           "user.tier": user_data["tier"],
           "user.region": user_data["region"]
       })
       
       # Request namespace
       enrich_span({
           "request.id": request_data["id"],
           "request.priority": request_data["priority"],
           "request.source": request_data["source"]
       })

Best Practices
--------------

**DO:**

- Use dot notation for hierarchical keys (``user.id``, ``request.priority``)
- Enrich early and often throughout function execution
- Include timing information for performance analysis
- Add error context in exception handlers
- Use consistent key naming conventions

**DON'T:**

- Include sensitive data (PII, credentials, API keys)
- Add extremely large values (>10KB per field)
- Use random/dynamic key names
- Over-enrich (100+ fields per span becomes noise)
- Duplicate data already captured by instrumentors

Troubleshooting
---------------

**Enrichment not appearing:**

- Ensure you're calling ``enrich_span()`` within a traced context
- Check that instrumentor is properly initialized
- Verify tracer is sending data to HoneyHive

**Performance impact:**

- Enrichment adds <1ms overhead per call
- Serialize complex objects before enriching
- Use sampling for high-frequency enrichment

Next Steps
----------

- :doc:`custom-spans` - Create custom spans for complex workflows
- :doc:`class-decorators` - Class-level tracing patterns
- :doc:`advanced-patterns` - Session enrichment and distributed tracing
- :doc:`/how-to/llm-application-patterns` - Application architecture patterns

**Key Takeaway:** Span enrichment transforms basic traces into rich observability data that powers debugging, analysis, and business intelligence. Use these 5 patterns as building blocks for your tracing strategy. ✨