Event Data Models
=================

.. note::
   **Technical specification for HoneyHive event data structures**
   
   This document defines the exact data models and formats used for events in the HoneyHive SDK.

Events are the core observability primitives in HoneyHive, representing discrete operations or interactions in your LLM application.

Core Event Model
----------------

.. py:class:: Event

   The primary event data structure used throughout HoneyHive.

   .. py:attribute:: event_id
      :type: str

      Unique identifier for the event.

      **Format**: UUID v4 string
      **Example**: ``"01234567-89ab-cdef-0123-456789abcdef"``
      **Required**: Auto-generated by SDK

   .. py:attribute:: session_id
      :type: str

      Session identifier that groups related events.

      **Format**: UUID v4 string  
      **Example**: ``"session-01234567-89ab-cdef-0123-456789abcdef"``
      **Required**: Auto-generated by tracer

   .. py:attribute:: parent_id
      :type: Optional[str]

      Parent event ID for nested operations.

      **Format**: UUID v4 string
      **Example**: ``"parent-01234567-89ab-cdef-0123-456789abcdef"``
      **Required**: No (None for root events)

   .. py:attribute:: event_type
      :type: str

      Categorizes the type of operation.

      **Valid Values**:
      - ``"model"`` - LLM model calls and interactions
      - ``"tool"`` - Tool/function calls and external API interactions  
      - ``"chain"`` - Chain/workflow operations and multi-step processes

      **Example**: ``"model"``
      **Required**: Yes

   .. py:attribute:: event_name
      :type: str

      Human-readable name for the specific operation.

      **Format**: Descriptive string, typically kebab-case
      **Example**: ``"openai-chat-completion"``
      **Required**: Yes

   .. py:attribute:: start_time
      :type: datetime

      ISO 8601 timestamp when the event started.

      **Format**: ``YYYY-MM-DDTHH:MM:SS.fffffZ``
      **Example**: ``"2024-01-15T10:30:45.123456Z"``
      **Required**: Auto-generated by SDK

   .. py:attribute:: end_time
      :type: Optional[datetime]

      ISO 8601 timestamp when the event completed.

      **Format**: ``YYYY-MM-DDTHH:MM:SS.fffffZ``
      **Example**: ``"2024-01-15T10:30:47.654321Z"``
      **Required**: Auto-generated by SDK

   .. py:attribute:: duration_ms
      :type: Optional[float]

      Event duration in milliseconds.

      **Calculation**: ``end_time - start_time`` in milliseconds
      **Example**: ``2531.065``
      **Required**: Auto-calculated by SDK

   .. py:attribute:: status
      :type: str

      Event completion status.

      **Values**:
      - ``"success"`` - Completed successfully
      - ``"error"`` - Failed with error
      - ``"cancelled"`` - Cancelled before completion
      - ``"timeout"`` - Timed out

      **Example**: ``"success"``
      **Required**: Auto-determined by SDK

   .. py:attribute:: inputs
      :type: Optional[Dict[str, Any]]

      Input data for the operation.

      **Structure**: Key-value pairs of input parameters
      **Example**: 
      
      .. code-block:: json
      
         {
           "messages": [
             {"role": "user", "content": "Hello, world!"}
           ],
           "model": "gpt-3.5-turbo",
           "temperature": 0.7,
           "max_tokens": 150
         }

      **Required**: No (but recommended)

   .. py:attribute:: outputs
      :type: Optional[Dict[str, Any]]

      Output data from the operation.

      **Structure**: Key-value pairs of output data
      **Example**:
      
      .. code-block:: json
      
         {
           "choices": [
             {
               "message": {
                 "role": "assistant", 
                 "content": "Hello! How can I help you today?"
               },
               "finish_reason": "stop"
             }
           ],
           "usage": {
             "prompt_tokens": 12,
             "completion_tokens": 9,
             "total_tokens": 21
           }
         }

      **Required**: No (but recommended)

   .. py:attribute:: metadata
      :type: Optional[Dict[str, Any]]

      Additional context and metadata.

      **Structure**: Key-value pairs of contextual information
      **Example**:
      
      .. code-block:: json
      
         {
           "user_id": "user_12345",
           "environment": "production",
           "model_version": "gpt-3.5-turbo-0613",
           "request_id": "req_abc123",
           "tags": ["chat", "customer-support"]
         }

      **Required**: No

   .. py:attribute:: metrics
      :type: Optional[Dict[str, Union[int, float]]]

      Numerical metrics associated with the event.

      **Structure**: Key-value pairs of numeric measurements
      **Example**:
      
      .. code-block:: json
      
         {
           "latency_ms": 1250.5,
           "token_count": 21,
           "cost_usd": 0.0001,
           "cache_hit_rate": 0.85,
           "confidence_score": 0.92
         }

      **Required**: No

   .. py:attribute:: error
      :type: Optional[Dict[str, Any]]

      Error information if the event failed.

      **Structure**:
      
      .. code-block:: json
      
         {
           "type": "OpenAIError",
           "message": "Rate limit exceeded",
           "code": "rate_limit_exceeded",
           "traceback": "Traceback (most recent call last)...",
           "context": {
             "retry_after": 60,
             "request_id": "req_123"
           }
         }

      **Required**: No (only for failed events)

   .. py:attribute:: project
      :type: str

      Project identifier for organization.

      **Format**: String identifier
      **Example**: ``"customer-chat-bot"``
      **Required**: Yes (set by tracer)

   .. py:attribute:: source
      :type: str

      Source system or component identifier.

      **Format**: String identifier
      **Example**: ``"chat-service"``
      **Required**: Yes (set by tracer)

   .. py:attribute:: user_properties
      :type: Optional[Dict[str, Any]]

      User-defined custom properties.

      **Structure**: Flexible key-value pairs
      **Example**:
      
      .. code-block:: json
      
         {
           "experiment_id": "exp_001",
           "feature_flags": ["new_ui", "beta_model"],
           "user_tier": "premium",
           "custom_field": "custom_value"
         }

      **Required**: No

LLM Event Model
---------------

.. py:class:: LLMEvent

   Specialized event model for LLM operations, extends base Event model.

   **Inherits**: All fields from :py:class:`Event`

   **LLM-Specific Fields**:

   .. py:attribute:: model
      :type: str

      LLM model identifier.

      **Examples**: ``"gpt-3.5-turbo"``, ``"claude-3-sonnet-20240229"``, ``"llama-2-70b"``
      **Required**: Yes for LLM events

   .. py:attribute:: provider
      :type: str

      LLM provider/service.

      **Values**: ``"openai"``, ``"anthropic"``, ``"google"``, ``"azure"``, ``"local"``
      **Required**: Yes for LLM events

   .. py:attribute:: prompt_template
      :type: Optional[str]

      Template used to generate the prompt.

      **Example**: ``"Answer the following question: {question}"``
      **Required**: No

   .. py:attribute:: prompt_variables
      :type: Optional[Dict[str, Any]]

      Variables used in prompt template.

      **Example**: ``{"question": "What is the capital of France?"}``
      **Required**: No

   .. py:attribute:: response_format
      :type: Optional[str]

      Expected response format.

      **Values**: ``"text"``, ``"json"``, ``"function_call"``
      **Required**: No

   .. py:attribute:: tools
      :type: Optional[List[Dict[str, Any]]]

      Available tools/functions for the LLM.

      **Structure**: OpenAI function calling format
      **Required**: No

   .. py:attribute:: tool_calls
      :type: Optional[List[Dict[str, Any]]]

      Tool calls made by the LLM.

      **Structure**: OpenAI tool call format
      **Required**: No

**Example LLM Event**:

.. code-block:: json

   {
     "event_id": "evt_01234567",
     "session_id": "session_abcdef",
     "event_type": "model",
     "event_name": "openai-chat-completion",
     "start_time": "2024-01-15T10:30:45.123Z",
     "end_time": "2024-01-15T10:30:47.654Z",
     "duration_ms": 2531.0,
     "status": "success",
     "model": "gpt-3.5-turbo",
     "provider": "openai",
     "inputs": {
       "messages": [
         {"role": "user", "content": "What is the capital of France?"}
       ],
       "temperature": 0.7,
       "max_tokens": 50
     },
     "outputs": {
       "choices": [
         {
           "message": {
             "role": "assistant",
             "content": "The capital of France is Paris."
           },
           "finish_reason": "stop"
         }
       ],
       "usage": {
         "prompt_tokens": 12,
         "completion_tokens": 8,
         "total_tokens": 20
       }
     },
     "metrics": {
       "latency_ms": 2531.0,
       "tokens_per_second": 3.16,
       "cost_usd": 0.00004
     }
   }

Tool Event Model
----------------

.. py:class:: ToolEvent

   Event model for tool/function calls.

   **Inherits**: All fields from :py:class:`Event`

   **Tool-Specific Fields**:

   .. py:attribute:: function_name
      :type: str

      Name of the function/tool called.

      **Example**: ``"get_weather"``
      **Required**: Yes for tool events

   .. py:attribute:: function_description
      :type: Optional[str]

      Description of the function's purpose.

      **Example**: ``"Get current weather for a location"``
      **Required**: No

   .. py:attribute:: parameters
      :type: Optional[Dict[str, Any]]

      Function parameters schema.

      **Structure**: JSON Schema format
      **Required**: No

   .. py:attribute:: return_value
      :type: Optional[Any]

      Function return value.

      **Structure**: Any valid JSON value
      **Required**: No

**Example Tool Event**:

.. code-block:: json

   {
     "event_id": "evt_tool_001",
     "session_id": "session_abcdef", 
     "event_type": "tool",
     "event_name": "weather-api-call",
     "function_name": "get_weather",
     "inputs": {
       "location": "Paris, France",
       "units": "celsius"
     },
     "outputs": {
       "temperature": 22,
       "conditions": "sunny",
       "humidity": 65
     },
     "metrics": {
       "api_latency_ms": 150.5
     }
   }

Evaluation Event Model
----------------------

.. py:class:: EvaluationEvent

   Event model for evaluation operations.

   **Inherits**: All fields from :py:class:`Event`

   **Evaluation-Specific Fields**:

   .. py:attribute:: evaluator_name
      :type: str

      Name of the evaluator used.

      **Example**: ``"factual_accuracy"``
      **Required**: Yes for evaluation events

   .. py:attribute:: evaluator_version
      :type: Optional[str]

      Version of the evaluator.

      **Example**: ``"v1.2.0"``
      **Required**: No

   .. py:attribute:: target_event_id
      :type: str

      ID of the event being evaluated.

      **Format**: UUID v4 string
      **Required**: Yes for evaluation events

   .. py:attribute:: score
      :type: Optional[Union[float, int, bool]]

      Evaluation score.

      **Examples**: ``0.85``, ``True``, ``7``
      **Required**: No

   .. py:attribute:: explanation
      :type: Optional[str]

      Human-readable explanation of the score.

      **Example**: ``"Response is factually accurate and well-supported"``
      **Required**: No

   .. py:attribute:: criteria
      :type: Optional[Dict[str, Any]]

      Evaluation criteria used.

      **Structure**: Evaluator-specific criteria
      **Required**: No

**Example Evaluation Event**:

.. code-block:: json

   {
     "event_id": "evt_eval_001",
     "session_id": "session_abcdef",
     "event_type": "tool", 
     "event_name": "factual-accuracy-check",
     "evaluator_name": "factual_accuracy",
     "target_event_id": "evt_01234567",
     "score": 0.92,
     "explanation": "Response contains accurate information with proper citations",
     "metrics": {
       "confidence": 0.95,
       "processing_time_ms": 1200
     }
   }

Event Serialization
-------------------

**JSON Format**:

Events are serialized to JSON for storage and transmission:

.. code-block:: python

   import json
   from datetime import datetime
   
   # Event serialization
   event = {
       "event_id": "evt_123",
       "event_type": "model", 
       "start_time": datetime.utcnow().isoformat() + "Z",
       # ... other fields
   }
   
   json_data = json.dumps(event, ensure_ascii=False, indent=2)

**Field Validation**:

All events undergo validation before transmission:

.. code-block:: python

   from pydantic import BaseModel, Field
   from typing import Optional, Dict, Any
   from datetime import datetime
   
   class EventModel(BaseModel):
       event_id: str = Field(..., description="Unique event identifier")
       event_type: str = Field(..., description="Type of event")
       event_name: str = Field(..., description="Human-readable event name")
       start_time: datetime = Field(..., description="Event start time")
       end_time: Optional[datetime] = Field(None, description="Event end time") 
       inputs: Optional[Dict[str, Any]] = Field(None, description="Input data")
       outputs: Optional[Dict[str, Any]] = Field(None, description="Output data")
       metadata: Optional[Dict[str, Any]] = Field(None, description="Metadata")
       
       class Config:
           # Ensure datetime serialization
           json_encoders = {
               datetime: lambda v: v.isoformat() + "Z"
           }

**Event Batching**:

Events can be batched for efficient transmission:

.. code-block:: json

   {
     "batch_id": "batch_001",
     "project": "my-project",
     "events": [
       {
         "event_id": "evt_001",
         "event_type": "model",
         // ... event data
       },
       {
         "event_id": "evt_002", 
         "event_type": "tool",
         // ... event data
       }
     ],
     "metadata": {
       "batch_size": 2,
       "created_at": "2024-01-15T10:30:45.123Z"
     }
   }

Common Patterns
---------------

**Nested Events**:

Events can form hierarchies using ``parent_id``:

.. code-block:: json

   {
     "event_id": "evt_parent",
     "event_type": "chain",
     "event_name": "rag-pipeline",
     "parent_id": null
   }
   
   {
     "event_id": "evt_child_1", 
     "event_type": "tool",
     "event_name": "vector-search",
     "parent_id": "evt_parent"
   }
   
   {
     "event_id": "evt_child_2",
     "event_type": "model", 
     "event_name": "answer-generation",
     "parent_id": "evt_parent"
   }

**Event Correlation**:

Events can reference each other:

.. code-block:: json

   {
     "event_id": "evt_llm",
     "event_type": "model",
     "outputs": {"response": "Paris is the capital."}
   }
   
   {
     "event_id": "evt_eval",
     "event_type": "tool",
     "target_event_id": "evt_llm",
     "score": 0.95
   }

**Custom Event Types**:

Define domain-specific event types:

.. code-block:: python

   # Custom event for document processing
   custom_event = {
       "event_type": "chain",
       "event_name": "pdf-extraction", 
       "inputs": {
           "document_url": "https://example.com/doc.pdf",
           "extract_tables": True
       },
       "outputs": {
           "text_content": "...",
           "tables": [...],
           "page_count": 10
       },
       "metadata": {
           "processing_engine": "pdfplumber",
           "file_size_mb": 2.5
       }
   }

Best Practices
--------------

**Event Design Guidelines**:

1. **Descriptive Names**: Use clear, descriptive ``event_name`` values
2. **Consistent Types**: Standardize ``event_type`` values across your application
3. **Rich Context**: Include relevant ``metadata`` for debugging and analysis
4. **Structured Data**: Keep ``inputs`` and ``outputs`` well-structured
5. **Error Details**: Capture comprehensive error information when events fail
6. **Metrics**: Include relevant performance and business metrics
7. **Privacy**: Avoid capturing sensitive data in event fields

**Performance Considerations**:

1. **Field Size**: Keep individual fields reasonably sized (< 1MB recommended)
2. **Batch Events**: Use batching for high-volume scenarios
3. **Async Logging**: Log events asynchronously to avoid blocking operations
4. **Selective Capture**: Only capture necessary data to minimize overhead

See Also
--------

- :doc:`spans` - Span data models and formats
- :doc:`evaluations` - Evaluation data structures
- :doc:`../api/tracer` - HoneyHiveTracer API for creating events
- :doc:`../api/decorators` - Decorator-based event creation