Event Data Models ================= .. note:: **Technical specification for HoneyHive event data structures** This document defines the exact data models and formats used for events in the HoneyHive SDK. Events are the core observability primitives in HoneyHive, representing discrete operations or interactions in your LLM application. Core Event Model ---------------- .. py:class:: Event The primary event data structure used throughout HoneyHive. .. py:attribute:: event_id :type: str Unique identifier for the event. **Format**: UUID v4 string **Example**: ``"01234567-89ab-cdef-0123-456789abcdef"`` **Required**: Auto-generated by SDK .. py:attribute:: session_id :type: str Session identifier that groups related events. **Format**: UUID v4 string **Example**: ``"session-01234567-89ab-cdef-0123-456789abcdef"`` **Required**: Auto-generated by tracer .. py:attribute:: parent_id :type: Optional[str] Parent event ID for nested operations. **Format**: UUID v4 string **Example**: ``"parent-01234567-89ab-cdef-0123-456789abcdef"`` **Required**: No (None for root events) .. py:attribute:: event_type :type: str Categorizes the type of operation. **Valid Values**: - ``"model"`` - LLM model calls and interactions - ``"tool"`` - Tool/function calls and external API interactions - ``"chain"`` - Chain/workflow operations and multi-step processes **Example**: ``"model"`` **Required**: Yes .. py:attribute:: event_name :type: str Human-readable name for the specific operation. **Format**: Descriptive string, typically kebab-case **Example**: ``"openai-chat-completion"`` **Required**: Yes .. py:attribute:: start_time :type: datetime ISO 8601 timestamp when the event started. **Format**: ``YYYY-MM-DDTHH:MM:SS.fffffZ`` **Example**: ``"2024-01-15T10:30:45.123456Z"`` **Required**: Auto-generated by SDK .. py:attribute:: end_time :type: Optional[datetime] ISO 8601 timestamp when the event completed. **Format**: ``YYYY-MM-DDTHH:MM:SS.fffffZ`` **Example**: ``"2024-01-15T10:30:47.654321Z"`` **Required**: Auto-generated by SDK .. py:attribute:: duration_ms :type: Optional[float] Event duration in milliseconds. **Calculation**: ``end_time - start_time`` in milliseconds **Example**: ``2531.065`` **Required**: Auto-calculated by SDK .. py:attribute:: status :type: str Event completion status. **Values**: - ``"success"`` - Completed successfully - ``"error"`` - Failed with error - ``"cancelled"`` - Cancelled before completion - ``"timeout"`` - Timed out **Example**: ``"success"`` **Required**: Auto-determined by SDK .. py:attribute:: inputs :type: Optional[Dict[str, Any]] Input data for the operation. **Structure**: Key-value pairs of input parameters **Example**: .. code-block:: json { "messages": [ {"role": "user", "content": "Hello, world!"} ], "model": "gpt-3.5-turbo", "temperature": 0.7, "max_tokens": 150 } **Required**: No (but recommended) .. py:attribute:: outputs :type: Optional[Dict[str, Any]] Output data from the operation. **Structure**: Key-value pairs of output data **Example**: .. code-block:: json { "choices": [ { "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 9, "total_tokens": 21 } } **Required**: No (but recommended) .. py:attribute:: metadata :type: Optional[Dict[str, Any]] Additional context and metadata. **Structure**: Key-value pairs of contextual information **Example**: .. code-block:: json { "user_id": "user_12345", "environment": "production", "model_version": "gpt-3.5-turbo-0613", "request_id": "req_abc123", "tags": ["chat", "customer-support"] } **Required**: No .. py:attribute:: metrics :type: Optional[Dict[str, Union[int, float]]] Numerical metrics associated with the event. **Structure**: Key-value pairs of numeric measurements **Example**: .. code-block:: json { "latency_ms": 1250.5, "token_count": 21, "cost_usd": 0.0001, "cache_hit_rate": 0.85, "confidence_score": 0.92 } **Required**: No .. py:attribute:: error :type: Optional[Dict[str, Any]] Error information if the event failed. **Structure**: .. code-block:: json { "type": "OpenAIError", "message": "Rate limit exceeded", "code": "rate_limit_exceeded", "traceback": "Traceback (most recent call last)...", "context": { "retry_after": 60, "request_id": "req_123" } } **Required**: No (only for failed events) .. py:attribute:: project :type: str Project identifier for organization. **Format**: String identifier **Example**: ``"customer-chat-bot"`` **Required**: Yes (set by tracer) .. py:attribute:: source :type: str Source system or component identifier. **Format**: String identifier **Example**: ``"chat-service"`` **Required**: Yes (set by tracer) .. py:attribute:: user_properties :type: Optional[Dict[str, Any]] User-defined custom properties. **Structure**: Flexible key-value pairs **Example**: .. code-block:: json { "experiment_id": "exp_001", "feature_flags": ["new_ui", "beta_model"], "user_tier": "premium", "custom_field": "custom_value" } **Required**: No LLM Event Model --------------- .. py:class:: LLMEvent Specialized event model for LLM operations, extends base Event model. **Inherits**: All fields from :py:class:`Event` **LLM-Specific Fields**: .. py:attribute:: model :type: str LLM model identifier. **Examples**: ``"gpt-3.5-turbo"``, ``"claude-3-sonnet-20240229"``, ``"llama-2-70b"`` **Required**: Yes for LLM events .. py:attribute:: provider :type: str LLM provider/service. **Values**: ``"openai"``, ``"anthropic"``, ``"google"``, ``"azure"``, ``"local"`` **Required**: Yes for LLM events .. py:attribute:: prompt_template :type: Optional[str] Template used to generate the prompt. **Example**: ``"Answer the following question: {question}"`` **Required**: No .. py:attribute:: prompt_variables :type: Optional[Dict[str, Any]] Variables used in prompt template. **Example**: ``{"question": "What is the capital of France?"}`` **Required**: No .. py:attribute:: response_format :type: Optional[str] Expected response format. **Values**: ``"text"``, ``"json"``, ``"function_call"`` **Required**: No .. py:attribute:: tools :type: Optional[List[Dict[str, Any]]] Available tools/functions for the LLM. **Structure**: OpenAI function calling format **Required**: No .. py:attribute:: tool_calls :type: Optional[List[Dict[str, Any]]] Tool calls made by the LLM. **Structure**: OpenAI tool call format **Required**: No **Example LLM Event**: .. code-block:: json { "event_id": "evt_01234567", "session_id": "session_abcdef", "event_type": "model", "event_name": "openai-chat-completion", "start_time": "2024-01-15T10:30:45.123Z", "end_time": "2024-01-15T10:30:47.654Z", "duration_ms": 2531.0, "status": "success", "model": "gpt-3.5-turbo", "provider": "openai", "inputs": { "messages": [ {"role": "user", "content": "What is the capital of France?"} ], "temperature": 0.7, "max_tokens": 50 }, "outputs": { "choices": [ { "message": { "role": "assistant", "content": "The capital of France is Paris." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 8, "total_tokens": 20 } }, "metrics": { "latency_ms": 2531.0, "tokens_per_second": 3.16, "cost_usd": 0.00004 } } Tool Event Model ---------------- .. py:class:: ToolEvent Event model for tool/function calls. **Inherits**: All fields from :py:class:`Event` **Tool-Specific Fields**: .. py:attribute:: function_name :type: str Name of the function/tool called. **Example**: ``"get_weather"`` **Required**: Yes for tool events .. py:attribute:: function_description :type: Optional[str] Description of the function's purpose. **Example**: ``"Get current weather for a location"`` **Required**: No .. py:attribute:: parameters :type: Optional[Dict[str, Any]] Function parameters schema. **Structure**: JSON Schema format **Required**: No .. py:attribute:: return_value :type: Optional[Any] Function return value. **Structure**: Any valid JSON value **Required**: No **Example Tool Event**: .. code-block:: json { "event_id": "evt_tool_001", "session_id": "session_abcdef", "event_type": "tool", "event_name": "weather-api-call", "function_name": "get_weather", "inputs": { "location": "Paris, France", "units": "celsius" }, "outputs": { "temperature": 22, "conditions": "sunny", "humidity": 65 }, "metrics": { "api_latency_ms": 150.5 } } Evaluation Event Model ---------------------- .. py:class:: EvaluationEvent Event model for evaluation operations. **Inherits**: All fields from :py:class:`Event` **Evaluation-Specific Fields**: .. py:attribute:: evaluator_name :type: str Name of the evaluator used. **Example**: ``"factual_accuracy"`` **Required**: Yes for evaluation events .. py:attribute:: evaluator_version :type: Optional[str] Version of the evaluator. **Example**: ``"v1.2.0"`` **Required**: No .. py:attribute:: target_event_id :type: str ID of the event being evaluated. **Format**: UUID v4 string **Required**: Yes for evaluation events .. py:attribute:: score :type: Optional[Union[float, int, bool]] Evaluation score. **Examples**: ``0.85``, ``True``, ``7`` **Required**: No .. py:attribute:: explanation :type: Optional[str] Human-readable explanation of the score. **Example**: ``"Response is factually accurate and well-supported"`` **Required**: No .. py:attribute:: criteria :type: Optional[Dict[str, Any]] Evaluation criteria used. **Structure**: Evaluator-specific criteria **Required**: No **Example Evaluation Event**: .. code-block:: json { "event_id": "evt_eval_001", "session_id": "session_abcdef", "event_type": "tool", "event_name": "factual-accuracy-check", "evaluator_name": "factual_accuracy", "target_event_id": "evt_01234567", "score": 0.92, "explanation": "Response contains accurate information with proper citations", "metrics": { "confidence": 0.95, "processing_time_ms": 1200 } } Event Serialization ------------------- **JSON Format**: Events are serialized to JSON for storage and transmission: .. code-block:: python import json from datetime import datetime # Event serialization event = { "event_id": "evt_123", "event_type": "model", "start_time": datetime.utcnow().isoformat() + "Z", # ... other fields } json_data = json.dumps(event, ensure_ascii=False, indent=2) **Field Validation**: All events undergo validation before transmission: .. code-block:: python from pydantic import BaseModel, Field from typing import Optional, Dict, Any from datetime import datetime class EventModel(BaseModel): event_id: str = Field(..., description="Unique event identifier") event_type: str = Field(..., description="Type of event") event_name: str = Field(..., description="Human-readable event name") start_time: datetime = Field(..., description="Event start time") end_time: Optional[datetime] = Field(None, description="Event end time") inputs: Optional[Dict[str, Any]] = Field(None, description="Input data") outputs: Optional[Dict[str, Any]] = Field(None, description="Output data") metadata: Optional[Dict[str, Any]] = Field(None, description="Metadata") class Config: # Ensure datetime serialization json_encoders = { datetime: lambda v: v.isoformat() + "Z" } **Event Batching**: Events can be batched for efficient transmission: .. code-block:: json { "batch_id": "batch_001", "project": "my-project", "events": [ { "event_id": "evt_001", "event_type": "model", // ... event data }, { "event_id": "evt_002", "event_type": "tool", // ... event data } ], "metadata": { "batch_size": 2, "created_at": "2024-01-15T10:30:45.123Z" } } Common Patterns --------------- **Nested Events**: Events can form hierarchies using ``parent_id``: .. code-block:: json { "event_id": "evt_parent", "event_type": "chain", "event_name": "rag-pipeline", "parent_id": null } { "event_id": "evt_child_1", "event_type": "tool", "event_name": "vector-search", "parent_id": "evt_parent" } { "event_id": "evt_child_2", "event_type": "model", "event_name": "answer-generation", "parent_id": "evt_parent" } **Event Correlation**: Events can reference each other: .. code-block:: json { "event_id": "evt_llm", "event_type": "model", "outputs": {"response": "Paris is the capital."} } { "event_id": "evt_eval", "event_type": "tool", "target_event_id": "evt_llm", "score": 0.95 } **Custom Event Types**: Define domain-specific event types: .. code-block:: python # Custom event for document processing custom_event = { "event_type": "chain", "event_name": "pdf-extraction", "inputs": { "document_url": "https://example.com/doc.pdf", "extract_tables": True }, "outputs": { "text_content": "...", "tables": [...], "page_count": 10 }, "metadata": { "processing_engine": "pdfplumber", "file_size_mb": 2.5 } } Best Practices -------------- **Event Design Guidelines**: 1. **Descriptive Names**: Use clear, descriptive ``event_name`` values 2. **Consistent Types**: Standardize ``event_type`` values across your application 3. **Rich Context**: Include relevant ``metadata`` for debugging and analysis 4. **Structured Data**: Keep ``inputs`` and ``outputs`` well-structured 5. **Error Details**: Capture comprehensive error information when events fail 6. **Metrics**: Include relevant performance and business metrics 7. **Privacy**: Avoid capturing sensitive data in event fields **Performance Considerations**: 1. **Field Size**: Keep individual fields reasonably sized (< 1MB recommended) 2. **Batch Events**: Use batching for high-volume scenarios 3. **Async Logging**: Log events asynchronously to avoid blocking operations 4. **Selective Capture**: Only capture necessary data to minimize overhead See Also -------- - :doc:`spans` - Span data models and formats - :doc:`evaluations` - Evaluation data structures - :doc:`../api/tracer` - HoneyHiveTracer API for creating events - :doc:`../api/decorators` - Decorator-based event creation