Data Models
===========

Pydantic models for experiment runs, results, and comparisons.

ExperimentRunStatus
-------------------

.. py:class:: ExperimentRunStatus

   Enum representing the status of an experiment run.

   **Values:**

   - ``PENDING`` - Run created but not started
   - ``RUNNING`` - Currently executing
   - ``COMPLETED`` - Finished successfully
   - ``FAILED`` - Execution failed
   - ``CANCELLED`` - Manually cancelled

   **Usage:**

   .. code-block:: python

      from honeyhive.experiments import ExperimentRunStatus
      
      if result.status == ExperimentRunStatus.COMPLETED:
          print("Experiment finished!")

MetricDatapoints
----------------

.. py:class:: MetricDatapoints

   Model for tracking passed/failed datapoint IDs per metric.

   **Attributes:**

   - ``passed`` (List[str]) - List of datapoint IDs that passed this metric
   - ``failed`` (List[str]) - List of datapoint IDs that failed this metric

MetricDetail
------------

.. py:class:: MetricDetail

   Detailed information about a single metric result.

   **Attributes:**

   - ``metric_name`` (str) - Name of the metric
   - ``metric_type`` (Optional[str]) - Type of metric ("numeric", "boolean", etc.)
   - ``event_name`` (Optional[str]) - Name of the event that generated this metric
   - ``event_type`` (Optional[str]) - Type of event ("model", "tool", etc.)
   - ``aggregate`` (Optional[Union[float, int, bool, str]]) - Aggregated value across all datapoints
   - ``values`` (Optional[List[Any]]) - Individual values per datapoint
   - ``datapoints`` (Optional[MetricDatapoints]) - Passed/failed datapoint tracking

   **Usage:**

   .. code-block:: python

      from honeyhive.experiments import get_run_result
      
      result = get_run_result(client, "run-123")
      
      # Get a specific metric detail
      accuracy = result.metrics.get_metric("accuracy_evaluator")
      if accuracy:
          print(f"Aggregate: {accuracy.aggregate}")
          print(f"Type: {accuracy.metric_type}")

DatapointMetric
---------------

.. py:class:: DatapointMetric

   Individual metric value for a single datapoint.

   **Attributes:**

   - ``name`` (str) - Name of the metric
   - ``event_name`` (Optional[str]) - Name of the event
   - ``event_type`` (Optional[str]) - Type of event
   - ``value`` (Optional[Union[float, int, bool, str]]) - Metric value for this datapoint
   - ``passed`` (Optional[bool]) - Whether this metric passed for this datapoint

DatapointResult
---------------

.. py:class:: DatapointResult

   Result for a single datapoint in an experiment run.

   **Attributes:**

   - ``datapoint_id`` (Optional[str]) - Unique identifier for the datapoint
   - ``session_id`` (Optional[str]) - Session ID associated with this datapoint
   - ``passed`` (Optional[bool]) - Whether all metrics passed for this datapoint
   - ``metrics`` (Optional[List[DatapointMetric]]) - Individual metric results

   **Usage:**

   .. code-block:: python

      from honeyhive.experiments import get_run_result
      
      result = get_run_result(client, "run-123")
      
      for datapoint in result.datapoints:
          print(f"Datapoint: {datapoint.datapoint_id}")
          print(f"Passed: {datapoint.passed}")
          if datapoint.metrics:
              for metric in datapoint.metrics:
                  print(f"  {metric.name}: {metric.value}")

AggregatedMetrics
-----------------

.. py:class:: AggregatedMetrics

   Aggregated experiment metrics with support for both new ``details`` array format
   and legacy ``model_extra`` format for backward compatibility.

   **Attributes:**

   - ``aggregation_function`` (Optional[str]) - Aggregation method used ("average", "sum", etc.)
   - ``details`` (List[MetricDetail]) - List of metric details from backend (new format)

   **Methods:**

   .. py:method:: get_metric(metric_name: str) -> Optional[Union[MetricDetail, Dict[str, Any]]]

      Get value for a specific metric. Supports both new ``details`` array format
      (returns ``MetricDetail``) and legacy ``model_extra`` format (returns dict).

      :param metric_name: Name of the metric
      :returns: MetricDetail object, dict, or None if not found

   .. py:method:: list_metrics() -> List[str]

      List all available metric names.

      :returns: List of metric names from ``details`` array or ``model_extra`` keys

   .. py:method:: get_all_metrics() -> Dict[str, Union[MetricDetail, Dict[str, Any]]]

      Get all metrics as a dictionary.

      :returns: Dictionary mapping metric names to MetricDetail objects or dicts

   **Usage:**

   .. code-block:: python

      from honeyhive.experiments import get_run_result
      
      result = get_run_result(client, "run-123")
      metrics = result.metrics
      
      # Get specific metric (returns MetricDetail with new format)
      accuracy = metrics.get_metric("accuracy_evaluator")
      if accuracy:
          # Access typed attributes
          print(f"Aggregate: {accuracy.aggregate}")
          print(f"Type: {accuracy.metric_type}")
      
      # List all metrics
      metric_names = metrics.list_metrics()
      
      # Get all as dict
      all_metrics = metrics.get_all_metrics()

ExperimentResultSummary
-----------------------

.. py:class:: ExperimentResultSummary

   Complete summary of an experiment run with aggregated results.

   **Attributes:**

   - ``run_id`` (str) - Unique run identifier
   - ``status`` (ExperimentRunStatus) - Current run status
   - ``success`` (bool) - Whether run completed successfully
   - ``passed`` (List[str]) - List of passed datapoint IDs
   - ``failed`` (List[str]) - List of failed datapoint IDs
   - ``metrics`` (AggregatedMetrics) - Aggregated evaluation metrics
   - ``datapoints`` (List[Any]) - Individual datapoint results

   **Usage:**

   .. code-block:: python

      from honeyhive.experiments import evaluate, evaluator
      
      @evaluator
      def my_evaluator(outputs, inputs, ground_truth):
          return {"score": 0.9}
      
      result = evaluate(
          function=my_function,
          dataset=test_data,
          evaluators=[my_evaluator],
          api_key="key",
          project="project"
      )
      
      # Access summary fields
      print(f"Run ID: {result.run_id}")
      print(f"Status: {result.status}")
      print(f"Success: {result.success}")
      print(f"Passed: {len(result.passed)}")
      print(f"Failed: {len(result.failed)}")
      
      # Access metrics
      avg_score = result.metrics.get_metric("my_evaluator")
      print(f"Average score: {avg_score}")

RunComparisonResult
-------------------

.. py:class:: RunComparisonResult

   Result of comparing two experiment runs.

   **Attributes:**

   - ``new_run_id`` (str) - ID of the new run
   - ``old_run_id`` (str) - ID of the old run
   - ``common_datapoints`` (int) - Count of datapoints in both runs
   - ``new_only_datapoints`` (int) - Count of datapoints only in new run
   - ``old_only_datapoints`` (int) - Count of datapoints only in old run
   - ``metric_deltas`` (Dict[str, Any]) - Per-metric comparison data

   **Methods:**

   .. py:method:: get_metric_delta(metric_name: str) -> Optional[Dict[str, Any]]

      Get comparison data for a specific metric.

      :param metric_name: Name of the metric
      :returns: Dict with delta information or None

      Returns dict with keys:
      
      - ``old_aggregate`` - Old run's aggregated value
      - ``new_aggregate`` - New run's aggregated value
      - ``improved_count`` - Number of improved datapoints
      - ``degraded_count`` - Number of degraded datapoints
      - ``improved`` - List of improved datapoint IDs
      - ``degraded`` - List of degraded datapoint IDs

   .. py:method:: list_improved_metrics() -> List[str]

      List metrics that improved in the new run.

      :returns: List of metric names with improved_count > 0

   .. py:method:: list_degraded_metrics() -> List[str]

      List metrics that degraded in the new run.

      :returns: List of metric names with degraded_count > 0

   **Usage:**

   .. code-block:: python

      from honeyhive.experiments import compare_runs
      
      comparison = compare_runs(
          client=client,
          new_run_id="run-new",
          old_run_id="run-old"
      )
      
      # Overview
      print(f"Common datapoints: {comparison.common_datapoints}")
      print(f"New datapoints: {comparison.new_only_datapoints}")
      print(f"Old datapoints: {comparison.old_only_datapoints}")
      
      # Metric analysis
      improved = comparison.list_improved_metrics()
      degraded = comparison.list_degraded_metrics()
      
      print(f"Improved: {improved}")
      print(f"Degraded: {degraded}")
      
      # Detailed metric delta
      accuracy_delta = comparison.get_metric_delta("accuracy")
      if accuracy_delta:
          print(f"Old: {accuracy_delta['old_aggregate']}")
          print(f"New: {accuracy_delta['new_aggregate']}")
          print(f"Improved datapoints: {len(accuracy_delta['improved'])}")

See Also
--------

- :doc:`core-functions` - Functions that return these models
- :doc:`results` - Retrieve and compare results
- :doc:`../../../how-to/evaluation/index` - Tutorial