Data Models
Pydantic models for experiment runs, results, and comparisons.
ExperimentRunStatus
- class ExperimentRunStatus
Enum representing the status of an experiment run.
Values:
PENDING- Run created but not startedRUNNING- Currently executingCOMPLETED- Finished successfullyFAILED- Execution failedCANCELLED- Manually cancelled
Usage:
from honeyhive.experiments import ExperimentRunStatus if result.status == ExperimentRunStatus.COMPLETED: print("Experiment finished!")
MetricDatapoints
- class MetricDatapoints
Model for tracking passed/failed datapoint IDs per metric.
Attributes:
passed(List[str]) - List of datapoint IDs that passed this metricfailed(List[str]) - List of datapoint IDs that failed this metric
MetricDetail
- class MetricDetail
Detailed information about a single metric result.
Attributes:
metric_name(str) - Name of the metricmetric_type(Optional[str]) - Type of metric (“numeric”, “boolean”, etc.)event_name(Optional[str]) - Name of the event that generated this metricevent_type(Optional[str]) - Type of event (“model”, “tool”, etc.)aggregate(Optional[Union[float, int, bool, str]]) - Aggregated value across all datapointsvalues(Optional[List[Any]]) - Individual values per datapointdatapoints(Optional[MetricDatapoints]) - Passed/failed datapoint tracking
Usage:
from honeyhive.experiments import get_run_result result = get_run_result(client, "run-123") # Get a specific metric detail accuracy = result.metrics.get_metric("accuracy_evaluator") if accuracy: print(f"Aggregate: {accuracy.aggregate}") print(f"Type: {accuracy.metric_type}")
DatapointMetric
- class DatapointMetric
Individual metric value for a single datapoint.
Attributes:
name(str) - Name of the metricevent_name(Optional[str]) - Name of the eventevent_type(Optional[str]) - Type of eventvalue(Optional[Union[float, int, bool, str]]) - Metric value for this datapointpassed(Optional[bool]) - Whether this metric passed for this datapoint
DatapointResult
- class DatapointResult
Result for a single datapoint in an experiment run.
Attributes:
datapoint_id(Optional[str]) - Unique identifier for the datapointsession_id(Optional[str]) - Session ID associated with this datapointpassed(Optional[bool]) - Whether all metrics passed for this datapointmetrics(Optional[List[DatapointMetric]]) - Individual metric results
Usage:
from honeyhive.experiments import get_run_result result = get_run_result(client, "run-123") for datapoint in result.datapoints: print(f"Datapoint: {datapoint.datapoint_id}") print(f"Passed: {datapoint.passed}") if datapoint.metrics: for metric in datapoint.metrics: print(f" {metric.name}: {metric.value}")
AggregatedMetrics
- class AggregatedMetrics
Aggregated experiment metrics with support for both new
detailsarray format and legacymodel_extraformat for backward compatibility.Attributes:
aggregation_function(Optional[str]) - Aggregation method used (“average”, “sum”, etc.)details(List[MetricDetail]) - List of metric details from backend (new format)
Methods:
- get_metric(metric_name: str) MetricDetail | Dict[str, Any] | None
Get value for a specific metric. Supports both new
detailsarray format (returnsMetricDetail) and legacymodel_extraformat (returns dict).- Parameters:
metric_name – Name of the metric
- Returns:
MetricDetail object, dict, or None if not found
- list_metrics() List[str]
List all available metric names.
- Returns:
List of metric names from
detailsarray ormodel_extrakeys
- get_all_metrics() Dict[str, MetricDetail | Dict[str, Any]]
Get all metrics as a dictionary.
- Returns:
Dictionary mapping metric names to MetricDetail objects or dicts
Usage:
from honeyhive.experiments import get_run_result result = get_run_result(client, "run-123") metrics = result.metrics # Get specific metric (returns MetricDetail with new format) accuracy = metrics.get_metric("accuracy_evaluator") if accuracy: # Access typed attributes print(f"Aggregate: {accuracy.aggregate}") print(f"Type: {accuracy.metric_type}") # List all metrics metric_names = metrics.list_metrics() # Get all as dict all_metrics = metrics.get_all_metrics()
ExperimentResultSummary
- class ExperimentResultSummary
Complete summary of an experiment run with aggregated results.
Attributes:
run_id(str) - Unique run identifierstatus(ExperimentRunStatus) - Current run statussuccess(bool) - Whether run completed successfullypassed(List[str]) - List of passed datapoint IDsfailed(List[str]) - List of failed datapoint IDsmetrics(AggregatedMetrics) - Aggregated evaluation metricsdatapoints(List[Any]) - Individual datapoint results
Usage:
from honeyhive.experiments import evaluate, evaluator @evaluator def my_evaluator(outputs, inputs, ground_truth): return {"score": 0.9} result = evaluate( function=my_function, dataset=test_data, evaluators=[my_evaluator], api_key="key", project="project" ) # Access summary fields print(f"Run ID: {result.run_id}") print(f"Status: {result.status}") print(f"Success: {result.success}") print(f"Passed: {len(result.passed)}") print(f"Failed: {len(result.failed)}") # Access metrics avg_score = result.metrics.get_metric("my_evaluator") print(f"Average score: {avg_score}")
RunComparisonResult
- class RunComparisonResult
Result of comparing two experiment runs.
Attributes:
new_run_id(str) - ID of the new runold_run_id(str) - ID of the old runcommon_datapoints(int) - Count of datapoints in both runsnew_only_datapoints(int) - Count of datapoints only in new runold_only_datapoints(int) - Count of datapoints only in old runmetric_deltas(Dict[str, Any]) - Per-metric comparison data
Methods:
- get_metric_delta(metric_name: str) Dict[str, Any] | None
Get comparison data for a specific metric.
- Parameters:
metric_name – Name of the metric
- Returns:
Dict with delta information or None
Returns dict with keys:
old_aggregate- Old run’s aggregated valuenew_aggregate- New run’s aggregated valueimproved_count- Number of improved datapointsdegraded_count- Number of degraded datapointsimproved- List of improved datapoint IDsdegraded- List of degraded datapoint IDs
- list_improved_metrics() List[str]
List metrics that improved in the new run.
- Returns:
List of metric names with improved_count > 0
- list_degraded_metrics() List[str]
List metrics that degraded in the new run.
- Returns:
List of metric names with degraded_count > 0
Usage:
from honeyhive.experiments import compare_runs comparison = compare_runs( client=client, new_run_id="run-new", old_run_id="run-old" ) # Overview print(f"Common datapoints: {comparison.common_datapoints}") print(f"New datapoints: {comparison.new_only_datapoints}") print(f"Old datapoints: {comparison.old_only_datapoints}") # Metric analysis improved = comparison.list_improved_metrics() degraded = comparison.list_degraded_metrics() print(f"Improved: {improved}") print(f"Degraded: {degraded}") # Detailed metric delta accuracy_delta = comparison.get_metric_delta("accuracy") if accuracy_delta: print(f"Old: {accuracy_delta['old_aggregate']}") print(f"New: {accuracy_delta['new_aggregate']}") print(f"Improved datapoints: {len(accuracy_delta['improved'])}")
See Also
Core Functions - Functions that return these models
Results Retrieval - Retrieve and compare results
Evaluation & Analysis Guides - Tutorial