Utility Functions

Helper functions for dataset preparation and ID generation.

generate_external_dataset_id()

generate_external_dataset_id(datapoints, custom_id=None)

Generate a unique EXT- prefixed dataset ID for client-side datasets.

Parameters:
  • datapoints (List[Dict[str, Any]]) – List of datapoints

  • custom_id (Optional[str]) – Optional custom suffix for the ID

Returns:

EXT- prefixed dataset ID

Return type:

str

Usage:

from honeyhive.experiments import generate_external_dataset_id

dataset = [
    {"inputs": {"x": 1}, "ground_truth": {"y": 2}},
    {"inputs": {"x": 2}, "ground_truth": {"y": 4}},
]

dataset_id = generate_external_dataset_id(dataset)
print(dataset_id)  # e.g., "EXT-a1b2c3d4e5f6"

With Custom ID:

dataset_id = generate_external_dataset_id(dataset, custom_id="my-test")
print(dataset_id)  # e.g., "EXT-my-test-a1b2c3d4"

generate_external_datapoint_id()

generate_external_datapoint_id(datapoint, index, custom_id=None)

Generate a unique EXT- prefixed datapoint ID.

Parameters:
  • datapoint (Dict[str, Any]) – Datapoint dictionary

  • index (int) – Index of datapoint in dataset

  • custom_id (Optional[str]) – Optional custom suffix

Returns:

EXT- prefixed datapoint ID

Return type:

str

Usage:

from honeyhive.experiments import generate_external_datapoint_id

datapoint = {"inputs": {"x": 1}, "ground_truth": {"y": 2}}

dp_id = generate_external_datapoint_id(datapoint, index=0)
print(dp_id)  # e.g., "EXT-d1e2f3a4b5c6"

prepare_external_dataset()

prepare_external_dataset(datapoints, custom_dataset_id=None)

Prepare a list of datapoints for an external dataset.

Ensures all datapoints have EXT- prefixed IDs and generates a dataset ID if not provided.

Parameters:
  • datapoints (List[Dict[str, Any]]) – List of datapoints

  • custom_dataset_id (Optional[str]) – Optional custom dataset ID

Returns:

Tuple of (dataset_id, list of datapoint_ids)

Return type:

Tuple[str, List[str]]

Usage:

from honeyhive.experiments import prepare_external_dataset

dataset = [
    {"inputs": {"query": "Q1"}, "ground_truth": {"answer": "A1"}},
    {"inputs": {"query": "Q2"}, "ground_truth": {"answer": "A2"}},
]

dataset_id, datapoint_ids = prepare_external_dataset(dataset)

print(f"Dataset ID: {dataset_id}")
print(f"Datapoint IDs: {datapoint_ids}")

# Output:
# Dataset ID: EXT-abc123def456
# Datapoint IDs: ['EXT-dp1hash', 'EXT-dp2hash']

prepare_run_request_data()

prepare_run_request_data(run_data, datapoint_ids=None)

Prepare experiment run request data for backend submission.

Transforms EXT- prefixed dataset_id to metadata.offline_dataset_id as required by the backend.

Parameters:
  • run_data (Dict[str, Any]) – Run data dictionary

  • datapoint_ids (Optional[List[str]]) – Optional list of datapoint IDs

Returns:

Transformed run data ready for backend

Return type:

Dict[str, Any]

Note

This is typically used internally by evaluate(). Most users don’t need to call this directly.

Usage:

from honeyhive.experiments import prepare_run_request_data

run_data = {
    "name": "my-experiment",
    "project": "my-project",
    "dataset_id": "EXT-abc123",
    "event_ids": []
}

prepared = prepare_run_request_data(run_data)

# EXT- dataset_id moved to metadata
print(prepared["dataset_id"])  # None
print(prepared["metadata"]["offline_dataset_id"])  # "EXT-abc123"

See Also