Utility Functions
Helper functions for dataset preparation and ID generation.
generate_external_dataset_id()
- generate_external_dataset_id(datapoints, custom_id=None)
Generate a unique EXT- prefixed dataset ID for client-side datasets.
- Parameters:
- Returns:
EXT- prefixed dataset ID
- Return type:
Usage:
from honeyhive.experiments import generate_external_dataset_id dataset = [ {"inputs": {"x": 1}, "ground_truth": {"y": 2}}, {"inputs": {"x": 2}, "ground_truth": {"y": 4}}, ] dataset_id = generate_external_dataset_id(dataset) print(dataset_id) # e.g., "EXT-a1b2c3d4e5f6"
With Custom ID:
dataset_id = generate_external_dataset_id(dataset, custom_id="my-test") print(dataset_id) # e.g., "EXT-my-test-a1b2c3d4"
generate_external_datapoint_id()
- generate_external_datapoint_id(datapoint, index, custom_id=None)
Generate a unique EXT- prefixed datapoint ID.
- Parameters:
- Returns:
EXT- prefixed datapoint ID
- Return type:
Usage:
from honeyhive.experiments import generate_external_datapoint_id datapoint = {"inputs": {"x": 1}, "ground_truth": {"y": 2}} dp_id = generate_external_datapoint_id(datapoint, index=0) print(dp_id) # e.g., "EXT-d1e2f3a4b5c6"
prepare_external_dataset()
- prepare_external_dataset(datapoints, custom_dataset_id=None)
Prepare a list of datapoints for an external dataset.
Ensures all datapoints have EXT- prefixed IDs and generates a dataset ID if not provided.
- Parameters:
- Returns:
Tuple of (dataset_id, list of datapoint_ids)
- Return type:
Usage:
from honeyhive.experiments import prepare_external_dataset dataset = [ {"inputs": {"query": "Q1"}, "ground_truth": {"answer": "A1"}}, {"inputs": {"query": "Q2"}, "ground_truth": {"answer": "A2"}}, ] dataset_id, datapoint_ids = prepare_external_dataset(dataset) print(f"Dataset ID: {dataset_id}") print(f"Datapoint IDs: {datapoint_ids}") # Output: # Dataset ID: EXT-abc123def456 # Datapoint IDs: ['EXT-dp1hash', 'EXT-dp2hash']
prepare_run_request_data()
- prepare_run_request_data(run_data, datapoint_ids=None)
Prepare experiment run request data for backend submission.
Transforms EXT- prefixed dataset_id to metadata.offline_dataset_id as required by the backend.
- Parameters:
- Returns:
Transformed run data ready for backend
- Return type:
Dict[str, Any]
Note
This is typically used internally by
evaluate(). Most users don’t need to call this directly.Usage:
from honeyhive.experiments import prepare_run_request_data run_data = { "name": "my-experiment", "project": "my-project", "dataset_id": "EXT-abc123", "event_ids": [] } prepared = prepare_run_request_data(run_data) # EXT- dataset_id moved to metadata print(prepared["dataset_id"]) # None print(prepared["metadata"]["offline_dataset_id"]) # "EXT-abc123"
See Also
Core Functions - Use these utilities via evaluate()
Evaluation & Analysis Guides - Tutorial