Experiments

Run, retrieve, and compare evaluation runs to measure how prompt or configuration changes affect agent performance.

`list-runs`

Get a list of evaluation runs

List experiment runs with optional filtering by dataset, status, name, date range, and specific run IDs. Results are paginated and sortable.

Usage

honeyhive experiments list-runs [options]

Options

Flag	Type	Required	Description
`--dataset-id`	string	no	Filter by dataset ID
`--date-range`	json	no	Filter by date range
`--limit`	number	no	Number of results per page
`--name`	string	no	Filter by run name
`--page`	number	no	Page number for pagination
`--run-ids`	json	no	List of specific run IDs to fetch
`--sort-by`	string	no	Field to sort by Allowed: `created_at`, `updated_at`, `name`, `status`.
`--sort-order`	string	no	Sort order Allowed: `asc`, `desc`.
`--status`	string	no	Filter by run status Allowed: `pending`, `completed`, `failed`, `cancelled`, `running`.

`create-run`

Create a new evaluation run

Create a new experiment run to track an evaluation against a dataset.

Usage

honeyhive experiments create-run [options]

Options

Flag	Type	Required	Description
`--configuration`	json	no	configuration
`--datapoint-ids`	json	no	datapoint_ids
`--dataset-id`	string	no	dataset_id
`--description`	string	no	description
`--evaluators`	json	no	evaluators
`--event-ids`	json	no	event_ids
`--metadata`	json	no	metadata
`--name`	string	no	name
`--passing-ranges`	json	no	passing_ranges
`--results`	json	no	results
`--run-id`	string	no	run_id
`--session-ids`	json	no	session_ids
`--status`	string	no	status Allowed: `pending`, `completed`, `failed`, `cancelled`, `running`.

`get-runs-schema`

Get events schema across all experiment runs in a project

Retrieve the aggregated events schema (fields, datasets, mappings) across all experiment runs in the project.

Usage

honeyhive experiments get-runs-schema [options]

Options

Flag	Type	Required	Description
`--date-range`	json	no	Filter by date range

`get-run`

Get details of an evaluation run

Retrieve the full details of a single experiment run by its run ID.

Usage

honeyhive experiments get-run [options]

Options

Flag	Type	Required	Description
`--run-id`	string	yes	run_id

`update-run`

Update an evaluation run

Update fields on an existing experiment run such as name, status, metadata, or results.

Usage

honeyhive experiments update-run [options]

Options

Flag	Type	Required	Description
`--run-id`	string	yes	run_id
`--configuration`	json	no	configuration
`--datapoint-ids`	json	no	datapoint_ids
`--description`	string	no	description
`--evaluators`	json	no	evaluators
`--event-ids`	json	no	event_ids
`--metadata`	json	no	metadata
`--name`	string	no	name
`--passing-ranges`	json	no	passing_ranges
`--results`	json	no	results
`--session-ids`	json	no	session_ids
`--status`	string	no	status Allowed: `pending`, `completed`, `failed`, `cancelled`, `running`.

`delete-run`

Delete an evaluation run

Permanently delete an experiment run by its run ID.

Usage

honeyhive experiments delete-run [options]

Options

Flag	Type	Required	Description
`--run-id`	string	yes	run_id

`get-run-schema`

Get events schema for a single experiment run

Retrieve the events schema (fields, datasets, mappings) for a single experiment run.

Usage

honeyhive experiments get-run-schema [options]

Options

Flag	Type	Required	Description
`--run-id`	string	yes	Experiment run ID (UUIDv4)
`--date-range`	json	no	Filter by date range

`get-run-metrics`

Get event metrics for an experiment run

Retrieve event metrics from ClickHouse for a specific experiment run

Usage

honeyhive experiments get-run-metrics [options]

Options

Flag	Type	Required	Description
`--run-id`	string	yes	Experiment run ID (UUIDv4)
`--date-range`	string	no	Date range filter as JSON string
`--filters`	json	no	Optional filters to apply (JSON string or array of filter objects)

`compare-runs`

Retrieve experiment comparison

Compare metrics and results between two experiment runs

Usage

honeyhive experiments compare-runs [options]

Options

Flag	Type	Required	Description
`--new-run-id`	string	yes	New experiment run ID to compare (UUIDv4)
`--old-run-id`	string	yes	Old experiment run ID to compare against (UUIDv4)
`--aggregate-function`	string	no	Aggregation function to apply to metrics Allowed: `average`, `min`, `max`, `median`, `p95`, `p99`, `p90`, `sum`, `count`.
`--filters`	json	no	Optional filters to apply (JSON string or array of filter objects)

`compare-run-events`

Compare events between two experiment runs

Retrieve and compare events between two experiment runs for detailed analysis

Usage

honeyhive experiments compare-run-events [options]

Options

Flag	Type	Required	Description
`--new-run-id`	string	yes	New experiment run ID (UUIDv4)
`--old-run-id`	string	yes	Old experiment run ID to compare against (UUIDv4)
`--event-name`	string	no	Filter by event name
`--event-type`	string	no	Filter by event type
`--filter`	json	no	Additional filter criteria (JSON string or object)
`--limit`	number	no	Maximum number of results
`--page`	number	no	Page number for pagination

Experiments ​

list-runs ​

Usage ​

Options ​

create-run ​

Usage ​

Options ​

get-runs-schema ​

Usage ​

Options ​

get-run ​

Usage ​

Options ​

update-run ​

Usage ​

Options ​

delete-run ​

Usage ​

Options ​

get-run-schema ​

Usage ​

Options ​

get-run-metrics ​

Usage ​

Options ​

compare-runs ​

Usage ​

Options ​

compare-run-events ​

Usage ​

Options ​

Experiments

`list-runs`

Usage

Options

`create-run`

Usage

Options

`get-runs-schema`

Usage

Options

`get-run`

Usage

Options

`update-run`

Usage

Options

`delete-run`

Usage

Options

`get-run-schema`

Usage

Options

`get-run-metrics`

Usage

Options

`compare-runs`

Usage

Options

`compare-run-events`

Usage

Options