Skip to content

Experiments

Run, retrieve, and compare evaluation runs to measure how prompt or configuration changes affect agent performance.

list-runs

Get a list of evaluation runs

List experiment runs with optional filtering by dataset, status, name, date range, and specific run IDs. Results are paginated and sortable.

Usage

sh
honeyhive experiments list-runs [options]

Options

FlagTypeRequiredDescription
--dataset-idstringnoFilter by dataset ID
--date-rangejsonnoFilter by date range
--limitnumbernoNumber of results per page
--namestringnoFilter by run name
--pagenumbernoPage number for pagination
--run-idsjsonnoList of specific run IDs to fetch
--sort-bystringnoField to sort by Allowed: created_at, updated_at, name, status.
--sort-orderstringnoSort order Allowed: asc, desc.
--statusstringnoFilter by run status Allowed: pending, completed, failed, cancelled, running.

create-run

Create a new evaluation run

Create a new experiment run to track an evaluation against a dataset.

Usage

sh
honeyhive experiments create-run [options]

Options

FlagTypeRequiredDescription
--configurationjsonnoconfiguration
--datapoint-idsjsonnodatapoint_ids
--dataset-idstringnodataset_id
--descriptionstringnodescription
--evaluatorsjsonnoevaluators
--event-idsjsonnoevent_ids
--metadatajsonnometadata
--namestringnoname
--passing-rangesjsonnopassing_ranges
--resultsjsonnoresults
--run-idstringnorun_id
--session-idsjsonnosession_ids
--statusstringnostatus Allowed: pending, completed, failed, cancelled, running.

get-runs-schema

Get events schema across all experiment runs in a project

Retrieve the aggregated events schema (fields, datasets, mappings) across all experiment runs in the project.

Usage

sh
honeyhive experiments get-runs-schema [options]

Options

FlagTypeRequiredDescription
--date-rangejsonnoFilter by date range

get-run

Get details of an evaluation run

Retrieve the full details of a single experiment run by its run ID.

Usage

sh
honeyhive experiments get-run [options]

Options

FlagTypeRequiredDescription
--run-idstringyesrun_id

update-run

Update an evaluation run

Update fields on an existing experiment run such as name, status, metadata, or results.

Usage

sh
honeyhive experiments update-run [options]

Options

FlagTypeRequiredDescription
--run-idstringyesrun_id
--configurationjsonnoconfiguration
--datapoint-idsjsonnodatapoint_ids
--descriptionstringnodescription
--evaluatorsjsonnoevaluators
--event-idsjsonnoevent_ids
--metadatajsonnometadata
--namestringnoname
--passing-rangesjsonnopassing_ranges
--resultsjsonnoresults
--session-idsjsonnosession_ids
--statusstringnostatus Allowed: pending, completed, failed, cancelled, running.

delete-run

Delete an evaluation run

Permanently delete an experiment run by its run ID.

Usage

sh
honeyhive experiments delete-run [options]

Options

FlagTypeRequiredDescription
--run-idstringyesrun_id

get-run-schema

Get events schema for a single experiment run

Retrieve the events schema (fields, datasets, mappings) for a single experiment run.

Usage

sh
honeyhive experiments get-run-schema [options]

Options

FlagTypeRequiredDescription
--run-idstringyesExperiment run ID (UUIDv4)
--date-rangejsonnoFilter by date range

get-run-metrics

Get event metrics for an experiment run

Retrieve event metrics from ClickHouse for a specific experiment run

Usage

sh
honeyhive experiments get-run-metrics [options]

Options

FlagTypeRequiredDescription
--run-idstringyesExperiment run ID (UUIDv4)
--date-rangestringnoDate range filter as JSON string
--filtersjsonnoOptional filters to apply (JSON string or array of filter objects)

compare-runs

Retrieve experiment comparison

Compare metrics and results between two experiment runs

Usage

sh
honeyhive experiments compare-runs [options]

Options

FlagTypeRequiredDescription
--new-run-idstringyesNew experiment run ID to compare (UUIDv4)
--old-run-idstringyesOld experiment run ID to compare against (UUIDv4)
--aggregate-functionstringnoAggregation function to apply to metrics Allowed: average, min, max, median, p95, p99, p90, sum, count.
--filtersjsonnoOptional filters to apply (JSON string or array of filter objects)

compare-run-events

Compare events between two experiment runs

Retrieve and compare events between two experiment runs for detailed analysis

Usage

sh
honeyhive experiments compare-run-events [options]

Options

FlagTypeRequiredDescription
--new-run-idstringyesNew experiment run ID (UUIDv4)
--old-run-idstringyesOld experiment run ID to compare against (UUIDv4)
--event-namestringnoFilter by event name
--event-typestringnoFilter by event type
--filterjsonnoAdditional filter criteria (JSON string or object)
--limitnumbernoMaximum number of results
--pagenumbernoPage number for pagination