Evaluation & Analysis Guides

Problem-solving guides for running experiments and evaluating LLM outputs in HoneyHive.

Tip

New to experiments? Start with the Tutorial 5: Run Your First Experiment tutorial first. It walks you through running your first experiment with evaluators in 15 minutes!

Overview

Experiments in HoneyHive help you systematically test and improve AI applications. These guides show you how to solve specific evaluation challenges.

What You Can Do:

  • Run experiments with the evaluate() function

  • Create custom evaluators to measure quality

  • Compare experiments to track improvements

  • Manage datasets for systematic testing

  • Evaluate multi-step pipelines and agents

  • Analyze results to identify patterns

  • Apply best practices for reliable evaluation

See the guides below for specific evaluation scenarios.