Skip to content

Eval Cases

Eval cases are individual test cases within an evaluation file. Each case defines input messages, expected outcomes, and optional evaluator overrides.

evalcases:
- id: addition
expected_outcome: Correctly calculates 15 + 27 = 42
input: What is 15 + 27?
expected_output: "42"
FieldRequiredDescription
idYesUnique identifier for the eval case
expected_outcomeYesDescription of what a correct response should contain
inputYesInput sent to the target (string, object, or message array). Alias: input_messages
expected_outputNoExpected response for comparison (string, object, or message array). Alias: expected_messages
executionNoPer-case execution overrides (target, evaluators)
rubricsNoStructured evaluation criteria
sidecarNoAdditional metadata passed to evaluators

The simplest form is a string, which expands to a single user message:

input: What is 15 + 27?

For multi-turn or system messages, use a message array:

input:
- role: system
content: You are a helpful math tutor.
- role: user
content: What is 15 + 27?

Optional reference response for comparison by evaluators. A string expands to a single assistant message:

expected_output: "42"

For structured or multi-message expected output, use a message array:

expected_output:
- role: assistant
content: "42"

Override the default target or evaluators for specific cases:

evalcases:
- id: complex-case
expected_outcome: Provides detailed explanation
input: Explain quicksort algorithm
execution:
target: gpt4_target
evaluators:
- name: depth_check
type: llm_judge
prompt: ./judges/depth.md

Include external files in message content using array format:

input:
- role: user
content:
- type: text
value: Review this code against our guidelines.
- type: file
value: ./guidelines.md

Supported file path formats:

FormatResolution
./path.mdRelative to eval file directory
../dir/file.mdRelative path with parent traversal
/docs/file.mdAbsolute from repository root
https://github.com/...GitHub blob URL (cloned and cached)
https://gitlab.com/...GitLab blob URL (cloned and cached)
https://bitbucket.org/...Bitbucket src URL (cloned and cached)

Git URLs are cloned once and cached in ~/.agentv/cache/repos/.

Pass additional context to evaluators via the sidecar field:

evalcases:
- id: code-gen
expected_outcome: Generates valid Python
sidecar:
language: python
difficulty: medium
input: Write a function to sort a list