View Hawk Eval Results

When the user wants to analyze evaluation results, use these hawk CLI commands:

1. List Eval Sets

You can list all eval sets if the user do not know the eval set ID:

bash

hawk list eval-sets

Shows: eval set ID, creation date, creator.

You can increase the limit of results returned by --limit N.

bash

hawk list eval-sets --limit 50

Or you can search for a specific eval set by using --search QUERY.

bash

hawk list eval-sets --search pico

2. List Evaluations

With an eval set ID, you can list all evaluations in the eval-set:

bash

hawk list evals [EVAL_SET_ID]

Shows: task name, model, status (success/error/cancelled), and sample counts.

3. List Samples

Or you can list individual samples and their scores:

bash

hawk list samples [EVAL_SET_ID] [--eval FILE] [--limit N]

4. Download Transcript

To get the full conversation for a specific sample:

bash

hawk transcript <UUID>

The transcript includes full conversation with tool calls, scores, and metadata.

To get even more details, you can get the raw data by using --raw:

bash

hawk transcript <UUID> --raw

Batch Transcript Download

You can also download all transcripts for an entire eval set:

bash

# Fetch all samples in an eval set
hawk transcripts <EVAL_SET_ID>

# Write to individual files in a directory
hawk transcripts <EVAL_SET_ID> --output-dir ./transcripts

# Limit number of samples
hawk transcripts <EVAL_SET_ID> --limit 10

# Raw JSON output (one JSON per line to stdout, or .json files with --output-dir)
hawk transcripts <EVAL_SET_ID> --raw

Workflow

Run hawk list eval-sets to see available eval sets 2a. Run hawk list evals <EVAL_SET_ID> to see available evaluations 2b. or run hawk list samples <EVAL_SET_ID> to find samples of interest 3a. Run hawk transcript <uuid> to get full details on a single sample 3b. or run hawk transcripts <eval_set_id> --output-dir ./transcripts to download all
Read and analyze the transcript(s) to understand the agent's behavior

API Environments

Production (https://api.inspect-ai.internal.metr.org) is used by default. Set HAWK_API_URL only when targeting non-production environments:

Environment	URL
Staging	`https://api.inspect-ai.staging.metr-dev.org`
Dev1	`https://api.inspect-ai.dev1.staging.metr-dev.org`
Dev2	`https://api.inspect-ai.dev2.staging.metr-dev.org`
Dev3	`https://api.inspect-ai.dev3.staging.metr-dev.org`
Dev4	`https://api.inspect-ai.dev4.staging.metr-dev.org`

Example:

bash

HAWK_API_URL=https://api.inspect-ai.staging.metr-dev.org hawk list eval_sets

Search AI Tools

view-results

Install this agent skill to your Project

SKILL.md

View Hawk Eval Results

1. List Eval Sets

2. List Evaluations

3. List Samples

4. Download Transcript

Batch Transcript Download

Workflow

API Environments