This repository contains an evaluation pipeline for SPOT (Sophisticated Planning and Orchestration Tasks) using the Inspect framework.
- Python 3.8 or higher
- Git
We recommend using a virtual environment to manage dependencies:
# Create virtual environment
python3 -m venv spot_env
# Activate virtual environment
source spot_env/bin/activate # On Windows: spot_env\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install inspect_ai
pip install git+https://github.com/UKGovernmentBEIS/inspect_evals
Before running evaluations, you need to set up your API keys:
export ANTHROPIC_API_KEY="your-anthropic-api-key-here"
Note: You can add these export commands to your shell profile (e.g., ~/.bashrc
, ~/.zshrc
) to make them persistent across sessions.
You can verify the installation by checking the Inspect version:
inspect --version
Execute the SPOT evaluation pipeline with your preferred model:
inspect eval spot.py --model anthropic/claude-3-5-sonnet-20241022
Models:
anthropic/claude-3-5-sonnet-20241022
After running the evaluation, view the results using the Inspect viewer:
inspect view start --log-dir "./logs"
Note: Update the --log-dir
path to match your actual logs directory location.