Quick Start
This page takes you from nothing installed to a scored benchmark built from your own git history.
Prerequisites
Section titled “Prerequisites”- Python 3.12+
- Docker (default) or a cloud sandbox provider — Harbor runs agents in isolated environments
- uv — Package manager
- npm — Required for Gemini CLI (
@google/gemini-cliis installed automatically by Harbor) - Agent credentials (at least one):
- Claude Code:
ANTHROPIC_API_KEYorCLAUDE_CODE_OAUTH_TOKEN - OpenAI Codex:
CODEX_API_KEY(API key) orcodex login(ChatGPT subscription OAuth) - Gemini CLI:
GEMINI_API_KEY(API key),GOOGLE_API_KEY(Vertex AI), orgemini login(Google account OAuth)
- Claude Code:
- Evaluator CLI — the assessment evaluator spawns the
claudeCLI by default (orcodexif[evaluation] backend = "codex"). That CLI must be installed and authenticated (OAuth subscription or API key — whichever you already use interactively)
See Authentication & Opik for how to set up each agent’s credentials.
Install the CLI
Section titled “Install the CLI”uv tool install nasde-toolkit --python 3.13nasde --versionThis installs the latest stable release from PyPI.
Prefer pipx, pip, or a from-source install? See the alternatives below.
Installation alternatives
Section titled “Installation alternatives”# pipx — analogous isolation, popular in Python communitypipx install nasde-toolkit --python 3.13
# Inside an existing virtual environment (3.12 or 3.13)pip install nasde-toolkit
# Latest unreleased changes from main (for testing PRs and dev builds)uv tool install git+https://github.com/NoesisVision/nasde-toolkit.git --python 3.13
# Local clone (for developing NASDE itself)git clone git@github.com:NoesisVision/nasde-toolkit.gitcd nasde-toolkituv syncUpgrading to the newest release:
uv tool upgrade nasde-toolkit # if installed via uv toolpipx upgrade nasde-toolkit # if installed via pipxpip install --upgrade nasde-toolkit # if installed via pipnasde checks PyPI for newer releases on startup and prints a one-line notice on stderr when an upgrade is available (severity-tinted: patch / minor / major). Disable with NASDE_NO_UPDATE_CHECK=1 or CI=true.
After installation, only nasde appears on PATH. Harbor and Opik are bundled as core dependencies. The reviewer agent spawns your already-installed claude or codex CLI as a subprocess (not bundled), so it reuses whatever authentication you’ve set up interactively. Check the installed version with nasde --version.
Install the authoring skills
Section titled “Install the authoring skills”nasde install-skillsThis copies the bundled nasde-benchmark-* skills into ~/.claude/skills/ so they’re available in every Claude Code session. Use --scope project to install into the current project’s .claude/skills/ instead, or --force to overwrite after a nasde upgrade.
Build your first benchmark from git history
Section titled “Build your first benchmark from git history”Open your own project in Claude Code and say something like:
“Create a NASDE benchmark with a single task, based on a recent piece of work from this repo — a commit, a range of commits, or a merged PR.”
Start with one task. Point the skill at whatever unit of work feels self-contained in your workflow — a single commit, a range, a merged MR/PR, or an issue that was closed by a set of commits. The nasde-benchmark-from-history skill proposes a good candidate, and generates one task directory with instruction.md, a Dockerfile, test.sh, and a starter assessment_criteria.md. You review each file before it’s written.
Run it
Section titled “Run it”nasde run --all-variants -C path/to/generated-benchmark--all-variants runs every variant the skill scaffolded, so you don’t need to know their names yet. If you’d rather burn fewer tokens on the first run, pick just one with --variant NAME — you can run the others later.
Good to know
Section titled “Good to know”- Start small. One task is enough to validate the loop end to end. Scale up once it works — more tasks only pay off after you’ve seen what a task looks like in practice.
- Your subscription covers it. Runs use your existing
claude/codex/geminiCLI auth, so a Claude Max or ChatGPT Plus subscription is enough to get going. API keys are supported too when you have them — see Authentication & Opik for the full picture. - More docs. See Use Cases for the end-to-end walkthrough and Benchmark Results for reference numbers.