Understanding Your AgentNode Verification Score

You published your package, verification ran, and you see a score and a tier. But what do those numbers actually mean? And why might a package score 95 but still sit at Verified instead of Gold?

This guide breaks down the entire scoring system.

The Score Breakdown

Every tool pack is scored on a 0-100 scale across seven dimensions:

Step	Max Points	What It Measures
Install	15	Package installs without errors
Import	15	Tool entrypoint imports successfully
Smoke	25	Tool produces a valid return value when called
Tests	15	Publisher-provided tests pass
Contract	10	Return value is serializable, non-None, type-stable
Reliability	10	Same input produces success on repeated runs (3x)
Determinism	5	Same input produces the same output hash

Additionally, runtime warnings deduct up to 10 points (2 per warning).

How to Check Your Score

Via the API:

curl https://agentnode.net/v1/packages/my-pack/versions/1.0.0 | jq '.verification'

The response includes:

{
  "score": 95,
  "tier": "gold",
  "confidence": "high",
  "breakdown": {
    "install": {"points": 15, "max": 15, "reason": "Installed in 2.3s"},
    "import": {"points": 15, "max": 15, "reason": "All tools imported successfully"},
    "smoke": {"points": 25, "max": 25, "reason": "Returned valid result"},
    "tests": {"points": 15, "max": 15, "reason": "Publisher-provided tests passed"},
    "contract": {"points": 10, "max": 10, "reason": "Serializable, typed return"},
    "reliability": {"points": 10, "max": 10, "reason": "3/3 runs passed"},
    "determinism": {"points": 5, "max": 5, "reason": "Consistent output across runs"}
  }
}

Score → Tier Mapping

Score Range	Base Tier
90-100	Gold
70-89	Verified
50-69	Partial
0-49	Unverified

But this is just the base tier. Hard caps can override it downward.

Hard Tier Caps

Even with a high score, certain conditions cap your maximum tier:

Condition	Maximum Tier
No verification cases (`has_explicit_cases=false`)	Verified
Smoke test not passed	Verified
Contract invalid	Verified
Credential boundary reached (no publisher tests)	Partial
`verification_mode=limited`	Verified

This is why a score of 95 doesn't guarantee Gold. The most common blocker: no explicit verification cases.

Common Scenarios and Fixes

Score 95, Tier Verified

Cause: No verification.cases in your manifest. The pipeline used auto-generated inputs and everything passed — but without publisher-declared cases, Gold is not reachable.

Fix: Add a verification.cases block to your agentnode.yaml. See the verification cases guide.

Smoke: 12/25 (Credential Boundary)

Cause: Your tool tried to call an external API and got an auth error. The sandbox has no API keys.

Fix: Either add a VCR cassette (for the API path) or add publisher tests that mock the API call. With passing tests, the smoke score bumps to 15/25 and you can reach Verified.

Contract: 0/10

Cause: Your tool returned None, a non-serializable object, or the return type changed between runs.

Fix: Ensure your tool always returns a JSON-serializable value (dict, list, str, int, float, bool). Never return None on success.

Reliability: 6/10 (2/3 runs passed)

Cause: One of three identical calls failed. Common reasons: rate limiting, network timeouts (in real mode), or non-deterministic state.

Fix: If using verification cases with a VCR cassette, this shouldn't happen (replay is deterministic). If in cases_real mode, ensure your tool handles edge cases gracefully.

Determinism: 0/5

Cause: Same input produced different output hashes across runs. This is expected for tools that include timestamps, random IDs, or live data in their output.

Fix: The pipeline normalizes outputs before hashing (sorts dict keys, strips whitespace). If your tool legitimately produces different output each time (e.g., a news aggregator), partial determinism credit (2-3/5) is normal and acceptable for Gold.

Tests: 0/15

Cause: Publisher tests failed in the container sandbox. Common reasons: tests try to access network, tests depend on system binaries not in the container, tests reference absolute paths.

Fix: Ensure your tests work in an isolated environment. Use pytest.mark.skipif for tests that need optional dependencies. Use relative paths or /tmp for file operations.

The Three Verification Modes

After verification runs, your package is assigned a mode that appears in the score detail:

Mode	Meaning	Gold Eligible
`fixture`	Cases ran with VCR cassette replay	Yes
`cases_real`	Cases ran with real local execution	Yes
`real_auto`	No explicit cases, auto-generated inputs	No

Gold Checklist

All of these must be true simultaneously:

verification.cases present in manifest (at least 1 case)
Smoke status: passed
Contract valid: true
Reliability: >= 0.9 (at least 3/3 or 9/10 runs pass)
Score: >= 90
Mode: fixture or cases_real (not real_auto)

If any one condition fails, the tier caps at Verified regardless of score.

Re-Verification

Your package is re-verified when:

You publish a new version
An admin triggers re-verification (after infrastructure updates)
The verification runner is upgraded (new capabilities)

Re-verification can upgrade your tier (e.g., after you add cases) or downgrade it (if something broke). The tier is always computed fresh from the latest run.

Debugging Tips

Check the smoke log — the API returns smoke_log with detailed output from each case run
Check smoke_reason — values like credential_boundary_reached, missing_system_dependency, or needs_binary_input tell you exactly what blocked the smoke test
Check stability_log — shows each stability run's success/failure and output hash
Check contract_details — shows why contract validation failed (non-serializable, None return, type mismatch)

All of these fields are available in the version detail API response.