The Nucleus Verify Benchmark

915 production repositories. 143 million lines of code. One question: how much open source software is verifiable?

915
Repositories scanned
41%
Achieve VERIFIED
143M
Lines of code
0
Regressions between runs

Run ID: 2026-03-12-055109 — All results cryptographically reproducible.
98% of 927 targeted repositories successfully scanned.


How open source software scores

VERIFIED
379 repos
41%
avg 99.0 · 90–100
PARTIAL
350 repos
38%
avg 88.0 · 75–90
UNVERIFIED
186 repos
20%
avg 66.4 · 50–80
VERIFIED repositories average 0.7 critical security findings.
UNVERIFIED repositories average 16.0.
That is a 23× difference.

Score breakdown across 915 repositories

90–100
622 68%
80–89
115 13%
70–79
29 3%
60–69
147 16%
50–59
2 0%
P99100
P95100
P90100
P75100
P5090
P2585
P1065
P0160

What the scan found

Severity breakdown

SeverityFindingsRepos
Critical4,745254
High26,580698
Medium10,584
Info / Low6,991

Top 5 by critical findings

995microsoft/monaco-editor
346HeyPuter/puter
317eosphoros-ai/DB-GPT
251infiniflow/ragflow
233Skyvern-AI/skyvern

All findings are static structural detections. Dynamic testing, fuzzing, and runtime behaviour are outside the scope of this benchmark.


Results by language

LanguageReposAvg ScoreVERIFIEDPARTIALUNVERIFIED
Python44989.625110890
JavaScript43487.412223478
TypeScript1980.3478
Go486.2211
Unknown970.0009

Results by project type

TypeCountAvg ScoreVERIFIED %
CLI tool23993.271%
npm package7491.546%
library3092.040%
API service21886.238%
frontend application11487.524%
web application15683.920%
full-stack web application4883.515%

Verification scales with codebase size

SizeCountAvg ScoreVERIFIED %
<5K LOC6594.080%
5–20K LOC14693.868%
20–50K LOC22191.043%
50–100K LOC15188.237%
100–300K LOC20883.524%
300K–1M LOC10880.619%
>1M LOC1682.844%

Total scanned: 143,776,584 lines of code across 584,803 files.
Largest repo scanned: 10.7M LOC (chinese-poetry/chinese-poetry).


Projects that achieved 100/100

379 repositories achieved VERIFIED in this benchmark run. Full results available via the API.


How the benchmark works

1

Corpus Selection

927 repositories selected from the GitHub top-starred AI and developer tools list. No curation by expected score — all repos scanned regardless of result.

2

Verification Engine

Each repository is scanned by the Nucleus Proof Engine. Five critical gates: Structure, Contract, Structural Integrity, Determinism, Build. Gates are pass/fail. Score is deterministic — same code always produces the same result.

3

Reproducibility

Run ID: 2026-03-12-055109. All results are cryptographically anchored. The deterministic hash of any scanned repository can be independently verified via the public API.

4

Honest Scope

This benchmark measures static structural properties. It does not measure runtime behaviour, test coverage quality, or domain correctness. The Not Verified section of every certificate discloses exact scope.


Which gates determine verdicts

GatePass RateRole
contract99%Critical — near-universal
gate_v279%Critical — decisive gate
build46%Critical — splits PARTIAL/UNVERIFIED
gate_s / gate_d100%Critical — structural baseline
arch0%Informational — not scored
dependency0%Informational — not scored
docs0%Informational — not scored
test0%Informational — not scored

Gates marked Informational appear in reports but do not affect verdict or score.

Verify your repository

Free verification available. No account required for public repositories.

Verify Now View Pricing

Follow us for updates: @AlterMenta on X