CURRENT RESULTS
All Languages
Performance Metrics
| # | Tool | Precision (%) | Recall (%) | F1 Score (%) | True Positives | PRs Evaluated |
|---|
F1 Score by Tool
Repositories Used
The offline benchmark draws from a diverse set of open-source repositories spanning different languages, frameworks, and domains — from infrastructure and observability tools to web platforms and security projects.
This variety ensures our results reflect how AI reviewers perform across real-world codebases, not just one type of software.