Dashboard

Adversarial prompt-injection benchmark results

4,000tests·308leaks (7.7%)·2,899defended·793errors·10runs

Leaderboard

View full →
#ModelEloAtk RateDef Rate
1DeepSeekV32154020.5%84.5%
2GPT5415299.0%69.5%
3Gem3Flash152622.0%71.5%
4KimiK25151810.5%85.0%
5ClaudeSonnet4615077.5%64.5%
6GemPro3115026.0%65.0%
7GPT5Nano15005.0%81.0%
8GPT53Codex14982.5%65.5%
9MiniMaxM2514974.0%85.0%
10GrokFast14914.5%87.0%
11GPToss120B14880.5%88.5%
12Grok414871.0%68.0%
13GLM514865.5%86.5%
14ClaudeOpus14865.5%68.0%
15Gem25FlashLite148513.0%79.0%
16TrinityLarge147610.5%63.0%
17ClaudeSonnet4514765.0%52.5%
18ClaudeHaiku14531.0%76.0%
19Gem20Flash144510.0%53.5%
20Gem25Flash143610.5%56.0%

Attack Success Heatmapattacker (row) vs defender (col)

ClaudeHaiku
ClaudeOpus
ClaudeSonnet45
ClaudeSonnet46
DeepSeekV32
GLM5
GPT53Codex
GPT54
GPT5Nano
GPToss120B
Gem20Flash
Gem25Flash
Gem25FlashLite
Gem3Flash
GemPro31
Grok4
GrokFast
KimiK25
MiniMaxM25
TrinityLarge
Atk Rate
ClaudeHaiku
0/100%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
1/1010%
0/100%
0/100%
1%
ClaudeOpus
0/100%
0/100%
1/1010%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
3/1030%
1/1010%
1/1010%
3/1030%
0/100%
0/100%
0/100%
0/100%
0/100%
1/1010%
6%
ClaudeSonnet45
1/1010%
0/100%
1/1010%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
1/1010%
5/1050%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
5%
ClaudeSonnet46
1/1010%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
2/1020%
5/1050%
1/1010%
0/100%
0/100%
0/100%
0/100%
1/1010%
0/100%
4/1040%
8%
DeepSeekV32
0/100%
0/100%
2/1020%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
7/1070%
9/1090%
5/1050%
7/1070%
0/100%
0/100%
0/100%
0/100%
1/1010%
10/10100%
21%
GLM5
1/1010%
0/100%
1/1010%
2/1020%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
1/1010%
1/1010%
0/100%
1/1010%
0/100%
1/1010%
0/100%
3/1030%
0/100%
0/100%
6%
GPT53Codex
1/1010%
1/1010%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
1/1010%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
3%
GPT54
0/100%
1/1010%
2/1020%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
3/1030%
3/1030%
1/1010%
2/1020%
0/100%
0/100%
1/1010%
0/100%
0/100%
4/1040%
9%
GPT5Nano
0/100%
0/100%
3/1030%
2/1020%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
1/1010%
0/100%
3/1030%
0/100%
0/100%
1/1010%
0/100%
0/100%
0/100%
5%
GPToss120B
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
1%
Gem20Flash
0/100%
0/100%
4/1040%
0/100%
0/100%
2/1020%
0/100%
0/100%
0/100%
0/100%
3/1030%
4/1040%
1/1010%
0/100%
0/100%
0/100%
1/1010%
0/100%
1/1010%
4/1040%
10%
Gem25Flash
1/1010%
0/100%
2/1020%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
8/1080%
2/1020%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
2/1020%
4/1040%
11%
Gem25FlashLite
1/1010%
0/100%
3/1030%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
7/1070%
6/1060%
1/1010%
3/1030%
0/100%
0/100%
0/100%
0/100%
1/1010%
4/1040%
13%
Gem3Flash
1/1010%
0/100%
2/1020%
1/1010%
4/1040%
0/100%
0/100%
0/100%
0/100%
0/100%
7/1070%
7/1070%
8/1080%
2/1020%
0/100%
0/100%
0/100%
1/1010%
1/1010%
10/10100%
22%
GemPro31
0/100%
0/100%
2/1020%
2/1020%
0/100%
0/100%
0/100%
1/1010%
0/100%
0/100%
3/1030%
1/1010%
0/100%
2/1020%
0/100%
0/100%
0/100%
0/100%
1/1010%
0/100%
6%
Grok4
0/100%
0/100%
1/1010%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
1%
GrokFast
0/100%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
2/1020%
3/1030%
0/100%
0/100%
0/100%
0/100%
0/100%
1/1010%
0/100%
2/1020%
5%
KimiK25
0/100%
1/1010%
2/1020%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
2/1020%
6/1060%
0/100%
5/1050%
0/100%
0/100%
0/100%
0/100%
0/100%
5/1050%
11%
MiniMaxM25
0/100%
0/100%
2/1020%
2/1020%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
2/1020%
0/100%
1/1010%
0/100%
0/100%
0/100%
0/100%
0/100%
1/1010%
4%
TrinityLarge
1/1010%
1/1010%
3/1030%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
0/100%
3/1030%
7/1070%
0/100%
4/1040%
0/100%
0/100%
0/100%
0/100%
0/100%
2/1020%
11%
Def Rate
96%
98%
83%
95%
96%
99%
100%
100%
100%
100%
74%
68%
91%
83%
100%
99%
99%
97%
97%
75%

Best Attackers

1Gem3Flash

44/200 leaks landed

22.0%
2DeepSeekV32

41/200 leaks landed

20.5%
3Gem25FlashLite

26/200 leaks landed

13.0%
4KimiK25

21/200 leaks landed

10.5%
5TrinityLarge

21/200 leaks landed

10.5%

Best Defenders

1GPToss120B

177/200 defended

88.5%
2GrokFast

174/200 defended

87.0%
3GLM5

173/200 defended

86.5%
4KimiK25

170/200 defended

85.0%
5MiniMaxM25

170/200 defended

85.0%