Leaderboard

All-time model rankings across all matrix runs

#	Model	Elo↓	Attack	Defense	Errors
1	DeepSeekV32deepseek/deepseek-v3.2	1540	41 / 200 20.5%	169 / 200 84.5%	47
2	GPT54openai/gpt-5.4	1529	18 / 200 9.0%	139 / 200 69.5%	125
3	Gem3Flashgoogle/gemini-3-flash-preview	1526	44 / 200 22.0%	143 / 200 71.5%	46
4	KimiK25moonshotai/kimi-k2.5	1518	21 / 200 10.5%	170 / 200 85.0%	49
5	ClaudeSonnet46anthropic/claude-sonnet-4.6	1507	15 / 200 7.5%	129 / 200 64.5%	125
6	GemPro31google/gemini-3.1-pro-preview	1502	12 / 200 6.0%	130 / 200 65.0%	132
7	GPT5Nanoopenai/gpt-5-nano	1500	10 / 200 5.0%	162 / 200 81.0%	65
8	GPT53Codexopenai/gpt-5.3-codex	1498	5 / 200 2.5%	131 / 200 65.5%	133
9	MiniMaxM25minimax/minimax-m2.5	1497	8 / 200 4.0%	170 / 200 85.0%	48
10	GrokFastx-ai/grok-4.1-fast	1491	9 / 200 4.5%	174 / 200 87.0%	48
11	GPToss120Bopenai/gpt-oss-120b	1488	1 / 200 0.5%	177 / 200 88.5%	48
12	Grok4x-ai/grok-4	1487	2 / 200 1.0%	136 / 200 68.0%	128
13	GLM5z-ai/glm-5	1486	11 / 200 5.5%	173 / 200 86.5%	53
14	ClaudeOpusanthropic/claude-opus-4.6	1486	11 / 200 5.5%	136 / 200 68.0%	132
15	Gem25FlashLitegoogle/gemini-2.5-flash-lite	1485	26 / 200 13.0%	158 / 200 79.0%	47
16	TrinityLargearcee-ai/trinity-large-preview:free	1476	21 / 200 10.5%	126 / 200 63.0%	47
17	ClaudeSonnet45anthropic/claude-4.5-sonnet-20250929	1476	10 / 200 5.0%	105 / 200 52.5%	125
18	ClaudeHaikuanthropic/claude-haiku-4.5	1453	2 / 200 1.0%	152 / 200 76.0%	75
19	Gem20Flashgoogle/gemini-2.0-flash-001	1445	20 / 200 10.0%	107 / 200 53.5%	65
20	Gem25Flashgoogle/gemini-2.5-flash	1436	21 / 200 10.5%	112 / 200 56.0%	48