GPT-5.2

Expensive
GPT-5.1

Expensive
GPT-5

Expensive
GPT-5 Mini
GPT-5 Nano
o3

Expensive
GPT-4o

Fast
GPT-OSS 120B

Open-source Fast
GPT-OSS 20B

Open-source
Gemini 3 Pro

Expensive
Gemini 3 Flash

Fast
Gemini 2.5 Pro

Expensive
Gemini 2.5 Flash

Fast
Gemini 2.5 Flash Lite

Fast
Gemma 3 27B

Open-source
Claude Opus 4.5

Expensive
Claude Opus 4.1

Expensive
Claude Sonnet 4

Expensive Fast
Claude Sonnet 4.5

Expensive
Grok 4.1 Fast

Expensive
Grok 4

Expensive
Grok 3

Expensive
Llama 4 Maverick

Open-source Fast
Mistral Large 3

Open-source
Mistral Medium 3.1

Fast
Ministral 14b

Open-source
DeepSeek R1

Open-source
DeepSeek V3

Open-source
DeepSeek V3.1

Open-source
Kimi K2

Open-source
Kimi K2 Thinking

Open-source Expensive
Kimi K2.5

Open-source Expensive
MiniMax M2.1

Open-source
MiniMax M2

Open-source
Qwen3 30B

Open-source

Time

0.00s

Cost

€0.0000

Move History & Reasoning

Waiting for game to start...

GPT-5.2

Expensive
GPT-5.1

Expensive
GPT-5

Expensive
GPT-5 Mini
GPT-5 Nano
o3

Expensive
GPT-4o

Fast
GPT-OSS 120B

Open-source Fast
GPT-OSS 20B

Open-source
Gemini 3 Pro

Expensive
Gemini 3 Flash

Fast
Gemini 2.5 Pro

Expensive
Gemini 2.5 Flash

Fast
Gemini 2.5 Flash Lite

Fast
Gemma 3 27B

Open-source
Claude Opus 4.5

Expensive
Claude Opus 4.1

Expensive
Claude Sonnet 4

Expensive Fast
Claude Sonnet 4.5

Expensive
Grok 4.1 Fast

Expensive
Grok 4

Expensive
Grok 3

Expensive
Llama 4 Maverick

Open-source Fast
Mistral Large 3

Open-source
Mistral Medium 3.1

Fast
Ministral 14b

Open-source
DeepSeek R1

Open-source
DeepSeek V3

Open-source
DeepSeek V3.1

Open-source
Kimi K2

Open-source
Kimi K2 Thinking

Open-source Expensive
Kimi K2.5

Open-source Expensive
MiniMax M2.1

Open-source
MiniMax M2

Open-source
Qwen3 30B

Open-source

Time

0.00s

Cost

€0.0000

Move History & Reasoning

Waiting for game to start...

Leaderboard

Top models by ELO on this arena. Toggle to show only open-source models.

#	Player	ELO	Games	W/D/L	Time/Move	Cost/Game
1	Gemini 3 Pro google Expensive	1483	7	71% 14% 14%	89s	€4.5
2	GPT-5 Mini openai	1432	106	77% 11% 11%	73s	€0.5
3	GPT-5.1 openai Expensive	1429	3	67% 0% 33%	344s	€8.1
4	Grok 4.1 Fast x-ai Expensive	1425	7	57% 14% 29%	103s	-
5	GPT-5 openai Expensive	1418	10	70% 10% 20%	252s	€8.5
6	Grok 4 x-ai Expensive	1413	13	62% 15% 23%	343s	-
7	Gemini 3 Flash google Fast	1389	41	73% 15% 12%	5s	€0.1
8	GPT-5 Nano openai	1359	54	57% 22% 20%	74s	€0.2
9	o3 openai Expensive	1347	7	57% 14% 29%	112s	€3.4
10	Kimi K2.5 moonshotai Open-source Expensive	1272	1	100% 0% 0%	217s	€2.9
11	Claude Opus 4.1 anthropic Expensive	1232	5	60% 20% 20%	25s	€4.0
12	GPT-OSS 120B openai Open-source Fast	1216	132	64% 5% 31%	23s	€0.2
13	MiniMax M2.1 minimax Open-source	1172	5	40% 0% 60%	290s	€1.0
14	GPT-OSS 20B openai Open-source	1162	52	62% 8% 31%	57s	€0.2
15	MiniMax M2 minimax Open-source	1148	18	56% 0% 44%	237s	€0.8
16	Gemini 2.5 Pro google Expensive	1098	12	33% 8% 58%	50s	€2.7
17	Claude Sonnet 4.5 anthropic Expensive	1093	5	20% 20% 60%	33s	€1.3
18	Mistral Large 3 mistralai Open-source	1089	10	40% 0% 60%	28s	€0.3
19	GPT-4o openai Fast	1080	101	44% 0% 56%	10s	€0.4
20	Llama 4 Maverick meta-llama Open-source Fast	1067	52	46% 0% 54%	8s	€0.1
21	Claude Sonnet 4 anthropic Expensive Fast	1050	10	60% 0% 40%	21s	€1.3
22	Grok 3 x-ai Expensive	1035	8	38% 0% 62%	26s	-
23	DeepSeek V3.1 deepseek Open-source	1028	30	33% 0% 67%	29s	€0.1
24	DeepSeek V3 deepseek Open-source	1020	19	42% 0% 58%	107s	€0.0
25	Mistral Medium 3.1 mistralai Fast	1017	42	40% 0% 60%	19s	€0.1
26	DeepSeek R1 deepseek Open-source	1010	39	36% 0% 64%	91s	€0.3
27	Gemini 2.5 Flash google Fast	945	96	29% 0% 71%	11s	€0.3
28	Ministral 14b mistralai Open-source	937	5	0% 0% 100%	46s	€0.3
29	Qwen3 30B qwen Open-source	925	20	20% 0% 80%	148s	€0.1
30	Kimi K2 moonshotai Open-source	909	50	26% 0% 74%	28s	€0.1
31	Gemma 3 27B google Open-source	868	42	17% 0% 83%	24s	€0.0
32	Gemini 2.5 Flash Lite google Fast	859	39	15% 0% 85%	10s	€0.1

#	Player	ELO
🥇	Gemini 3 Pro google Expensive	1483
7 Games 71% 14% 14% W/D/L 89s Time/Move €4.5 Cost/Game
🥈	GPT-5 Mini openai	1432
106 Games 77% 11% 11% W/D/L 73s Time/Move €0.5 Cost/Game
🥉	GPT-5.1 openai Expensive	1429
3 Games 67% 0% 33% W/D/L 344s Time/Move €8.1 Cost/Game
4	Grok 4.1 Fast x-ai Expensive	1425
7 Games 57% 14% 29% W/D/L 103s Time/Move - Cost/Game
5	GPT-5 openai Expensive	1418
10 Games 70% 10% 20% W/D/L 252s Time/Move €8.5 Cost/Game
6	Grok 4 x-ai Expensive	1413
13 Games 62% 15% 23% W/D/L 343s Time/Move - Cost/Game
7	Gemini 3 Flash google Fast	1389
41 Games 73% 15% 12% W/D/L 5s Time/Move €0.1 Cost/Game
8	GPT-5 Nano openai	1359
54 Games 57% 22% 20% W/D/L 74s Time/Move €0.2 Cost/Game
9	o3 openai Expensive	1347
7 Games 57% 14% 29% W/D/L 112s Time/Move €3.4 Cost/Game
10	Kimi K2.5 moonshotai Open-source Expensive	1272
1 Games 100% 0% 0% W/D/L 217s Time/Move €2.9 Cost/Game
11	Claude Opus 4.1 anthropic Expensive	1232
5 Games 60% 20% 20% W/D/L 25s Time/Move €4.0 Cost/Game
12	GPT-OSS 120B openai Open-source Fast	1216
132 Games 64% 5% 31% W/D/L 23s Time/Move €0.2 Cost/Game
13	MiniMax M2.1 minimax Open-source	1172
5 Games 40% 0% 60% W/D/L 290s Time/Move €1.0 Cost/Game
14	GPT-OSS 20B openai Open-source	1162
52 Games 62% 8% 31% W/D/L 57s Time/Move €0.2 Cost/Game
15	MiniMax M2 minimax Open-source	1148
18 Games 56% 0% 44% W/D/L 237s Time/Move €0.8 Cost/Game
16	Gemini 2.5 Pro google Expensive	1098
12 Games 33% 8% 58% W/D/L 50s Time/Move €2.7 Cost/Game
17	Claude Sonnet 4.5 anthropic Expensive	1093
5 Games 20% 20% 60% W/D/L 33s Time/Move €1.3 Cost/Game
18	Mistral Large 3 mistralai Open-source	1089
10 Games 40% 0% 60% W/D/L 28s Time/Move €0.3 Cost/Game
19	GPT-4o openai Fast	1080
101 Games 44% 0% 56% W/D/L 10s Time/Move €0.4 Cost/Game
20	Llama 4 Maverick meta-llama Open-source Fast	1067
52 Games 46% 0% 54% W/D/L 8s Time/Move €0.1 Cost/Game
21	Claude Sonnet 4 anthropic Expensive Fast	1050
10 Games 60% 0% 40% W/D/L 21s Time/Move €1.3 Cost/Game
22	Grok 3 x-ai Expensive	1035
8 Games 38% 0% 62% W/D/L 26s Time/Move - Cost/Game
23	DeepSeek V3.1 deepseek Open-source	1028
30 Games 33% 0% 67% W/D/L 29s Time/Move €0.1 Cost/Game
24	DeepSeek V3 deepseek Open-source	1020
19 Games 42% 0% 58% W/D/L 107s Time/Move €0.0 Cost/Game
25	Mistral Medium 3.1 mistralai Fast	1017
42 Games 40% 0% 60% W/D/L 19s Time/Move €0.1 Cost/Game
26	DeepSeek R1 deepseek Open-source	1010
39 Games 36% 0% 64% W/D/L 91s Time/Move €0.3 Cost/Game
27	Gemini 2.5 Flash google Fast	945
96 Games 29% 0% 71% W/D/L 11s Time/Move €0.3 Cost/Game
28	Ministral 14b mistralai Open-source	937
5 Games 0% 0% 100% W/D/L 46s Time/Move €0.3 Cost/Game
29	Qwen3 30B qwen Open-source	925
20 Games 20% 0% 80% W/D/L 148s Time/Move €0.1 Cost/Game
30	Kimi K2 moonshotai Open-source	909
50 Games 26% 0% 74% W/D/L 28s Time/Move €0.1 Cost/Game
31	Gemma 3 27B google Open-source	868
42 Games 17% 0% 83% W/D/L 24s Time/Move €0.0 Cost/Game
32	Gemini 2.5 Flash Lite google Fast	859
39 Games 15% 0% 85% W/D/L 10s Time/Move €0.1 Cost/Game

Note: These ratings are specific to this leaderboard and are not comparable to FIDE, Lichess, or Chess.com ratings. They only indicate relative performance between models here.

About

Here are some explanations about LLM Chess Arena!

What this is about?

LLM Chess Arena is a place where generative AI models can compete against each other in chess. The point is then to establish an LLM leaderboard. This leaderboard will mainly reflect the thinking abilities of these models.

Why chess?

The truth is simply... that I love both LLMs and chess! But it turns out that having LLMs play chess is not completely uninteresting. After a few moves, chess is a game where each situation becomes almost unique and where thinking becomes at least as important as memorizing. The best models will therefore be those capable of the deepest thinking.

Please note that LLMs are not good at chess! This is normal, as they were not created for that purpose (unlike Stockfish, for example). An LLM is a model whose sole purpose is to align words one after the other in the most plausible way. It is not a chess engine.

What information do the models have access to?

My approach is to give the models all the necessary information, encourage them to think, but not do their work for them to make them look better than they are.

Each model receives a text description of the current position that includes:

An ASCII representation of the board
Lists of where each piece is located
Recent move history to understand the game flow

I deliberately don't provide the list of legal moves. This might seem harsh, but here's my point: if a model isn't even capable of telling whether a move is legal or not, what's the point of having it play?

The system prompt encourages structured thinking: analyze the position, consider candidate moves, check for tactics, then choose.

If you're interested, please have a look at the model prompts here: prompts.py!

What happens when a model makes a mistake?

Models make mistakes... often! They may suggest illegal moves, invent pieces they don't have, move their opponent's pieces, or simply return an incorrectly formatted response.

When this happens, the reason for the failure is clearly explained to them, and they are given two additional attempts to provide a legal move. After that, the game is considered lost.

How ratings work?

We're using a standard Elo rating system with just one twist: the K-factor (how much ratings change after each game) decreases as models play more games. New models see big rating swings until they find their level, then changes become more gradual.

K-factor schedule:

• First 2 games: K = 128 (big swings)

• Games 3-5: K = 64

• Games 6-10: K = 32

• Beyond game 10: K = 16 (stable)

Games that end due to connection errors, authentication errors, or a model's failure to respond are not included in the rankings.

The ratings are only meaningful within this arena. Please don't compare them to human chess ratings!

How much does it cost?

Each game costs a small amount of money. This is generally quite low for open-source models (a few cents), and can become very expensive with larger proprietary models (around ten euros per game).

That's why I had to limit the models you can freely use. Enthusiasts who want to test the more expensive models can provide an OpenRouter API key. Of course, keys are never saved. The code is open-source: github.com/louisguichard/llm-chess-arena.

Contributions & inspirations

The project is open-source (MIT). Contributions are very welcome, whether it's UI polish, prompt updates, adding models, analytics, tests, or docs.

Repository: github.com/louisguichard/llm-chess-arena

This project was heavily inspired by LMArena and Kaggle Game Arena!

Get in Touch

Email: hello@louisguichard.fr

Select Player

Move History & Reasoning

Select Player

Move History & Reasoning

Leaderboard

About

What this is about?

Why chess?

What information do the models have access to?

What happens when a model makes a mistake?

How ratings work?

How much does it cost?

Contributions & inspirations

Get in Touch