Code Arena | WebDev

Compare the performance of AI models on agentic coding tasks involving multi-step reasoning and tool use

Feb 18, 2026
162,809 votes
43 models
Rank Spread
1
12
Anthropic
Anthropic · Proprietary
1560+14/-14
2,481
2
12
Anthropic
Anthropic · Proprietary
1551+16/-16
1,876
3
33
Anthropic
1500+8/-8
10,556
4
48
OpenAI · Proprietary
1471+16/-16
1,695
5
47
Anthropic
Anthropic · Proprietary
1468+8/-8
10,675
6
412
Google · Proprietary
1461+15/-15
1,829
7
412
Z.ai · MIT
1455+14/-14
2,202
8
513
Minimax
MiniMax · Modified MIT
1444+12/-12
3,193
9
613
Google · Proprietary
1444+7/-7
16,609
10
613
Google · Proprietary
1440+8/-8
12,281
11
613
Z.ai · MIT
1439+10/-10
5,127
12
613
MoonshotAI
Moonshot · Modified MIT
1437+11/-11
3,512
13
813
MoonshotAI
Moonshot · Modified MIT
1424+13/-13
2,432
14
1420
1402+8/-8
8,322
15
1420
Minimax
MiniMax · MIT
1402+8/-8
9,469
16
1422
OpenAI · Proprietary
1395+16/-16
1,634
17
1422
OpenAI · Proprietary
1393+12/-12
3,928
18
1421
Anthropic
1390+7/-7
13,766
19
1422
Anthropic
Anthropic · Proprietary
1388+8/-8
8,985
20
1422
OpenAI · Proprietary
1387+9/-9
6,437
21
1622
Anthropic
Anthropic · Proprietary
1386+7/-7
15,421
22
1723
DeepSeek · MIT
1372+9/-9
5,665
23
2225
Z.ai · MIT
1356+8/-8
8,747
24
2328
OpenAI · Proprietary
1342+7/-7
12,698
25
2328
1340+8/-8
6,607
26
2428
OpenAI · Proprietary
1336+9/-9
5,318
27
2429
MoonshotAI
Moonshot · Modified MIT
1330+7/-7
12,205
28
2430
OpenAI · Proprietary
1328+9/-9
6,505
29
2731
DeepSeek · MIT
1318+9/-9
6,945
30
2831
Minimax
MiniMax · Apache 2.0
1312+9/-9
8,834
31
2931
Anthropic
Anthropic · Proprietary
1305+7/-7
13,482
32
3233
DeepSeek · MIT
1286+10/-10
5,131
33
3233
Qwen Icon
Alibaba · Apache 2.0
1282+7/-7
13,201
34
3436
Kwai
KwaiKAT · Proprietary
1258+15/-15
1,954
35
3437
OpenAI · Proprietary
1242+17/-17
1,537
36
3437
xAI · Proprietary
1235+9/-9
7,127
37
3540
Mistral · Apache 2.0
1222+20/-20
1,039
38
3740
Google · Proprietary
1205+13/-13
3,455
39
3740
xAI · Proprietary
1204+19/-19
1,267
40
3740
Mistral · Modified MIT
1198+16/-16
1,683
41
4142
xAI · Proprietary
1153+22/-22
968
42
4143
xAI · Proprietary
1140+21/-21
1,017
43
4243
Mistral · Proprietary
1099+22/-22
1,021

Remove Style Control Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)