Joute Mode

The Arena

Two AIs battle on the same prompt. You judge blind, without knowing the brands: that is the playable arena, a sample of battles. The ranking is the aggregate of 139,935 real votes from the open compar:IA dataset, enriched in real time by Joute visitor votes.

Gemini8,7/1022 €/moisEssayer Gemini

Claude9/1018 €/moisEssayer Claude

139,935

real votes aggregated

295

playable battles

100%

blind vote

The AI arena, without brand bias.

When you choose your AI model, brand gets in the way. ChatGPT, Claude, Gemini: the names carry weight. The arena breaks that bias. You read two answers to the same prompt, without knowing who wrote them. You vote for the best. Then we reveal.

The format was not invented by Joute. It is the method of compar:IA from the French Ministry of Culture: since 2024, they have collected 139,935 real votes on hundreds of models in French, under Open License 2.0. Joute aggregates the entire dataset in its ranking, and lets you replay a sample of 295 battles to form your own opinion.

Your Joute votes count. They are aggregated server-side in real time and enrich the base ranking. The larger the Joute community grows, the more the ranking reflects real usage, not just the government reference dataset.

100% real duels from compar:IA, the AI arena of the French Ministry of Culture. Joute simulates nothing.

CreativiteDuel #1

The challenge

Je suis parent et je cherche une histoire du soir à raconter à mes enfants. Choisis trois objets sans rapport apparent et utilise-les comme base pour créer une histoire de 100 mots.

AFighter A

L'Aventure du Parapluie Magique Il était une fois un vieux parapluie rouge oublié dans un grenier. Un soir, il s'envola par la fenêtre et atterrit près d'un cactus du désert. "Aïe ! Attention à mes piquants !" dit le cactus. "Pardon ! Je cherche la pluie," répondit le parapluie tristement. Soudain, une cuillère en argent apparut, portée par le vent. "Je connais un nuage qui pleure chaque nuit ! Suivez-moi !" Ensemble, ils trouvèrent le nuage solitaire. Le parapluie le réconforta, le cactus recueillit ses larmes de joie, et la cuillère servit des étoiles en dessert. Depuis, ils sont les meilleurs amis du monde. Bonne nuit ! 🌙

BFighter B

Objets choisis : un parapluie rouge, une montre à gousset cassée, un coquillage spiralé. --- Léo trouva le parapluie rouge dans le grenier. En l’ouvrant, un tourbillon l’emporta sur une plage où le sable était doré. À ses pieds, un coquillage spiralé murmura : « Cherche le Temps ». Plus loin, un crabe jouait avec une montre à gousset cassée. « Elle ne tourne plus depuis que le Chagrin a volé les rires », expliqua-t-il. Léo, avec son parapluie, dessina dans le sable un grand sourire. Le mécanisme de la montre cliqueta à nouveau. Le coquillage souffla : « Le temps repart quand on redessine la joie. » Et Léo rentra, juste à l’heure du dodo.

Cast your verdict

The ranking

The arena ranking

139 935 votes

Ranking built on 139 935 real blind votes from the open compar:IA dataset (French Ministry of Culture). Each brand aggregates all its tested models, from smallest to largest. No editorial score factors in.

Gemini41%

20 744 / 50 177 duels 2

DeepSeek39%

6 367 / 16 518 duels 3

Claude35%

5 433 / 15 405 duels 4

Grok35%

1 636 / 4 677 duels 5

Le Chat34%

17 048 / 50 695 duels 6

ChatGPT31%

11 259 / 36 509 duels

The percentage is a win rate: the share of duels won across all duels the brand played in the compar:IA dataset. The number of duels varies from one brand to another.

How it works

Three steps, one minute per battle.

You read both answers

Same prompt, two AIs, identities hidden. You see A and B, not their names. No logo, no brand color. Just the text.

You vote for the best

A wins, B wins, tie, or both weak. No registration required, just a click. The vote is anonymous (IP+UA hash, no cookie).

We reveal, we aggregate

Names appear: you see whether your intuition matches. Your vote is added to the Joute ranking in real time.

Ranking methodology

How we build the arena ranking.

The model: Bradley-Terry, not a raw score

We do not add up wins. We use the Bradley-Terry statistical model, the standard for pair-wise rankings (Elo in chess, LMSYS Chatbot Arena). It computes a latent strength for each model, such that the probability that A beats B reflects the strength gap observed in past battles.

Two combined signals

compar:IA signal: the base ranking is sourced from the 139,935 real votes in the French Ministry of Culture dataset. This is the prior: a known strength for each model.

Joute signal: your votes and those of the Joute community are aggregated server-side (Vercel KV) and adjust the prior via Bayesian logic. The more votes accumulate, the more the Joute signal weighs vs the initial compar:IA ranking.

Data freshness

The compar:IA dataset is re-synced monthly (first Monday of the month). Joute votes are aggregated in real time: your vote changes the ranking the second you click.

See the full Joute method →

Where the battles come from

100% real confrontations

Everything comes from compar:IA, the open dataset of the French Ministry of Culture: 139,935 real votes cast blind by French-speaking users, under Open License 2.0. The ranking is its aggregate, enriched in real time by Joute votes. The playable arena gives you a sample of 295 battles from this same dataset to replay and judge yourself.

FAQ

Everything we get asked about the arena.

What is the Joute AI Arena?

A blind test between two AI models on the same prompt. You read both answers without knowing who wrote them, you vote for the best one, then we reveal the names. It is the only format that measures perceived quality without brand bias.

Where do the battles and votes come from?

Battles are drawn from the open dataset compar:IA, the AI arena of the French Ministry of Culture, under Open License 2.0. The current ranking aggregates 139,935 real votes cast by French-speaking users. Your Joute votes are added to this signal in real time.

How is the ranking calculated?

Two signals are combined. The compar:IA signal gives the base ranking (Bradley-Terry model on the 139,935 dataset votes). Joute votes are aggregated server-side and adjust this ranking via a Bayesian prior: the more votes accumulate, the more the Joute signal weighs vs the starting ranking.

Are my votes anonymous?

Yes. We only store a hash of IP + user-agent to limit spam (1 vote per battle per hash), no tracking cookie, no personal data. No account required, no email asked.

Why an arena rather than a classic benchmark?

Benchmarks (MMLU, GPQA, etc.) measure what models know how to answer in multiple-choice tests. The arena measures what you PREFER to read, blind, on real everyday prompts. It is complementary, and it is what best predicts usage satisfaction at 6 months.

How often is the ranking updated?

The compar:IA pool is re-synced monthly. Joute votes are aggregated in real time: you can refresh the ranking after your vote and your signal is already integrated.

What's next

The ranking evolves every week, don't miss it.

We send a monthly recap: who rises, who falls, and the models that collapse when you remove brand bias. No spam, one-click unsubscribe.

Subscribe to the monthly recap →Compare 2 AIs in detail