Methodology
You be the judge, not us.
LLM Arena compares the visible outputs of popular models on practical, funny, and easy-to-inspect prompts. We don't pick a winner — you vote blind for the output you like best, then see which model made it and how everyone else voted.
Fairness Rules
- Every model receives the exact same prompt text.
- Runs use the same temperature and token cap per showcase.
- The first response is kept. No retries, no repair prompts, no follow-up.
- Outputs are saved with model name, version string, timestamp, token usage, and latency when available.
- We don't rank the outputs. You vote blind for the one you like, then reveal which model made it.
Security Rules
Model-generated HTML and JavaScript are untrusted code. Build outputs render inside sandboxed iframes with scripts allowed, but without same-origin privileges.
<iframe sandbox="allow-scripts" referrerpolicy="no-referrer"></iframe> Generated HTML is not injected into the main page. SVG outputs are isolated or sanitized before publication.
What It Is Not
This is not an ELO leaderboard, a scientific benchmark, or a claim that one model is globally better than another. It is a repeatable content format for seeing real model behavior side by side.