HappyHorse-1.0 is Crushing the AI Video Leaderboards. So Why Can't Anyone Actually Use It?

If you spend as much time refreshing the Artificial Analysis Video Arena as I do, you probably did a double-take last week. Out of nowhere, a completely unknown model called HappyHorse-1.0 hijacked the #1 spot for both text-to-video (T2V) and image-to-video (I2V). No press release. No recognizable lab attached. Just a barebones website with a bunch of GitHub links that currently lead to nowhere.

If you evaluate these models for actual production pipelines, you've already learned to be deeply cynical about leaderboard hype. So let's strip away the noise. Here is exactly what we know about HappyHorse-1.0, what the rumor mill is making up, and why you can't actually build with it yet.

The Math Doesn't Lie (But It Might Fluctuate)

Let's talk about why its rank actually matters. The Artificial Analysis arena isn't based on those cherry-picked, self-reported lab benchmarks that companies love to tweet. It runs strictly on blind human preference. You enter a prompt, two anonymous models generate a video side-by-side, and you click the one that looks better. It uses the exact same Elo math that ranks chess grandmasters.

Right now, HappyHorse is sitting at an Elo of 1333 for T2V (without audio), comfortably beating out the previous heavyweight, Dreamina Seedance 2.0 (1273). A 60-point gap in this system is massive — it basically means HappyHorse wins nearly 60% of its head-to-head blind matchups. In the I2V category, it's dominating even harder with a 1392 score.

Interestingly, once you ask for integrated audio, the rankings flip. Seedance 2.0 claws its way back to #1, edging out HappyHorse by 14 points in T2V and a razor-thin 1-point margin in I2V.

One major caveat before we crown a new king: Elo scores are incredibly volatile when a model is fresh. Seedance 2.0 has a massive cushion of over 7,500 sample votes. HappyHorse's sample size is still baking. These numbers will swing in the coming weeks.

Under the Hood (Allegedly)

So what exactly is powering this thing? If you trust their marketing domains (happyhorses.io and happy-horse.art), it's an absolute beast.

They claim it's a unified, 40-layer single self-attention Transformer. According to the specs, the middle 32 layers share parameters across text, video, and audio modalities with zero cross-attention, while the first and last four layers handle modality-specific projections. One of the pages also casually drops a 15-billion parameter count. They're boasting native multilingual support for joint audio-video generation (handling Chinese, English, Japanese, Korean, German, French, and supposedly Cantonese) alongside crazy fast inference times — like hitting a 256p clip in two seconds, or a 1080p clip in 38 seconds on an H100 GPU.

It sounds incredible. It is also 100% unverified.

Because we don't have the weights, no third party has been able to reverse-engineer the architecture, check the VRAM usage, or test those wild multilingual lip-sync claims.

The Phantom Lab and the WAN 2.7 Theory

This brings us to the elephant in the room: who actually built HappyHorse?

Artificial Analysis officially lists the team as "pseudonymous." Naturally, tech Twitter has gone into overdrive trying to unmask them, with the consensus heavily leaning toward a Chinese AI lab.

The most persistent theory right now is that HappyHorse-1.0 is actually a stealth beta test for Alibaba's unreleased WAN 2.7. It makes sense on paper. The current WAN 2.6 is lagging behind at an 1189 Elo, and anonymous pre-launch drops are becoming a standard playbook in the Asian AI ecosystem. Just look at February — a mystery model called "Pony Alpha" suddenly appeared on OpenRouter, sparked a massive guessing game, and turned out to just be Z.ai stress-testing GLM-5.

But frankly, parallel timelines don't equal proof. Until we see leaked weights, API fingerprinting, or an insider breaking their NDA, the WAN 2.7 connection is purely fan fiction.

What This Means for Builders Today

Here's the reality check for anyone actually trying to ship a product right now: HappyHorse-1.0 is a ghost.

Their website prominently claims that "everything is open" — base models, distilled models, super-res, the inference code, you name it. Yet as of April 8, every single HuggingFace and GitHub link on their site just says "coming soon." There is no documented pricing. There is no API endpoint. There is no SLA.

From a practical standpoint, the real leaderboard currently starts at position #3. If you need to integrate a model today:

SkyReels V4 ($7.20/min) is arguably your best bang-for-the-buck among accessible options.
PixVerse V6 ($5.40/min) remains the top budget pick.
Kling 3.0 Pro ($13.44/min) is sitting there if you absolutely need native 1080p out of the gate.

The Bottom Line

HappyHorse-1.0 proves that video generation quality just took another massive leap forward. The blind Elo votes are a real signal. But until somebody actually pushes a commit to a public repo, it's nothing more than a highly impressive flex.

We'll be updating this post the moment weights drop or API access opens. Until then, keep building with what you can actually ship.