Can AI Clones Predict How DAOs Vote? Digital Clones for On-Chain Governance

In February, an Aave temperature check titled "Aave Will Win Framework" went to a vote and closed at 52.58% in favor, 42% against. We pointed our system at the same proposal — without letting it see the real result — and it produced 58% in favor, 42% against. Same winner, almost the same shape. We did it by spinning up AI "clones" of past Aave voters and having each one read the proposal individually.

This post is about how those clones work, what they get right, and — just as importantly — what they get wrong. You can see the live system at voteprediction.xyz.

What's a DAO, and why predict its votes?

A DAO (Decentralized Autonomous Organization) is, in plain terms, a company-like entity governed by token holders instead of a board. Imagine a shareholder vote, except the shareholders are anonymous wallets scattered across the world, the votes are public on a blockchain, and most holders never bother to show up. The protocols we study — Aave, Uniswap, and Curve — manage billions of dollars between them. When their token holders vote, real money moves: collateral types change, fees get adjusted, treasuries get spent.

Predicting these votes matters for several reasons:

✓Proposers want to know whether a proposal will pass before they invest weeks of political capital writing and defending it.
✓Delegates — voters who hold others' voting power — want early signal on where their community is leaning.
✓Researchers want to study whether DAO governance actually works, or whether a few whales decide everything.

Most DAO proposals pass quietly with lopsided support — but the ones that don't are usually the ones that matter. A small but consequential slice of votes are decided by tight margins, and those are exactly the cases where having an early read on community sentiment is most valuable.

What is a "digital clone"?

A digital clone is a general concept in behavior modeling. The idea is to take whatever historical traces a person has left behind — comments, survey answers, purchase records, forum posts, past votes, anything that reflects how they think and decide — and use that data to build a computational stand-in that simulates how that specific person would behave in a new situation. It's not a chatbot of them, and it's not a deepfake. It's a model of their decision-making, grounded in evidence of how they've actually behaved before.

In our case, the historical traces are on-chain voting records, and the new situation is a governance proposal. So a digital clone here is an AI stand-in that reads a new proposal the way one specific voter would, based on how they voted in the past: "given this voter's history, how would they likely react to this proposal?"

It helps to contrast with the obvious alternatives:

✓Naive history: "this voter said yes 80% of the time, so predict yes." Ignores what the proposal is actually about.
✓Proposal-only LLM: have a language model read the proposal and guess the outcome. Ignores who's actually voting.
✓Digital clone: have a language model read the proposal while conditioned on a specific voter's history. Then aggregate across all voters, weighted by their actual voting power.

The third option is the bet of this project: that a voter's recent behavior pattern is a useful signal, but it only becomes a prediction when combined with the actual content of a new proposal. The live system at voteprediction.xyz runs on exactly this loop — new governance proposals come in, clones read them, and predictions appear before the vote closes.

The headline result

To stress-test the system we ran a separate, dedicated experiment focused on close calls: 19 finished proposals where the actual vote margin between the top two choices was under 10%. These are the cases where prediction is hardest and matters most. (The proposals you'll see live on voteprediction.xyz are a different, more recent set — the close-calls evaluation is an offline benchmark we ran specifically to measure accuracy on hard cases, while the live dashboard tracks current and upcoming governance votes.)

Proposal-only baseline

36.8%

7 of 19 winners correct

Digital clones

68.4%

13 of 19 winners correct

↑+31.6 percentage points · p = 0.035 (McNemar's test)

That gap is statistically significant at p = 0.035 (one-sided McNemar's test). On easier proposals — landslides — both methods do well; the gap appears precisely where it should: on the contested votes.

How the pipeline actually works

At a high level, the system has four stages:

Step 1

Collect

Snapshot proposals, votes, delegations

Step 2

Build clone

Per-voter profile with time-aware history cutoff

Step 3

Predict

LLM clone reads the new proposal as that voter

Step 4

Aggregate

Token-weighted tally and display

Let's walk through it using one real proposal as a thread: the "Aave Will Win Framework" temperature check from the opening.

1. Data collection

We pull proposals, votes, and delegation data from Snapshot's GraphQL API for the three DAOs we cover, with incremental sync and idempotent deduplication so we don't double-count. For each vote we record the address, the choice, the voting power at the proposal's snapshot block, and the timestamp.

Across the three DAOs we currently track 1,322 proposals and roughly 3.35 million individual votes from tens of thousands of distinct addresses. Aave dominates the dataset both in proposal count and in active voter base.

DAO	Snapshot space	Proposals	Notes
Aave	aavedao.eth	928	Largest, most active governance
Curve	curve.eth	203	Frequent technical proposals
Uniswap	uniswapgovernance.eth	191	Highest concentration of whale voting power
Total		1,322

This gives us, for any given voter, a timestamped sequence of past decisions — the raw material for a clone.

2. Building a clone

For each voter we compute a compact profile from their voting history — by far the most important input the clone receives. The profile captures participation rate, how often they vote for / against / abstain, average voting power, and a consistency score, but the raw history of past votes is the signal that does most of the work. We tier voters by data quality — "high" for ≥20 historical votes, "medium" for ≥5, anything below that we treat as cold-start (more on that in a moment).

Profile snapshots respect time. Whenever we predict a proposal, the clone is built using only votes that happened before the proposal's snapshot block — actually, 24 hours before, to be safe. It would be very easy to accidentally let a voter's future votes leak into the clone that's "predicting" them, and the resulting numbers would look great and mean nothing. We invested real engineering in making sure that doesn't happen.

A tempting wrong approach is to build profiles statically — one frozen snapshot per voter, computed once and reused. That doesn't work in practice: voters' behavior drifts week to week, and a profile from a month ago is already a worse predictor than one from yesterday. Instead we use dynamic profiles: every day, each voter's profile is rebuilt from the prior day's data, with caching and incremental updates so we're not recomputing the whole history every run. We also pin the voter cohort across releases so that comparisons between simulation runs aren't polluted by membership churn — the clones change because their behavior changed, not because we silently swapped the population.

3. Asking the clone to vote

This is the step that turns history into prediction. For each voter, we use their past voting record as the raw material for a digital clone — a behavioral model of how that specific person tends to engage with governance proposals: which kinds they show up for, which kinds they push back on, how willing they are to break with the majority. We then put a new proposal in front of that clone and let it decide how the voter would react.

The clone produces three things for every proposal it sees: whether the voter would participate at all, which choice they would pick if they did, and a short natural-language rationale tying the decision back to the voter's history. That rationale is the part that matters for trust — it lets us go back later and understand why a clone landed where it did, instead of treating the prediction as an opaque score.

4. Aggregating the clones

Once every clone has voted, we aggregate: each clone's predicted choice contributes proportionally to that voter's actual on-chain voting power, exactly the way Snapshot would tally a real vote. We also produce an "equal weight" version where every address counts the same, which is useful for spotting whale-vs-crowd splits.

The output is a predicted distribution across the proposal's choices, plus an aggregate confidence number, plus the full set of individual clone responses for inspection.

Design choices

A handful of decisions shaped this system more than anything else. These are the ones worth knowing about if you're building something similar.

Cold-start voters

Some voters have almost no history, or none at all. There is no honest way for a clone to predict them, but it's tempting to fill in a guess anyway.

We chose the boring answer: voters with fewer than 5 historical votes don't get a clone, and when the model fails to produce a valid response, we default `will_vote` to false rather than fabricating a choice. Absence is a safer default than confident invention. It means our predictions sometimes underestimate turnout, but they don't make up votes that aren't there.

Temporal leakage

Already mentioned, but worth emphasizing: every clone is built from a snapshot of voter history frozen 24 hours before the proposal goes live. If you don't do this, your evaluation accuracy will be misleadingly high, because the clone will effectively have seen the answer.

"Won't vote" versus "abstain"

These look similar and mean very different things. "Won't vote" means the voter doesn't show up at all. "Abstain" means they show up and explicitly cast an abstain ballot, which counts toward quorum and signals presence.

Our schema separates them into two fields, the prompt explicitly distinguishes them, and the aggregation pipeline handles them on different paths. It sounds pedantic until you realize that conflating them inflates participation numbers and corrupts every downstream metric.

Class imbalance in voting history

There is a structural skew in DAO governance data: most proposals pass, and most votes that get cast are votes for something. "Against" and "abstain" votes are comparatively rare. If you train or condition a model naively on this distribution, the clone will learn that the safe bet is always "yes" — and it will be right often enough on easy proposals that the imbalance hides until you hit a contested vote, where it matters most.

We address this in two places. When constructing the history context for a clone, we deliberately surface examples of the voter's opposing and abstain behavior alongside their typical "for" votes, so the model sees that this voter is capable of disagreeing. And when evaluating the system, we report accuracy on the close-call subset specifically, where a "predict for on everything" strategy collapses to coin-flip performance. The 36.8% baseline number from the headline result is, in part, a measure of how much that naive bias costs you.

Overconfident clones

Language models love to say "100% confidence" even when they're guessing. We push back in two ways: clones for thin-history voters get their confidence dampened, and the user-facing UI carries an explicit tooltip explaining that confidence drops when historical signal is weak and that the model cannot predict newly participating voters. Confidence is presented as a property of the prediction, not a guarantee about reality.

An honest miss

A few weeks later, an Aave proposal titled "[ARFC ADDENDUM] Mandatory Disclosures and Conflict-of-Interest Voting" went up. Our clones predicted it would pass comfortably — 86.76% in favor versus 13.24% against, at moderately high aggregate confidence. The actual vote went the other way: 53.13% against, 46.57% for. The proposal failed, and we were not just slightly wrong about the magnitude — we were wrong about the winner.

Predicted (digital clones)

Actual result

For

86.76%

46.57%

Against

13.24%

53.13%

Abstain

—

0.30%

Predicted winner: For → Actual winner: Against — clones called the wrong side.

Why? Because the clones were doing exactly what they're designed to do: extrapolating from how those voters had historically behaved on governance-process proposals. Mandating disclosures and conflict-of-interest rules sounds, on its face, like the kind of motherhood-and-apple-pie reform that delegates routinely wave through. The clones picked up on that pattern and projected it forward. What they couldn't see was the off-chain discussion in the days leading up to the vote — concerns raised in forums and group chats about the specific definitions, enforcement mechanisms, and second-order effects on legitimate voters. By the time ballots were cast, sentiment had hardened against the proposal in a way that no historical signal could have anticipated.

This is the limitation worth being explicit about: digital clones model the past, not opinion shifts. They are good at telling you what voters' established patterns predict. They cannot tell you when a community is about to change its mind.

We think this is a feature, not a bug, as long as we're honest about it. A prediction system that quietly hedges to look smart on every case is much less useful than one that tells you "based on history, here's what we expect — and here's where history might be wrong."

What the platform actually shows

The user-facing platform is deliberately small. Two things matter:

✓A dashboard of upcoming, running, and finished proposals across the DAOs we cover, each with a predicted outcome and a confidence number.
✓A detail page for any single proposal that shows three distributions side by side: what the digital clones predicted, what a proposal-only baseline predicted, and — once the vote closes — what actually happened. Sample clone reasoning is shown so you can read why individual voters were predicted to vote a certain way.

You can explore all of this directly at voteprediction.xyz.

What this is not

Because this project sits at an awkward intersection of AI and on-chain governance, it's worth stating the boundaries directly:

✓It's not impersonation. Clones model voting patterns, not personal identity, and they don't speak as anyone.
✓It's not vote buying or manipulation. The predictions are about understanding sentiment, not influencing it.
✓It's not financial advice. Knowing a vote will likely pass tells you nothing about whether the underlying decision is good.
✓It's not a crystal ball. It's an extrapolation from observed behavior, and we try to be transparent about exactly when that extrapolation is on solid ground and when it isn't.

Closing

Back to the "Aave Will Win Framework" proposal. Here is what the clones said versus what actually happened:

Predicted (digital clones)

Actual result

For

58.12%

52.58%

Against

41.88%

42.00%

Abstain

—

5.42%

Predicted winner: For → Actual winner: For — correct call, magnitude within 6 points.

Same winner, same general shape, with the clones running a few points hot on enthusiasm — exactly the kind of prediction that's actually useful: directionally right and within striking distance of the magnitude. And, just as importantly, contrast that with the "Mandatory Disclosures" miss earlier in the post: the clones told us it would pass overwhelmingly, and the community shifted out from under them. Both outcomes are honest. The system that produces the first kind of result has to be the same system that occasionally produces the second.

That's the honest pitch for digital clones in DAO governance. They will not always pick the winner. But they will tell you, well before voting opens, what the historical voting base would do with a given proposal, where the likely flashpoints are, and which voters carry the swing weight. For a domain where most proposals pass quietly and the contested ones decide everything, that turns out to be a useful thing to know — as long as you also remember when to distrust the answer.