AudienceLens

Overview

Content creators face a fundamental problem: existing analytics tools are retrospective — they tell you what happened, not what will happen. Demographic-based audience models also miss the cognitive diversity that drives comment sections: two viewers of the same age and location may react in completely opposite ways to the same content.

AudienceLens is the digital-clone instantiation of an audience. Instead of segmenting viewers by age or geography, it constructs a roster of cognitive archetypes directly from a channel's real comment behavior, then runs those archetypes as LLM-powered agents against a draft script — surfacing controversy, consensus, and dead spots before publishing.

Core Innovation: Cognitive Archetype Extraction (CAE)

CAE clusters viewers along four cognitive dimensions that demographic models entirely miss:

✓ Attention patterns — which content layers a viewer is sensitive to (price action, technical analysis, regulation, on-chain data, macro, …)
✓ Reasoning style — data-driven vs. narrative-driven vs. emotional intuition
✓ Risk cognition — optimism bias, pessimism bias, hedge thinking, gambler mentality
✓ Social orientation — opinion leader, follower, contrarian, observer

Each user is encoded as a 22-dimensional feature vector across topic distribution, sentiment patterns, linguistic style, social behavior, and stance stability. HDBSCAN auto-discovers the cluster structure, and an LLM synthesizes each cluster into a labeled archetype with a 3–4 sentence cognitive profile and few-shot example comments.

Simulation Pipeline

The simulation runs in three phases:

1. Attention Filter — Jaccard similarity between archetype attention topics and script topics determines which archetypes engage at all (`comment_and_reply`, `comment`, `watch_only`, `skip`). 2. Comment Generation — Each engaged archetype generates a JSON-structured comment via Chain-of-Thought prompting grounded in real few-shot examples from cluster representatives. All archetype calls run in parallel via `asyncio.gather`. 3. Sensitivity Analysis — The script is segmented and each `[segment × archetype]` cell is scored from −1.0 to 1.0, producing a heatmap that classifies every segment as consensus, controversial, or cold.

Fidelity Evaluation

A composite fidelity score quantifies how well the simulation matches real audience behavior across 4 independent metrics:

✓ Distribution Alignment (DA) — `1 − JSD(P_sim, P_real)` over sentiment buckets
✓ Topic Coverage (TC) — Jaccard similarity over discussed topics
✓ Controversy Prediction (CP) — F1 over controversial-segment set membership
✓ Engagement Calibration (EC) — Pearson correlation between predicted and actual likes

The composite score `0.30·DA + 0.30·TC + 0.20·CP + 0.20·EC` is bounded in `[0, 1]`, with weight redistribution if any metric is unavailable.

Use Cases

✓ Web3 / Crypto KOLs previewing audience reactions before publishing high-stakes market commentary
✓ AI & Tech creators stress-testing technical scripts against real follower-base archetypes
✓ Content strategists identifying controversial segments and rewriting them before they go live
✓ Creator-economy researchers studying the cognitive structure of online audiences beyond demographic proxies

Tech Stack

Backend: FastAPI + async SQLAlchemy + PostgreSQL + Redis + DeepSeek API (OpenAI-compatible). NLP: `sentence-transformers`, `HDBSCAN`, `scikit-learn`. Frontend: React 19 + TypeScript + Vite 7 + TailwindCSS 4 + Recharts + Framer Motion, with full bilingual (ZH/EN) support.

How It Works

Live Preview

Tech Stack

Overview

Core Innovation: Cognitive Archetype Extraction (CAE)

Simulation Pipeline

Fidelity Evaluation

Use Cases

Tech Stack