Tech Stack
Overview
Content creators face a fundamental problem: existing analytics tools are retrospective — they tell you what happened, not what will happen. Demographic-based audience models also miss the cognitive diversity that drives comment sections: two viewers of the same age and location may react in completely opposite ways to the same content.
AudienceLens is the digital-clone instantiation of an audience. Instead of segmenting viewers by age or geography, it constructs a roster of cognitive archetypes directly from a channel's real comment behavior, then runs those archetypes as LLM-powered agents against a draft script — surfacing controversy, consensus, and dead spots before publishing.
Core Innovation: Cognitive Archetype Extraction (CAE)
CAE clusters viewers along four cognitive dimensions that demographic models entirely miss:
- Attention patterns — which content layers a viewer is sensitive to (price action, technical analysis, regulation, on-chain data, macro, …)
- Reasoning style — data-driven vs. narrative-driven vs. emotional intuition
- Risk cognition — optimism bias, pessimism bias, hedge thinking, gambler mentality
- Social orientation — opinion leader, follower, contrarian, observer
- Distribution Alignment (DA) — `1 − JSD(P_sim, P_real)` over sentiment buckets
- Topic Coverage (TC) — Jaccard similarity over discussed topics
- Controversy Prediction (CP) — F1 over controversial-segment set membership
- Engagement Calibration (EC) — Pearson correlation between predicted and actual likes
- Web3 / Crypto KOLs previewing audience reactions before publishing high-stakes market commentary
- AI & Tech creators stress-testing technical scripts against real follower-base archetypes
- Content strategists identifying controversial segments and rewriting them before they go live
- Creator-economy researchers studying the cognitive structure of online audiences beyond demographic proxies
Each user is encoded as a 22-dimensional feature vector across topic distribution, sentiment patterns, linguistic style, social behavior, and stance stability. HDBSCAN auto-discovers the cluster structure, and an LLM synthesizes each cluster into a labeled archetype with a 3–4 sentence cognitive profile and few-shot example comments.
Simulation Pipeline
The simulation runs in three phases:
1. Attention Filter — Jaccard similarity between archetype attention topics and script topics determines which archetypes engage at all (`comment_and_reply`, `comment`, `watch_only`, `skip`). 2. Comment Generation — Each engaged archetype generates a JSON-structured comment via Chain-of-Thought prompting grounded in real few-shot examples from cluster representatives. All archetype calls run in parallel via `asyncio.gather`. 3. Sensitivity Analysis — The script is segmented and each `[segment × archetype]` cell is scored from −1.0 to 1.0, producing a heatmap that classifies every segment as consensus, controversial, or cold.
Fidelity Evaluation
A composite fidelity score quantifies how well the simulation matches real audience behavior across 4 independent metrics:
The composite score `0.30·DA + 0.30·TC + 0.20·CP + 0.20·EC` is bounded in `[0, 1]`, with weight redistribution if any metric is unavailable.
Use Cases
Tech Stack
Backend: FastAPI + async SQLAlchemy + PostgreSQL + Redis + DeepSeek API (OpenAI-compatible). NLP: `sentence-transformers`, `HDBSCAN`, `scikit-learn`. Frontend: React 19 + TypeScript + Vite 7 + TailwindCSS 4 + Recharts + Framer Motion, with full bilingual (ZH/EN) support.