CryptoFintech Digital Clone
Back to Projects

AudienceLens

Beta

A virtual audience simulation platform that helps Web3/AI content creators preview how their audience will react to a video — before they hit publish.

Zhenhang ShangKani Chen
digital cloneaudience simulationLLM agentsWeb3content creation

22

Feature Dimensions

6

Content Sectors

4

Fidelity Metrics

3

Pipeline Phases

How It Works

1

Channel Ingest

Scrape a YouTube channel's comment history via the YouTube Data API and aggregate per-user activity into a corpus.

2

Cognitive Archetypes

Extract a 22-dim feature vector per user, cluster via HDBSCAN, then synthesize each cluster into a named cognitive archetype with an LLM.

3

Reaction Simulation

For each engaged archetype, generate a CoT + few-shot grounded comment and score every script segment for sensitivity.

4

Fidelity Scoring

Compare simulated vs. real comments across 4 metrics (JSD, Jaccard, F1, Pearson) for a composite fidelity score in [0, 1].

Live Preview

https://audiencelens.ustlabproject.live

Loading preview...

Tech Stack

FastAPIReact 19DeepSeek LLMHDBSCANPostgreSQL

Overview

Content creators face a fundamental problem: existing analytics tools are retrospective — they tell you what happened, not what will happen. Demographic-based audience models also miss the cognitive diversity that drives comment sections: two viewers of the same age and location may react in completely opposite ways to the same content.

AudienceLens is the digital-clone instantiation of an audience. Instead of segmenting viewers by age or geography, it constructs a roster of cognitive archetypes directly from a channel's real comment behavior, then runs those archetypes as LLM-powered agents against a draft script — surfacing controversy, consensus, and dead spots before publishing.

Core Innovation: Cognitive Archetype Extraction (CAE)

CAE clusters viewers along four cognitive dimensions that demographic models entirely miss:

  • Attention patterns — which content layers a viewer is sensitive to (price action, technical analysis, regulation, on-chain data, macro, …)
  • Reasoning style — data-driven vs. narrative-driven vs. emotional intuition
  • Risk cognition — optimism bias, pessimism bias, hedge thinking, gambler mentality
  • Social orientation — opinion leader, follower, contrarian, observer
  • Each user is encoded as a 22-dimensional feature vector across topic distribution, sentiment patterns, linguistic style, social behavior, and stance stability. HDBSCAN auto-discovers the cluster structure, and an LLM synthesizes each cluster into a labeled archetype with a 3–4 sentence cognitive profile and few-shot example comments.

    Simulation Pipeline

    The simulation runs in three phases:

    1. Attention Filter — Jaccard similarity between archetype attention topics and script topics determines which archetypes engage at all (`comment_and_reply`, `comment`, `watch_only`, `skip`). 2. Comment Generation — Each engaged archetype generates a JSON-structured comment via Chain-of-Thought prompting grounded in real few-shot examples from cluster representatives. All archetype calls run in parallel via `asyncio.gather`. 3. Sensitivity Analysis — The script is segmented and each `[segment × archetype]` cell is scored from −1.0 to 1.0, producing a heatmap that classifies every segment as consensus, controversial, or cold.

    Fidelity Evaluation

    A composite fidelity score quantifies how well the simulation matches real audience behavior across 4 independent metrics:

  • Distribution Alignment (DA) — `1 − JSD(P_sim, P_real)` over sentiment buckets
  • Topic Coverage (TC) — Jaccard similarity over discussed topics
  • Controversy Prediction (CP) — F1 over controversial-segment set membership
  • Engagement Calibration (EC) — Pearson correlation between predicted and actual likes
  • The composite score `0.30·DA + 0.30·TC + 0.20·CP + 0.20·EC` is bounded in `[0, 1]`, with weight redistribution if any metric is unavailable.

    Use Cases

  • Web3 / Crypto KOLs previewing audience reactions before publishing high-stakes market commentary
  • AI & Tech creators stress-testing technical scripts against real follower-base archetypes
  • Content strategists identifying controversial segments and rewriting them before they go live
  • Creator-economy researchers studying the cognitive structure of online audiences beyond demographic proxies

Tech Stack

Backend: FastAPI + async SQLAlchemy + PostgreSQL + Redis + DeepSeek API (OpenAI-compatible). NLP: `sentence-transformers`, `HDBSCAN`, `scikit-learn`. Frontend: React 19 + TypeScript + Vite 7 + TailwindCSS 4 + Recharts + Framer Motion, with full bilingual (ZH/EN) support.