The Data Flywheel — AI-Enabled Attention Training

Status: Vision + roadmap, approved direction (Ian, 2026-07-04)

Owners: Ian (product, priorities) · Harsha (signal validity, data pipeline) · Fable (implementation prep) · External neuroscience advisor (methodology validation)

Relationship to other docs: extends PRODUCT_PLAN.md §6 (the EEG differentiator ladder) into an execution roadmap; inherits all guardrails from handoff/methodology-guardrails.md.

1. Why this is the moat — stated precisely

Every competitor sells an invisible effect and asks users to trust a study. Our position is the opposite: measure it, per person, and improve it per person. That position is only defensible if we systematically learn from our own data — and we are structurally the only product collecting the complete learning triplet:

	Stimulus (what audio, exactly, when)	Response (EEG during it)	Outcome (ratings, completion, retention)
Brain.fm	✅	❌ no sensor	partial
Endel	✅	❌	partial
Muse	❌ static content	✅	partial
Myndlift	❌ clinician-set	✅	partial
Second Attention	✅ (after PR below)	✅ raw EEG archived	✅ ratings + behavior

Stimulus→response→outcome, per user, across time, is what learning systems eat. Nobody else has the triplet; each new user and session compounds it. That is the flywheel: better data → better personalization → better results → more usage → better data.

The honest-measurement brand is what makes this sellable: "tuned to your measured alpha frequency, and here's the data" is a claim no incumbent can copy without rebuilding their product around a sensor.

2. What we capture today — audit (2026-07-04, grounded in code)

Triplet component	Today	Gap
Response	✅ Raw EEG archived per session (`msdf.tar.gz` in the `eeg-data` bucket: raw channels + `metadata.json` with device, timestamps, sample counts)	None material — this is the hard part and it's done
Outcome	✅ Ratings (`post_session_data`: focus, audio experience, time-feel, notes) · completion/duration/abandonment on the session row	`meditation_rating` is a crude 3-rating average — store the three raw values distinctly (kit v2 rating screen already collects them)
Stimulus	❌ Nothing. Only `recommended_session_id` (a static playlist pointer). No play/pause/seek timeline, no volume changes, no phase/entrainment parameter events, no background/foreground, no signal-quality timeline	This is the gap that matters. Sessions collected without it cannot be used for stimulus-response learning — ever. Retrofit is impossible
Alignment infra	⚠️ The MSDF file format already has a timestamped event stream (offsets from recording start)… but it defines only 4 event types (blink, jaw clench, headband on/off) and appears to have no callers	The vehicle exists; it just carries nothing

Conclusion: one small PR converts every future session into training data. That PR is the "first move" and is specified in §4.

3. The roadmap — five phases, each shippable alone

Phase 1 — Log the triplet (now; rides the Vayu Flow integration PRs)

Stimulus-timeline logging (§4), raw rating values, pre-session state, consent copy (§5). Zero user-visible change; converts the entire future dataset. Owner: Fable implements, Harsha reviews.

Phase 2 — Narrated insights (the LLM layer) (with V1 launch)

An LLM narrates honestly computed metrics into the personal "Session Insight" cards already designed in UI Kit v2 ("your alpha held longer in the back half; your last three Trees sessions outperformed Oxalis"). Strict rule: the model narrates measured facts; it never generates claims. Server-side (processing repo), cached per session. This is 90% of perceived "AI-powered" at the cost of an API call. Owner: Fable; copy boundary enforced by the claims validator.

Phase 3 — Personal calibration + per-user optimization (the scientific core)

The PRODUCT_PLAN "anti-quiz," now made concrete:

- Calibration session (~5 min): short audio probes at varied modulation depths / frequency offsets around the user's measured alpha peak; measure the brain's response (ASSR/engagement); set that user's starting parameters empirically.
- Continuous tuning: a contextual bandit / Bayesian optimizer over a deliberately tiny parameter space (IAF offset ±1 Hz, modulation depth, prominence). Small per-user data suffices because the space is tiny and the response is directly measurable.
- Not deep learning; classical, auditable, defensible. Methodology validated by the neuroscience advisor (§6), signal validity owned by Harsha.

Phase 4 — Cross-user priors (where the data compounds)

With calibrations from enough users: hierarchical/Bayesian mapping from cheap first-session features → good starting parameters. New users get a warm start no competitor can produce without our dataset. Also unlocks honest population insights ("users with your baseline profile respond best to…"). Batch, in second-attention-processing.

Phase 5 — Research bets (optional, later)

Fine-tune a pretrained self-supervised EEG foundation model (LaBraM-class, 2024–25 generation) for state detection (focused/drifting) on our 4-channel data — potentially beating band-ratio heuristics. And formalized n-of-1 self-experiments ("two weeks entrainment-on vs off — see your own data"), the cancellation-proof retention loop from PRODUCT_PLAN Rung 4.

Sequencing logic: each phase produces user value alone; each depends only on the previous; Phase 1 is prerequisite to everything and costs days, not weeks.

4. The first PR — stimulus-timeline logging (spec for review)

Principle: reuse the existing MSDF event stream (same clock as the EEG samples — offsets from recording start — so alignment is free) and extend metadata.json. No new infrastructure.

New event types (extend the existing EVENT_* constants in eegFileWriter.ts):

Event	Payload	When
`AUDIO_PLAY` / `AUDIO_PAUSE` / `AUDIO_SEEK`	position ms	player state changes
`TRACK_START`	song id	each track begins
`VOLUME_SET`	0–1	user volume change
`PHASE_CHANGE`	phase id	DUEP phase boundaries (Vayu Flow sessions)
`ENTRAINMENT_PARAMS`	carrier Hz, beat Hz, modulation depth, prominence	at start + every change
`APP_BACKGROUND` / `APP_FOREGROUND`	—	lifecycle (also relevant to the lock-screen bug domain — coordinate with Harsha's timing fixes)
`SIGNAL_QUALITY`	per-channel quality	sampled ~every 10s

Also:

- metadata.json gains a stimulus block: full session config, ordered song ids, entrainment metadata from the session row.
- Sessions without EEG log the same timeline to a lightweight post_session_data.stimulusSummary (events without the alignment requirement) so no-headband sessions still teach us outcome patterns.
- pre_session_data written at session start (intent, config, device state, local time-of-day) — the field exists and is currently unused.
- Rating write stores the three raw values (focus, audio, time-feel) alongside the existing fields.

Non-goals: no on-device learning, no new servers, no schema migrations beyond JSONB contents, no changes to any signal-processing code.

Size estimate: small — the writer, the event constants, and ~8 call sites in the player. Reviewable in one sitting.

5. Guardrails — what keeps this "us"

1. Claims stay measured. "Tuned to your measured alpha frequency" — yes. "AI optimizes your brain" — never. The claims validator vocabulary extends to all AI-layer copy.
2. Every learned adjustment is visible and auditable (PRODUCT_PLAN Rung 2). Personalization that users can inspect is a retention feature; invisible personalization is a trust liability.
3. Batch first. All learning runs offline in second-attention-processing against archived sessions; the app receives parameters, not models. Iterate in notebooks; ship configs.
4. Lanes. Harsha owns signal validity and the data pipeline. The advisor validates methodology. The LLM never touches raw claims. Product decisions stay with Ian.
5. Privacy is a feature, not a checkbox. Neural data is now explicitly regulated biometric data (Colorado Privacy Act amendment 2024; California SB 1223). Requirements before Phase 1 ships:
- Explicit consent at first EEG session, plain language. Draft:

> "Your brain activity recordings are used to measure your sessions and, over time, personalize your audio to your brain's responses. They're stored securely, never sold, and you can delete them any time in Settings."

- Delete-my-data already exists (account deletion RPC); extend to EEG-only deletion.
- Aggregate/population learning uses de-identified data only.

6. Questions for the neuroscience advisor (additions to the existing call agenda)

1. From Muse S Athena's montage (AF7/AF8/TP9/TP10), which response metric is trustworthy enough to optimize against in a short calibration — ASSR amplitude at the modulation frequency, an engagement index, alpha suppression? What probe duration per parameter setting gives a usable signal-to-noise?
2. Is within-session parameter adjustment (the bandit) defensible, or should adaptation happen only between sessions until we have more data?
3. What's the minimum evidence bar before telling a user "your optimal frequency is X" (vs. silently using it)?
4. Known failure modes of IAF estimation from temporal channels that our calibration design should control for?

7. Success metrics (honest ones)

- Phase 1: % of sessions with complete triplet logging (target: 100% of post-PR sessions).
- Phase 2: insight-card engagement; rating-completion rate holds or rises.
- Phase 3: within-user improvement on the calibrated metric vs. their own pre-calibration baseline (n-of-1, not population claims); self-rated focus trend.
- Phase 4: cold-start quality — first-session response of warm-started users vs. default-parameter users.
- North star: week-4 retention of users with ≥1 calibration vs. without. If personalization works, this is where it shows.