MindReader

Mind Reading 101

Note: MindReader models the cortex, not the whole brain. That is the outer surface where attention, memory encoding etc are represented in TRIBE v2.

SECTION 01

Defining our 'brain'

The cortex is the outer sheet of the brain — two to four millimeters thick, folded densely enough that, unfurled, it covers about the area of a dinner napkin. It has internal structure: groupings of regions that fire together, studied by neurologists for decades.

How the cortex is mapped

For research purposes, the cortex is mapped onto a standardized surface called fsaverage5: a reference template with 20,484 vertices, each one a point on the cortical sheet that can be tracked across subjects. When neuroscientists talk about “a region of the brain doing X,” they usually mean the average activity at some collection of these vertices, across many people, doing X.

In 2011, Yeo and colleagues at Harvard published a reproducible parcellation of this surface[1]. They didn't decide what the cortex should look like; they let the cortex's resting activity reveal its own organization. That map is now one of the most reproduced findings in cognitive neuroscience.

You don't have to memorize the names. The point: the cortex has internal structure that's the same across people, and that structure has been studied by neurologists for decades. Everything later in this page is built on top of that cortical structure.
SECTION 02 · BUT
↺ BUT

We don't have one brain. We have billions, all different.

Emotional intelligence has been an academic field for thirty-five years. Three traditions try to measure it (ability tests, self-report questionnaires, informant ratings) and they disagree with each other. The deeper problem isn't methodology — it's that the interesting questions don't have correct answers.

The three traditions, the Marcus thought experiment, and the 2023 EQ-Bench / MMLU collapse

Ability tests like the MSCEIT score people on emotion problems where the “correct” answer is whatever the majority of experts say.[2] Consensus-based, which is to say circular. Self-report questionnaires (EQ-i, TEIQue) ask people what they think they're good at — measuring reputation, not capacity, and overlapping heavily with personality and IQ.[3] Informant ratings (ESCI) ask the people around someone — measuring social legibility, which is real but not the same thing.

After thirty-five years and dozens of meta-analyses, the numbers don't agree, and the field cannot say what its central object is.

Marcus's father just died. What should you say to him?

American workplace: something direct — I'm so sorry, let me know if there's anything I can do. Japanese workplace: silence and a formal condolence card; direct emotional language would be intrusive. A stoic colleague: ask Marcus about a work project — distraction as kindness. A therapist: ask Marcus what was coming up for him.

All four are defensible. Any “correct” answer a test assigns smuggles in a cultural norm, a professional norm, or an individual preference.

In 2023, EQ-Bench began scoring large language models on emotional reasoning. The benchmark correlated 0.97 with MMLU, the standard general-knowledge benchmark.[4] Whatever EQ-Bench was measuring was almost identical to general reasoning. The field had been measuring general cognitive ability with emotional wallpaper.

No brain-based model of emotional intelligence
has yet been proposed.
Smith, Killgore & Lane · Biological Psychology · 2018

That sentence has stood for eight years. It still stands.

The trouble with measuring emotional intelligence isn't that we lack the right test. It's that the construct itself fragments the moment you try to operationalize it across cultures, professions, or theories. Three of those items follow — try them and see what your scores end up correlating with.
Tap an answer for each item · result appears below
Try three items from a real EQ self-report scale.
Pick one option per item. After the third answer, a panel appears showing what these answers actually correlate with — including a couple of things you'd never expect.
ITEM 01 · TEIQue, adapted
"I usually find it difficult to regulate my emotions."
ITEM 02 · SSEIT, adapted
"On the whole, I'm able to deal with stress."
ITEM 03 · WLEIS, adapted
"I often pause and think about my feelings."

What just happened

You answered three items. The same three items produce different mean scores in different cultures, and correlate strongly with personality traits unrelated to "emotional intelligence."

3.4US sample mean
2.9East Asia mean
r = 0.61w/ neuroticism
r = 0.43w/ openness

This is what thirty-five years of measurement looks like. Items that vary by culture, correlate with personality traits, and never converge on a stable construct. The field has never agreed on what it is measuring.

SECTION 03 · THEREFORE
→ THEREFORE

So we use a population-average cortex.

Individual brains vary; self-report can't be trusted. The move is to step back from individuals and look for what's stable across many of them. fMRI researchers have been doing this for decades — aggregating across subjects to find the response patterns that are population-wide.

How MindReader inherits the population-average move

Individual brains differ enormously in their fine-grained activity. The response patterns across cortex turn out to be surprisingly stable. MindReader doesn't measure your brain — it predicts the response of the average human cortex across the 720+ subjects in TRIBEv2's training set.[5] When we say "the cortex did X," we mean: averaged across those people, the cortex did X.

For content evaluation this is exactly the right unit of analysis. What someone making a video, ad, or script actually wants to know isn't how will this one person respond? — it's how will humans on average respond? A focus group is one (broken) way to ask. The cortex, simulated at the population level, is another.

MindReader is built on a population-average unit, not individual variance — that's the right scale for content questions, and it's also a real limitation we own. You won't get a personal readout. You'll get a stable estimate of what humans, broadly, would do.
SECTION 04 · BUT
↺ BUT

The next problem: lab tasks are not real content.

Putting a brain in a scanner costs about $600/hour. A single naturalistic study runs $150K–$300K and 6–12 months. That cost is the reason almost everything we know about cognition was learned with flashcards, not movies — until Uri Hasson's lab at Princeton (2004) showed that when many people watched the same movie, large swaths of cortex synchronized across them.[6] The cortex was responding to real content, and the response was measurable.

Why the lab-task tradition stuck for so long, and what Hasson changed

For most of the history of cognitive neuroscience, the way you studied a brain was by giving it a tightly controlled task. Identify the face. Remember the word. Solve the puzzle. Beautiful science — reproducible, falsifiable. It also bore almost no resemblance to how brains actually spend their time.

Hasson's paradigm — “naturalistic stimuli” — is the methodological bridge between the lab task and the real world. Without it, the seven cortical systems MindReader exposes would only be defined by how they respond to flashcards. With it, we can see them respond to content as it is actually consumed. The entire training-data regime MindReader inherits is downstream of this shift.

Hasson made it legitimate to study the cortex's response to real content — movies, narrative audio, full videos — instead of stripped-down lab probes. Every model trained on naturalistic content, including TRIBEv2, depends on this paradigm shift.
SECTION 05 · THEREFORE
→ THEREFORE

We'd need to predict the cortex's response
without putting anyone in a scanner.

fMRI is too expensive and too slow to be a tool. A real comparison tool needs a model that predicts brain response from input — a simulation, not a measurement. Between 2021 and 2024, a wave of papers from MIT, Princeton, Paris, and UT Austin made that possible, one capability at a time.

The five-year arc — Schrimpf, Goldstein, Caucheteux, Tang

In 2021, Schrimpf and colleagues at MIT showed that the internal representations of large language models — GPT-2, BERT, others — predicted human brain activity in language regions, with surprising accuracy.[7] The same statistical structure that made language models work showed up in cortex.

In 2022, Goldstein and colleagues at Princeton went further: LLMs and human brains share computational principles for language — predicting the next word, integrating context, building meaning across time.[8] Not just similar outputs. Similar processing.

That same year, Caucheteux and King in Paris formalized “brain score”: a metric for how brain-like a model's representations are.[9] A way to benchmark not what models can do, but how closely they resemble what cortex does on the same input.

In 2023, Tang and the Huth lab at UT Austin demonstrated semantic decoding from fMRI: continuous reconstruction of language from brain recordings, using a language model as the decoding bridge.[10] Not telepathy — but proof that the cortex-to-model mapping was real, bidirectional, and useful.

Each paper added one move. By 2024 the open question wasn't whether encoding models could predict brain response — it was whether the technique could extend beyond text, to images, audio, and video.

Five years, four labs, one direction: from "LLMs predict cortex on text" to a measurable, bidirectional bridge between models and brains. Without this arc, no one would have a model worth handing a video to.
Hover or tap any dot · paper details appear in the card
Five years of brain encoding models, leading somewhere.
Each datapoint adds one capability. The trendline is what made TRIBE inevitable.
HOVER OR TAP A DATAPOINT
The encoding-model arc
2021 — 2026
Five papers, four labs, one direction. Each one made the next inevitable.
SECTION 06 · THEREFORE
→ THEREFORE

TRIBEv2 — the model that ties it together.

On March 26, 2026, Meta FAIR officially released TRIBEv2 (Trimodal Brain Encoder) — the integration the previous five years had been pointing at. It handles audio, video, and language together (the “hard case”), trained on 720+ subjects watching naturalistic content, released CC BY-NC. MindReader is built on it.

What TRIBEv2 actually does — architecture, output, and the honest disclosure

Video is the hard case because audio, visuals, and language arrive simultaneously and interact — change the narration over the same image and the cortex responds differently. TRIBEv2 handles all three streams together: three frozen feature extractors (LLaMA 3.2-3B, V-JEPA2, Wav2Vec-BERT) process each stream; a temporal transformer fuses them; a subject-specific projection maps the fused signal onto cortex.

Output, per input video: 20,484 cortical points × 1Hz of predicted BOLD response. One number per cortical point per second.

Honest disclosure: this is predicted response, not measured. Out-of-distribution stimuli — pure music, abstract animation, content unlike the training set — produce noisier predictions.

Read the full methodology →

TRIBEv2 is the model under the hood. It produces a second-by-second cortical response prediction for any video you can hand it. The rest of this page is about what that prediction lets you see inside one piece of content, or between two versions when you run a comparison.
SECTION 07 · SO NOW
⟶ SO NOW

What does the cortex say?

We have the cortex. We have a way to simulate its response. So we can ask any video, audio clip, or script: what did the cortex do, second by second? Run one piece of content through TRIBE, surface the seven cortical systems' response, and inspect the moments that moved.

For the technical version of how comparison works, see the methodology.

SECTION 08 · THE CENTERPIECE

Thirty years of cognitive neuroscience,
mapped onto one brain.

MindReader's dimensions are grounded in decades of cognitive neuroscience: specific labs, specific questions, and specific cortical regions.

Year slider · brain regions light up in order
MEMORY · 1998 EFFORT · 2001 SOCIAL · 2003 GUT · 2005 LANGUAGE · 2011 YEO 7-NETWORK PARCELLATION · 2011 PERSONAL RESONANCE · 2012 r = 0.87
1995
— Timeline —
PRE-DISCOVERY
Move through the timeline to see when each system entered the literature.
By 2012, all seven systems MindReader exposes had been mapped onto the cortex by separate labs over thirty years of cognitive neuroscience.
SEVEN SYSTEMS · 1995 — 2012
r = 0.87 — brain activity in the medial prefrontal cortex, measured while subjects watched anti-smoking ads, predicted national-scale call volume to the quit-smoking hotline. Stimulus-side measurement worked.
1995 1998 2001 2003 2005 2011 2012
SECTION 09

So we built MindReader.

MindReader takes your content and simulates, region by region, how a brain responds to it.

It does not declare a winner. It gives you the response map, so the next edit is based on what changed in the cortex.