News

May 1, 2026

How Brains and AI Learn to Reason: A Mechanistic View

This article was written as part of IVADO’s 2026 thematic semester—“The Computational Foundations of Reasoning”—which took place at IVADO in Montreal, Canada, from February 24 to 27, 2026.

By Raymond Chua

PhD candidate at McGill University and Mila

Building on the first workshop’s exploration of how cognitive mechanisms can guide and enhance reasoning capabilities in AI systems, this second workshop turns to the mechanistic foundations of reasoning across animals, humans, and AI systems. Through a series of talks and panel discussions, participants explored how mechanistic approaches can provide deeper insights into model behaviour, reasoning processes, and the representations that emerged during learning. The discussions also examined how techniques traditionally used to study human and animal behaviour can be adapted to better understand AI systems.

Day 1

Irina Rish (Université de Montréal, Mila): The Effects of Data Pruning, Representation Capacity, and Latent Trajectories

Irina Rish, a professor at Université de Montréal and Mila opened the workshop with a talk spanning a broad range of topics, including scaling laws, reasoning, and reinforcement learning, with applications in neuroimaging and large language models (LLMs). She revisited a key question from the earlier thematic workshop on the cognitive basis of reasoning: if reasoning models benefit more from the quality rather than the quantity of chain-of-thought (CoT), what actually determines high-quality reasoning?

Her work suggests that careful data curation plays a crucial role. By ranking and pruning unhelpful or harmful training data, her team is able to improve model robustness and reasoning performance. Beyond data quality, they are also beginning to investigate the internal dynamics of reasoning models, examining how representations evolve over time and across layers.

This line of inquiry raises deeper scientific questions: Are reasoning models effectively performing multi-level energy optimization? And at what stage of training do System 2-like reasoning capabilities begin to emerge?

Siva Reddy (McGill University, Mila): The Next Frontier: Generative and Decodable Embeddings from Frozen LLMs

Siva Reddy, a professor at McGill University and Mila presented recent advances in learning embeddings from LLMs. While unsupervised methods such as LLM2Vec have made significant progress, a key challenge remains: diverse inputs that should map to similar meanings often result in inconsistent representations, highlighting a gap between inputs and outputs. To address this, Siva and his collaborators proposed a novel approach—using LLM-generated responses as inputs for learning embeddings, rather than relying directly on human-written text. This leads to their model, LLM2Vec-Gen. By grounding embeddings in the model’s own responses, this method enables the transfer of reasoning and knowledge across semantically different inputs, while also producing more interpretable and decodable representations. Finally, Siva highlighted conceptual parallels between LLM2Vec-Gen and Joint Embedding Predictive Architectures (JEPA), suggesting a broader connection between generative representations and predictive embedding frameworks.

Paper: LLM2Vec-Gen: Generative Embeddings from Large Language Models https://arxiv.org/abs/2603.10913v2

Claire Stevenson (University of Amsterdam): Fluid Reasoning Development: Comparing Mechanisms in Minds and Machines

After lunch, Claire Stevenson, a professor at the University of Amsterdam, kicked off the afternoon session with a compelling perspective on fluid reasoning across humans and AI systems. She opened with a slinky analogy to illustrate how systems can both generalize to novel experiences and rapidly adapt to complex situations, highlighting key properties of flexible intelligence. Bridging psychological and neurocognitive theories, she discussed emerging evidence that LLMs exhibit alignment with human cognition in abstract reasoning tasks. She then introduced Function Vectors (FVs) and Concept Vectors (CVs) representations derived from attention heads in neural networks. While FVs capture functional behaviors, CVs focus on invariant attention heads, enabling them to represent more abstract and generalizable structures. Interestingly, both types of vectors tend to emerge in similar layers of the network, suggesting shared underlying mechanisms. Finally, she presented studies showing that both children and AI models undergo a developmental transition from associative to relational reasoning. This phase shift appears to be driven by changes in attention, pointing to a common computational principle underlying the emergence of higher-level reasoning in both biological and artificial systems.

Siddarth Venkatraman & Sarthak Mittal (Université de Montréal, Mila): Deep Test-Time Thinking with Recursive Self-Aggregation

The final talk of the day was presented by Sarthak Mittal, a PhD student at Université de Montréal and Mila, who shared joint work with fellow PhD student Siddarth Venkatraman on inference-time computation in LLMs. Unlike classical machine learning models, LLMs can perform substantial computation at inference time, enabling behaviors reminiscent of meta-learning. To leverage this, they introduced Recursive Self-Aggregation (RSA), a framework that constructs a parallel chain of responses during a sequential reasoning process.

Inspired by evolutionary algorithms, RSA generates a population of candidate responses and iteratively refines them through mutation, producing a diverse set of potential solutions. These candidates are then aggregated, allowing the model to progressively improve its reasoning.

Across challenging benchmarks such as ARC and FrontierScience-Olympiad, models augmented with RSA consistently achieved stronger performance. Additionally, the authors showed that encouraging models to aggregate reference solutions during reinforcement learning further improves robustness.

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models https://arxiv.org/abs/2509.26626

Day 2

Dhanya Sridhar (IVADO, Université de Montréal, Mila): Causal Representation Learning: A Natural Fit for Mechanistic Interpretability

Dhanya Sridhar, a professor at Université de Montréal and Mila, as well as a co-leader of the IVADO R3AI working group on safe and aligned AI, kick-started the second day of the workshop on her lab’s work on causal representation learning. The goal of her research was to develop methods to interpret a complex system through human-understandable concepts. Motivated by a transformer architecture performing simple arithmetic tasks, Dhanya showed why searching for alignments naively is infeasible.

To overcome these challenges, her collaborators developed methods that use unsupervised learning (learning without labels) and inductive biases. Together, they showed better disentaglement when applied to different backbones, such as Qwen2 and Llama 3. Furthermore, by visualizing the latent representations in 2D, clearly clustered representations emerged using their approach, suggesting that highly harmful representations lead to refusal behaviour in LLMs.

Benno Krojer (McGill University, Mila): Interpreting the Relation Between Visual and Linguistic Representations in a VLM

Benno Krojer, a PhD student at McGill University and Mila, presented his recent work on LatentLens, an approach for interpreting visual tokens in large language models (LLMs). He began by highlighting key limitations of existing methods, which often require additional training, rely on predefined probe classes, and offer limited insight into how visual tokens relate to language model embeddings. LatentLens addresses these challenges by leveraging contextual embeddings pretrained on text corpora and comparing them to visual token representations using a top-k nearest neighbor approach. The resulting matches are then used to generate meaningful descriptions of the visual tokens. Importantly, unlike prior methods, LatentLens enables interpretation across different layers and generalizes across multiple models, offering a more flexible and scalable framework for understanding multimodal representations.

LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs : https://arxiv.org/abs/2602.00462

David Klindt (Cold Spring Harbor Lab): Identifying the Atoms of Meaning in Biological and Artificial Visual Processing

The afternoon sessions at Mila, resumed with David Klindt, an Assistant Professor at Cold Spring Harbor Laboratory in New York, USA, who presented his work on the Linear Representation Hypothesis. He explored why neural networks tend to represent concepts linearly, beginning with the notion of identifiability. He then introduced a framework for achieving identifiability by learning mappings from environmental states to observations, and from observations to underlying state dynamics. Importantly, his results showed that analyzing population-level activity, rather than single neurons, reveals more interpretable and diverse features.

Ida Momennejad (Microsoft Research NYC): The Compositional Geometry of Open-Ended Reasoning

Ida Mommennejad, a research scientist at Microsoft Research NYC, concluded the day with a talk on how open-ended intelligence relies on compositional reasoning and reusable building blocks. She highlighted her earlier work on Successor Representations, which align closely with how humans learn and plan. She then introduced CogEval, a cognitive science–inspired framework for evaluating LLMs, showing that LLMs struggle in graph-based tasks with bottlenecks where errors are less forgiving. She concluded by emphasizing the need for architectures that can recombine knowledge across diverse settings, drawing inspiration from how humans explore and learn in open-ended environments.

Papers: Evaluating Cognitive Maps and Planning in Large Language Models with CogEval, A brain-inspired agentic architecture to improve planning with LLMs https://doi.org/10.1038/s41467-025-63804-5

Expert Panel discussion:

Modern AI systems increasingly rely on chain-of-thought reasoning, but how does this compare to reasoning in humans? How do these computational processes differ from those in the brain? The panel also raised interesting questions about the validity of latent reasoning, which, unlike chain-of-thought, occurs within the internal hidden states of LLMs rather than through explicit, text-based outputs. Given that LLMs are trained through diverse regimes, from unsupervised pre-training to fine-tuning, an important open question is how to learn invariant representations that remain adaptable and generalizable across contexts. While compelling, the panel acknowledged that there is no straightforward solution to this challenge.

More broadly, as reasoning becomes central to the success of LLMs, how should we evaluate its quality? What are the right metrics? To what extent does formal logic underpin effective reasoning? Should systems be allowed to make mistakes and recover as part of the reasoning process? Finally, how do we balance the trade-off between data efficiency and strong reasoning capabilities?

DAY 3 – February 26, 2026

Ila Fiete (MIT McGovern Institute): Associative and Episodic Memory from Pre-Structured Spatial Scaffolds in the Hippocampus

Ila Fiete, a professor from Massachusetts Institute of Technology, opened the first talk of the third day by presenting her lab’s recent work on modular structures in biological intelligence. She began by highlighting a key limitation of traditional content-addressable memory models: their performance degrades significantly as the number of stored patterns increases. To address this, her team drew inspiration from the entorhinal cortex and hippocampus to design a more structured memory system. This biologically inspired approach demonstrates reduced performance drop-off at scale, enables zero-shot familiarity detection, and shows increased robustness to catastrophic forgetting.

Paper: Episodic and associative memory from spatial scaffolds in the hippocampus https://doi.org/10.1038/s41586-024-08392-y

Uri Hasson (Princeton University): Deep Language as a Cognitive Model for the Development and Processing of Natural Language

Uri Hasson, a professor in the department of Psychology at Princeton University, began by exploring a fundamental question: what is the appropriate level of abstraction for modeling cognition in the brain? He contrasted two major theoretical perspectives: one grounded in rule-based structures such as universal grammar, and the other in statistical learning through next-word prediction. Rather than favoring one over the other, existing scientific evidence suggests that the brain likely integrates both approaches. However, mechanistic models that unify these perspectives remain limited. Drawing on large-scale datasets of human speech and language, Hasson and his collaborators showed that neural networks trained on such data exhibit striking parallels with brain activity, with simple linear mappings revealing strong correlations between neural network embeddings and neural representations in the brain.

Stephanie C.Y. Chan (Google DeepMind): Connecting Perspectives on Dynamical Phenomena in Transformers

After lunch, Stephanie Chan, a Research Scientist at Google DeepMind, resumed the afternoon session with a talk on dynamical phenomena in transformers architectures. She began by highlighting a striking observation: certain behaviors in these models appear to emerge suddenly during training. However, these improvements can be transient, often diminishing or disappearing shortly after they arise. To better understand this phenomenon, she presented three complementary perspectives. The first focused on regularization, showing that L2 regularization can reduce the transience of in-context learning behaviors. The second examined the problem through the lens of circuits, illustrating how phase transitions in the loss landscape are associated with the emergence of “induction head” circuits. The third perspective emphasized the role of data, suggesting that emergent behaviors can be predicted based on properties such as in-context repetition and cross-sample repetition.

Some of the papers presented in this talk: The Transient Nature of Emergent In-Context Learning in Transformers, Strategy Coopetition Explains the Emergence and Transience of In-Context Learning, An evolutionary perspective on modes of learning in Transformers https://arxiv.org/abs/2311.08360

Nino Scherrer (Google): Uncovering Mesa-Optimization Algorithms in Transformers & Building a Layer for Optimal Test-Time-Training

Nino Scherrer, a research scientist at Google, presented recent work on understanding mesa-optimization in Transformer models. He began by highlighting a growing connection between in-context learning and gradient-based optimization, suggesting that Transformers can implicitly implement learning algorithms at inference time.

Using a toy model as a motivating example, he introduced MesaNet, a variant of the Transformer architecture in which a base optimizer gives rise to an internal “mesa-optimizer.” This setup allows the model to effectively learn how to learn within its forward pass. Several intriguing properties emerged from this framework: model predictions improve with longer input sequences, certain subsets of weights exhibit sparse and structured patterns, and information about past inputs can be decoded from current representations.

In the second part of the talk, Scherrer discussed how MesaNet can be scaled to larger settings. In particular, he showed that combining linear self-attention with preconditioning techniques enables more efficient and stable training, making the approach more practical for real-world Transformer models.

Paper: MesaNet: Sequence Modeling by Locally Optimal Test-Time Training https://arxiv.org/abs/2506.05233

Guillaume Lajoie (IVADO, Mila, Université de Montréal, Google): In-Context Processing and Occam’s Razor: Learning the Best Models by Thinking on the Fly

Guillaume Lajoie, an associate professor at Université de Montréal, Mila, and Ivado concluded the day with a talk on in-context learning, focusing on how systems can perform credit assignment without relying on synaptic plasticity. Drawing parallels between neuroscience and AI, he highlighted how the brain may support learning through activity alone, without weight updates—an idea mirrored in techniques such as chain-of-thought prompting and prompt engineering. This distinction can be framed as in-weights learning (parameter updates during training) versus in-activations learning (adaptation through internal activations at inference time), where the former effectively builds an optimizer for the latter. The talk centered on how plasticity and activity-driven mechanisms interact to support higher-order learning, emphasizing two key ingredients: circuit-level mechanisms that enable input-driven learning and meta-learning processes that allow models to learn how to learn in context.

Discussion

The discussion between the participants, Guillaume, and Nino focused on how chain-of-thought reasoning can be viewed as a way of constructing a dataset on the fly for the model, enabling more effective in-context learning. Guillaume also highlighted how different forms of memory, such as episodic, semantic, and working memory, can interact and unify to support more efficient and adaptive learning.

DAY 4 – February 27, 2026

Naomi Saphra (Harvard University): When Can We Predict Model Behavior?

Naomi Saphra, a research fellow at the Kempner Institute at Harvard University and incoming Assistant Professor at Boston University, opened the final day of the workshop by presenting her recent work on understanding when high-level performance emerges in AI models. It is well known that as AI models scale to a large number of parameters, they can exhibit sudden and dramatic improvements in performance, a phenomenon often associated with grokking. However, a less frequently discussed aspect is the role of random variations during training. Across different random seeds, model performance does not vary smoothly; instead, it often follows a bimodal distribution, where models tend to either succeed or fail distinctly.

Interestingly, this bimodal pattern also appears in hierarchical generalization capabilities, suggesting that the emergence of structured reasoning may depend sensitively on training dynamics rather than solely on model scale.

Paper: Random Scaling of Emergent Capabilities https://doi.org/10.48550/arXiv.2502.17356

Hidenori Tanaka (Harvard University): Mechanistic Principles of Concept Learning and Compositional Reasoning

Hidenori Tanaka, a professor at Harvard University, concluded the workshop with a thought-provoking talk on in-context learning and the role of social dynamics in large language models (LLMs). He began by discussing how in-context learning enables LLMs to infer underlying data-generating processes directly from examples. Using random walks on structured graphs as an illustrative case, he showed that providing richer context can lead to the emergence of internal representations that capture the structure of the graph itself. This highlights how LLMs can move beyond surface-level pattern matching toward more structured forms of reasoning.

Shifting to the topic of alignment, he introduced a compelling framework for thinking about social dynamics in AI systems, organized into three paradigms: single-body (control), two-body (interaction/relationships), and many-body (societal dynamics). This perspective emphasizes that alignment is not solely an individual property of a model, but can also arise from interactions between agents and within larger systems.

In the latter part of the talk, he explored what it might mean to foster healthy social dynamics among AI agents, raising important questions about cooperation, stability, and long-term behavior in multi-agent settings.

He also briefly introduced his recent book, A Psychiatrist’s Guide to AI for Mental Health, which has become a number one bestseller in the nursing, psychiatry, and mental health category on Amazon Japan. The book is currently available only in Japanese.

Back to articles