News

February 24, 2026

Rethinking AI Reasoning through the Lens of Human Cognition

By Raymond Chua, PhD candidate at McGill University and Mila

This blog post series launches the 2026 IVADO Thematic Semester – Computanional Ingredients of Reasoning, an initiative that brings together researchers from Artificial Intelligence, Cognitive Science, Neuroscience, Philosophy, and the Social Sciences to explore how we might better understand and ultimately bridge the gap between human and machine reasoning.

Several foundational questions guided the discussions: Do machines reason in ways comparable to humans? Should AI systems be modeled after child cognition or developmental processes? Why does AI reasoning appear to work — and does it truly qualify as reasoning?

Through a series of talks and panel discussions, participants engaged critically with these questions, leaving with a deeper understanding of the challenges ahead and renewed inspiration for their own research agendas.

Day 1

Introduction

Aaron Courville, the scientific director of IVADO, opened Day 1 of the Cognitive Basis of Reasoning (in Minds and AI) workshop with an introduction to IVADO, Canada’s main interdisciplinary AI research, training, and translation consortium, anchored at Université de Montréal. Guillaume Lajoie, Chair of this thematic semester then set the stage by outlining the central theme, which is understanding the cognitive foundations of reasoning across both biological and artificial systems. He emphasized that this event is the first installment in a three-part workshop series. While this session focuses on the cognitive aspects of reasoning, the upcoming workshops will turn to its mechanistic and social perspectives, respectively.

Alison Gopnik (UC Berkeley): Causal Learning and Empowerment in Cognitive Science and AI

Alison Gopnik, a psychology professor at UC Berkeley, and a McGill University alumna, opened the workshop with a talk on Causal Learning and Empowerment in Cognitive Science and AI. Her work has long examined how children develop causal reasoning abilities, showing that even four-year-olds can form surprisingly sophisticated causal models of the world.

In contrast, today’s advanced large language models (LLMs) still struggle with genuine causal learning. This gap motivates her lab’s research into how intrinsic motivation might help artificial systems better uncover the structure of their environments. In particular, they explore the concept of empowerment – maximizing the mutual information between an agent’s actions and their outcomes, while also encouraging diversity in action selection. Together, these objectives implicitly drive structured exploration.

Since humans, especially young children, appear to naturally maximize empowerment during exploration, equipping AI systems with similar principles may enable them to acquire deeper and more meaningful casual representations of the world.

Andrew Granville (Université de Montréal): Are the Reasoning Skills Being Developed by AI Producers Actually Reasoning Skills?

Andrew Granville, from Université de Montréal next took a slightly philosophical different approach towards reasoning by taking inspiration from mathematical proofs. He first questioned if Mathematics and Artificial General Intelligence (AGI) are more connected than most people think. Fundamentally, can machines really think?

If so, can AI strategies work with mathematical proofs? Finally, and perhaps more importantly, how can we measure intelligence? Reasoning? Problem solving? Learning? Or adaptability? Watch this Andrew’s talk if you have been pondering what sort of intelligence can emerge from a machine.

Q&A with Alison Gopnik and Andrew Granville

The Q&A consisted of Alison Gopnik and Andrew Granville, and the discussions with the audience were centered around empowerment and its influence on rewards, and how mathematics can play an important role in developing general intelligence.

Subbarao Kambhampati (Arizona State University): Anthropomorphization Sins in Modern AI (Or The Perils of Premature Application of the Lens of Cognition to LLMs)

After the exciting talks on exploration and measuring intelligence in the morning, Subbarao Kambhampati from Arizona State University shifted the focus to a more fundamental question: how well do modern AI systems truly plan? Humans excel at long-horizon planning but do today’s LLMs possess comparable capabilities?

Despite their impressive performance, even the most advanced LLMs struggle to plan autonomously, particularly as the tasks require longer planning horizons. Large Reasoning Models (LRMs) which rely on additional techniques such as Chain-Of-Thought and Fine-tuning, show improvements but still face significant generalization challenges.

One of the most thought-provoking aspects of the talk questioned whether the use of immediate reasoning tokens genuinely reflect faithful planning steps, or whether they merely act as computational scaffolding that improves accuracy without representing true reasoning. This raises deep questions about what it truly means for AI systems to “plan.”

Taylor Webb (IVADO, Mila, Université de Montréal): Emergent Symbol Processing in Transformer Language Models

Continuing the theme of reasoning, Taylor Webb from Université de Montréal explored how his lab studies the conceptual foundations of reasoning in LLMs. While LLMs often appear to exhibit retrieval-like behaviour and structured reasoning similar to humans, a central question remains: what mechanisms inside the neural network give rise to these abilities?

His team proposed the idea of an emergent symbolic architecture, within transformer models. By analyzing the internal circuits, they identified forms of symbolic processing, such as abstraction, induction, and retrieval, distributed across different layers of the network. Remarkably, these capabilities emerge without any explicit symbolic inductive biases built into the architecture.

Their findings suggest a functional distinction between different attention heads: abstraction and induction heads appear to capture symbolic similarity, whereas retrieval heads primarily capture token-level similarity. Similar patterns were observed not only in advanced models like Llama 3, but also in Vision Language Models (VLMs), thus providing converging evidence that structured, symbolic-like processing may naturally arise in modern AI systems.

Steven Piantadosi (UC Berkeley): Cognition, Neuroscience, and What’s In-Between

Steven Piantadosi from UC Berkeley presented a compelling framework for linking neuroscience, cognitive science, and behaviour. His central idea is that while a system may occupy many possible neural states, only some distinctions between those states actually matter for predicting future behavior. By carefully analyzing behaviour we can infer which underlying neural states must have been to support it.

To formalize this, he revisited the concept of cognitive diagrams, first proposed in the 1950s. These diagrams represent finite or infinite sets of states that generate observable behaviour. Crucially, for any behavioural pattern, there exists a unique minimal cognitive diagram, which is the smallest set of states that can reproduce the behaviour without losing explanatory power. In this view, the best theory is one that maps complex neural activity onto this minimal structure, preserving only distinctions that are behaviorally relevant.

Overall, the framework offers a principled, deductive approach to reasoning from observed behaviour to the minimal cognitive structure required to produce it, which brings clarity into how we connect neural mechanisms and cognitive theory.

Day 2

Noah D. Goodman (Stanford University): Learning to Reason

Noah D. Goodman, a professor at Stanford University, opened the second day of the workshop and presented his lab’s work on understanding reasoning by training transformer models to approximate distributions defined by Bayesian networks. This approach provides a controlled setting to analyze what models are truly learning when they appear to “reason.”

One key finding was that Chain-of-Thought prompting improves performance because it acts as a better sequence density estimator, especially when combined with local observations. However, the structure of this reasoning differs from human cognition. While humans tend to reason in a directed and structured manner, often using search and backtracking in tasks like arithmetic, Chain-of-Thought reasoning appears more undirected.

He concluded with an intriguing parallel to human learning, as practice improves human performance, LLMs can also refine their abilities by fine-tuning on traces of correct solutions, thereby hinting at the similarities between how learning unfolds in humans and machines.

Karim Jerbi (IVADO, Université de Montréal): Human Creativity vs. Language Models: New Insights from a Large-Scale Benchmarking Study in 100,000 Individuals

Karim Jerbi, a professor at Université de Montréal, presented his lab’s recent work investigating a fundamental question: Are LLMs truly creative? To explore this, his team compared LLMs and humans using standardized creativity assessment tasks. Interestingly, results suggest that more recent language models can outperform the average human participant on certain creativity metrics. As expected, increasing the sampling temperature of the models further improved their creative output.

However, a more nuanced question remains: How do LLMs compare against individuals who are highly on creativity tests? Watch on to find out!

Q&A with Karim Jerbi, Noah Goodman and Ben Prystawski

Q&A with Karim Jerbi, Noah Goodman and Ben Prystawski who is Noah’s student and is attending the workshop in person. The discussion with the audience broadly centered around if AI can have intent alongside creativity and missing ingredients that may lead to better reasoning capabilities in LLMs.

Sari Kisilevsky (CUNY Queens College): Reason and Freedom

Sari Kisilevsky, a philosopher from CUNY Queens College, took a conceptual approach to mapping what she called the space of Reasons, which is the structure that underlies what it means to reason at all. Unlike other talks in the workshop which have mainly focused on algorithms and cognitive mechanisms, her talk examined the normative and philosophical foundations of reasoning itself.

According to Sari’s talk, this space encompasses themes such as unity, beliefs, agency, freedom and the “Myth of the Given.” She carefully unpacked each of these components, showing how they jointly shaped our understanding of rational thought and what it truly means to reason.

Eva Portelance (IVADO, HEC Montréal): What If AI Models Learned More Like Kids Do?

Eva Portelance, a professor at HEC Montréal, explored parallels between how children acquire language and how AI systems learn. She highlighted that children rely on inductive biases, such as object and shape biases, when learning words. This raises a deeper question: are such biases innate, or do they emerge through experience? Her work suggests that factors like social interactions and communication context play a crucial role in shaping these biases.

She then extended this discussion to AI systems, particularly Visual Question Answering (VQA) models, asking if they exhibit similar learning dynamics. Interestingly, her findings suggest that these models develop human-like learning patterns. Moreover, she showed that joint learning over structured representations, especially when combined with bootstrapping, leads to higher learning efficiency.

Laura Ruis (MIT): Hidden Computations: Planning and Reasoning in the Forward Pass

Laura Ruis, a postdoctoral researcher at MIT, examined whether reasoning traces in large language models are faithful to the computations they actually perform. In particular, she asked whether LLMs can reason latently, when structures are not explicitly represented.

Her findings suggest that transformer models trained from scratch do not spontaneously discover latent planning strategies. However, they can learn to make use of latent planning when exposed to it during training. She further explored whether LLMs can convert declarative knowledge into procedural competence. Interestingly, prompting with Chain-of-Thought appears to facilitate this declarative to procedural generalization, enabling models to better translate factual information into step-by-step reasoning.

Day 3

Jieyu Zhao (USC): Evaluating the Social Intelligence of LLMs through Social Interactions

Jieyu Zhao, a Computer Science professor at the University of Southern California, opened the third day of the workshop by emphasizing that a major challenge for LLMs in the coming year will be enabling agents to collaborate effectively with human users. She noted that current models often reflect Western-centric norms and values, which is likely due to the distribution of their training data and that they tend to struggle when aligning with cultural contexts outside North America, Europe, Australia, and New Zealand. As LLMs are increasingly deployed worldwide, improving cross-cultural alignment will be critical.

To address this, Zhao’s lab draws on social science theories such as intention modeling and Theory of Mind to better understand and improve human–LLM alignment. Using multiplayer game settings, including negotiation and mediation scenarios, they study how models interact with humans in dynamic contexts. Their findings reveal an important trade-off between short-term and long-term alignment: strategies that optimize immediate agreement or cooperation do not necessarily lead to sustained alignment over time.

Najoung Kim (Boston University): Classical Computation in Connectionist Models

Najoung Kim, professor in Linguistics at Boston University, began by highlighting that even modern LLMs equipped with reasoning capabilities continue to struggle with robust generalization. She argued that true generalization requires three key properties: compositionality, systematicity, and productivity, which are core principles long studied in cognitive science.

Drawing inspiration from classical computation and symbolic processing, she suggested that identifying and instilling these structural signatures in models could lead to more reliable and systematic generalization. Overall, her talk points toward incorporating deeper structural principles as a path toward more flexible and human-like reasoning.

Claire Stevenson (University of Amsterdam): Learning to Solve Analogies: The Paths Children and LLMs Take

Claire Stevenson, an assistant professor of Psychology at the University of Amsterdam, argued that instead of using AI merely to model a child’s mind, we should focus on modeling the process of cognitive development itself. She emphasized that children’s learning unfolds gradually, shaped by mechanisms such as analogical reasoning, which plays a central role in how they generalize and acquire new concepts.

Building on scientific evidence of children’s analogical abilities, she asked how we might design AI systems with more robust and developmentally grounded analogical reasoning. Interestingly, her research shows that both children and LLMs tend to rely heavily on copying strategies when performing analogical tasks. This raises important questions about whether current models truly abstract relational structure or simply mimic surface patterns

Ivan Titov (University of Edinburgh, University of Amsterdam): Post-Training for Reasoning in Large Language Models: Learning vs Reshaping, Generalization and Failure Mode

Ivan Titov, a professor at the University of Edinburgh, discussed the interpretability of LLMs, with a particular focus on what happens during post-training. He framed this analysis through three complementary perspectives, ie. data, parameters, and inference. He presented some studies that examine how techniques such as supervised fine-tuning and reinforcement-based fine-tuning reshape models after pretraining. These studies show that different post-training strategies can lead to substantial and sometimes unexpected differences in model behavior.

Ivan also highlighted evidence suggesting that Chain-of-Thought (CoT) reasoning traces are not always faithful indicators of how a model arrives at a correct answer. In some cases, the generated reasoning may not reflect the true underlying computation. As a potential mitigation strategy, proposed solutions include allowing reward models access to the model’s reasoning traces during training, which could help better align outputs with the actual reasoning process and improve transparency.

Andrew Lampinen (Google DeepMind): How Do Language Models Reason About Information From Parameters and Context? Lessons for Complementary Learning Systems

Andrew Lampinen, a research scientist at Google DeepMind, presented LLMs through the lens of the Complementary Learning Systems (CLS) framework. In this view, LLMs rely on two interacting memory systems: a short-term memory corresponding to the in-context information provided at inference time, and a long-term memory encoded in the model’s parameters through training. This perspective offers a useful way to analyze how models learn and generalize across tasks.

In his talk, Lampinen highlighted how these two memory systems contribute differently to generalization. He discussed ways to bridge the resulting generalization gap through both offline strategies, such as augmenting or diversifying training data, and online strategies, including retrieving relevant past experiences at inference time. Together, these approaches suggest a more memory-aware path toward improving LLM robustness and adaptability.

Panel with Alison Gopnik, Andrew Granville, Taylor Webb, Laura Ruis, Andrew Lampinen and Guillaume Lajoie.

Following Andrew Lampinen’s talk, a panel hosted by Guillaume Lajoie brought together Alison Gopnik, Andrew Granville, Taylor Webb, Laura Ruis, and Andrew Lampinen for a wide-ranging discussion. The conversation explored the relationship between symbolic and neural computation, debating their respective strengths and how hybrid approaches might combine the structure of symbolic systems with the flexibility of neural networks.

The panel also examined the advantages and limitations of training models via next-token prediction on human-curated data. While such data provides rich structure and knowledge, it may also constrain models within existing human patterns of reasoning, raising questions about generalization, creativity, and the long-term trajectory of AI development.

Back to articles