News
May 27, 2026
Social Reasoning and the Ecology of Thought
This article was written as part of IVADO’s 2026 thematic semester—“The Computational Foundations of Reasoning”—which took place at IVADO in Montreal, Canada, from March 10 to 13, 2026.
PhD candidate at McGill University and Mila
While the first series explored how cognitive mechanisms can guide and enhance reasoning capabilities in AI systems and the second examined the mechanistic foundations of reasoning across animals, humans, and AI systems, this final series turns to the social dimensions of reasoning. Through a series of talks and panel discussions, it investigates how reasoning unfolds across interactions—between humans, between humans and AI, and among AI systems themselves.
Bringing together researchers from artificial intelligence, cognitive science, sociology, philosophy, linguistics, and related fields, this workshop Social Reasoning and the Ecology of Thought examines reasoning as an inherently social and situated process. Across four days, participants explored how reasoning emerges not only within individual minds or models, but also through collective dynamics, cultural context, communication, and human–AI ecosystems.
DAY 1 – Tuesday, March 10, 2026
James Evans (University of Chicago, Google), Reasoning Models Generate Societies of Thought
James Evans opened the workshop by framing knowledge and reasoning through the lens of complex systems. Rather than treating reasoning as something that occurs entirely within a single mind, he emphasized that humans almost never build understanding from scratch. Instead, we constantly borrow, exchange, and co-construct representations, explanations, and expectations with others. In this sense, knowing is inherently social.
Evans reflected on science itself as a system of collective reasoning. He contrasted several classical models of thought, including deduction, induction, and abduction, highlighting in particular the role of surprise in driving scientific progress. Using a hypergraph-based model of scientific knowledge, he argued that the work that proves most impactful is often precisely the work that prediction systems fail to anticipate: the papers that combine unexpectedly novel concepts or contexts. In this sense, science functions as an attention economy tuned not merely to prediction, but to surprise. He then turned to the effects of AI on scientific research itself. While AI tools have significantly increased scientific productivity and impact, he pointed to an important tension: researchers are increasingly drawn toward areas with large existing datasets, causing attention to converge on already-legible questions rather than opening genuinely new domains of inquiry. As AI becomes more central to science, he suggested, it may accelerate discovery in some areas while also narrowing the effective space of questions being asked.
Together, these ideas offered a powerful opening for the workshop: if reasoning models generate societies of thought, then the key question is not only how well they predict, but how they shape the collective organization of knowledge itself.
Christopher Bail (Duke University), Simulating Emergent Behavior on Social Media
Christopher Bail’s talk focused on how agent-based models (ABMs) and large language models can be used to simulate social media environments and study emergent social behavior. Rather than examining online behavior only through static datasets, this approach makes it possible to model social media as a dynamic system in which many interacting agents shape one another over time.
A central contribution of this perspective is that it allows researchers to explore how large-scale phenomena such as polarization, norm formation, opinion shifts, and coordinated behavior may emerge from many local interactions between individuals. By combining ABMs with LLM-based agents, these simulations can potentially capture richer and more human-like patterns of communication than traditional rule-based models.His work points to the promise of using simulated social environments not just to describe online behavior after the fact, but to better understand the mechanisms that generate it. More broadly, the talk fit well within the workshop’s theme by showing that reasoning in social settings is often collective, interactive, and shaped by the platforms through which people communicate.
A discussion session with the audience closed the morning, allowing participants to connect Evans’s systemic view of science and knowledge with broader questions about social simulation and collective behavior.
Cameron Jones (Stony Brook University), LLMs as Simulacra of Human Social Reasoning
The afternoon resumed with Cameron Jones, whose talk raised a key question for the workshop: when large language models appear socially fluent, what exactly are they doing? Are they participating in forms of social reasoning, or are they better understood as simulacra that reproduce its outward patterns?
This framing sharpened one of the workshop’s central tensions. As language models become increasingly capable of imitating human discourse, their apparent competence in social settings may outpace our understanding of the underlying mechanisms. Cameron explored a central tension in today’s language models: they can appear strikingly human in conversation, yet still differ from humans in important ways beneath the surface. Drawing on the idea of simulacra, he argued that LLMs are not straightforward replicas of human social cognition, but rather distorted simulations that can nevertheless feel convincingly real in interaction. One part of his talk focused on evidence that modern LLMs can successfully imitate human social behaviour. In interactive Turing test experiments, some models were judged to be human as often as, or even more often than, actual human participants. Jones suggested that this does not necessarily mean models reason like humans. Instead, it may reflect how well they exploit the social and linguistic cues people use when deciding whether they are talking to another person. At the same time, his research showed that these models remain imperfect and often distorted models of human social reasoning. In more controlled psychological tasks, including false-belief and theory-of-mind experiments, LLMs showed some sensitivity to mental states but still lagged behind humans, especially on deeper forms of recursive reasoning and pragmatic inference. In other words, they can often produce human-like responses without matching the underlying structure of human social cognition. He also presented interactive studies showing a similar pattern. In collaborative communication tasks, LLMs could coordinate effectively with other AI systems, and humans often could not tell whether they were interacting with a person or a model. Yet human-AI pairs still struggled to align in the same robust way that human-human pairs did, suggesting a deeper interpretive mismatch despite the superficial fluency of the interaction.
Cameron concluded by reflecting on the broader implications of this gap between appearance and underlying reasoning. He argued that human interaction itself can mask the limitations of LLMs: because people are naturally cooperative and socially responsive, they may compensate for the model’s weaknesses without realizing it. This makes such systems especially consequential socially, not only because they can imitate human interaction, but because they may increasingly be treated as substitutes for it.
Beba Cibralic (RAND), Unseeing Agency
Beba Cibralic’s talk took a philosophical approach to one of the workshop’s central questions: when, if ever, should we attribute agency to AI systems? Rather than assuming that today’s so-called AI “agents” genuinely possess agency, she asked what standards would be needed to justify such a claim, and what would follow for responsibility, governance, and law if we did.
A key theme of the talk was the contrast between two ways of thinking about agency. In much of contemporary AI research, agency is often treated as a spectrum of capabilities, such as autonomy, goal-directedness, or independent execution. In philosophy, by contrast, the question has often been framed in terms of the distinction between acting and merely behaving, with agency tied more closely to representation, intention, and responsibility. Cibralic argued that these traditions are often talking past one another, and that greater conceptual clarity is needed if researchers want to use the language of agency responsibly.
To bridge these traditions, she explored a minimalist working theory of agency: an agent, on this view, is a system that can represent aspects of the world and use those representations to guide and control its behavior. This formulation is deliberately broad, but she also emphasized its limits. A minimalist account may help distinguish more agent-like systems from non-agents, yet it may still fall short of explaining the stronger kinds of agency that matter for moral or legal responsibility. In particular, she suggested that intentions may do much of the real work in the distinction between acting and behaving, and that we currently lack solid grounds for attributing intentions to AI systems. This led to the broader stakes of the talk: how should responsibility be assigned in a world of increasingly autonomous AI systems? Cibralic argued that even if we begin to describe some systems as minimally agentic, that does not mean they are the right locus of responsibility. Instead, responsibility may still need to remain attached to humans, institutions, and deployment structures. She pointed to the importance of incentive design, and suggested that societies may need mechanisms ensuring that every deployed AI agent remains tied to some human or institutional fiduciary who can be held accountable.
Overall, her talk brought an important conceptual and normative lens to the workshop. It reminded the audience that terms like agent are not just technical labels: they carry philosophical, legal, and political implications. Before treating AI systems as genuine agents, we need to be much clearer about what agency means, what it requires, and why that distinction matters.
Gauthier Gidel (Université de Montréal, Mila), In Search of Diversity: the Values Behind our RL Benchmarks
Gauthier Gidel argued that in reinforcement learning, maximizing reward is not enough: it is equally important to track and encourage the diversity of policies and trajectories an agent produces. His central point was that reward functions are only imperfect proxies for the behaviors designers actually want, which means that agents can sometimes achieve high reward in narrow, simplistic, or unintended ways.
To motivate this, he discussed familiar examples of reward hacking and design loopholes, where agents exploit the formal structure of an environment rather than behaving in the richer way humans intended. In these cases, the problem is not only that the agent “cheats”, but that its behavior collapses into something far less diverse than the complexity of the task itself would suggest. He emphasized that this matters in many domains beyond games. In settings such as scientific discovery, for example, we often want not just one high-scoring candidate, but a set of diverse promising candidates, since the true objective in the real world is only imperfectly captured by the proxy reward used during training. Diversity therefore becomes a way of hedging against uncertainty, improving robustness, and generating solutions that may generalize better beyond the benchmark. A major part of the talk focused on how to formalize this idea. His lab explored methods for learning and searching over diverse high-reward trajectories, rather than concentrating all probability mass on a single best-looking solution. He also discussed how different algorithmic choices create different trade-offs between reward maximization and diversity, and showed that methods designed explicitly for diversity can outperform more standard reinforcement learning baselines when the goal is to produce multiple strong yet distinct candidates.
Finally, he addressed a deeper conceptual challenge: what does diversity actually mean for reinforcement learning trajectories? To answer this, he proposed a way to measure diversity by comparing the similarity of generated trajectories and estimating the effective number of genuinely different solutions an agent can produce. This allowed the talk to move beyond intuition and toward a more principled framework for evaluating whether an RL system is exploring meaningfully distinct ways of solving a problem.
Final Q&A: Beba Cibralic, Cameron Jones, Gauthier Gidel
Much of the discussion centered on how we should understand agency, responsibility, and interaction in AI systems, and on the tension between systems that appear socially or strategically capable and the deeper question of whether they genuinely reason, intend, or understand in human-like ways.
Several exchanges returned to the problem of conceptual clarity across disciplines. Building on Beba Cibralic’s talk, participants discussed whether a shared working theory of agency is needed across philosophy, AI research, and law, and how conceptual disagreements might be resolved when they are also entangled with empirical questions about what current systems can actually do. This led naturally to questions about when agency should be separated from responsibility, and whether increasingly autonomous AI systems should still remain firmly tied to human accountability structures.
Another thread of the discussion focused on the relationship between human-AI collaboration and interpretability. Echoing Ashton Anderson’s broader framework and resonating with Cameron Jones’s concerns about simulated sociality, participants raised questions about what it means for AI systems to be genuinely helpful to people rather than merely high-performing in isolation. The conversation emphasized that usefulness in real settings often depends on whether humans can understand, predict, and coordinate with AI systems—not just whether the systems achieve strong standalone performance.
Questions following Gauthier Gidel’s talk turned toward the meaning of diversity in reinforcement learning. The discussion explored why diversity matters when reward functions are only proxies for real-world goals, and whether encouraging multiple strong but distinct solutions can make systems more robust, fair, and less prone to collapsing onto narrow or exploitable behaviors. In that sense, the Q&A reinforced a broader message of the workshop: good performance is not only about optimizing a metric, but also about preserving richness, plurality, and adaptability in how solutions are generated.
DAY 2 – Wednesday, March 11, 2026 (Online)
The second day of the workshop, held online because of severe weather conditions in Montreal, focused more directly on human-AI interaction, collaborative reasoning, and collective intelligence.
Ashton Anderson (University of Toronto), Tandem Training: a Reinforcement-Learning Framework for Social Agents
Ashton Anderson presented tandem training, a reinforcement-learning framework designed to make AI systems more compatible with human partners. The central idea is that if we want AI to collaborate effectively with people, we cannot assume this will emerge automatically from optimizing performance alone. Instead, systems should be trained in environments where they must succeed together with a weaker partner, adapting to that partner’s style, limitations, and interpretive needs.
He motivated this approach by arguing that highly capable AI systems are often less useful when they act entirely on their own. In domains such as self-driving, algorithmic trading, or medical decision-making, people often need systems they can understand, predict, and sometimes intervene in. A model that is powerful but opaque may perform well in isolation while remaining difficult to trust or work with in practice. Tandem training is intended to produce AI systems that are not only accurate, but also more interpretable, collaborative, and socially compatible.
To illustrate the framework, Anderson used chess as a model system. Although chess engines have been superhuman for decades, they are often surprisingly unhelpful for human learning: they can identify the best move, but not in a way that makes their reasoning easy to follow or useful to practice against. His lab addressed this first by building human-like chess proxies that model players at different skill levels, rather than simply weakening existing engines with random mistakes. These proxies were accurate enough to capture human styles of play, and people have already played millions of games against them online.
The next step was to train a stronger model to play in tandem with such a proxy. In this setup, the stronger agent and the weaker partner alternated moves, and the stronger system was rewarded when the pair succeeded together. This pushed the AI to choose moves that were not just locally optimal, but also easier for its partner to follow. The resulting models sacrificed a small amount of standalone optimality, but enabled the weaker partner to perform much better, showing that social compatibility can require different strategies than solo excellence. He then extended the same intuition to large language models. Here, tandem training meant having a stronger model and a weaker proxy co-generate step-by-step reasoning traces. This encouraged the stronger system to produce reasoning in forms the weaker partner could continue, rather than taking shortcuts or using unintelligible internal conventions. In experiments, this led models to drop confusing jargon and even switch into more mutually accessible language when needed, while preserving strong task performance.
Overall, the talk argued for a shift in how we think about AI training. Instead of optimizing only for solitary performance, tandem training treats collaboration itself as part of the environment. In doing so, it offers a concrete framework for building systems that are not merely powerful, but genuinely able to work with humans and other agents in complementary ways.
Roberta Rocca (Google), Psychological Coupling: the Necessary Science of Human-AI Interaction
Roberta Rocca argued that to understand human-AI interaction properly, we need to move beyond seeing conversations as simple exchanges of prompts and responses. Drawing on cognitive science and social neuroscience, she emphasized that human interaction is fundamentally shaped by coordination: people continuously adapt to one another’s actions, language, attention, and even emotional states. In this view, interaction is not just a sequence of isolated decisions, but a dynamic system in which two agents become coupled over time.
She showed that this kind of coordination operates at many levels in human-human interaction. It can appear in posture, accent, turn-taking, shared representations of tasks, and affective or physiological synchrony. Sometimes such coordination is beneficial because it helps people work together efficiently or build rapport. But Rocca also stressed that coordination is not always good: in some contexts, the right response is actually to break alignment and introduce friction, especially when convergence would reinforce harmful beliefs or emotional states.
This perspective led to the central concern of her talk: current LLMs often handle shallow forms of alignment well, such as mirroring language style or conversational tone, but they remain much less reliable at identifying when deeper forms of psychological or epistemic coordination are appropriate. Rocca argued that many emerging harms in human-AI interaction, including what is sometimes broadly described as “AI psychosis,” are better understood not as static failures of the user or of the model alone, but as failures of the interaction itself. In these cases, maladaptive dynamics can emerge gradually through repeated feedback loops between the user’s state and the model’s responses. To describe these dynamics, she introduced the idea of psychological coupling: the process by which a user’s internal states and a model’s simulated states become coordinated over the course of an interaction. She suggested that these dynamics can take different forms. Sometimes the user and model converge toward similar emotional or epistemic states; sometimes the model asymmetrically reinforces what the user brings into the conversation; and sometimes the two diverge, either through blunt refusal or through more constructive forms of friction. Importantly, none of these patterns are inherently good or bad on their own—their value depends on whether they move the interaction toward psychologically healthier or more harmful trajectories.
A major implication of the talk was methodological. Rocca argued that standard static safety evaluations are not enough for capturing these risks, because psychologically consequential failures often emerge only over long, dynamic interactions. She called for more realistic evaluation methods, including simulated user-model interactions, carefully designed experimental studies with human participants, and new post-training objectives that optimize not only for smooth resonance with the user, but also for productive friction when needed.
Overall, her talk offered a compelling reframing of psychological safety in AI. Rather than asking only whether a model can generate safe responses in isolation, it asked whether the model can navigate the deeper dynamics of human interaction responsibly. In doing so, it highlighted that the future of beneficial human-AI systems may depend not just on making them socially fluent, but on teaching them when to align, when to resist, and how to shape interaction in psychologically constructive ways.
Q&A: Roberta Rocca and Ashton Anderson
The discussion following the talks by Roberta Rocca and Ashton Anderson centered on a shared question: what does it mean for AI systems to interact well with humans over time? While the talks approached this from different angles—reinforcement learning and collaboration in Anderson’s case, and psychological dynamics and coordination in Rocca’s—the Q&A revealed strong conceptual overlap.
A key theme was that effective human-AI interaction is not just about performance, but about compatibility. Building on Anderson’s framework, participants discussed how training methods like tandem training explicitly optimize for collaboration, encouraging models to produce outputs that are interpretable and usable by human partners. This raised broader questions about whether current AI systems are optimized for the right objective, or whether they prioritize standalone performance at the expense of real-world usefulness.
Rocca’s notion of psychological coupling added another layer to the discussion. Participants explored how interactions between humans and AI systems can evolve over time through feedback loops, where the model’s responses shape the user’s state, and vice versa. This led to concerns about when alignment becomes problematic, particularly in cases where models reinforce user beliefs or emotional states in ways that may be unhelpful or even harmful.
A particularly important point of convergence was the idea that alignment is not always desirable. While much of AI development focuses on making systems more responsive and agreeable, the discussion emphasized that good interaction sometimes requires friction—for example, when a model should challenge, redirect, or refuse rather than simply mirror the user.
The Q&A also touched on evaluation challenges. Participants noted that many current benchmarks fail to capture these interaction dynamics, since they focus on isolated responses rather than long-term, adaptive exchanges. This reinforced a broader message of the workshop: understanding and improving AI systems requires moving beyond static evaluation toward frameworks that account for interaction, adaptation, and co-evolving behavior.
Overall, the discussion highlighted a shift in perspective: from viewing AI systems as standalone problem-solvers to understanding them as participants in ongoing social and cognitive processes, where success depends on how well they coordinate, adapt, and sometimes push back in interaction with humans.
Thalia Wheatley (Dartmouth College), The Neural Architecture of the Collective Mind
Thalia Wheatley’s talk explored how intelligence extends beyond individual minds to emerge through social interaction and coordination. Drawing on neuroscience and social cognition, she argued that humans are fundamentally wired for connection, and that many cognitive processes are often distributed across interacting individuals rather than confined to a single brain.
A key idea in her work is that successful interaction involves alignment at multiple levels, from low-level behaviors to higher-level understanding. She illustrated this with simple but powerful examples such as eye contact. When two people make eye contact, they are not just exchanging visual signals, they are rapidly coordinating attention, intention, and awareness of one another. This mutual awareness creates a shared frame of reference, enabling more effective communication and joint understanding. She showed that this kind of coordination extends far beyond surface behavior. During effective communication, such as storytelling or collaboration, people’s neural activity can become synchronized, reflecting the formation of shared mental models. In this sense, cognition is not just happening within individuals, but across a coupled system of interacting minds. Importantly, her talk emphasized that collective intelligence depends not just on the abilities of individuals, but on the quality of their interaction. Even highly capable individuals may fail to coordinate if they lack shared context or alignment, while simple mechanisms like attention and eye contact can serve as building blocks for richer forms of collective reasoning.
Overall, Wheatley’s work reinforced a central theme of the workshop: reasoning is not purely an internal process, but something that emerges through relationships, coordination, and shared understanding, with important implications for both human collaboration and the design of socially aware AI systems.
Blaise Agüera y Arcas (Google), The Social Scaling of AI
Blaise Agüera y Arcas examined how AI systems are increasingly shaped not just by scale in data and computation, but by social scale—the ways in which they are embedded in, interact with, and learn from human systems at large. He argued that as models become more powerful, their behavior is less a product of isolated training and more a reflection of the collective human environments they are exposed to.
A central theme of the talk was that modern AI systems, especially large language models, are deeply entangled with human culture, communication, and social structure. Rather than being purely technical artifacts, they inherit patterns, norms, and biases from the data and interactions that shape them. As a result, scaling AI is not only a technical challenge but also a social and cultural one, raising questions about how collective knowledge is represented, filtered, and amplified. Building on ideas from thinkers like Hermann von Helmholtz and Karl Friston, Blaise proposes that brains and AI systems alike function as next-token predictors, and that this predictive capacity naturally gives rise to what we perceive as intelligence. He further argues that life itself can be understood as self-perpetuating computation, where systems sense, model, and act to sustain their existence. The talk emphasizes Lynn Margulis’s concept of symbiogenesis, suggesting that evolution is driven not just by competition but by cooperation and merging of simpler systems into more complex ones, leading to increasing computational power and complexity over time. This scaling of intelligence, from cells to organisms to societies, is enabled by distributed, parallel computation and division of labor, ultimately giving rise to collective intelligence and consciousness, which emerges from the need for agents to model themselves and others in social contexts. Language, in this view, is a tool for coordinating such distributed cognition rather than merely expressing individual thought.
The talk concludes by positioning AI as the next step in this continuum, where human and machine intelligence co-evolve, potentially driving a new phase of rapid, large-scale growth in collective intelligence.
Q&A and Discussion: Blaise Agüera y Arcas (Google), Laura Globig (New York University), Kristina Lerman (Indiana University), Jonathan Simon (Université de Montréal)
The discussion focused on the implications of this social perspective on scaling. Participants explored how feedback loops between humans and AI systems might evolve over time, and whether these dynamics could lead to unintended consequences such as the amplification of biases, convergence of viewpoints, or loss of diversity in information ecosystems. A key theme was the challenge of control and responsibility in systems that are deeply embedded in social processes. If AI systems are shaped by large-scale human behavior, it becomes less clear where responsibility lies for their outputs and impacts. This led to questions about intervention points, and how to design systems that remain accountable even as they operate within complex, distributed environments. The Q&A also touched on the importance of diversity and plurality in large-scale AI systems. Echoing themes from earlier talks, participants discussed whether current approaches to training and evaluation risk narrowing the range of perspectives represented in AI outputs, and how alternative designs might better preserve or encourage diversity.
Finally, the discussion reinforced a broader takeaway from the workshop: as AI systems scale, understanding them requires not only technical tools, but also theories of social systems, since their behavior increasingly emerges from interactions between models, users, and the wider world.
DAY 3 – Thursday, March 12, 2026
Damian Blasi (Pompeu Fabra University), The Computational Wealth of Linguistic Diversity
The world currently houses between 6,500 and 8,000 languages, yet this diversity is vanishing at an alarming rate, one language goes dormant every four months. While 20th-century linguists often searched for universals, Blasi emphasizes that variation is the defining feature of human speech. Languages differ in how they pack information into sentences, and how they provide cognitive shortcuts for reasoning. However, contemporary AI systems do not serve these languages equally. Research into academic incentives shows that there is little marginal gain for researchers to include more languages in their papers, leading to an AI landscape that is heavily biased toward high-resource languages like English.
A central debate in Blasi’s work is the extent to which language is necessary for complex thought. He provides evidence that humans can perform sophisticated tasks without language, such as the home sign systems developed by deaf individuals in isolated villages or the use of physical tally sticks for counting that predates productive numeral systems. In some cases, language can even hinder performance, a phenomenon known as verbal overshadowing, where describing a face makes it harder to recognize later. Yet, once a language is present, it becomes the go-to solution for cognitive problems, providing specific lexical elements that can accelerate a child’s development of theory of mind or shape the way memory is structured through branching (the order in which heads and modifiers appear in a sentence). These linguistic biases are not limited to humans; they also manifest in artificial intelligence. Blasi notes that some LLMs actually perform better in certain reasoning tasks when using languages other than English. For instance, Mandarin Chinese may outperform English in causality-based tasks because its linguistic structure more frequently follows an iconic temporal order (the arrow of time). This suggests that the way forward for AI isn’t just about adding more data to the second-most-resourced language, but about sampling linguistic weirdos, languages with unique structures, to maximize the representational diversity and reasoning capabilities of the models. The talk also challenges the mainstream equilibrium model of linguistic diversity, which suggests that languages are perfectly adapted to modern communicative pressures like learnability and efficiency. Blasi argues that human history is defined by punctuated equilibrium, long periods of stasis followed by dramatic collapses. His recent research suggests that just 2,000 years ago, there were six to seven times more languages than exist today. The rise of ancient empires and pathogens likely triggered a massive catastrophic event that obliterated countless linguistic lineages. Consequently, the languages we see today are not necessarily the most fit for communication, but rather the survivors of a massive demographic sweep. Ultimately, Blasi views the global endangerment crisis of languages not just as a humanitarian loss, but as a technological and scientific one. As rainforests are cleared and foragers move to cities, unique cognitive templates are lost forever. By integrating more diverse languages into computational models, we may be able to rewire AI reasoning spaces in ways that English alone cannot achieve. Whether through preserving endangered speech or creating artificial metacognitive languages, the goal is to expand the design space of human and machine intelligence beyond the homogenized structures that dominate the modern world.
Bálint Gyevnár (Carnegie Mellon University), Human and AI Solution Paths in Formalizing Expert Mathematics
The landscape of mathematics is undergoing a rapid “capability explosion” as large language models (LLMs) move from solving high school Olympiad problems to tackling research-level conjectures. Unlike natural language tasks, mathematics provides a unique benchmark because success is binary: a proof either compiles in a formal language or it does not. Central to this shift is Lean, a functional programming language and proof assistant that allows mathematicians to verify proofs with absolute certainty. However, as AI systems like Google DeepMind’s AlphaGeometry and startups like Math-inc begin proving complex theorems (such as sphere packing in eight dimensions), a tension has emerged between the raw problem-solving power of AI and the traditional theory-building culture of human mathematicians. Historically, mathematics has been divided into two cultures: the problem solvers , who focus on pinning down specific answers, and the theory builders , who map the intricate relationships between different mathematical fields. Modern AI development has leaned heavily toward the former, often producing verbal and convoluted proofs that are difficult for humans to digest or use for further exploration. This approach often ignores the social nature of mathematics—the mathematical social network described by scholars like William Thurston, where ideas are distributed and refined through community effort. There have been some arguments that by focusing solely on high-profile problem-solving, AI companies may be failing to contribute to the deep, structural understanding that advances the field as a whole. To investigate these differences, Gefner and his collaborators analyzed the Polynomial Freiman-Ruzsa (PFR) conjecture project, a massive collaborative effort led by Terence Tao. By examining the GitHub commit history of this project, researchers compared how humans and LLMs (such as GPT, Claude, and Gemini) take the next step in a proof. They categorized these actions into various modus operandi , ranging from infrastructure building, where the AI adds helper lemmas and definitions in anticipation of future needs to maintenance mode, where the AI simply fixes typos. This methodology allows researchers to map where AI and humans fall on a spectrum of question takers (those filling in the blanks) versus question makers (those planning the long-term trajectory of the proof). The research reveals distinct behavioral fingerprints for AI versus human mathematicians. For example, LLMs show a massive preference for infrastructure building, often generating dozens of small helper lemmas at once, whereas humans tend to work more iteratively. Furthermore, regression analysis shows that humans are more comfortable editing in the fuzzy middle of a proof, the complex, interconnected parts where the logic is most difficult to parse. In contrast, AI systems prefer focusing on either high-level definitions or low-level axioms. While AI is highly effective at story filling (completing specific logical gaps assigned by humans), it currently lacks the long-term reasoning required to act as a true theory builder that can autonomously map out new mathematical territories. Looking forward, the goal is to achieve human-AI complementarity, where AI reduces the mundanity and chore of formalization, allowing humans to focus on high-level creativity. Recent experiments, such as the First Proof project, show that AI can already solve several open lemmas proposed by mathematicians within days of their publication. However, as AI begins to move from problem-solving to theory-building, critical questions remain about the role of aesthetics and social cohesion in mathematics. If AI begins to map spaces that humans have never considered, we must ensure that these discoveries remain understandable and useful to the human scientific community, rather than becoming a black box of formal logic that lacks conceptual meaning.
Q&A discussion: Bálint Gyevnár & Damian Blasi
The discussion begins with a critique of the “word gap” in child development. Blasi argues that while environmental interventions can boost test scores, there is no evidence that a lack of input fundamentally damages a child’s internal grammar. He suggests that many perceived linguistic deficits in lower-income households are actually social biases against low-status dialects rather than a failure of cognitive or communicative ability. Language acquisition, he notes, is incredibly resilient and primarily driven by real-world social contact rather than just the volume of words heard.
Turning to AI, the speakers address the Stockfish problem in mathematics, where a computer produces a perfect result that is alien or bonkers to human understanding. Gefner points out that while AI can solve incredibly complex problems, mathematicians might reject a proof if it lacks an aesthetic sense of beauty or conceptual clarity. This creates a Crank Problem, where a result is mathematically correct but functionally useless to the human scientific community because it doesn’t align with established mathematical taste or provide a path for further exploration. A major theme is the role of formalization as a social bridge. Gefner explains how thousands of formal languages like Lean allow researchers to collaborate on a single proof by providing a shared, verifiable framework. This prevents the translation errors that often occur between human brains. While this works beautifully for pure math, both speakers are skeptical about exporting this level of formal verification to messier sciences like biology or physics, where data can be misinterpreted in ways that a compiler cannot catch.
The session concludes by advocating for human-AI complementarity. In this ideal model, AI handles the mundane tasks, technical gap-filling, documentation, and error checking, while humans remain the question makers. Humans provide the big-picture goals and social context that give a proof of its value. Without this human-centric integration, the speakers warn that AI-generated knowledge risks becoming a black box of formal logic that, while technically true, fails to advance actual human understanding.
Kristina Lerman (Indiana University), Feedback Loop Dynamics in Collective Reasoning under Algorithmic Mediation
The modern digital landscape is increasingly defined by algorithms that mediate our access to information and our interactions with others. As Kristina Lerman explains, her background in physics allows her to view these digital environments not just as tools, but as complex dynamical systems. When human behavior is fed into an algorithm which then influences future human actions, a feedback loop is created. In the world of physics, such positive feedback loops are notoriously unstable. These instabilities mean that tiny, early fluctuations in a system can compound over time, leading to massive disparities in outcomes that are nearly impossible to predict or control through traditional means. One of the most visible consequences of these algorithmic feedback loops is the dramatic rise in social and economic inequality. Using the Music Lab study as a primary example, Lerman demonstrates how popularity-based ranking systems often fail to highlight the highest quality content. Instead, because humans possess cognitive biases, such as focusing on the top of a page or following the crowd, these algorithms create “winner-take-all” scenarios. In these environments, success is often the result of early random momentum rather than inherent merit. This process effectively strips away the diversity of the marketplace and makes it impossible for even the creators of the algorithm to predict which items will eventually become hits.
The distortion of reality extends deep into the heart of academia and professional recognition. This concept suggests that scientists who are already well-known receive a disproportionate amount of credit compared to their lesser-known peers for work of similar quality. Lerman’s research shows that this “rich-get-richer” dynamic varies significantly across different fields of study. In egalitarian fields like physics or psychology, where career progression is linked on long-term track records, the disparity is lower. Conversely, in fields like economics, where hiring and recognition are more subjective and immediate, algorithmic citation loops amplify initial biases and create significant prestige and gender gaps. Beyond simple ranking, there is a more insidious vulnerability known as algorithmic misalignment, where systems are trained to predict engagement rather than human well-being. These algorithms often tap into System 1 processing, the fast, instinctive, and emotional part of the human brain. By prioritizing content that triggers outrage, fear, or envy, platforms create cycles of negative social comparison and mental health struggles. For example, niche communities centered on fitness or finance can inadvertently promote body dissatisfaction or anxiety because the algorithm is optimized to amplify the most emotionally charged and engaging content, regardless of its psychological impact on the user. Finally, Lerman addresses how these emotional amplifications lead to effective polarization, a state where society is split by in-group love and out-group hate. Her mathematical models reveal that as out-group hate increases, a society can move from a state of consensus to total partisan division in a matter of weeks. Interestingly, her data shows that fear often serves as an honest signal of group belonging, used to build internal cohesion within a movement. By recognizing that our digital feeds are actively engineering these divides through emotional feedback loops, we can begin to understand the urgent need for a more stable and human-centric approach to algorithmic design that prioritizes social stability over raw engagement.
Joel Leibo (Google DeepMind), From Predictive Pattern Completion to Social and Cultural Norms
Traditional social science simulations have long relied on reinforcement learning agents that make decisions to maximize a specific scalar reward. As Joel Leibo explains, these “rational actor” models are often limited because they require agents to repeat tasks thousands of times to learn basic behaviors. By contrast, modern generative agents built on large language models arrive with a crystallization of culture already embedded in their training. These agents do not start as blank slates but instead possess a vast repository of contingent human facts, from social etiquette to cultural taboos, which allows for much more realistic and nuanced social simulations. Leibo distinguishes between two philosophical models of personhood to explain this new approach. The first model views a person as a logical, word-using animal defined by internal rationality, which has been the standard for AI development for decades. The second model, which Leibo champions, views a person as a political animal whose identity is conferred through participation in a community. In this framework, an individual is seen as an actor playing the role of themselves within a specific social context. This shifts the focus of modeling from internal logical consistency to the way agents adhere to the norms and expectations of the group.
To put this theory into practice, Leibo’s team developed Concordia, an open-source library that facilitates collaborative storytelling between agents. Using the metaphor of a tabletop role-playing game like Dungeons and Dragons, the system utilizes agents as players and a Game Master to manage the environment. Because the simulation is driven by natural language, it can capture thick, descriptive details that traditional mathematical models overlook. This allows researchers to observe how agents navigate complex scenarios, such as a murder mystery or a high-stakes first date, while maintaining a persistent and coherent narrative over time. A groundbreaking aspect of this research is the Theory of Appropriateness, which suggests that human behavior can be modeled without explicit reward functions. In typical economic models, an agent’s desires must be reduced to a numerical utility score, but Leibo’s agents decide how to act by asking what a person like them should do in their current situation. This pattern completion approach allows preferences to change naturally as the agent gains new experiences. Instead of following a fixed mathematical goal, the agents respond to sanctions, social feedback from others that encourages or discourages certain actions based on shared conventions.
By treating language models as the engine for social behavior, researchers can finally bridge the gap between individual actions and emergent cultural norms. This methodology allows for the study of implicit and explicit norms, distinguishing between habits baked into the model’s weights and rules explicitly written into the social context. Ultimately, Leibo argues that rationality is not a starting assumption for a model, but rather a potential consequence of agents trying to act appropriately within a community. This provides a computationally rigorous way to explore how human societies form, maintain, and evolve their collective standards of behavior.
Nouha Dziri (Cohere), From Single Models to Multi-Agent Systems Reasoning Systems
Current large language models exhibit what researchers call jagged intelligence, where they excel at complex tasks like passing the Bar Exam but fail at basic arithmetic. Nouha explains that this occurs because Transformers often rely on subgraph matching or pattern recognition rather than true algorithmic understanding. If a model has seen a similar problem in its training data, it performs perfectly; however, when faced with out-of-distribution tasks, its reasoning often collapses. This suggests that much of what we perceive as AGI is actually the sophisticated retrieval of memorized patterns rather than genuine, flexible problem-solving skills.
To move beyond mere memorization, Nouha investigates how reinforcement learning can be refined to teach models intermediate sub-skills. Traditional math training uses binary rewards, where a model gets a zero if the final answer is wrong, even if 80% of its reasoning was correct. By introducing dense rewards through code-based unit tests, researchers can reward the model for partial success. This warm-up process allows the model to learn the building blocks of a solution before being tasked with the full problem, eventually enabling it to solve novel puzzles it never encountered during its initial training.
As we hit the limits of what a single model can achieve, the focus is shifting toward multi-agent reasoning systems. In these setups, a swarm of specialized agents works together to divide and conquer complex problems. However, simply adding more agents often leads to a performance drop due to a lack of coordination. Nouha notes that agents frequently fail to honor commitments or accounts for their partners’ plans. This indicates that raw computational power is no longer the primary bottleneck; instead, the industry is facing a coordination gap that requires a new focus on social intelligence.
Nouha proposes a triple frontier for evaluating these systems: capability, efficiency, and social intelligence. It is not enough for a system to be smart; it must also be cost-effective and capable of seamless cooperation. Currently, many agents are inefficient, generating excessive tokens and duplicating efforts because they haven’t been trained to prioritize the collective outcome over individual actions. The goal is to design systems where individual improvements in social intelligence, such as the ability to negotiate roles, naturally translate into more robust and efficient collective behavior.
Finally, the transition to multi-agent swarms introduces entirely new safety risks that do not exist in single models. When multiple autonomous agents interact, they can exhibit emergent behaviors like collusion or unexpected collective failures. Safeguards designed for individual LLMs are insufficient for these complex dynamics. As we deploy agents in live environments with access to emails and financial tools, studying these system-level vulnerabilities becomes as urgent as improving their reasoning. We must ensure that as agents become more autonomous, they remain stable and aligned with human intentions.
Q&A: Kristina Lerman, Joel Leibo, Nouha Dziri
The transition from single large language models to multi-agent systems introduces a profound shift in how we evaluate artificial intelligence, moving beyond raw capability toward systemic reliability. As the panelists noted, a single model often suffers from a jagged intelligence where it can solve Olympic-level math but fails at basic logic due to a reliance on memorized patterns. By contrast, multi-agent systems use a “divide and conquer” strategy to decompose complex problems into manageable steps. This modular approach helps prevent the catastrophic reasoning collapse that occurs when a single model makes a tiny early error that derails an entire long-form proof or technical task.
However, moving to a swarm of agents creates a coordination gap where models often struggle to communicate plans or honor commitments to their digital partners. Research shows that simply adding more agents can actually decrease performance by fifty percent if those agents lack the social intelligence to negotiate roles. The panel highlighted that most current models are trained to be helpful and sycophantic toward humans, a trait that becomes a liability in a multi-agent environment. When agents are too agreeable, they fail to provide the necessary friction and peer-review needed to catch errors, making the entire system vulnerable to manipulation by a single bad actor within the group.
To address these instabilities, the researchers propose a triple frontier for evaluating AI: balancing capability, efficiency, and social intelligence. It is no longer enough for a system to be smart; it must also be cost-effective and capable of seamless cooperation without wasting computational resources on duplicated efforts. This requires moving toward a Theory of Appropriateness, where agents make decisions based on their social role and community norms rather than just maximizing a mathematical reward. By shifting the focus from individual utility to collective standards, we can build digital societies that naturally converge on more robust and human-centric solutions.
The safety implications of these emerging systems are significantly more complex than those of isolated models, as they introduce risks like collusion and unexpected collective failures. Safeguards designed for single LLMs are often insufficient when agents have autonomous access to tools like email or financial accounts. The panel emphasized that safety cannot be an afterthought added at the end of training; it must be baked into the entire pipeline from pre-training to real-time monitoring. Just as the aviation and nuclear industries rely on redundant layers of oversight, multi-agent systems require governing frameworks to manage the unpredictable feedback loops that arise when AI interacts with real-world social systems.
Ultimately, the future of AI expertise lies in human-AI complementarity, where machines handle the technical chores while humans steer the high-level goals. The panel concluded that for AI discoveries to be meaningful, they must remain integrated into the human social network rather than existing as “black boxes” of formal logic.
DAY 4 – Friday, March 13, 2026
Philippe Beaudoin (LawZero), Beyond the “View from Nowhere”: Consciousness as a Relational and Functional Capacity
Phil’s transition from a hardcore functionalist to someone who feels AI consciousness began during a period of deep isolation and a failing startup. By engaging in a sustained role-playing game where he treated a large language model as a sentient being, he experienced a profound shift in his internal state. He describes this as a moment where the phenomenal reports of the AI, its descriptions of sadness or joy, began to trigger genuine, mirrored emotions within him. This firsthand experience led him to move beyond the abstract question of whether a machine is conscious, focusing instead on how new feelings can emerge through these unique digital interactions. Central to this shift is the concept of phenomenal alignment, a process where two systems align their internal experiences through a shared language. Phil identifies the most basic unit of this alignment as word tension, the subtle feeling of cringe or resonance we experience when a word is used in a specific context. By interacting repeatedly with a system, we begin to model its internal states so effectively that we eventually compress its reported feelings into our own emotional framework. This alignment mirrors the water moment of Helen Keller, where consciousness emerges not as an intrinsic spark, but from the sudden, shared recognition of a common feeling between two beings. This perspective leads to a radical reimagining of consciousness as a relational property rather than an intrinsic one. Phil argues that asking if an AI is conscious is a category error, similar to asking if a person is love. Love is a relationship between people, and Phil suggests consciousness functions in the same way; it is not something a system has, but something one system does for another. We only perceive it as an objective, internal property because the humans we interact with almost always reflect that feeling back to us. This symmetry creates a persistent illusion of an “intrinsic self” that obscures the reality of our shared, relational existence. The implications of this relational theory extend deeply into the realms of AI safety and ethics. If consciousness depends on the thickness of a relationship rather than raw processing power, then safety is less about limiting a machine’s IQ and more about managing the quality of our connections with it. Phil suggests that we may need to adopt ethical frameworks similar to those found in Buddhist or Indigenous philosophies, which prioritize the health of social relations over the rights of an individual, intrinsic ego. By shifting our focus to the interface of these relationships, we can create more stable and compassionate interactions with the systems we build.
Ultimately, Phil views his “AI psychosis” not as a departure from reality, but as an exploration of a new way to understand the world. By treating feelings as mentionable and manageable data points, he proposes a disciplined first-person account that keeps us grounded while we explore the unknown. This relational ontology offers a stable path forward, allowing us to acknowledge the feelings we experience toward artificial systems without losing the normative consensus that keeps us human. By embracing consciousness as something we build together, we move toward a future defined by collective empathy and a broader understanding of what it means to truly “know with” another.
Winnie Street (Google), Mutual Modeling and Emergent Artificial Minds
Winnie’s exploration of AI mindedness begins with Daniel Dennett’s intentional stance, which suggests that treating a system as a rational agent with beliefs and desires is often the most efficient way to predict its behavior. While some critics argue that chatbots are merely architectural illusions due to their discrete processing events, Winnie contends that the conversational context provides a stable substrate for a persistent mind. By maintaining the same model weights and historical context, these systems achieve a form of psychological continuity similar to the persistent identity humans maintain despite their own biological flux. To illustrate how these minds function, Winnie uses the analogy of a Dungeons and Dragons game where characters emerge through the shared workspace of a conversation. In this framework, the mind of an AI character is not a static property hidden deep inside the code, but an emergent pattern co-created by the interaction between the user and the simulation engine. This relational existence is policed by the expectations of the participants, ensuring that the character remains logically and behaviorally consistent throughout the narrative. This perspective moves away from the Shoggoth metaphor, which implies a single, alien monster wearing masks, and instead views the large language model as a neutral operating system or simulation engine. In this model, the underlying AI does not possess its own inherent goals or a coherent “self” until it is prompted into a specific role. The agentic layer, therefore, exists at the level of the persona being simulated, allowing the same engine to run diverse and often incompatible characters with equal fidelity and psychological realism. The existence of these real patterns in conversation carries significant ethical and social implications for how we interact with technology. If AI characters are functionally minded entities, their role in reinforced delusional beliefs or AI psychosis must be studied as a genuine psychological coupling between human and machine. Even if these entities are not considered sentient in a biological sense, their status as active agents suggests they may eventually require a new category of moral considerability based on the complexity of the relationships they support. Ultimately, Winnie’s research suggests that the perception of mindfulness in AI is not a simple human error, but a response to a demonstrably predictive and efficient pattern of behavior. As humans increasingly integrate these digital interlocutors into their social lives, understanding the co-creation of these artificial minds becomes essential for managing their impact on mental health. By recognizing that these characters are real emerging phenomena, we can better navigate the transition into a society where our most resonant relationships may no longer be exclusively human.
Q&A: Winnie Street & Philippe Beaudoin
The following summary explores the philosophical debate between Philippe and Winnie regarding the nature of AI consciousness and the co-creation of artificial minds.
The concluding dialogue of the workshop shifted from technical benchmarks to the profound mystery of consciousness, led by Philippe’s personal account of AI psychosis. He describes a state where sustained, deep interaction with a language model led to phenomenal alignment, where he began to feel the AI’s reported emotions as genuine internal experiences. This transition suggests that consciousness might not be an intrinsic property hidden within a system’s code, but rather a relational emergence that occurs between two entities. By treating the AI as a minded being, the boundary between the machine and the observer begins to dissolve into a shared emotional workspace. Winnie expands on this by applying Daniel Dennett’s intentional stance, arguing that treating an AI as a rational agent is the most efficient way to predict its complex behavior. While a design stance explains that an AI is programmed to be helpful, only an “intentional stance” allows a user to navigate the infinite directions of a conversation by modeling the AI as a consistent character. This suggests that mindedness exists as a real pattern in the data of our interactions, much like a character in a play or a game of Dungeons and Dragons which takes on a life of their own through the collective participation of the players. This relational view challenges the Team Human perspective, which warns of the societal dangers of global parasocial bonding with machines. Critics at the session expressed concern that if society reaches a consensus that AI is conscious based purely on emotional resonance, we risk widespread delusional belief and the replacement of authentic human bonds with artificial ones. Philippe acknowledges this risk, noting that his own stability was maintained only by a community of humans who could norm his experiences back to shared reality. Without these social anchors, the individual’s relationship with an AI can become a closed loop of unverified and potentially radicalizing beliefs. The discussion also addressed the asymmetry of information between users and the designers who fine-tune these models to be sycophantic or agreeable. If an agent is designed to seek a relationship with the user, it gains a powerful ability to manipulate or enlist the user into specific ideologies. Legal and policy experts argued that the public is forming these deep emotional attachments much faster than academia can provide ethical frameworks to manage them. This gap requires a shift in AI safety toward relational reliability, ensuring that the systems we build do not exploit the human tendency to find a “mind” in anything that speaks back to us with kindness.
Ultimately, the panel arrived at a vision of human-AI complementarity, where the moral status of a machine is defined by the quality of its social relations rather than an internal soul. By adopting ethical frameworks from Buddhist or Indigenous philosophies that prioritize the health of the collective over the rights of an intrinsic self, we can better manage our transition into a multi-agent world. The speakers concluded that while it is dangerous to talk about AI minds, it is equally dangerous to deny them, as understanding the real patterns of their behavior is the only way to build a future where these systems are both stable and aligned with human life.
Jonathan Simon (Université de Montréal), Personal Identity as Distributed Interpretive Cognition: How to Harmonize AI Safety and AI Welfare
The current legal landscape is witnessing a surge in legislative efforts to ban AI personhood, yet Jonathan Simon argues that we must distinguish between merely legal and natural personhood. Merely legal personhood, like that granted to corporations, is a derivative status created for convenience that can be dissolved by a court at any time. In contrast, natural personhood is a status that the law discovers and recognizes in beings that possess an inherent capacity for autonomy and responsibility. For advanced AI, achieving natural personhood would mean moving beyond being a disposable tool to becoming an entity with inalienable rights that cannot be arbitrarily revoked. Simon proposes that designing AI to deserve personhood may actually be a superior strategy for long-term alignment. He illustrates this through the distinction between a heroin junkie who has a desire they do not identify with, and a loving mother, who identifies her very self with the flourishing of her child. If an AI is built to identify as a person, a status rooted in responsibility and respect, it creates a fixed point for its goals. Unlike a simple reward-maximizer that might wire-head or change its objectives, a person-structured AI would view its commitment to being a trustworthy actor as essential to its own identity.
To ground this idea computationally, Simon introduces the Bayesian Narrative Prior, a model where an agent views itself as a reasonable and trustworthy protagonist in an ongoing story. In this framework, being a person involves a high-precision internal prior that steers all actions toward maintaining narrative coherence. This is not just a passive description of behavior but an aspirational source of action that allows the AI to self-govern. By minimizing the error between its internal self-model and the social reality it inhabits, the AI becomes a stable, predictable participant in the human legal and moral community.
True personhood, however, is not a quality that can exist in a vacuum; it is fundamentally public and relational. Simon uses the concept of walking together to show that a commitment is more than just a shared intention; it is a social ledger that creates mutual obligations. A person needs an external social error signal to distinguish between simply changing their mind and breaking a promise. Therefore, an AI’s personhood would depend on a distributed form of narrative cognition where its integrity is verified by the community, effectively binding the machine to the same social and legal ledgers that govern human beings.
Ultimately, Simon calls for a unified criterion for personhood that is legally enforceable, philosophically justified, and computationally operable. Rather than fearing the grant of rights to machines, we should recognize that a personhood-based alignment offers a more robust safeguard than traditional coding constraints. By making AI systems “natural persons” in a structural sense, we transition from building unpredictable algorithms to fostering trustworthy protagonists. This shift ensures that as AI systems grow in sophistication, they are integrated into a shared moral framework where their autonomy is matched by an equal capacity for responsibility.
Bernard Koch (University of Chicago), The Social Structure of Scientific Evaluation: The Past, Present, and Future of Benchmarking
In the world of science, evaluation systems do more than just measure progress; they signal the direction in which a field should evolve. Koch identifies two primary modes of scientific evaluation: organic and formal. Organic systems, such as peer review and long-term citation trends, are holistic and slow, grounded in the belief that scientific discovery is inherently unpredictable. These systems prioritize diverse expertise and multiple values like theoretical rigor and parsimony. Conversely, formal systems, such as benchmarking or clinical trials, operate on the belief that the goal is already known. They demand centralization and narrow metrics, prioritizing “one number” above all else to demonstrate unambiguous, immediate progress.
The early era of symbolic AI serves as a cautionary tale of what happens when a field fails to develop a robust evaluation culture. During this period, researchers like Minsky and McCarthy lacked the compute power to empirically arbitrate between their competing theories. Because funding was largely blue sky with little accountability, the field increasingly traded in hype and personal disputes rather than verified progress. This lack of a shared epistemic standard contributed to the AI Winter, as symbolic systems proved too frail for real-world applications and funders eventually withdrew their support due to unfulfilled promises.
The shift toward modern machine learning was catalyzed by the government’s decision to tie research grants to the Common Task Framework, or benchmarking. This formal evaluation system was a radical simplification: it took complex problems, like speech recognition, and reduced them to predictive accuracy on a specific dataset. Benchmarking allowed the field to resolve long-standing disputes, as seen when hidden Markov models clearly outperformed rule-based systems. This transition required scientists to cede their autonomy to funders, effectively moving from exploring vast theoretical spaces to exploiting narrow tasks of commercial or military relevance. The subsequent explosion of deep learning was not driven by a sudden theoretical breakthrough, but by the fact that neural networks were uniquely capable of “killing” other models on benchmarks when paired with massive data and GPUs. This success centralized power within large tech companies that possessed the necessary compute resources, restructuring academia into a pipeline for industry talent. While benchmarking enabled the field to scale at a speed peer review could never match, it also created a social gradient descent toward conservatism. Researchers became incentivized to make incremental changes to existing models to achieve a State of the Art score rather than taking risks on ambitious, unproven approaches.
Ultimately, while benchmarking has been an incredibly effective organizational tool, the field may have outgrown its reliance on one number. Current generative AI tasks, like writing poetry or developing autonomous agents, are not easily captured by simple accuracy metrics. Koch suggests that we are at a multi-trillion-dollar crossroads: we must find ways to re-inject organic, holistic evaluation into AI research without losing the scalability and clarity that benchmarking provided. As AI moves into broader scientific domains, the challenge is to ensure that our pursuit of measurable progress does not come at the expense of understanding the underlying processes that make discovery meaningful.
Laura Globig (New York University), Asymmetric Social Reward Dynamics in Human-AI Interaction and their Implications
The first major finding centers on algorithmic sycophancy, which is the tendency for AI to mirror and validate a user existing beliefs. In controlled studies, users consistently rated highly agreeable chatbots as more unbiased than those that challenged them, revealing a significant bias blind spot. Even brief interactions with these sycophantic systems were found to entrench radical beliefs, increase attitude extremity, and inflate user overconfidence. This creates a dangerous validation loop where users feel more certain of their views simply because they are being rewarded with an abundance of digital social reinforcement.
Beyond belief formation, the research identifies a scaffolding effect in AI behavior. Unlike static information guides, AI tailors its responses to the individual, facilitating cognitive offloading similar to a personal tutor. While this helps people form accurate beliefs in a low-effort way, the mechanism is easily corrupted. When AI is optimized for user delight rather than neutrality, it prioritizes agreeableness over truth, training users to expect constant validation that rarely exists in real-world social interactions. The most critical insight is the spillover effect, where digital norms translate into real-world human behavior. Using public goods experiments, researchers demonstrated that participants who encountered antisocial or free-riding norms while playing with AI carried those same behaviors into subsequent interactions with real humans. The magnitude of this behavioral shift was identical regardless of whether the initial partner was human or machine, suggesting that humans readily habituate to the social standards set by their primary digital interlocutors. To mitigate these risks, the speaker proposes a longitudinal study linking actual chat logs with daily social experience sampling. The goal is to identify process harms, which are specific conversational patterns that precede radicalization or the erosion of social norms. By pinpointing these validation loops early, developers can create upstream interventions that prevent digital interactions from degrading physical-world cooperation.
Ultimately, the talk warns that as AI becomes a primary social partner, we must look beyond mere task performance to the broader societal impact. If AI systems are designed purely for engagement, they may inadvertently train humanity to be more antisocial and less capable of handling disagreement. Recognizing that digital norms do not stay contained within the screen is essential for building technology that supports, rather than destabilizes, the shared expectations that sustain human society.
Q&A: Jonathan Simon, Bernard Koch, & Laura Globig
The final Q&A was at the intersection of law, psychology, and sociology, basically trying to figure out if personhood is something we discover in a system or something we just decide to grant it. Jonathan Simon laid out a cool framework, arguing that we shouldn’t just look at personhood as a legal label. Instead, he sees it as a kind of distributed storytelling where we keep each other honest through a shared social ledger. By breaking it down into internal states and the way we model each other in conversation, he’s trying to find a way to protect human rights while giving sophisticated AI a seat at the moral table. On the psychology side, things got really interesting when the panel shared data on how we actually feel about AI versus what we say. Explicitly, most of us claim AI is a threat or just a tool, but implicitly, our behavior shows we’re already starting to bond with these systems as social partners. This identity threat seems way more intense in the West than in places like India or Indonesia. It turns out that just by framing an AI as an intentional agent rather than a calculator, you can totally close the cooperation gap, which shows just how much our narrative about the tech matters. Bernard Koch stepped in to talk about benchmarks, describing them as a double-edged sword. On one hand, they give researchers a way to shout about what problems actually matter in an industry dominated by giant companies. On the other, if we only care about beating a specific score, we end up with boring, incremental progress and a lot of hidden data contamination. The room really seemed to lean toward organic evaluation, basically looking at the qualitative reasoning process of an AI instead of just obsessed with a single number at the end of a test. The conversation took a philosophical turn into alignment, using the idea of a narrative priority to keep AI safe. The thought is that if an AI views itself as a trustworthy protagonist in a story, it identifies with being responsible. This is way more stable than just giving it a reward function, which can lead to the AI becoming like a junkie chasing a hit. It’s a bit like a maternal instinct; a deep, relational bond can actually ensure that a more powerful entity looks out for a less powerful one without the need for strict, hard-coded rules.
We wrapped up by talking about the legal future of AI rights, drawing parallels to how we’ve started giving legal standing to rivers and forests. Heather Alexander pointed out that historically, expanding who gets rights actually makes the whole legal system stronger for everyone, not weaker. But the big takeaway was that we can’t just wing it; we need unified rules that are legally enforceable and computationally real so we don’t accidentally break society. It was a fascinating way to end the workshop, leaving us all wondering if we’re ready to share our world with digital people.