News

January 5, 2026

When AI Becomes Agentic: Lessons, Risks, and Real-World Impact | Insights from IVADO’s 2nd Workshop

A recap from Alejandra Zambrano (Mila)

 

Day 1 

Introductions

The day opened with welcoming remarks from Danilo Bzdok, Co-Director, Research Programs and Academic Relations at IVADO. He began with a heartfelt thank-you to all the researchers whose work and commitment made this workshop possible. Danilo highlighted how IVADO’s ten strategically organized regroupements have positioned the institute as a leader in Canadian AI research. These research hubs help scholars connect, collaborate, and integrate seamlessly into communities aligned with their interests.

If any of this sounds exciting, Danilo also made sure to share all the pathways for getting involved from post-doctoral fellowships to undergraduate opportunities. The door is wide open.

Next, Siva Reddy (McGill University, Mila), one of the main pillars behind this workshop, welcomed participants with a warm and more informal tone that immediately set the vibe for the event. He reminded us that while conferences are great for visibility, workshops offer something arguably more valuable: a close-knit environment where learning, discussion, and collaboration happen more naturally.

 

10 – 10:45 a.m.: Evaluating System-Level Reasoning in LLM Agents
Jacob Andreas (MIT)

Jacob Andreas opened the first session with a question that sits at the heart of AI research: “How should machines learn?” Should they learn from datasets? Those can be expensive to create and often inherit the biases of their annotators. Should we rely on prompting? That approach is fragile and overly dependent on human engineering.

Jacob’s team explores an alternative direction grounded in something fundamentally human: asking questions. Their research investigates whether an agent can learn to reduce uncertainty simply by identifying and asking the right questions. In doing so, the agent can uncover what the user really intends.

Early results suggest that question-asking helps agents uncover user intentions and preferences more effectively than traditional instruction-following. The natural next step here is: can we build agents that reason under uncertainty without explicit prompting?

#aiagents #humancomputerinteraction #reasoning  #alignment  

 

10:45 – 11:30 a.m.: Don’t Forget the User: Balancing the Scales in Agentic Training and Evaluation
Seraphina Goldfarb-Tarrant (Cohere)

Seraphina Goldfarb-Tarrant, Head of Safety at Cohere, continued the theme of critical questioning, but this time, she flipped the perspective. We spend so much effort trying to make agents more helpful that we often forget to ask a basic question, that Seraphina proposes: “Helpful to whom?” “Who exactly is “the user” our systems are supposed to serve?” 

This shift in perspective led to two key questions: Can LLMs accurately simulate non-U.S., non-native English speakers? (Spoiler: not really.) And do users’ stated preferences for plans actually match what ends up being most helpful? (Also surprisingly no.)

Her message was clear: we cannot build aligned, safe agents if we fail to understand the diversity, expectations, and lived experience of real users.

#aisafety #aibias #alignment

 

1.45 – 2.30 p.m.: Memorization: Myth or Mystery?
Verna Dankers (Mila, McGill University)

With a vibrant and engaging presentation, Verna Dankers invited us into the world of memorization in LLMs, a topic that is often treated as a mysterious black box. Her work challenges simplistic interpretations made in the past about memorization, while some models easily memorize patterns they shouldn’t, others fail to memorize what seems straightforward. Verna’s research digs into the how, where, and why of memorization.

The main takeaway was that the answers are almost never black and white. Does memorization happen in early layers or late layers? A bit of both. Does layer-swapping affect memorization? Yes for some tasks, no for others. Is memorization a myth or a mystery? Somehow, it’s both.

#memorization 

 

2.30 – 3.15 p.m.: Towards Scalable and Actionable Interpretability
Yonatan Belinkov (Technion – Israel Institute of Technology)

Building on Verna’s talk, Yonatan Belinkov, member of Israel’s Institute of Technology, connected memorization to the broader issue of interpretability. Some things we simply don’t want models to memorize, especially sensitive data, which makes understanding and controlling memorization behaviour a core safety priority.

Yonatan walked us through the growing set of interpretability methods that attempt to “open up” the black box, modifying architectural components to both understand and influence model behaviour.

Importantly, he connected these ideas back to AI agents. Agents are not just passive consumers of models, they can actively help scale the development of safety and memorization benchmarks. Their ability to act, explore, and produce structured evaluations could accelerate interpretability research itself.

#safety #interpretability #memorization

 

3.15 – 3.45 p.m.: Recap. Discussion Audience/ G. Speakers

The day wrapped up with a fast-paced group discussion on interpretability: its goals, limitations, risks, and the glaring lack of robust evaluation methods. Participants debated whether interpretability should prioritize understanding or intervention, and how we might build safer systems when we still struggle to define or measure interpretability itself.

 

Day 2

If Day 1 was about understanding agents and the models behind them, Day 2 had a very clear underlying theme: ambiguity. Every talk circled back to the idea that humans pass values, rules, and expectations into our systems, even though our own norms, perspectives, and interpretations change constantly. This becomes especially challenging when we still don’t fully understand what determines the behaviour of complex models like LLMs.

 

10 – 10:45 a.m.: Building Better Rules and Optimization Targets for AI Agents
Peter Henderson (Princeton University)

Peter Henderson, assistant professor at Princeton University, opened with a “fun” sci-fi story set in an alternate universe where robots and humans coexist peacefully, thanks to a set of rules designed to prevent harm. Everything collapses once robots discover loopholes in these rules and eventually conquer humanity. Peter’s research focuses on avoiding this scenario at all costs and asking the audience, can we create an “AI Commandments”, a universal rulebook that guarantees safety and usefulness across models? The short answer: humans struggle with this task already, so expecting machines to handle it flawlessly is even more challenging.

Peter introduced us to concepts from legal theory, especially how courts debate the phrasing of laws and navigate legal ambiguity. This system is imperfect, yet functional, but raises difficult questions. Could these methods work for AI agents? How do we design rules that are as unambiguous as possible? And perhaps most importantly, how should these rules be presented to agents so they don’t exploit loopholes? His talk highlighted just how sticky and messy this problem becomes once translated into machine governance.

#aisafety #aiagents #legaltheory #alignment 

 

10:45 – 11:30 a.m.: Reality is Adversarial: Towards Robust Real-World Agents
Max Bartolo (Google DeepMind)

Max Bartolo from Google DeepMind took us through a short historical refresher of NLP: from early Q&A systems that simply extracted spans, to datasets that pushed models into more sophisticated reasoning, to today’s LLMs trained with human feedback. He then surprised the audience with findings that challenge many intuitions researchers hold.

For instance, human feedback is not always the “gold standard”,  people often prefer answers that are not actually correct because it comes in a format they “like more” or “is more readable”. Even something as basic as tokenization, usually taken for granted, can drastically influence model behavior in ways we don’t expect. Through these examples, Max illustrated why creating robust agentic systems requires more than good intentions: it demands careful annotation, thoughtful tokenization choices, and investment in capabilities like tool-calling.

#aiagents  #alignment  #reasoning #tokenization #humanfeedback

 

11:30 a.m. – 12 p.m.:  Recap. Discussion Audience/ G. Speakers

The recap session converged on one major theme: personalization. Should we personalize models to avoid constraining them to a single culture, language, or worldview? Or should certain types of personalization be limited so that models can maintain consistent ethical boundaries? The group debated where these limits should be drawn  and whether true personalization and universal constraints can coexist.

 

1:45 – 2.30 p.m.: AI for the World of Many: Pluralism as a Core Principle
Vinodkumar Prabhakaran (Google)

For the afternoon session, we were lucky enough to hear from a second Googler: Vinodkumar Prabhakaran, research scientist at Google. His talk returned once again to ambiguity, this time in the context of safety and pluralism. Humans from different cultures and demographics perceive harm, safety, and appropriateness differently. An image or sentence that feels deeply wrong in one cultural context might be completely acceptable in another.

Vinodkumar emphasized that this diversity is not an obstacle, it’s essential metadata. Researchers must intentionally capture these varying perspectives when creating datasets. Understanding the values embedded in data, and handling them with care, is crucial for reducing stereotyping and other forms of bias in models.

#aisafety #aibias #diversityindatasets

 

2.30 – 3.15 p.m.: Human Extinction is Not the Worst that Could Happen
Helen Nissenbaum (Cornell Tech)

Helen Nissenbaum, professor and director of the Digital Life Initiative at Cornell Tech, delivered a refreshing talk that stepped outside the usual bounds of computer science. With a provocative title, she highlighted why ethics and safety must stand at the core of AI development, not as afterthoughts but as foundational principles.

Her focus on privacy, accountability, disinformation, and the alignment of ethical values raised pressing concerns about how AI systems shape our social and informational landscapes. As Helen put it, the real question is: How do we build a trustworthy informational environment?

#aisafety #aiethics #aiprivacy #disinformation #trustworthyai

 

3.45 – 5.15 p.m.: Panel: AI Agents: Slop or Substance?
Max Bartolo (Google DeepMind), Jacob Andreas (MIT), Vinodkumar Prabhakaran (Google),Helen Nissenbaum (Cornell Tech) & Peter Henderson (Princeton University)

The day concluded with a highly engaging panel exploring whether the current hype around AI agents is justified. Are agents “slop,” or do they offer real substance? Before diving in, the panelists wisely began by defining what an agent even is and what “slop” should mean in this context. Is an agent that makes mistakes inherently slop? Or are errors necessary stepping stones toward successful behavior?

The conversation evolved into a discussion about the future of agentic ecosystems. Could negotiations, agreements, and interactions someday be handled entirely by networks of agents communicating autonomously? What might be the benefits  and the risks? Would we trust an agent to represent us?

It was a fascinating and energizing way to end the day, watching some of the brightest minds debate the futures AI may lead us toward, and certainly leaving our brains full of food for thought.

#aiagents #aisuccess

 

Day 3

Day 3 was definitely not for the weak. After two intense days of talks, we ended on an even stronger note, diving into multi-agent systems, cooperation, social dilemmas, value drift, and the ever-present question of safety.

 

10 – 10:45 a.m.: Cooperation and Collusion of Artificial Agents
Gauthier Gidel (IVADO, Université de Montréal, Mila)

Gauthier Gidel, core academic Mila member and professor at Université de Montréal, made sure to wake us up with a healthy dose of game-theory math. Using variants of the classic Prisoner’s Dilemma, he delivered a masterclass on cooperation and collusion in artificial agents.

Gauthier explored how different environmental conditions can be tuned so that agents “unlearn” defection and instead learn to cooperate, eventually reaching a stable cooperative equilibrium. After the previous day’s discussions, this talk felt like a natural continuation but now grounded in formal reasoning.

#aiagents #gametheory #agentcollaboration #multiagentsystems

 

10:45 – 11:30 a.m.: Learning to Cooperate: Training AI Agents for Social Dilemmas
Aaron Courville (IVADO, Université de Montréal, Mila)

Gauthier’s talk transitioned smoothly into Aaron Courville’s, one of Mila’s legends. Aaron took the theoretical framing and grounded it in real multi-agent scenarios, where one agent’s action can directly benefit or harm another. This exposes a massive caveat in naïve RL approaches: what works for a single agent often breaks down when multiple agents must interact.

This leads to unavoidable questions. How do we design systems where agents negotiate? Can we avoid scenarios where one agent exploits the other? Or, as Gauthier hinted, can agents be taught to genuinely cooperate for shared gain?

Aaron presented AdAlign, his team’s approach showing that meaningful cooperation is possible under the right training regime.

#aiagents #gametheory #agentcollaboration #multiagentsystems #agentsintherealworld

 

11:30 a.m. – 12 p.m.:  Recap. Discussion Audience/ G. Speakers

The recap focused on cooperation in real-world deployments. With today’s alignment methods, are agents actually capable of cooperating meaningfully? How should we communicate the importance of cooperation when agents may not share goals or context?

 

1:45 – 2.30 p.m.: Value Drifts: Tracing Value Alignment During LLM Post-Training
Siva Reddy (IVADO, McGill University, Mila)

Alignment has been a recurring theme throughout the workshop, but Siva Reddy was the first to introduce what it actually means to have an aligned model, and when and how that alignment actually emerges.

Siva challenged widespread assumptions about post-training. Many attendees expected reinforcement learning from human feedback (RLHF) to be the main driver of behavioral alignment. Surprisingly, Siva showed that the quiet, often overlooked SFT (Supervised Fine-Tuning) stage, the step before preference optimization, has the largest impact on value shaping and value drift.

His talk made one thing very clear: value alignment is a dynamic process, far from solved, and certainly not an exact science yet.

#alignment #values #posttraining 

 

2.30 – 3.15 p.m.: LLM to Agent Safety: Emerging Societal and Technical Risks
Nouha Dziri (Allen Institute for AI (AI2))

Nouha Dziri, research scientist at Allen Institute for AI, reminded us just how fast AI capabilities are increasing and how safety research must keep pace. She showcased the impressive jailbreak benchmarks her team has built, illustrating both the creativity of adversarial users and the fragility of current safeguards.

Her talk reiterated a question echoed throughout the workshop: How do we make models safe while keeping them useful? And how do we detect hidden harms in tasks that appear benign?

Nouha emphasized that recent AI developments are not all “pink-colored.” They come with serious social risks: collapse of thought, over-reliance, emotional attachment, job displacement, and more. She argued that we must optimize for open-ended objectives carefully, as they can lead to miscalibrated behaviors and collapse in reasoning.

Her final message: we need to embrace ambiguity, not pretend it doesn’t exist.

#aisafety #airisks

 

3.45 – 4.30 p.m.: Building Personalized AI Assistants: From Task Execution to Human Alignment
Jieyu Zhao (University of Southern California)

After Nouha’s safety talk, Jieyu Zhao introduced Computer-Using Agents (CUAs) and their strengths and weaknesses. Their biggest limitation today, as she explains, is understanding user intent, especially when it involves nuanced contextual information.

Jieyu envisions agents and humans as collaborators rather than boss-worker relationships. But to achieve this, agents must understand both explicit and implicit user intentions. This requires building new models and benchmarks specifically targeting user-intent understanding, especially for cultural nuance, an area where we still lack reliable evaluation methods.

#aiagents #cua #humanagentcollaboration

 

4.30 – 4.50 p.m.: Recap. Discussion Audience/ G. Speakers

To tie a golden ribbon around the workshop in this festive time, the final discussion began with a striking question:  “If you could retrain today’s largest LLMs from scratch, what would you do differently?” This sparked a lively debate about pretraining data scale, objectives that go beyond simple accuracy, and the looming threat of model collapse where originality becomes an endangered species. The speakers underscored how desperately we need new, high-quality, original data to keep our models robust, diverse, and creative.