Safety-Guaranteed LLMs | IVADO

Le paysage de l’intelligence artificielle évolue, et il est primordial de garantir la sécurité et l’alignement des modèles de langage (LLM) superintelligents. Cet atelier se penchera sur les fondements théoriques de la sécurité des LLM. Cela pourrait inclure des sujets tels que la vision bayésienne de la sécurité des LLM par rapport à la vision RL de la sécurité ainsi que d’autres théories.

Le sujet de cet atelier est futuriste et se concentre sur la manière de garantir qu’un LLM/AI superintelligent reste sûr et aligné avec les humains. Cet atelier est un effort conjoint du Simons Institute et d’IVADO.

Thèmes principaux :

Approches bayésiennes de la sécurité des LLM
Perspectives de l’apprentissage par renforcement sur la sécurité
Cadres théoriques pour garantir l’alignement de l’IA
Études de cas et implications pratiques
Orientations futures de la recherche sur la sécurité des LLM

Cet atelier sera exclusivement en anglais.

L’activité fait partie de la programmation du semestre thématique intitulé « Les grands modèles de langage et les transformeurs » organisé en collaboration avec le Simons Institute for the Theory of computing.

Les bourses de voyage pour assister à l’événement en Californie ont toutes été attribuées.

Les ateliers seront aussi disponibles en ligne et en direct (sur inscription seulement).

Coorganisation scientifique

Yoshua Bengio (IVADO - Mila - Université de Montréal)

Siva Reddy (IVADO - Mila - McGill University)

Sasha Rush (Cornell University)

Umesh Vazirani (Simons Institute, UC Berkeley)

Programme de l’événement

Lundi 14 avril 2025

9:00 – 9:15 : Accueil et café
9:15 – 9:30 : Mot de bienvenue
9:30 – 10:30 : Simulating Counterfactual Training
    Roger Grosse (University of Toronto)
10:30 – 11:30 : Pause
11:00 – 12:00 : AI Safety via Inference-Time Compute
    Boaz Barak (Harvard University and OpenAI)
12:00 – 14:00 : Lunch (non fourni)
14:00 – 15:00 : Controlling Untrusted AIs with Monitors
    Ethan Perez (Anthropic)
15:00 – 15:30 : Pause
15:30 – 16:30 : Scalable AI Safety via Efficient Debate Games
    Georgios Piliouras (Singapore University of Technology and Design)
16:30 – 18:00 : Réception

Mardi 15 avril 2025

9:30 – 10:00 : Accueil et café
10:00 – 11:00 : Full-Stack Alignment
    Ryan Lowe (Meaning Alignment Institute)
11:00 – 11:30 : Pause
11:30 – 12:30 : Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight?
    Geoffrey Irving (UK AI Security Institute)
12:30 – 14:30 : Lunch (non fourni)
14:30 – 15:30 : Amortised Inference Meets LLMs: Algorithms and Implications for Faithful Knowledge Extraction
    Nikolay Malkin (Mila + The University of Edinburgh)
15:30 – 16:10 : Pause
16:10 – 17:00 : Lecture spéciale – Richard M. Karp Distinguished Lecture
    Yoshua Bengio (IVADO + Université de Montréal + Mila)
17:00 – 18:00 : Discussion Panel
    Yoshua Bengio (IVADO – Mila – Université de Montréal), Dawn Song (UC Berkeley), Roger Grosse, Geoffrey Irving, Siva Reddy (IVADO – Mila – McGill University)

Mercredi 16 avril 2025

8:30 – 9:00 : Accueil et café
9:00 – 10:00 : Robustness of Jailbreaking across Aligned Models, Reasoning Models and Agents
    Siva Reddy (IVADO + McGill University + Mila)
10:00 – 10:15 : Pause
10:15 – 11:15 : Adversarial Training for LLMs’ Safety Robustness
    Gauthier Gidel (IVADO + Université de Montréal + Mila)
11:15 – 11:30 : Pause
11:30 – 12:30 : Talk By
    Zico Kolter (Carnegie Mellon University)
12:30 – 14:00 : Lunch (non fourni)
14:00 – 15:00 : Causal Representation Learning: A Natural Fit for Mechanistic Interpretability
    Dhanya Sridhar (IVADO + Université de Montréal + Mila)
15:00 – 15:15 : Pause
15:15 – 16:15 : Out of Distribution, Out of Control? Understanding Safety Challenges in AI
    Aditi Raghunathan (Carnegie Mellon University)

Jeudi 17 avril 2025

9:00 – 9:30 : Accueil et café
9:30 – 10:30 : LLM Negotiations and Social Dilemmas
    Aaron Courville (IVADO + Université de Montréal + Mila)
10:30 – 11:00 : Pause
11:00 – 12:00 : Scalably Understanding AI With AI
    Jacob Steinhardt (UC Berkeley)
12:00 – 13:45 : Lunch (non fourni)
13:45 – 14:45 : Future Directions in AI Safety Research
    Dawn Song (UC Berkeley)
14:45 – 15:00 : Pause
15:00 – 16:00 : What Can Theory of Cryptography Tell us About AI Safety
    Shafi Goldwasser (UC Berkeley)
16:00 – 17:00 : Assessing the Risk of Advanced Reinforcement Learning Agents Causing Human Extinction
    Michael Cohen (UC Berkeley)

Vendredi 18 avril 2025

8:30 – 9:00 : Accueil et café
9:00 – 10:00 : Safeguarded AI Workflows
    David Dalrymple (Advanced Research + Invention Agency)
10:00 – 10:15 : Pause
10:15 – 11:15 : AI Safety: LLMs, Facts, Lies, and Agents in the Real World
    Chris Pal (IVADO + Polytechnique + Mila + UdeM DIRO + CIFAR + ServiceNow)
11:15 – 11:30 : Pause
11:30 – 12:30 : Measurements for Capabilities and Hazards
    Dan Hendrycks (Center for AI Safety)
12:30 – 14:00 : Lunch (non fourni)
14:00 – 15:00 : Theoretical and Empirical aspects of Singular Learning Theory for AI Alignment
    Daniel Murfet (Timaeus)
15:00 – 15:30 : Pause
15:30 – 16:30 : Probabilistic Safety Guarantees Using Model Internals
    Jacob Hilton (Alignment Research Center)
16:30 – 16:45 : Remarques de clôture

Retour à tous les événements