Accelerating Safe AI: Building trust in tomorrow’s technology.

Vision

The successes of large language models (LLMs), computer vision and generative models of text, images and audio have drawn considerable attention for the transformative effects they could have on society — consider applications like autonomous cars, AI tutoring systems deployed in under-resourced regions, and personal AI assistants. However, LLM releases far outpace careful research as to their safety. To this end, the goal of Regroupement R10 is to accelerate this much-needed research towards developing AI agents that are safe and reliable.

Objectives

The objectifs are:

  • Identify harmful behaviors that arise when AI systems increasingly interact with other agents (human users or other LLM agents).
  • Understand why AI systems make the decisions they do.
  • Develop algorithmic approaches for steering AI agents towards safe and ideally prosocial behaviors.

Research Axes

To this end, R10 proposes a multi-year research program on AI safety that divides this ambitious and broad topic into three concrete research axes:

Axe 1:  Evaluating and mitigating LLM biases and unsafe behaviors

Focus on two types of short-term harms that we risk with modern-day AI systems:

  • Harmful data biases amplified by LLMs and, vulnerabilities of LLM web agents that are exploitable by adversaries, including other LLM web agents that discover exploitative policies.
  • Undesirable behaviors from agents that interact with the physical world in unseen scenarios.

Axe 2: Improving LLM interpretability

  • Explain the predictions of LLMs by improving  LLM self-explanations, learning abstractions of LLMs  and visualizing neural networks adaptations.
  • Develop models that are inherently interpretable such as symbolic models.

Axe 3: Promoting safe and robust behaviors in multi-agent settings with LLMs

  • Novel evaluations of multi-agent dynamics with LLM agents.
  • Developing algorithms for multi-agent learning.
  • Incentivizing social good using causal models.

Challenges

We are in an era when companies release ever larger LLMs that surpass one another on benchmarks and even act on the web. As large AI systems grow ever more agent-like, several researchers point out the growing risk of catastrophic harms. At this pace, we may develop agents in the future that deceive and manipulate humans, with signs of this troubling behavior already emerging. Already, LLMs routinely generate made-up facts (often referred to as “hallucinations”), amplify harmful stereotypes in the training data and do not show convincing signs of having a coherent model of the world, as other intelligent agents do.

Anticipated Impact

The efforts will drive impactful outcomes, including publications in leading venues, the development of open source software libraries and community-building opportunities. R10 will offer educational opportunities for experts via workshops and summer schools and have an impact on the general public through efforts to educate about safe and responsible use of AI. R10 researchers expect its research team to play a key role in advising the government on AI safety issues, providing valuable insight on future regulations.

Research Team

Co-leaders

Yoshua Bengio
Université de Montréal
Chris Pal
Polytechnique Montréal
Dhanya Sridhar
Université de Montréal

Researchers

Research Advisor

Dana F. Simon: dana.simon@ivado.ca