As the landscape of artificial intelligence evolves, ensuring the safety and alignment of superintelligent language models (LLMs) is paramount. This workshop will delve into the theoretical foundations of LLM safety. This could include topics like the Bayesian view of LLM safety versus the RL view of safety and other theories.
The flavor of this workshop is futuristic, focusing on how to ensure a superintelligent LLM/AI remains safe and aligned with humans. This workshop is a joint effort of the Simons Institute and IVADO.
Key Topics:
- Bayesian Approaches to LLM Safety
- Reinforcement Learning Perspectives on Safety
- Theoretical Frameworks for Ensuring AI Alignment
- Case Studies and Practical Implications
- Future Directions in LLM Safety Research
This workshop is part of the programming for the thematic semester on Large Language Models and Transformers, organized in collaboration with the Simons Institute for the Theory of Computing.
Travel grants are available to attend the event in California.
Workshops will also be available online and live (registration required).