The Future of Language Models and Transformers | IVADO

Transformers have now been scaled to vast amounts of static data. This approach has been so successful it has forced the research community to ask, “What’s next?”. This workshop will bring together researchers thinking about questions related to the future of language models beyond the current standard model. The workshop is meant to be exploratory and welcome to novel vectors in which new setups may arise, e.g. data efficiency, training paradigms, and architectures.

This workshop is part of the programming for the thematic semester on Large Language Models and Transformers, organized in collaboration with the Simons Institute for the Theory of Computing.

Travel grants are available to attend the event in California.

Workshops will also be available online and live (registration required).

Organizer

Sasha Rush (Cornell University; chair)

Swabha Swayamdipta (University of Southern California (USC))

Invited Participants

Sanjeev Arora (Princeton University), Kianté Brantley (Harvard University), Danqi Chen (Princeton University), Grigorios Chrysos (University of Wisconsin-Madison), Gintare Karolina Dziugaite (Google DeepMind), Zaid Harchaoui (University of Washington), Elad Hazan (Princeton University), He He (New York University), Andrew Ilyas (Stanford University), Yoon Kim (Massachusetts Institute of Technology), Aviral Kumar (Carnegie Mellon University), Jason Lee (Princeton University), Sewon Min (UC Berkeley), Azalia Mirhoseini (Stanford / DeepMind), Nanyun (Violet) Peng (UCLA), Daniela Rus (MIT), Sasha Rush (Cornell University), Kilian Weinberger (Cornell University), Luke Zettlemoyer (University of Washington), Denny Zhou (Google DeepMind)

Agenda

Monday, Mar. 31st, 2025

9 – 9:30 a.m.: Coffee and Check-In
9:30 – 10:30 a.m.: LLM Reasoning
    Denny Zhou (Google DeepMind)
10:30 – 11 a.m.: Break
11 a.m. – 12 p.m.: The Key Ingredients of Optimizing Test-Time Compute and What’s Still Missing
    Aviral Kumar (Carnegie Mellon University)
12 – 1:30 p.m.: Lunch (on your own)
1:30 – 2:30 p.m.: Openthinker: Curating a Reasoning Post-Training Dataset and Training Open Data Reasoning Models
    Alex Dimakis (UC Berkeley)
2:30 – 3 p.m.: Break
3 – 4 p.m.: LLM Skills and Meta-Cognition: Scaffolding for New Forms of Learning?
    Sanjeev Arora (Princeton University)
4 – 5 p.m.: Reception

Tuesday, Apr. 1st, 2025

9 – 9:30 a.m.: Coffee and Check-In
9:30 – 10:30 a.m.: What Will Transformers Look Like In 2027?
    Yoon Kim (Massachusetts Institute of Technology)
10:30 – 11 a.m.: Break
11 a.m. – 12 p.m.: Reducing the Dimension of Language: A Spectral Perspective on Transformers
    Elad Hazan (Princeton University)
12 – 1:30 p.m.: Lunch (on your own)
1:30 – 2:30 p.m.: Mixed-modal Language Modeling: Chameleon, Transfusion, and Mixture of Transformers
    Luke Zettlemoyer (University of Washington)
2:30 – 3 p.m.: Break
3 – 4 p.m.: Talk by
    Danqi Chen (Princeton University)
4 – 5 p.m.: Attention to Detail: Fine-Grained Vision-Language Alignment
    Kai-Wei Chang (UCLA)

Wednesday, Apr. 2nd, 2025

9 – 9:30 a.m.: Coffee and Check-In
9:30 – 10:30 a.m.: Inference Scaling: A New Frontier for AI Capability
    Azalia Mirhoseini (Stanford / DeepMind)
10:30 – 11 a.m.: Break
11 a.m. – 12 p.m.: Talk by
    Zaid Harchaoui (University of Washington)
12 – 1:30 p.m.: Lunch (on your own)
1:30 – 2:30 p.m.: Talk by
    Dileep George (Google DeepMind)
2:30 – 3 p.m.: Break
3 – 4 p.m.: Talk by
    Siva Reddy (IVADO – Mila – McGill University)

Thursday, Apr. 3rd, 2025

9 – 9:30 a.m.: Coffee and Check-In
9:30 – 10:30 a.m.: Advancing Diffusion Models for Text Generation
    Kilian Weinberger (Cornell University)
10:30 – 11 a.m.: Break
11 a.m. – 12 p.m.: Controllable and Creative Natural Language Generation
    Nanyun (Violet) Peng (UCLA)
12 – 1:30 p.m.: Lunch (on your own)
1:30 – 2:30 p.m.: Transformers Can Learn Compositional Function
    Jason Lee (Princeton University)
2:30 – 3 p.m.: Break
3 – 4 p.m.: Predicting and Optimizing the Behavior of Large ML Models
    Andrew Ilyas (Stanford University)
4 – 5 p.m.: Panel Discussion

Friday, Apr. 4th, 2025

9 – 9:30 a.m.: Coffee and Check-In
9:30 – 10:30 a.m.: Towards Sequence-to-Sequence Models Without Activation Functions
    Grigorios Chrysos (University of Wisconsin-Madison)
10:30 – 11 a.m.: Break
11 a.m. – 12 p.m.: Eﬃcient Policy Optimization Techniques for LLMs
    Kianté Brantley (Harvard University)
12 – 1:30 p.m.: Lunch (on your own)
1:30 – 2:30 p.m.: Talk by
    Sewon Min (UC Berkeley)
2:30 – 3 p.m.: Break
3 – 4 p.m.: The Future of Language Models: A Perspective on Evaluation
    Swabha Swayamdipta (University of Southern California)