Price
$70-$280
The goal of this bootcamp is to equip statisticians with a working knowledge command of advanced AI tools, especially large language models (LLMs), deep learning, reinforcement learning, agents, and AI alignment. Specific topics going to be covered include:
- AI alignment statistically framed: turn safety/helpfulness/non-toxicity into measurable quantities (hypothesis tests, risk bounds, selective prediction/abstention); practice red-teaming and auditing with error control.
- Deep learning through a statistical lens: generalization under shift, robustness, calibration and when bounds meaningfully predict deployment behavior.
- Large language models (LLMs)
- AI agents: decision loops with uncertainty and feedback; risk-aware policies, auditable logs, and monitoring.
- Causal reasoning with LLMs: distinguishing predictive vs. causal targets; sensitivity analyses; documenting assumptions for defensible decisions.
- Reinforcement learning: statistically principled exploration, off-policy evaluation, confidence sets, and safety/fairness implications.
Instructors
Bang Liu (Université de Montréal), Jiancong Xiao (University of Pennsylvania), Marouane Il Idrissi (UQAM), Yanxun Xu (Johns Hopkins University) and Amirhossein Kazemnejad (Mila).
Agenda
9:00 AM – 9:20 AM – Welcome & Registration
9:20 AM – 9:30 AM – Opening Remarks
9:30 AM – 11:00 AM – Theory Sessions
11:00 AM – 11:30 AM – Coffee Break
11:30 AM – 12:15 PM – Theory Session
12:15 PM – 2:00 PM – Lunch Break (not included)
2:00 PM – 3:30 PM – Practical Workshops
3:30 PM – 4:00 PM – Coffee Break
4:00 PM – 4:45 PM – Practical Workshop
4:45 PM – 6:00 PM – Networking Cocktail (Monday only)
Each day of the bootcamp is built around a specific theme. Discover the detailed description below!
MONDAY, MAY 4, 2026 – 9:00 AM to 4:45 PM, followed by a networking activity
Enhancing Statistical Research with Task-Specific LLMs and Structured Agents
Instructor – Yanxun Xu, Johns Hopkins University
This short course focuses on how modern generative AI systems can be used as practical, statistically grounded tools in biomedical research workflows. The lectures introduce core AI methods — including retrieval-augmented generation (RAG), task-specific adaptation of large language models (LLMs), and structured-output LLM agents — and demonstrate how they apply to biomedical literature synthesis, clinical trial data extraction, cohort characterization from electronic health records, and downstream statistical analyses such as survival analysis and evidence synthesis.
The course includes guided, end-to-end examples of AI-assisted biomedical data workflows.
Software
- Python (primary)
- Google Colab notebooks (Google account required)
- Optional use of R for downstream statistical modeling and visualization
Compute
- No GPUs required
- All examples run on standard laptops using API-based models
Prerequisites
- Basic proficiency in Python (or R) for data analysis
TUESDAY, MAY 5, 2026 – 9:00 AM to 4:45 PM
Foundations and Applications of Agentic Intelligence
Instructor – Bang Liu, Université de Montréal
This course provides a comprehensive overview of intelligent agents built upon large language models (LLMs), framed within a modular, brain-inspired architecture that integrates insights from cognitive science, neuroscience, and computational research.
The course is structured in four parts:
- Modular foundations of intelligent agents, including memory, world modeling, reward systems, and emotion-like mechanisms.
- Self-enhancement and adaptive evolution, covering continual learning, automated optimization, and LLM-driven improvement strategies.
- Collaborative and multi-agent systems, exploring collective intelligence and social dynamics among interacting agents.
- Safety, robustness, and alignment, addressing security threats, ethical considerations, and trustworthy deployment.
The course identifies key research challenges and opportunities in building adaptive and socially aligned intelligent agents.
Software
- Python
Compute
- Standard laptop sufficient
- No GPU required (unless optional experimentation is pursued)
Prerequisites
- Basic familiarity with Python
- General understanding of machine learning concepts
WEDNESDAY, MAY 6, 2026 – 9:00 AM to 4:45 PM
Teaching Language Models to Reason with Reinforcement Learning from Scratch
Instructor – Amirhossein Kazemnejad, Mila
This course explores how reinforcement learning (RL) can unlock emergent reasoning abilities in large language models without human-annotated reasoning traces. The course covers the full training pipeline, including:
- Group Relative Policy Optimization (GRPO) and its connection to policy gradient methods
- Reward design using rule-based signals for verifiable tasks
- KL-regularized optimization to control policy drift
- Emergent reasoning behaviors observed during RL training
Participants will implement a complete RL training loop from scratch in a guided Jupyter notebook, covering prompt construction, reward design, advantage computation, and policy gradient updates.
Software
- Python
- PyTorch
- Hugging Face Transformers
- DeepSpeed
- vLLM
Compute
- GPU required (Google Colab A100 or T4 recommended)
- Pre-computed checkpoints provided as fallback
Prerequisites
- Proficiency in Python and PyTorch
- Familiarity with gradient-based optimization
- No prior reinforcement learning experience required
THURSDAY, MAY 7, 2026 – 9:00 AM to 4:45 PM
From Shapley-type Attributions to Concepts: Interpreting tabular, image, and text models
Instructor – Marouane Il Idrissi, UQAM
This course introduces practical approaches for interpreting black-box predictive models after training, with an emphasis on quantifying feature influence. We begin with attribution methods for tabular data grounded in cooperative game theory (e.g., Shapley values) and show how these tools extend beyond simple prediction decompositions.
The course then expands to modern vision and language models, where the notion of “feature” is less direct. We introduce concept-based explainability methods that interpret predictions through human-meaningful concepts (e.g., visual patterns, semantic attributes, topics) rather than raw pixels or tokens. We discuss how concept-based methods complement attribution techniques and support model assessment in high-dimensional settings.
Hands-on sessions include guided case studies across tabular, image, and text data.
Software
- Python
- Google Colab notebooks (Google account required)
Compute
- Standard laptop sufficient
- No GPU required
Prerequisites
- Basic proficiency in Python (ability to read and understand code)
FRIDAY, MAY 8, 2026 – 9:00 AM to 4:45 PM
Statistical and Mathematical Foundations for Human Preference Alignment in LLMs
Instructor – Jiancong Xiao, University of Pennsylvania
This course examines the statistical and mathematical foundations of aligning large language models with diverse human preferences.
The course is organized into four parts:
- Overview of human preference alignment and algorithms such as RLHF, DPO, NLHF, and related variants.
- Statistical assumptions underlying preference modeling, focusing on the Bradley–Terry (BT) model and its algorithmic biases, including risks of minority preference collapse.
- Fundamental statistical limits of preference alignment, including impossibility results related to preserving minority preferences.
- Connections to social choice theory, presenting a unified framework for alignment satisfying social choice axioms.
Software
- Python
- PyTorch
- OpenRLHF
Compute
- GPU recommended for practical demonstrations
- Colab GPU or equivalent environment
Prerequisites
- Proficiency in Python
- Basic knowledge of probability and statistics
- Familiarity with machine learning concepts
Your attendance throughout the week is required.
You will need to bring your personal laptop computer to participate in the training.
