Lifelong Learning Systems: Next Generation Machine Learning Systems that can continually learn after deployment

Cette page dédiée présente l’un des thème / domaine stratégique actuellement en discussion dans le cadre de notre Programme de financement de recherche stratégique. L’ensemble des thèmes / domaines en discussion est indiqué sur la page du programme. Chaque page dédiée (y compris celle-ci) peut être à un niveau de détail et de maturité variable. Pour participer à la discussion sur ce thème ou en proposer de nouveaux, veuillez utiliser ce formulaire. Si vous souhaitez être tenu.e au courant des développements autour de ce thème / domaine, inscrivez-vous ci-dessous.

Lifelong Learning Systems: Next Generation Machine Learning Systems that can continually learn after deployment

Description et justification du domaine

Significance of the theme: Designing systems that can learn online (after deployment) and handle continual distribution shifts is currently one of the biggest challenges in machine learning. Lifelong learning is the most promising learning paradigm to achieve this goal. This is a fundamental research theme which has an immediate and significant impact in every application of machine learning.

Local scientific expertise: Mila, the Quebec AI Institute, is one of the leading research institutes in deep learning and hosts several world-class experts in ML. Sarath Chandar is the lead organizer of the Lifelong Learning Workshop that has been happening regularly for the last 5 years. This proposal brings together researchers from different research disciplines including lifelong learning, deep learning, reinforcement learning, software engineering, and robotics. Such a multi-pronged approach is necessary to make progress in this challenging research area.

Economic Impact: Lifelong learning will directly impact all existing machine learning applications, uplifting the technology and in turn the economy that depends on ML products. Lifelong learning will also help realize some of the previously unachievable applications like self-driving cars, automated personal assistants, personalized medicine, and real-world robotics.

(Ajout 22/07) Il est stratégique afin de préserver et conserver les investissements et les acquis après les efforts d’entraînement initiaux de modèles intelligents. Le thème vise à étudier les problèmes de réduction des efforts de ré-entrainement suite à une dérive de la distribution dans les données de production en opération. Les déclencheurs et les procédures de ré-entrainements partiel seront visés par les études.

(ajout 22/07) Québec has invested substantial amounts of money to bring AI to the point where products can be released based on ML models. However, once a product gets in the hands of users, a continuous flow of user feedback will be provided, and users will expect (demand) this feedback to lead to improved versions of the products, resolving bugs and annoyances continuously.

This is a known problem in the world of software engineering, where, until a decade ago, software providers could only respond slowly, in an ad hoc manner to their users’ gripes. The advent of continuous delivery in 2010 led to a widespread understanding, supported by tools and processes, that software should be releasable at any point in time, and that releases should be made according to a predictable, fast release cycle in order to obtain and implement user feedback faster.

Hence, one of the strategic benefits of this project exactly is to shorten the release cycle of AI/ML products by continuously retrain models in production based on user feedback, eliminating the irregular, manual intervention in training discrete versions of models. Apart from improving user satisfaction and shortening turn-around time, the quality (and safety) of the resulting products will be higher as well.

Mots-clefs :

Artificial Intelligence, Machine Learning, Deep Learning, Reinforcement Learning, Lifelong Learning, Multi-agent Systems, Software Engineering, Robotics, Self-driving Cars, Learning on Device, Federated Learning, Learning after Deployment, Continual Learning, Transfer Learning, Meta-learning, Multi-task Learning, Online Learning, Never-ending Learning, Out-of-distribution Generalization.
(Ajout 22/07) Évolution du ML, processus de versions et « releases » des modèles, amélioration continue, approches incrémentielles, approches sensibles aux séries temporelles, ML avec données sensibles au temps et aux phénomènes cycliques et temporels.
(ajout 22/07) continuous learning, continuous delivery

Organisations pertinentes :

Mila, UbiSoft, IBM, Thales, RydeSafely, Huawei, Ericsson, FlyingWhales
(ajout 22/07) Tout les partenaires industriels ou publiques ou gouvernementaux avec du ML entrainé sur des données sensibles aux variations temporelles.

Most members of the team are faculty members at Mila, the Quebec AI Institute. Collectively, the team members also already have ongoing collaborations involving applications of lifelong learning with several industry partners, including UbiSoft, IBM, Thales, RydeSafely, Huawei, Ericsson, and FlyingWhales

Personnes pertinentes suggérées durant la consultation :

Les noms suivants ont été proposés par la communauté et les personnes mentionnées ci-dessous ont accepté d’afficher publiquement leur nom. Notez cependant que tous les noms des professeur.e.s (qu’ils soient affichés publiquement ou non sur notre site web) seront transmis au comité conseil pour l’étape d’identification et de sélection des thèmes stratégiques. Notez également que les personnes identifiées durant l’étape de consultation n’ont pas la garantie de recevoir une partie du financement. Cette étape sert avant tout à présenter un panorama du domaine, incluant les personnes pertinentes et non à monter des équipes pour les programmes-cadres.

Sarath Chandar Anbil Parthipan
Bram Adams
Ettore Merlo
Laurent Charlin

(pas de programmes-cadres potentiels pour le moment)

Programmes-cadres potentiels

Deep Learning (DL) has revolutionized the fields of Artificial Intelligence (AI) and Machine Learning (ML). It has had a significant impact in several application domains including speech recognition, natural language processing, and computer vision. The dominant paradigm in DL is to learn one new model per task, which, given enough data and computational resources, can achieve superhuman performance. However, this setup is far from ideal since the model would require a large number of samples to learn each task from scratch. Humans, on the other hand, excel at learning multiple tasks and transferring the knowledge from previously learned tasks to learning new tasks faster. Also, almost all the existing ML algorithms assume that the data is independent and identically distributed (i.i.d.) and that the test data will follow the same distribution as the training data. Such systems cannot handle distribution shifts that often arise in environments where we want systems to learn after deployment. This has limited the use of ML/DL based systems in applications such as self-driving cars, drones, and robotics where learning after deployment in changing environments is essential.

In this research program, we intend to explore “lifelong learning” as a way to address these limitations and design ML systems that can continually learn even after deployment and get better at solving new tasks. This research program will focus on both the fundamental research in lifelong learning and also their impactful applications in the real world. We list below some of the major fundamental research questions and the application areas that we plan to explore.

Fundamental research questions that this program will address includes but are not limited to the following:

How to address the problem of catastrophic forgetting of previously learned tasks while learning new tasks?
How to effectively transfer knowledge from the previously learned tasks to learning and performing new tasks (forward transfer)?
How to effectively transfer knowledge from a newly learned task to improve performance on previously learned tasks (backward transfer)?
How to design optimization algorithms that can handle the non-stationarity of the non-i.i.d. lifelong learning setting?
How to design neural network architectures that can accommodate new input/output features that arise over time during deployment?
How to design lifelong reinforcement learning agents that can learn and accumulate skills over time and use them effectively to learn new tasks?
How to design agents that can learn, act and coordinate in environments with other (changing) agents?
How to design agents that acquire knowledge useful not just to solve the current task but also to learn future tasks (knowledge acquisition)?
How can we design agents that can explore safely? For example, how should a self-driving car explore? Most of the exploration methods that we use while learning in simulation are not applicable while learning in the real world.

Lifelong learning will have a huge impact in almost every industrial ML system. Based on the experience of our team, we will focus on the following non-exhaustive list of applications:

There exists a great deal of ML prediction systems in software engineering pipelines, which include systems that can predict bugs, build failures, commit failures, and test failures. Machine learning is also used in software development for code completion and code suggestions (see GitHub co-pilot project for example). The data in this setting is definitely not i.i.d. and the input distribution shifts over time depending on the stage of the software development. The typical solution in such settings is to retrain the system every few weeks and then just use the system for inference until the next system update. In collaboration with our industrial partner UbiSoft, we will focus on applying the lifelong learning techniques to design prediction systems that can continually learn in an online fashion and handle the distribution shifts.
Self-driving cars is a classic example where most of the standard ML algorithms fail to generalize to out-of-distribution scenarios. In self-driving cars, it is not possible to enumerate every possible scenario that could result from things like changing road, traffic, and weather conditions that the car would face and hence generalizing to new scenarios is essential. We expect lifelong reinforcement learning to be the key tool in addressing this problem.
Industrial robots and mobile robots have to work in environments that change constantly and hence generalizing to new scenarios is again essential. Most of these systems work with planning instead of learning since the standard learning setups cannot handle such distribution shifts. Applying lifelong learning should improve the performance of these systems.
In safety critical sectors such as aerospace, it is very difficult to use RL algorithms because they are too fragile and not robust. A lifelong learning agent would be robust in such real life situations and hence make it feasible to design self-driving planes.
With the emergence of Internet of Things (IoT) and federated learning, there has been a growing interest in learning on devices (edge intelligence). This is also important when privacy is a concern. While learning on device, the data distribution is not i.i.d. and the online learning problem that they face can be efficiently handled with lifelong learning.