AI and Neuroscience

Foundation models for neuroscience.

Vision

In neuroscience, there is a growing recognition that we need to develop foundation models for analyzing neural and behavioural data. Indeed, neuroscience data presents an ideal use case for foundation models. First, neural activity and behaviour are very complex. Second, neural and behavioural data are diverse and there are many different modalities of neural and behavioural data. By training a foundation model in a self-supervised manner on large collections of diverse multimodal neural and behavioural datasets, the model could be fine-tuned for specific applications, such as detection of sleep disorders, treatment outcome prediction for epilepsy, diagnosis of mental health disorders, among others.

Objectives

Build the world’s first multimodal foundation model for neuroscience.
Harness the potential of big-data approaches to transform neuroscience.
Lay the groundwork that is necessary to build these foundation models.

Research Axes

Axis 1: Create tools for building multimodal foundation models on neuroscience data

The goal is to provide the neuroscience community with a code base and toolsets required for building and evaluating multimodal foundation models on neuroscience data according to best practices. To ensure that the tools get adopted, they will be released openly with well-established documentation.Regroupement R1 envisions a library of tools that will come to be the de facto go-to for creating foundation models of neuroscience data worldwide.

Objective 1. Build tokenization modules for ingesting diverse neuroscience data.
Objective 2. Create modules for conditioning on stimuli and behaviours.
Objective 3. Develop simulated data for verification of pre-trained models.
Objective 4. Scaling law estimation for neuroscience data.

Axis 2: Identify inductive biases that improve learning on diverse data

The goal is to identify inductive biases, i.e., model designs that incorporate appropriate domain knowledge, to reduce the amount of data required for effective inference. These inductive biases can, and should, be informed by our existing knowledge of the brain. Identifying appropriate inductive biases could not only reduce the amount of data required to train the models, it could also tell us when we have captured something real about the brain, thus supporting the creation of more brain-like models.

Objective 1. Architectures and loss functions best suited for neuroscience data.
Objective 2. Data augmentations for improving data efficiency.
Objective 3. Determine if domain knowledge can alter scaling laws for neural data.

Axis 3: Build and release a multimodal foundation model on neural and behavioural data

The goal is to provide the first fully open, multimodal foundation model for neuroscience research. We will engage in full transparency in the data ingested by the model and the model building process. We will release the model parameters and code for the wider community to use.

Objective 1. Organize data and determine the appropriate model size.
Objective 2. Obtain computational resources to build the model.
Objective 3. Build and release a multimodal foundation model for neuroscience.

Challenges

One of the challenges for the neuroscience community in building foundation models is that a lot of the technical groundwork is outside of the expertise of neuroscience labs. Though neuroscientists have rich expertise in analyzing their data, their understanding of how to build large neural network models pretrained in a self-supervised manner is very limited. There are numerous considerations and “tricks-of-the-trade” for self-supervised learning that few neuroscience labs are familiar with. Also, building and maintaining expansive, well-tested code bases for building such models is not something individual neuroscience labs have any incentive towards.

Though there is more and more data becoming available for training models, each individual dataset is small, and contains unique data modalities and labels. Accessing varied, high quality datasets may be time-consuming. If we could train a foundation model in a self-supervised manner on large collections of diverse, multimodal, neural and behavioural datasets, independent of the primary purpose for which these data sources were acquired in the first place, then we should subsequently be able to do fine-tuning for specific applications.

Anticipated Impact

The code-base and tools will have a huge impact on the neuroscience landscape.
The final foundation model will have a potentially huge impact on neuroscience research.
The central technical deliverables will be four specific tools for the neuroscience. community to use, a collated multimodal dataset, and a multimodal foundation model.
This work will help to establish and reinforce Quebec’s position as a world-leader in cutting-edge neuroscience.