Decoding Cognition Part II: Brain Foundation Models

In the previous article, we spoke at lengths about AETHER’s forays into cognitive monitoring and quantifying cognitive states, as well as our experiments with identifying optimal electrode placements for varying tasks.

However, what we’ve realised in that quest is that neuroscience is a deeply technical field and modern research still doesn’t know enough about the brain, let alone us as individuals new to this field with zero domain expertise. Believe us: scientific papers are hard enough to read and we question life after going through them.

So it is better that we reach out to experts that (we hope) know what they are doing.

Open Table of contents

Connecting Across the Ecosystem
Key Problem: Experimentation Fatigue
Brain Foundation Models
A Summary
References

Connecting Across the Ecosystem

Throughout the course of the past year, AETHER has reached out to researchers across various Singaporean institutes, including:

SUTD’s Professors Yow Wei Quin and Tan U-Xuan
NTU’s Centre for Brain Computing Research (CBCR)
Singapore Poly’s (SP) Centre of Excellence in Maritime Safety (CEMS)

These researchers possess prior experience and knowledge in this domain, having worked on projects in working this sort of brain research before.

Key Problem: Experimentation Fatigue

To develop technology or AI models that can actually monitor and understand the behavioural patterns of such operators, we need to collect a lot of domain-specific data, largely via trials.

However, endless cycles of trials with new sensors is not a pragmatic way to go: we need a solution that can adapt to the quick-moving sensor technology, wherein we will keep finding cheaper, faster and better sensors.

We’ve already been collecting (and throwing away) old datasets because they were rendered irrelevant when sensors were replaced. It’s an important exploration: could we find a way to end this never-ending nightmare of collecting, curating and throwing away data?

When Aloysius first raised the problem to his SUTD collaborator, the latter felt that a conceptual “brain model” (something that can model brain activity accurately and efficiently) was a research domain that was several years away and quickly perished the idea.

However, we felt that the idea had legs.

When we actually sat down and researched further on the key enabling technology being developed in this domain, we found something rather interesting. In the past few years, computational neuroscience and cognitive sciences have slowly been catching up to major developments in AI over the past decade, slowly adopting various technologies such as variational autoencoders, diffusion learning, and most recently, foundation models. This work has led to the development of what we now call Brain Foundation Models.

Brain Foundation Models

The promise of brain foundation models, or BFMs for short, is simple: generically modelling the relationship of various input channels requires a large amount of raw data. Be it drivers on a simulator, or gamers playing Atari, be it athletes playing memory games, or console-based operators carrying out their daily operations, we can collect data from all across this domain, from sensors far and wide, and feed them all into a foundation model in a process known as pre-training. This foundation model then learns to find generic relationships in the data that normal models trained to only accept specific inputs simply cannot.

Pre-training pipeline

Pre-training pipeline: we use raw EEG data recorded from various experiments, and feed them into a Foundation Model, which is usually an Encoder-Decoder system. The middle value, B, is an intermediary representation which we consider highly important for our studies, as it represents the EEG signal generically.

Foundation models are able to generalize to various EEG waveforms, and they are largely able to find relationships across both the spatial and temporal fields. What this means is that for an array of input channels from various EEG electrodes, the model identifies spatial features by analysing the data across various channels for the same timestamp. The model can also similarly identify temporal features by analysing individual channels across timestamps.

CBraMod

Spatiotemporal feature extraction: This is an example of a block from CBraMod, a recently introduced BFM that strives to analyze both the spatial and temporal level information by using their S-Attention and T-Attention methods. This uses the attention method popularised by Vaswani et al, and applies it on both the temporal (horizontally) and spatial (vertically) level.

In fact, for downstream tasks like cognitive load monitoring, or imagined speech decoding, a foundation model is incredibly useful as it is able to generically provide insights which we can use by using a foundation model to convert input signals into a “vector” representation that is more representative of the various relationships. You can think of this vector as describing the brain activity, and we can train downstream models based on a method known as supervised fine-tuning (SFT), where we train models to learn key relationships for specific tasks.

SFT pipeline

SFT pipeline: we assemble clean Question-Answer pairs specific to our use case and feed it to the foundation model in order to require it to generate one of the following items: (1) Next value in EEG sequence for forecasting; (2) Label for EEG-based classification; (3) Cluster for EEG-based clustering; (4) Missing values in EEG array for EEG imputation.

This approach can potentially promise to solve that same problem of experimental fatigue, as we can generically identify relations across different sets of sensors, teaching the model not to generalise based on the input ranges of values, but based on the general waveform. Hence, the baseline foundation model is standardised and, even if more knowledge is introduced with newer sensors, we can simply alter the foundation model using a method known as continual pre-training (CPT), which allows us to provide more information for knowledge expansion. However, we can potentially expect fine-tuned models to remain consistent as we expect the final vector representation from the model to be standardised even after CPT is performed.

It was fortuitous that while we were in the midst of exploring BFMs, there was a parallel development that turned out to be opportunistically timely. Microsoft had invited RAiD, and through that AETHER, to their office in Singapore to attend a course on the development and finetuning of LLMs. This was a unique opportunity as no other cloud provider offered this sort of workshop. (Shoutout to Microsoft!) Many of the terms we have stated in the past few paragraphs, such as pre-training, SFT and CPT, are terms we gathered from this course.

Microsoft Visit

RAiD's delegation to Microsoft's Office for the Build-your-LLM Course. We cannot include pictures of the slides as they are under NDA and we want to avoid litigation 😳

A Summary

From this whole experience, here are a few tidbits of information worth remembering.

Firstly, as we move towards building such technologies, getting perspectives from knowledgeable individuals is crucial in developing technologies that push the frontier of research. Even then, we learnt that experts don’t hold all the answers, especially with the technology landscape evolving so quickly. Consult but also independently verify: we learnt that we must do our own homework and research to make sure we don’t fall behind. Not all research is spelled out for us, and it is ultimately our job to understand how these systems work.

Secondly, we are aware of experimentation fatigue, and we are divinely discontent with getting locked into a cycle of collecting, curating and throwing away data over the years. We are digging deep into the sciences to figure out how to better address this.

Thirdly, finding and seizing opportunities is incredibly important. Microsoft’s timely course on building LLMs from scratch helped us further understand BFMs at an algorithm-construction level.

Next, catch us trying to explain our brain foundation models.

References

1. Ref 1
2. Ref 2
3. Ref 3
4. Ref 4