Power-efficient VO2max prediction with a TCN-SNN model

Think of this as a simplified VO2max engine behind a sport watch experience: a Garmin or Apple Watch style device reads exercise signals such as speed, heart rate, breathing rate, and ventilation, then estimates aerobic fitness. This project keeps that idea small enough for low-power devices by using sparse spike signals and pruning redundant model weights.

Read signals exercise data becomes 60-second windows

Spike only when needed SNN neurons fire compact 0/1 events

Shrink the model pruning and 8-bit weights reduce cost

Watch the workflow See compression results

What the project does

Estimate aerobic fitness without a bulky lab setup

VO2max is a strong marker of aerobic capacity, but direct lab measurement is expensive and inconvenient. This project uses treadmill test signals and subject metadata to train a compact temporal model that can run closer to a wearable device.

Input signals

Speed, HR, RR, VE, age, weight, height, and sex are sampled into 60-second windows.

Temporal model

Dilated TCN blocks extract local and longer-range exercise dynamics from each window.

Spiking layer

Two LIF-style SNN layers integrate current, leak membrane potential, and emit sparse spikes.

Deployment path

Magnitude pruning and 8-bit quantization reduce model cost with only a small accuracy drop.

Data

Merge and clean

Join subject information with treadmill measurements by ID_test, remove missing VO2/VCO2/VE/HR rows, and encode sex as a numeric feature.

60s

Create windows

Interpolate each test to 1 Hz, slice 60-second sequences, and use 50% overlap to preserve temporal continuity.

TCN

Read time context

Four Conv1d blocks use dilation 1, 2, 4, and 8 to widen the receptive field while keeping the sequence length.

SNN

Spike and read out

Spiking layers convert the TCN output into a sparse time code, average it, then predict VO2 and VO2max with two heads.

Surrogate gradient implementation

Why the spiking layer can still be trained

A spiking neuron outputs only 0 or 1. That is efficient for a wearable device, but it creates a training problem: a hard 0/1 step has no useful normal gradient. The workaround is to keep the hard spike in the forward pass, then use a smooth surrogate slope during backpropagation so the model can learn.

Formula source: the project report section "Surrogate Gradient Implementation". The report cites Li et al. (2021), "Wearable-based Human Activity Recognition with Spatio-Temporal Spiking Neural Networks", for the surrogate-gradient idea, and Liu and Wang (2001) for the LIF spiking-neuron background.

1. Spike decision

s^(t+1) = 1, if v^{(t+1, pre)} > V_th 0, otherwise

The neuron fires only when membrane potential passes the threshold.

2. Surrogate slope

∂s^(t) / ∂v^{(t, pre)} = max(0, 1 - |v^{(t, pre)} / V_th - 1|)

Training uses this triangular slope near the threshold instead of a zero gradient.

3. Implementation form

∂s^(t) / ∂v^{(t, pre)} = 1 - |v^{(t, pre)} / V_th - 1|

The report uses this simplified linear form inside the active surrogate region.

Interactive demo

Follow one physiological window through the model

The canvas turns the project workflow into a readable animation: raw treadmill signals become overlapping windows, the TCN scans the sequence, the SNN emits sparse spikes, then compression removes redundant weights.

Ramp test

The workout gets harder step by step, like a treadmill test where speed rises until the athlete is near max effort.

Steady effort

The athlete stays at a mostly stable pace. Signals change slowly, so the model sees a smoother pattern.

Interval effort

Hard and easy segments alternate. The model must react to quick jumps in heart-rate and breathing signals.

Phase Speed

Current phase

Merge and clean signals

Subject metadata and treadmill measurements are joined by ID_test, cleaned, sorted, and converted into numeric features.

window: 60 s
features: 8
model shape: B x 60 x 8

Data
Merge, clean, sort, and normalize physiological measurements.
Window
Interpolate to 1 Hz and create 60-second windows with 50% overlap.
TCN + SNN
Extract temporal patterns, integrate membrane potential, and emit spikes.
Compress
Prune low-magnitude weights, fine-tune, then quantize to 8-bit levels.
Output
Predict VO2 and VO2max, then compare against true lab measurements.

Results

Compression keeps most of the predictive signal

The model remains stable after 70% pruning and 8-bit quantization. The largest change is a modest VO2 RMSE increase, while VO2max RMSE changes by only 10 ml/min.

VO2 R2 after compression 0.878

Baseline was 0.893, a 1.5% absolute drop.

VO2max R2 after compression 0.579

Baseline was 0.596, a 1.7% absolute drop.

Pruning target 70%

Global magnitude pruning over Conv1d and Linear weights.

Data & methodology

Sources and references

This demo is built on a published research dataset and an established backpropagation method for spiking neural networks.

Source dataset. Treadmill maximal-exercise time-series come from the Exercise Physiology and Human Performance Lab of the University of Málaga, published on PhysioNet by Mongin, García Romero, and Álvero Cruz (2021). Treadmill maximal exercise tests from the Exercise Physiology and Human Performance Lab of the University of Malaga, version 1.0.1. DOI 10.13026/7ezk-j442 (RRID: SCR_007345).

Surrogate-gradient training for SNNs. Backpropagation through the non-differentiable spike emission follows the surrogate-gradient approach described in Li et al. (2021), “Wearable-based human activity recognition with spatio-temporal spiking neural networks”, arXiv:2108.05047. arxiv.org/abs/2108.05047