Accelerometer to GRF Prediction

Predicting vertical ground reaction force (vGRF) from a single lower-back accelerometer during countermovement jumps using a dual Functional Principal Component Analysis (FPCA) representation and a simple MLP.

Key Finding

A simple MLP (~12K parameters) with FPC representation outperforms a transformer (~750K parameters). Signal representation, not model architecture, is the primary determinant of prediction quality.

Results (5-run validation, FPC-MLP with resultant input)

Dataset	Signal R²	JH RMSE (rRMSE)	PP RMSE (rRMSE)
No arm swing (72 val. trials)	0.976 ± 0.002	4.2 ± 0.4 cm (10.3%)	4.2 ± 0.4 W·kg⁻¹ (9.1%)
Arm swing (64 val. trials)	0.964 ± 0.001	5.3 ± 0.5 cm (10.6%)	5.6 ± 0.4 W·kg⁻¹ (10.3%)
Both conditions (144 val. trials)	0.970 ± 0.001	4.3 ± 0.2 cm (9.8%)	4.3 ± 0.2 W·kg⁻¹ (8.8%)

JH = jump height; PP = peak power; rRMSE = relative RMSE (% of ground truth mean). Validated on 663 jumps from 67 participants with participant-level train/validation splits.

Overview

This project maps accelerometer signals to GRF curves, enabling force plate-quality biomechanical metrics (jump height, peak power) from a single wearable sensor. The approach uses:

Functional Principal Component Analysis (FPCA) to represent both input (ACC) and output (GRF) signals as low-dimensional score vectors
A simple MLP (single hidden layer) to learn the mapping between FPC scores
Resultant acceleration input which is invariant to sensor orientation and outperforms triaxial input for the FPC representation

Project Structure

acc2grf_prediction/
├── data/
│   ├── cmj_dataset_arms.npz    # Arms condition dataset (generated, not tracked)
│   ├── cmj_dataset_noarms.npz  # No-arms condition dataset (generated, not tracked)
│   └── cmj_dataset_both.npz    # Both conditions dataset (generated, not tracked)
├── src/
│   ├── __init__.py           # Package initialization
│   ├── attention.py          # Multi-head self-attention (legacy)
│   ├── transformer.py        # Transformer model (legacy)
│   ├── mlp.py                # MLP model (recommended)
│   ├── transformations.py    # FPCA and B-spline transforms
│   ├── data_loader.py        # Data loading and preprocessing
│   ├── visualize_data.py     # Data inspection and debugging plots
│   ├── biomechanics.py       # Jump height and peak power calculations
│   ├── losses.py             # Custom loss functions
│   ├── evaluate.py           # Model evaluation metrics
│   └── train.py              # Training script with CLI
├── scripts/
│   ├── prepare_dataset.py     # Preprocessing: MATLAB → .npz
│   ├── inspect_data_quality.py  # Accelerometer quality checks
│   ├── run_all_experiments.sh   # Full experiment suite
│   ├── run_resultant_experiments.sh  # Resultant-only experiments
│   ├── visualize_projection.py  # FPC projection matrix visualization
│   └── visualize_random_samples.py  # ACC/GRF signal grid visualization
├── results_noarms/            # Reported results: no arm swing condition
├── results_arms/              # Reported results: arm swing condition
├── results_both/              # Reported results: both conditions
├── notebooks/
│   └── visualise_predictions.ipynb
├── requirements.txt
├── EXPERIMENTS.md            # Detailed experiment log
├── EXCLUSIONS.md             # Data quality analysis and exclusions
└── README.md

Installation

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Data Preparation

The dataset must be generated from the original MATLAB source files before training:

python scripts/prepare_dataset.py --data-dir /path/to/matlab/files

This extracts the specified condition, merges duplicate participants, applies quality filters (including exclusion of trials with sensor miscalibration and ADC clipping — see EXCLUSIONS.md), and saves a portable .npz file. Generate all three condition datasets with:

python scripts/prepare_dataset.py --conditions arms --output data/cmj_dataset_arms.npz
python scripts/prepare_dataset.py --conditions noarms --output data/cmj_dataset_noarms.npz
python scripts/prepare_dataset.py --conditions both --output data/cmj_dataset_both.npz

The source MATLAB files (AccelerometerSignals.mat, GRFFeatures.mat, processedjumpdata.mat) are not included in this repository.

To inspect accelerometer data quality before generating the dataset:

python scripts/inspect_data_quality.py --data-path data/cmj_dataset_both.npz

To skip quality exclusions and use all available trials:

python scripts/prepare_dataset.py --no-exclude-quality

Contents	Shape	Units
ACC signals	Variable length × 3	g (triaxial, 250 Hz)
GRF signals	Variable length	BW (body weight, 1000 Hz)
Subject IDs	(N,)	0-indexed, 67 unique
Jump height	(N,)	metres
Peak power	(N,)	W/kg

Usage

Recommended Configuration (Best Results)

python -m src.train \
    --data-path data/cmj_dataset_noarms.npz \
    --model-type mlp --mlp-hidden 128 \
    --input-transform fpc --output-transform fpc \
    --loss reconstruction \
    --simple-normalization \
    --n-trials 5 --seed 42 \
    --epochs 200

This uses the resultant acceleration (default) and achieves signal R² = 0.976, JH RMSE = 4.2 cm, PP RMSE = 4.2 W·kg⁻¹.

Data Visualization (Recommended First Step)

Before training, verify your data loading is working correctly:

python -m src.visualize_data

This generates diagnostic plots in outputs/figures/ and runs sanity checks.

FPC Projection Visualization

Visualize how ACC functional principal components map to GRF components:

python scripts/visualize_projection.py --data-path data/cmj_dataset_noarms.npz --n-display 3 --top-k 3 --output-dir outputs/projection_viz

This generates three complementary figures:

Figure	Description
`projection_combined.png`	Overview showing ACC eigenfunctions, projection matrix heatmaps, and GRF eigenfunctions
`projection_contributions.png`	Per-GRF-FPC breakdown showing top contributing ACC components with weights
`biomechanics_fpc.png`	Traditional biomechanics-style visualization with mean ± 2SD bands

The biomechanics figure shows GRF FPCs in the top row with their top-k ACC contributors below, using the traditional mean ± standard deviation representation. This reveals how acceleration patterns during different movement phases (quiet standing, unweighting, braking, propulsion) contribute to force production.

Arguments:

Argument	Default	Description
`--n-display`	3	Number of FPC components to display
`--top-k`	3	Number of top ACC contributors per GRF FPC
`--n-components`	15	Total FPC components for transformation
`--output-dir`	outputs/projection_visualization	Output directory
`--dpi`	150	Figure resolution

Training Arguments

Argument	Default	Description
`--data-path`	required	Path to dataset .npz file (e.g. `data/cmj_dataset_both.npz`)
`--model-type`	transformer	Model type: `mlp` (recommended) or `transformer`
`--mlp-hidden`	128	MLP hidden layer size
`--use-triaxial`	False	Use 3D acceleration (default resultant is recommended for FPC)
`--input-transform`	raw	Input transform: `raw`, `bspline`, or `fpc`
`--output-transform`	raw	Output transform: `raw`, `bspline`, or `fpc`
`--loss`	mse	Loss function: `mse`, `reconstruction`, `signal_space`
`--simple-normalization`	False	Use global z-score (recommended: True)
`--n-trials`	1	Number of trials for statistical validation
`--epochs`	100	Maximum training epochs
`--batch-size`	32	Training batch size
`--learning-rate`	1e-4	Adam learning rate
`--patience`	15	Early stopping patience
`--output-dir`	outputs	Output directory
`--run-name`	timestamp	Experiment name

Running the Full Experiment Suite

To run all 12 experiments (4 representations × {Transformer, MLP} × {triaxial, resultant}) for a given dataset condition:

bash scripts/run_all_experiments.sh arms

Valid conditions are arms, noarms, or both (default). To run all three conditions unattended (e.g. overnight), use MPLBACKEND=Agg to prevent matplotlib from opening figure windows:

nohup env MPLBACKEND=Agg bash -c \
  'bash scripts/run_all_experiments.sh arms && \
   bash scripts/run_all_experiments.sh noarms && \
   bash scripts/run_all_experiments.sh both' \
  > overnight_run.log 2>&1 &

Monitor progress with tail -f overnight_run.log. Results are saved to results_<condition>/.

Model Architecture

Recommended: MLP with FPC Transforms

The best-performing architecture is surprisingly simple:

ACC signal (500×1) → FPCA → FPC scores (15) → MLP → FPC scores (15) → Inverse FPCA → GRF signal (500×1)

MLP Architecture:

Input: 15 features (15 FPCs for resultant acceleration)
Hidden: 128 neurons with ReLU activation
Output: 15 features (15 FPCs for GRF)
Parameters: ~12K

Why MLP beats Transformer:

FPC representation does the heavy lifting — the mean function captures the typical CMJ shape, so the model only learns deviations
Attention adds no value — the mapping from ACC FPCs to GRF FPCs doesn't benefit from temporal attention
Simpler models generalize better with limited data (~277 training samples)

Legacy: Transformer Architecture

The transformer architecture (~750K parameters) is still available but not recommended:

Input Projection: Linear layer mapping input dimension to d_model
Positional Encoding: Learnable position embeddings for 500 timesteps
Encoder Stack: N transformer encoder blocks with multi-head self-attention
Output Projection: Linear layer mapping d_model to output dimension

Data

Dataset

Source: Vertical countermovement jumps (arms swing and no arm swing conditions)
Participants: 67 unique (after exclusion of clipped and miscalibrated trials)
Jumps: 663 total (split across three dataset conditions: no arm swing, arm swing, both)

Input Format

Accelerometer: Lower back sensor, triaxial (x, y, z) in g units at 250 Hz
Signal Mode: Resultant acceleration √(x² + y² + z²) or raw triaxial
Preprocessing: Padded/truncated to 500 points (2000 ms pre-takeoff), z-score normalised

Output Format

GRF: Vertical ground reaction force in body weight (BW) units at 1000 Hz (downsampled to 250 Hz)
Preprocessing: Already BW-normalised in source data, z-score normalised

Data Splits

Participant-level train/validation split (no data leakage)
Default: 80% train, 20% validation

Evaluation Metrics

Signal-Level

RMSE (Root Mean Square Error)
MAE (Mean Absolute Error)
R² (Coefficient of Determination)

Biomechanical Metrics

Derived from predicted GRF using impulse-momentum method:

Jump Height: Computed via double integration of net force
Peak Power: Maximum instantaneous power (F × v)

Both metrics compared against ground truth with RMSE, MAE, R², and Bland-Altman analysis.

Output Files

After training, the following files are generated:

outputs/<run_name>/
├── config.json              # Training configuration
├── data_info.json           # Dataset statistics
├── evaluation_results.csv   # All metrics in CSV format
├── checkpoints/
│   ├── best_model.keras     # Best validation model
│   ├── final_model.keras    # Final epoch model
│   └── training_log.csv     # Epoch-by-epoch metrics
└── figures/
    ├── prediction_curves.png  # Predicted vs actual GRF (5 samples)
    ├── prediction_grid.png    # Compact 5x6 grid of predictions
    ├── scatter_metrics.png    # Jump height/power scatter
    ├── bland_altman.png       # Agreement analysis
    └── training_history.png   # Loss curves

Python API

from src.data_loader import CMJDataLoader
from src.mlp import build_mlp_model
from src.evaluate import evaluate_model, print_evaluation_summary

# Load data with FPC transforms (recommended)
loader = CMJDataLoader(
    data_path='data/cmj_dataset_noarms.npz',
    input_transform='fpc',
    output_transform='fpc',
    simple_normalization=True
)
train_ds, val_ds, info = loader.create_datasets()

# Build and train MLP model
model = build_mlp_model(
    input_dim=info['input_dim'],  # 15 for resultant FPC
    output_dim=info['output_dim'],  # 15 for GRF FPC
    hidden_dim=128
)
model.fit(train_ds, validation_data=val_ds, epochs=200)

# Evaluate
results = evaluate_model(model, X_val, y_val, loader)
print_evaluation_summary(results)

Key Insights

From extensive experimentation (see EXPERIMENTS.md):

Representation matters more than architecture: A simple MLP with 12K parameters outperforms a 750K-parameter transformer
FPC representation is the key: Functional Principal Components capture biomechanically relevant features that raw signals and B-splines miss
Resultant acceleration is sufficient: The acceleration magnitude, invariant to sensor orientation, outperforms triaxial input for the FPC representation
Simple normalization works best: Global z-score outperforms sophisticated robust normalization
The mapping is approximately linear and sparse: Each GRF FPC is driven by one or two ACC FPCs, explaining why a simple MLP suffices

Why FPC Works

Mean function captures the template — the typical CMJ shape is encoded; the model only learns deviations
Variance-ordered components naturally weight importance — the leading FPCs capture the aspects most relevant to jump performance
Massive dimensionality reduction: 15 FPCs vs 500 raw samples
Implicitly addresses MSE limitations — errors concentrate on biomechanically relevant variation rather than the quiet-standing phase

Requirements

Python 3.9+
TensorFlow 2.10+
NumPy
SciPy (for MATLAB file loading during data preparation)
scikit-fda (for FPCA transforms)
Matplotlib
scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
data		data
notebooks		notebooks
results_arms		results_arms
results_both		results_both
results_noarms		results_noarms
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
EXCLUSIONS.md		EXCLUSIONS.md
EXPERIMENTS.md		EXPERIMENTS.md
README.md		README.md
accel_grf_transformer_project.md		accel_grf_transformer_project.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Accelerometer to GRF Prediction

Key Finding

Results (5-run validation, FPC-MLP with resultant input)

Overview

Project Structure

Installation

Data Preparation

Usage

Recommended Configuration (Best Results)

Data Visualization (Recommended First Step)

FPC Projection Visualization

Training Arguments

Running the Full Experiment Suite

Model Architecture

Recommended: MLP with FPC Transforms

Legacy: Transformer Architecture

Data

Dataset

Input Format

Output Format

Data Splits

Evaluation Metrics

Signal-Level

Biomechanical Metrics

Output Files

Python API

Key Insights

Why FPC Works

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages