Predicting vertical ground reaction force (vGRF) from a single lower-back accelerometer during countermovement jumps using a dual Functional Principal Component Analysis (FPCA) representation and a simple MLP.
A simple MLP (~12K parameters) with FPC representation outperforms a transformer (~750K parameters). Signal representation, not model architecture, is the primary determinant of prediction quality.
| Dataset | Signal R² | JH RMSE (rRMSE) | PP RMSE (rRMSE) |
|---|---|---|---|
| No arm swing (72 val. trials) | 0.976 ± 0.002 | 4.2 ± 0.4 cm (10.3%) | 4.2 ± 0.4 W·kg⁻¹ (9.1%) |
| Arm swing (64 val. trials) | 0.964 ± 0.001 | 5.3 ± 0.5 cm (10.6%) | 5.6 ± 0.4 W·kg⁻¹ (10.3%) |
| Both conditions (144 val. trials) | 0.970 ± 0.001 | 4.3 ± 0.2 cm (9.8%) | 4.3 ± 0.2 W·kg⁻¹ (8.8%) |
JH = jump height; PP = peak power; rRMSE = relative RMSE (% of ground truth mean). Validated on 663 jumps from 67 participants with participant-level train/validation splits.
This project maps accelerometer signals to GRF curves, enabling force plate-quality biomechanical metrics (jump height, peak power) from a single wearable sensor. The approach uses:
- Functional Principal Component Analysis (FPCA) to represent both input (ACC) and output (GRF) signals as low-dimensional score vectors
- A simple MLP (single hidden layer) to learn the mapping between FPC scores
- Resultant acceleration input which is invariant to sensor orientation and outperforms triaxial input for the FPC representation
acc2grf_prediction/
├── data/
│ ├── cmj_dataset_arms.npz # Arms condition dataset (generated, not tracked)
│ ├── cmj_dataset_noarms.npz # No-arms condition dataset (generated, not tracked)
│ └── cmj_dataset_both.npz # Both conditions dataset (generated, not tracked)
├── src/
│ ├── __init__.py # Package initialization
│ ├── attention.py # Multi-head self-attention (legacy)
│ ├── transformer.py # Transformer model (legacy)
│ ├── mlp.py # MLP model (recommended)
│ ├── transformations.py # FPCA and B-spline transforms
│ ├── data_loader.py # Data loading and preprocessing
│ ├── visualize_data.py # Data inspection and debugging plots
│ ├── biomechanics.py # Jump height and peak power calculations
│ ├── losses.py # Custom loss functions
│ ├── evaluate.py # Model evaluation metrics
│ └── train.py # Training script with CLI
├── scripts/
│ ├── prepare_dataset.py # Preprocessing: MATLAB → .npz
│ ├── inspect_data_quality.py # Accelerometer quality checks
│ ├── run_all_experiments.sh # Full experiment suite
│ ├── run_resultant_experiments.sh # Resultant-only experiments
│ ├── visualize_projection.py # FPC projection matrix visualization
│ └── visualize_random_samples.py # ACC/GRF signal grid visualization
├── results_noarms/ # Reported results: no arm swing condition
├── results_arms/ # Reported results: arm swing condition
├── results_both/ # Reported results: both conditions
├── notebooks/
│ └── visualise_predictions.ipynb
├── requirements.txt
├── EXPERIMENTS.md # Detailed experiment log
├── EXCLUSIONS.md # Data quality analysis and exclusions
└── README.md
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtThe dataset must be generated from the original MATLAB source files before training:
python scripts/prepare_dataset.py --data-dir /path/to/matlab/filesThis extracts the specified condition, merges duplicate participants, applies quality filters (including exclusion of trials with sensor miscalibration and ADC clipping — see EXCLUSIONS.md), and saves a portable .npz file. Generate all three condition datasets with:
python scripts/prepare_dataset.py --conditions arms --output data/cmj_dataset_arms.npz
python scripts/prepare_dataset.py --conditions noarms --output data/cmj_dataset_noarms.npz
python scripts/prepare_dataset.py --conditions both --output data/cmj_dataset_both.npzThe source MATLAB files (AccelerometerSignals.mat, GRFFeatures.mat, processedjumpdata.mat) are not included in this repository.
To inspect accelerometer data quality before generating the dataset:
python scripts/inspect_data_quality.py --data-path data/cmj_dataset_both.npzTo skip quality exclusions and use all available trials:
python scripts/prepare_dataset.py --no-exclude-quality| Contents | Shape | Units |
|---|---|---|
| ACC signals | Variable length × 3 | g (triaxial, 250 Hz) |
| GRF signals | Variable length | BW (body weight, 1000 Hz) |
| Subject IDs | (N,) | 0-indexed, 67 unique |
| Jump height | (N,) | metres |
| Peak power | (N,) | W/kg |
python -m src.train \
--data-path data/cmj_dataset_noarms.npz \
--model-type mlp --mlp-hidden 128 \
--input-transform fpc --output-transform fpc \
--loss reconstruction \
--simple-normalization \
--n-trials 5 --seed 42 \
--epochs 200This uses the resultant acceleration (default) and achieves signal R² = 0.976, JH RMSE = 4.2 cm, PP RMSE = 4.2 W·kg⁻¹.
Before training, verify your data loading is working correctly:
python -m src.visualize_dataThis generates diagnostic plots in outputs/figures/ and runs sanity checks.
Visualize how ACC functional principal components map to GRF components:
python scripts/visualize_projection.py --data-path data/cmj_dataset_noarms.npz --n-display 3 --top-k 3 --output-dir outputs/projection_vizThis generates three complementary figures:
| Figure | Description |
|---|---|
projection_combined.png |
Overview showing ACC eigenfunctions, projection matrix heatmaps, and GRF eigenfunctions |
projection_contributions.png |
Per-GRF-FPC breakdown showing top contributing ACC components with weights |
biomechanics_fpc.png |
Traditional biomechanics-style visualization with mean ± 2SD bands |
The biomechanics figure shows GRF FPCs in the top row with their top-k ACC contributors below, using the traditional mean ± standard deviation representation. This reveals how acceleration patterns during different movement phases (quiet standing, unweighting, braking, propulsion) contribute to force production.
Arguments:
| Argument | Default | Description |
|---|---|---|
--n-display |
3 | Number of FPC components to display |
--top-k |
3 | Number of top ACC contributors per GRF FPC |
--n-components |
15 | Total FPC components for transformation |
--output-dir |
outputs/projection_visualization | Output directory |
--dpi |
150 | Figure resolution |
| Argument | Default | Description |
|---|---|---|
--data-path |
required | Path to dataset .npz file (e.g. data/cmj_dataset_both.npz) |
--model-type |
transformer | Model type: mlp (recommended) or transformer |
--mlp-hidden |
128 | MLP hidden layer size |
--use-triaxial |
False | Use 3D acceleration (default resultant is recommended for FPC) |
--input-transform |
raw | Input transform: raw, bspline, or fpc |
--output-transform |
raw | Output transform: raw, bspline, or fpc |
--loss |
mse | Loss function: mse, reconstruction, signal_space |
--simple-normalization |
False | Use global z-score (recommended: True) |
--n-trials |
1 | Number of trials for statistical validation |
--epochs |
100 | Maximum training epochs |
--batch-size |
32 | Training batch size |
--learning-rate |
1e-4 | Adam learning rate |
--patience |
15 | Early stopping patience |
--output-dir |
outputs | Output directory |
--run-name |
timestamp | Experiment name |
To run all 12 experiments (4 representations × {Transformer, MLP} × {triaxial, resultant}) for a given dataset condition:
bash scripts/run_all_experiments.sh armsValid conditions are arms, noarms, or both (default). To run all three conditions unattended (e.g. overnight), use MPLBACKEND=Agg to prevent matplotlib from opening figure windows:
nohup env MPLBACKEND=Agg bash -c \
'bash scripts/run_all_experiments.sh arms && \
bash scripts/run_all_experiments.sh noarms && \
bash scripts/run_all_experiments.sh both' \
> overnight_run.log 2>&1 &Monitor progress with tail -f overnight_run.log. Results are saved to results_<condition>/.
The best-performing architecture is surprisingly simple:
ACC signal (500×1) → FPCA → FPC scores (15) → MLP → FPC scores (15) → Inverse FPCA → GRF signal (500×1)
MLP Architecture:
- Input: 15 features (15 FPCs for resultant acceleration)
- Hidden: 128 neurons with ReLU activation
- Output: 15 features (15 FPCs for GRF)
- Parameters: ~12K
Why MLP beats Transformer:
- FPC representation does the heavy lifting — the mean function captures the typical CMJ shape, so the model only learns deviations
- Attention adds no value — the mapping from ACC FPCs to GRF FPCs doesn't benefit from temporal attention
- Simpler models generalize better with limited data (~277 training samples)
The transformer architecture (~750K parameters) is still available but not recommended:
- Input Projection: Linear layer mapping input dimension to d_model
- Positional Encoding: Learnable position embeddings for 500 timesteps
- Encoder Stack: N transformer encoder blocks with multi-head self-attention
- Output Projection: Linear layer mapping d_model to output dimension
- Source: Vertical countermovement jumps (arms swing and no arm swing conditions)
- Participants: 67 unique (after exclusion of clipped and miscalibrated trials)
- Jumps: 663 total (split across three dataset conditions: no arm swing, arm swing, both)
- Accelerometer: Lower back sensor, triaxial (x, y, z) in g units at 250 Hz
- Signal Mode: Resultant acceleration √(x² + y² + z²) or raw triaxial
- Preprocessing: Padded/truncated to 500 points (2000 ms pre-takeoff), z-score normalised
- GRF: Vertical ground reaction force in body weight (BW) units at 1000 Hz (downsampled to 250 Hz)
- Preprocessing: Already BW-normalised in source data, z-score normalised
- Participant-level train/validation split (no data leakage)
- Default: 80% train, 20% validation
- RMSE (Root Mean Square Error)
- MAE (Mean Absolute Error)
- R² (Coefficient of Determination)
Derived from predicted GRF using impulse-momentum method:
- Jump Height: Computed via double integration of net force
- Peak Power: Maximum instantaneous power (F × v)
Both metrics compared against ground truth with RMSE, MAE, R², and Bland-Altman analysis.
After training, the following files are generated:
outputs/<run_name>/
├── config.json # Training configuration
├── data_info.json # Dataset statistics
├── evaluation_results.csv # All metrics in CSV format
├── checkpoints/
│ ├── best_model.keras # Best validation model
│ ├── final_model.keras # Final epoch model
│ └── training_log.csv # Epoch-by-epoch metrics
└── figures/
├── prediction_curves.png # Predicted vs actual GRF (5 samples)
├── prediction_grid.png # Compact 5x6 grid of predictions
├── scatter_metrics.png # Jump height/power scatter
├── bland_altman.png # Agreement analysis
└── training_history.png # Loss curves
from src.data_loader import CMJDataLoader
from src.mlp import build_mlp_model
from src.evaluate import evaluate_model, print_evaluation_summary
# Load data with FPC transforms (recommended)
loader = CMJDataLoader(
data_path='data/cmj_dataset_noarms.npz',
input_transform='fpc',
output_transform='fpc',
simple_normalization=True
)
train_ds, val_ds, info = loader.create_datasets()
# Build and train MLP model
model = build_mlp_model(
input_dim=info['input_dim'], # 15 for resultant FPC
output_dim=info['output_dim'], # 15 for GRF FPC
hidden_dim=128
)
model.fit(train_ds, validation_data=val_ds, epochs=200)
# Evaluate
results = evaluate_model(model, X_val, y_val, loader)
print_evaluation_summary(results)From extensive experimentation (see EXPERIMENTS.md):
- Representation matters more than architecture: A simple MLP with 12K parameters outperforms a 750K-parameter transformer
- FPC representation is the key: Functional Principal Components capture biomechanically relevant features that raw signals and B-splines miss
- Resultant acceleration is sufficient: The acceleration magnitude, invariant to sensor orientation, outperforms triaxial input for the FPC representation
- Simple normalization works best: Global z-score outperforms sophisticated robust normalization
- The mapping is approximately linear and sparse: Each GRF FPC is driven by one or two ACC FPCs, explaining why a simple MLP suffices
- Mean function captures the template — the typical CMJ shape is encoded; the model only learns deviations
- Variance-ordered components naturally weight importance — the leading FPCs capture the aspects most relevant to jump performance
- Massive dimensionality reduction: 15 FPCs vs 500 raw samples
- Implicitly addresses MSE limitations — errors concentrate on biomechanically relevant variation rather than the quiet-standing phase
- Python 3.9+
- TensorFlow 2.10+
- NumPy
- SciPy (for MATLAB file loading during data preparation)
- scikit-fda (for FPCA transforms)
- Matplotlib
- scikit-learn