Skip to content

markgewhite/acc2grf_prediction

Repository files navigation

Accelerometer to GRF Prediction

Predicting vertical ground reaction force (vGRF) from a single lower-back accelerometer during countermovement jumps using a dual Functional Principal Component Analysis (FPCA) representation and a simple MLP.

Key Finding

A simple MLP (~12K parameters) with FPC representation outperforms a transformer (~750K parameters). Signal representation, not model architecture, is the primary determinant of prediction quality.

Results (5-run validation, FPC-MLP with resultant input)

Dataset Signal R² JH RMSE (rRMSE) PP RMSE (rRMSE)
No arm swing (72 val. trials) 0.976 ± 0.002 4.2 ± 0.4 cm (10.3%) 4.2 ± 0.4 W·kg⁻¹ (9.1%)
Arm swing (64 val. trials) 0.964 ± 0.001 5.3 ± 0.5 cm (10.6%) 5.6 ± 0.4 W·kg⁻¹ (10.3%)
Both conditions (144 val. trials) 0.970 ± 0.001 4.3 ± 0.2 cm (9.8%) 4.3 ± 0.2 W·kg⁻¹ (8.8%)

JH = jump height; PP = peak power; rRMSE = relative RMSE (% of ground truth mean). Validated on 663 jumps from 67 participants with participant-level train/validation splits.

Overview

This project maps accelerometer signals to GRF curves, enabling force plate-quality biomechanical metrics (jump height, peak power) from a single wearable sensor. The approach uses:

  1. Functional Principal Component Analysis (FPCA) to represent both input (ACC) and output (GRF) signals as low-dimensional score vectors
  2. A simple MLP (single hidden layer) to learn the mapping between FPC scores
  3. Resultant acceleration input which is invariant to sensor orientation and outperforms triaxial input for the FPC representation

Project Structure

acc2grf_prediction/
├── data/
│   ├── cmj_dataset_arms.npz    # Arms condition dataset (generated, not tracked)
│   ├── cmj_dataset_noarms.npz  # No-arms condition dataset (generated, not tracked)
│   └── cmj_dataset_both.npz    # Both conditions dataset (generated, not tracked)
├── src/
│   ├── __init__.py           # Package initialization
│   ├── attention.py          # Multi-head self-attention (legacy)
│   ├── transformer.py        # Transformer model (legacy)
│   ├── mlp.py                # MLP model (recommended)
│   ├── transformations.py    # FPCA and B-spline transforms
│   ├── data_loader.py        # Data loading and preprocessing
│   ├── visualize_data.py     # Data inspection and debugging plots
│   ├── biomechanics.py       # Jump height and peak power calculations
│   ├── losses.py             # Custom loss functions
│   ├── evaluate.py           # Model evaluation metrics
│   └── train.py              # Training script with CLI
├── scripts/
│   ├── prepare_dataset.py     # Preprocessing: MATLAB → .npz
│   ├── inspect_data_quality.py  # Accelerometer quality checks
│   ├── run_all_experiments.sh   # Full experiment suite
│   ├── run_resultant_experiments.sh  # Resultant-only experiments
│   ├── visualize_projection.py  # FPC projection matrix visualization
│   └── visualize_random_samples.py  # ACC/GRF signal grid visualization
├── results_noarms/            # Reported results: no arm swing condition
├── results_arms/              # Reported results: arm swing condition
├── results_both/              # Reported results: both conditions
├── notebooks/
│   └── visualise_predictions.ipynb
├── requirements.txt
├── EXPERIMENTS.md            # Detailed experiment log
├── EXCLUSIONS.md             # Data quality analysis and exclusions
└── README.md

Installation

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Data Preparation

The dataset must be generated from the original MATLAB source files before training:

python scripts/prepare_dataset.py --data-dir /path/to/matlab/files

This extracts the specified condition, merges duplicate participants, applies quality filters (including exclusion of trials with sensor miscalibration and ADC clipping — see EXCLUSIONS.md), and saves a portable .npz file. Generate all three condition datasets with:

python scripts/prepare_dataset.py --conditions arms --output data/cmj_dataset_arms.npz
python scripts/prepare_dataset.py --conditions noarms --output data/cmj_dataset_noarms.npz
python scripts/prepare_dataset.py --conditions both --output data/cmj_dataset_both.npz

The source MATLAB files (AccelerometerSignals.mat, GRFFeatures.mat, processedjumpdata.mat) are not included in this repository.

To inspect accelerometer data quality before generating the dataset:

python scripts/inspect_data_quality.py --data-path data/cmj_dataset_both.npz

To skip quality exclusions and use all available trials:

python scripts/prepare_dataset.py --no-exclude-quality
Contents Shape Units
ACC signals Variable length × 3 g (triaxial, 250 Hz)
GRF signals Variable length BW (body weight, 1000 Hz)
Subject IDs (N,) 0-indexed, 67 unique
Jump height (N,) metres
Peak power (N,) W/kg

Usage

Recommended Configuration (Best Results)

python -m src.train \
    --data-path data/cmj_dataset_noarms.npz \
    --model-type mlp --mlp-hidden 128 \
    --input-transform fpc --output-transform fpc \
    --loss reconstruction \
    --simple-normalization \
    --n-trials 5 --seed 42 \
    --epochs 200

This uses the resultant acceleration (default) and achieves signal R² = 0.976, JH RMSE = 4.2 cm, PP RMSE = 4.2 W·kg⁻¹.

Data Visualization (Recommended First Step)

Before training, verify your data loading is working correctly:

python -m src.visualize_data

This generates diagnostic plots in outputs/figures/ and runs sanity checks.

FPC Projection Visualization

Visualize how ACC functional principal components map to GRF components:

python scripts/visualize_projection.py --data-path data/cmj_dataset_noarms.npz --n-display 3 --top-k 3 --output-dir outputs/projection_viz

This generates three complementary figures:

Figure Description
projection_combined.png Overview showing ACC eigenfunctions, projection matrix heatmaps, and GRF eigenfunctions
projection_contributions.png Per-GRF-FPC breakdown showing top contributing ACC components with weights
biomechanics_fpc.png Traditional biomechanics-style visualization with mean ± 2SD bands

The biomechanics figure shows GRF FPCs in the top row with their top-k ACC contributors below, using the traditional mean ± standard deviation representation. This reveals how acceleration patterns during different movement phases (quiet standing, unweighting, braking, propulsion) contribute to force production.

Arguments:

Argument Default Description
--n-display 3 Number of FPC components to display
--top-k 3 Number of top ACC contributors per GRF FPC
--n-components 15 Total FPC components for transformation
--output-dir outputs/projection_visualization Output directory
--dpi 150 Figure resolution

Training Arguments

Argument Default Description
--data-path required Path to dataset .npz file (e.g. data/cmj_dataset_both.npz)
--model-type transformer Model type: mlp (recommended) or transformer
--mlp-hidden 128 MLP hidden layer size
--use-triaxial False Use 3D acceleration (default resultant is recommended for FPC)
--input-transform raw Input transform: raw, bspline, or fpc
--output-transform raw Output transform: raw, bspline, or fpc
--loss mse Loss function: mse, reconstruction, signal_space
--simple-normalization False Use global z-score (recommended: True)
--n-trials 1 Number of trials for statistical validation
--epochs 100 Maximum training epochs
--batch-size 32 Training batch size
--learning-rate 1e-4 Adam learning rate
--patience 15 Early stopping patience
--output-dir outputs Output directory
--run-name timestamp Experiment name

Running the Full Experiment Suite

To run all 12 experiments (4 representations × {Transformer, MLP} × {triaxial, resultant}) for a given dataset condition:

bash scripts/run_all_experiments.sh arms

Valid conditions are arms, noarms, or both (default). To run all three conditions unattended (e.g. overnight), use MPLBACKEND=Agg to prevent matplotlib from opening figure windows:

nohup env MPLBACKEND=Agg bash -c \
  'bash scripts/run_all_experiments.sh arms && \
   bash scripts/run_all_experiments.sh noarms && \
   bash scripts/run_all_experiments.sh both' \
  > overnight_run.log 2>&1 &

Monitor progress with tail -f overnight_run.log. Results are saved to results_<condition>/.

Model Architecture

Recommended: MLP with FPC Transforms

The best-performing architecture is surprisingly simple:

ACC signal (500×1) → FPCA → FPC scores (15) → MLP → FPC scores (15) → Inverse FPCA → GRF signal (500×1)

MLP Architecture:

  • Input: 15 features (15 FPCs for resultant acceleration)
  • Hidden: 128 neurons with ReLU activation
  • Output: 15 features (15 FPCs for GRF)
  • Parameters: ~12K

Why MLP beats Transformer:

  1. FPC representation does the heavy lifting — the mean function captures the typical CMJ shape, so the model only learns deviations
  2. Attention adds no value — the mapping from ACC FPCs to GRF FPCs doesn't benefit from temporal attention
  3. Simpler models generalize better with limited data (~277 training samples)

Legacy: Transformer Architecture

The transformer architecture (~750K parameters) is still available but not recommended:

  1. Input Projection: Linear layer mapping input dimension to d_model
  2. Positional Encoding: Learnable position embeddings for 500 timesteps
  3. Encoder Stack: N transformer encoder blocks with multi-head self-attention
  4. Output Projection: Linear layer mapping d_model to output dimension

Data

Dataset

  • Source: Vertical countermovement jumps (arms swing and no arm swing conditions)
  • Participants: 67 unique (after exclusion of clipped and miscalibrated trials)
  • Jumps: 663 total (split across three dataset conditions: no arm swing, arm swing, both)

Input Format

  • Accelerometer: Lower back sensor, triaxial (x, y, z) in g units at 250 Hz
  • Signal Mode: Resultant acceleration √(x² + y² + z²) or raw triaxial
  • Preprocessing: Padded/truncated to 500 points (2000 ms pre-takeoff), z-score normalised

Output Format

  • GRF: Vertical ground reaction force in body weight (BW) units at 1000 Hz (downsampled to 250 Hz)
  • Preprocessing: Already BW-normalised in source data, z-score normalised

Data Splits

  • Participant-level train/validation split (no data leakage)
  • Default: 80% train, 20% validation

Evaluation Metrics

Signal-Level

  • RMSE (Root Mean Square Error)
  • MAE (Mean Absolute Error)
  • R² (Coefficient of Determination)

Biomechanical Metrics

Derived from predicted GRF using impulse-momentum method:

  • Jump Height: Computed via double integration of net force
  • Peak Power: Maximum instantaneous power (F × v)

Both metrics compared against ground truth with RMSE, MAE, R², and Bland-Altman analysis.

Output Files

After training, the following files are generated:

outputs/<run_name>/
├── config.json              # Training configuration
├── data_info.json           # Dataset statistics
├── evaluation_results.csv   # All metrics in CSV format
├── checkpoints/
│   ├── best_model.keras     # Best validation model
│   ├── final_model.keras    # Final epoch model
│   └── training_log.csv     # Epoch-by-epoch metrics
└── figures/
    ├── prediction_curves.png  # Predicted vs actual GRF (5 samples)
    ├── prediction_grid.png    # Compact 5x6 grid of predictions
    ├── scatter_metrics.png    # Jump height/power scatter
    ├── bland_altman.png       # Agreement analysis
    └── training_history.png   # Loss curves

Python API

from src.data_loader import CMJDataLoader
from src.mlp import build_mlp_model
from src.evaluate import evaluate_model, print_evaluation_summary

# Load data with FPC transforms (recommended)
loader = CMJDataLoader(
    data_path='data/cmj_dataset_noarms.npz',
    input_transform='fpc',
    output_transform='fpc',
    simple_normalization=True
)
train_ds, val_ds, info = loader.create_datasets()

# Build and train MLP model
model = build_mlp_model(
    input_dim=info['input_dim'],  # 15 for resultant FPC
    output_dim=info['output_dim'],  # 15 for GRF FPC
    hidden_dim=128
)
model.fit(train_ds, validation_data=val_ds, epochs=200)

# Evaluate
results = evaluate_model(model, X_val, y_val, loader)
print_evaluation_summary(results)

Key Insights

From extensive experimentation (see EXPERIMENTS.md):

  1. Representation matters more than architecture: A simple MLP with 12K parameters outperforms a 750K-parameter transformer
  2. FPC representation is the key: Functional Principal Components capture biomechanically relevant features that raw signals and B-splines miss
  3. Resultant acceleration is sufficient: The acceleration magnitude, invariant to sensor orientation, outperforms triaxial input for the FPC representation
  4. Simple normalization works best: Global z-score outperforms sophisticated robust normalization
  5. The mapping is approximately linear and sparse: Each GRF FPC is driven by one or two ACC FPCs, explaining why a simple MLP suffices

Why FPC Works

  1. Mean function captures the template — the typical CMJ shape is encoded; the model only learns deviations
  2. Variance-ordered components naturally weight importance — the leading FPCs capture the aspects most relevant to jump performance
  3. Massive dimensionality reduction: 15 FPCs vs 500 raw samples
  4. Implicitly addresses MSE limitations — errors concentrate on biomechanically relevant variation rather than the quiet-standing phase

Requirements

  • Python 3.9+
  • TensorFlow 2.10+
  • NumPy
  • SciPy (for MATLAB file loading during data preparation)
  • scikit-fda (for FPCA transforms)
  • Matplotlib
  • scikit-learn

About

Accelerometer to Ground Reaction Force Transformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors