Skip to content

DMontgomery40/acoustic-momentum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Acoustic Momentum Detector

Weakly-supervised, real-time crowd audio classification as a complementary signal channel for in-play sports pricing — 12ms inference on Apple Silicon, no manual annotation required.

A microphone at the stadium captures crowd audio. This project turns that raw noise into an Attack Pressure Index — a continuous [0,1] signal that escalates during dangerous attacks, near-misses, 1v1s, and other high-entropy game moments, trained entirely from match audio + goal timestamps with no frame-by-frame labeling.

What It Is (and Isn't)

What it is: A game-state volatility detector. When the crowd erupts, the model fires. That eruption happens on dangerous attacks, near-misses, 1v1s, keeper rushes — high-entropy moments where the match outcome is temporarily uncertain.

What it isn't: A goal predictor. Soccer generates ~10–15 crowd spikes per match and ~2.5 goals on average. Most spikes are not goals. That's the point — and also the honest limitation.

What makes it interesting technically:

  1. Weakly supervised training — goal timestamps from StatsBomb open data generate labels automatically. Audio within 60s of a goal = positive. No manual annotation. This scales to thousands of hours of archive footage.

  2. 12ms end-to-end inference — mel spectrogram extraction (~2ms) + ANE-optimized CNN encoder (~8ms) + BiLSTM temporal head (~2ms) on Apple Silicon. Faster than any human data entry pipeline.

  3. Complementary to tracking data — computer vision tells you WHERE the ball is. Crowd noise tells you what 40,000 humans collectively THINK is about to happen. These are different signals. The crowd integrates player body language, tactical context, set piece positioning, and historical pattern recognition simultaneously.

Architecture

At-venue microphone  (stadium, pitch-side or press box)
        │
        ▼  sounddevice / ffmpeg  →  22050 Hz mono PCM
┌─────────────────────────────────────────┐
│  Mel Spectrogram Pipeline               │
│  10s windows · 2s hop · 64 mel bins     │
│  → (n_windows, 64, 216) float32         │
└────────────────┬────────────────────────┘
                 │  sliding window (15 windows = 38s audio history)
                 ▼
┌─────────────────────────────────────────┐
│  SpectrogramCNN  [ANE-optimized]        │
│  stem → DSConv×3 (ReLU6)               │
│  ch: 1→32→64→128→128                   │
│  global avg pool → FC(128)             │
└────────────────┬────────────────────────┘
                 │  (batch, 15, 128)
                 ▼
┌─────────────────────────────────────────┐
│  BiLSTM × 2  (hidden=64, → 128 out)    │
│  last timestep → FC(64) → FC(2)         │
└────────────────┬────────────────────────┘
                 ▼
       Attack Pressure Index  [0, 1]
       ~12 ms total · Apple Silicon

CNN is ANE-exportable (DSConv, ReLU6, channel multiples of 64). Training uses pre-computed CNN embeddings so the LSTM head can be fine-tuned on new match audio in minutes.

Time alignment (no lookahead): Each prediction is timestamped at the END of the last processed window. Labels gate on window END time to exclude windows whose audio spans the goal event. The model sees only audio from the past.

Quick Start

cd ~/acoustic-momentum

# install deps (prefer uv or pip3.10)
pip install librosa statsbombpy yt-dlp click rich tqdm sounddevice scikit-learn

# run end-to-end demo (synthetic data, no audio files needed)
python scripts/demo.py

# launch dashboard
python -m streamlit run dashboard/app.py

Real Match Data

1. Ingest match audio (via dashboard or CLI)

Use the Ingest tab in the dashboard. Paste a YouTube URL (highlight clips work great) or upload a WAV/MP4. Enter goal timestamps manually (seconds from kickoff) or fetch from StatsBomb open data.

# CLI alternative
yt-dlp -x --audio-format wav --ffmpeg-location /opt/homebrew/bin \
  -o "data/raw/%(title)s.%(ext)s" "YOUTUBE_URL"

2. Train on real match audio

After ingesting, use the Train section in the Ingest tab. This fits the LSTM head on real crowd audio — replacing the synthetic-trained checkpoint with one that responds to actual stadium acoustics.

# CLI
python scripts/train_cli.py --data-dir data/processed/ --epochs 30

3. Run live

streamlit run dashboard/app.py
# → Live tab → At-venue microphone mode → type "mic" → Go Live

StatsBomb Open Data

StatsBomb provides free event-level data for ~50 competitions including La Liga, Champions League, Women's World Cup.

from statsbombpy import sb
matches = sb.matches(competition_id=11, season_id=90)  # La Liga 2020/21
events  = sb.events(match_id=3788741)
goals   = events[(events.type == "Shot") & (events.shot_outcome == "Goal")]

Where This Signal Is Useful

Tier-1 markets (Premier League, Bundesliga) are heavily automated. Major data providers use computer vision tracking at top grounds. The 5-15s human data entry window exists but is shrinking at the elite level.

Lower leagues are a different story. The Championship, Bundesliga 2, Serie B, and most non-European leagues still rely on manual data entry. Those markets also have thinner bookmaker pricing teams, making them more exploitable. The acoustic edge is more realistic there.

The fade signal may be more durable than the front-run. Academic research (Otting, 2025, Bundesliga data) found betting against crowd momentum after a spike returned +4.1% per match — the crowd is a contrarian indicator. Emotional bettors overreact to dangerous attacks, pushing odds irrationally. A 30–60s delay to let that overreaction set in before fading it is harder for automated systems to neutralize.

What This Demonstrates (Portfolio Context)

This project demonstrates:

  1. Domain problem framing — understanding in-play pricing mechanics, data provider pipelines (provider → bookmaker → odds), and where the signal gaps are
  2. Real-time ML under hard latency constraints — 12ms end-to-end, ANE-optimized, on-device Apple Silicon deployment
  3. Weak supervision at scale — generating training labels from timestamps, not annotation; the same approach scales to thousands of archive matches automatically
  4. Novel signal identification — crowd audio as a complementary channel to tracking data; what 40,000 humans collectively perceive vs. what sensors measure
  5. Honest edge analysis — correctly identifying which market tiers and which strategy (front-run vs. fade) are realistically exploitable

Extension Ideas

  • Tactical shift detection — classify team formation changes from tracking data in real time; a 4-4-2 → 3-5-2 shift is a statistically meaningful in-play pricing event
  • Expected Possession Value chains — predict the full action chain (pass → movement → recovery), not just next shot probability, at sub-second latency
  • Crowd + player arousal fusion — combine acoustic signal with player positional density clustering (spatial cohesion under pressure) for a richer momentum index
  • Commentary transcription layer — Whisper → event classifier, fused with acoustic score; two independent noisy channels, one cleaner signal
  • CoreML export — iOS deployment for iPhone/iPad at-venue device
  • Multi-sport — basketball (fast breaks), tennis (break points), American football (red zone)

Roadmap

  • Fade strategy implementation — auto-bet against momentum N seconds post-spike
  • Calibration study — annotate which spike types (near-miss vs. corner vs. VAR check) correlate with market overreaction
  • Attention map visualization — which frequency bands and time regions drive the model
  • Tactical shift detection module — real-time formation classifier from tracking data
  • CoreML export for iOS deployment

About

Weakly-supervised crowd audio classification for real-time game-state detection. CNN+BiLSTM pipeline with Streamlit dashboard.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages