Skip to content

v1.4.2: Merge pull request #171 from UCL-CORU/inference-column-handling

Choose a tag to compare

@zmek zmek released this 04 May 16:47
· 2 commits to main since this release
5aacf84

Summary

This release fixes inference-time alignment between snapshot dataframes and trained classifier pipelines. ED and inpatient classifiers built with the modern train_classifier layout (FeatureColumnTransformer / feature_columns step) expect elapsed_los as timedelta64 through to predict_proba; conversion to seconds stays inside the fitted pipeline. Older artefacts trained without feature_columns and with numeric LOS still receive numeric seconds at the classifier step, as before.

Changes

  • dataframe_for_classifier_predict_proba (predict.emergency_demand): chooses the correct representation of elapsed_los from the pipeline—keep timedelta for modern pipelines; convert timedelta to float seconds only for legacy pipelines.
  • create_predictions: uses that helper instead of always converting elapsed_los to seconds before admission probabilities.
  • build_service_data / _prepare_base_probabilities (predict.service):
    • add_missing_columns runs only for legacy pipelines (no feature_columns step), matching create_predictions and avoiding spurious missing-column fills for modern bundles.
    • ED and inpatient snapshot frames passed to classifiers use dataframe_for_classifier_predict_proba so modern bundles keep timedelta LOS.

Upgrade / deployment notes

  • Modern model bundles (pipelines with feature_columns): deploy this release with bundles trained on dataframes where elapsed_los is timedelta, consistent with training-time feature typing.
  • Legacy bundles (no feature_columns): behaviour remains compatible; timedelta snapshots are still converted to seconds where required.
  • Downstream services should continue supplying elapsed_los as timedelta on snapshot frames where the upstream contract already does so; avoid pre-converting to float seconds for modern bundles outside the pipeline.