v1.4.2: Merge pull request #171 from UCL-CORU/inference-column-handling
Summary
This release fixes inference-time alignment between snapshot dataframes and trained classifier pipelines. ED and inpatient classifiers built with the modern train_classifier layout (FeatureColumnTransformer / feature_columns step) expect elapsed_los as timedelta64 through to predict_proba; conversion to seconds stays inside the fitted pipeline. Older artefacts trained without feature_columns and with numeric LOS still receive numeric seconds at the classifier step, as before.
Changes
dataframe_for_classifier_predict_proba(predict.emergency_demand): chooses the correct representation ofelapsed_losfrom the pipeline—keep timedelta for modern pipelines; convert timedelta to float seconds only for legacy pipelines.create_predictions: uses that helper instead of always convertingelapsed_losto seconds before admission probabilities.build_service_data/_prepare_base_probabilities(predict.service):add_missing_columnsruns only for legacy pipelines (nofeature_columnsstep), matchingcreate_predictionsand avoiding spurious missing-column fills for modern bundles.- ED and inpatient snapshot frames passed to classifiers use
dataframe_for_classifier_predict_probaso modern bundles keep timedelta LOS.
Upgrade / deployment notes
- Modern model bundles (pipelines with
feature_columns): deploy this release with bundles trained on dataframes whereelapsed_losis timedelta, consistent with training-time feature typing. - Legacy bundles (no
feature_columns): behaviour remains compatible; timedelta snapshots are still converted to seconds where required. - Downstream services should continue supplying
elapsed_losas timedelta on snapshot frames where the upstream contract already does so; avoid pre-converting to float seconds for modern bundles outside the pipeline.