Skip to content

Commit d81206b

Browse files
committed
strict typing in viz plots
1 parent ce57c2a commit d81206b

8 files changed

Lines changed: 57 additions & 59 deletions

File tree

README.md

Lines changed: 27 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -31,58 +31,46 @@
3131

3232
## Summary
3333

34-
patientflow, a Python package, converts patient-level predictions into output that is useful for bed managers in hospitals. If you have a predictive model of some outcome for a patient, like admission or discharge from hospital, you can use patientflow to create bed count distributions for a cohort of patients.
34+
patientflow, a Python package, converts patient-level predictions into output that is useful for bed managers in hospitals.
3535

36-
The package was developed for University College London Hospitals (UCLH) NHS Trust to predict the number of emergency admissions within the next eight hours. The methods generalise to any problem where it is useful to convert patient-level predictions into outcomes for a whole cohort of patients at a point in time. The repository includes a synthetic dataset and a series of notebooks demonstrating the use of the package.
36+
We developed this code originally for University College London Hospitals (UCLH) NHS Trust to predict the number of emergency admissions within the next eight hours. The methods generalise to other aspects of patient flow in hospitals, including predictions of discharge numbers, within a group of patients. It can be applied to any problem where it is useful to convert patient-level predictions into outcomes for a whole cohort of patients at a point in time.
3737

38-
## Background
38+
If you have a predictive model of some outcome for a patient, like admission or discharge from hospital, you can use patientflow to create bed count distributions for a cohort of patients. We show how to prepare your data and train models for these kinds of problems. The repository includes a synthetic dataset and a series of notebooks demonstrating the use of the package.
3939

40-
I'm [Zella King](https://github.com/zmek/), a health data scientist in the Clinical Operational Research Unit (CORU) at University College London. Since 2020, I have worked with University College London Hospital (UCLH) on practical tools to improve patient flow through the hospital.
41-
42-
Hospital bed managers constantly monitor whether they have sufficient beds to meet demand. At specific points during the day they count numbers of inpatients likely to leave, and numbers of new admissions. Their projections about short-term changes are vital because if they anticipate a shortage of beds, bed managers must take swift action to mitigate the situation.
43-
44-
With a team from UCLH, I developed a predictive tool that is now in daily use by bed managers at the hospital.
45-
46-
The tool we built for UCLH takes a 'snapshot' of patients in the hospital at a point in time, and using data from the hospital's electronic record system, predicts the number of emergency admissions in the next 8 or 12 hours. We are working on predicting discharges in the same way.
47-
48-
The key principle is that we take data on hospital visits that are unfinished, and predict whether some outcome (admission from A&E, discharge from hospital, or transfer to another clinical specialty) will happen to each of those patients in a window of time. What the outcome is doesn't really matter; the same methods can be used.
49-
50-
The utility of our approach - and the thing that makes it very generalisable - is that we build up from the patient-level predictions into a predictions for a whole cohort of patients at a point in time. That step is what creates useful information for bed managers. They are less interested in whether any individual will need a bed and more interested in the overall number of beds needed, and in which parts of the hospital. They trade in cohort-level data - numbers of beds needed for patients in A&E, number of transfers out of the acute medical unit to other wards, number of patients leaving a certain ward. And they are always only looking a few hours ahead.
40+
## What patientflow is for:
5141

52-
The methods that we developed for UCLH can be used in any hospitals setting where point-in-time predictions about cohorts of patients are useful. We are sharing these methods because we want to make it easier for researchers and analysts in healthcare to create information products that are useful for site and operations managers in hospitals.
42+
- Managing patient flow in hospitals: The package can be used to predict numbers of emergency admissions, discharges or transfers between units
43+
- Short-term operational planning: The predictions produced by this package are designed for bed managers who need to make decisions within an 4-16 hour timeframe.
44+
- Working with real-time data: The design assumes that data from an electronic health record (EHR) is available in real-time, or near to real-time
45+
- Point-in-time analysis: The packages works by taking "snapshots" of groups of patients at a particular moment, and making projections from those specific moments.
5346

54-
We provide a Python package to make this convenient. The repository includes a set of notebooks with code written in Python and commentary on how to use the package.
47+
## What patientflow is NOT for:
5548

56-
We also show a fully worked example of how to predict emergency demand for beds, and demonstrate how we tailored the approach, using the package, to the specific demands of bed managers at UCLH.
49+
- Long-term capacity planning: The package focuses on immediate operational needs (hours ahead), not strategic planning over weeks or months.
50+
- Making decisions about individual patients: The package is not designed for clinical decision-making about specific patients. It relies on data entered into the EHR by clinical staff looking after patients, but cannot and should not be use to influence their decision-making
51+
- General hospital analytics: It is specifically focused on short-term bed management, not broader hospital analytics like long-term demand and capacity planning.
52+
- Finished/historical patient analysis: While historical data might train underlying models, the package itself focuses on patients currently in the hospital or soon to arrive
53+
- Replacing human judgment: It augments the information available to bed managers, but isn't meant to automate bed management decisions completely.
5754

58-
## What patientflow is for:
55+
## This package will help you if you want to:
5956

60-
* Converting individual patient predictions to cohort-level insights: The core purpose is transforming patient-level predictions into aggregate bed count distributions for groups of patients.
61-
* Short-term operational planning: The package is designed for bed managers who need to make decisions within an 4-16 hour timeframe.
62-
* Use with real-time data: The modelling is intended to be used with data streamed from an electronic health record in near to real-time
63-
* Point-in-time analyses: It works by taking "snapshots" of hospital populations and making projections from those specific moments.
64-
* Various patient flow outcomes: While developed for emergency admissions, it generalises to other outcomes like discharges or transfers between units.
65-
* Hospital resource management: It helps operational staff anticipate bed needs across different hospital areas.
66-
* Working with unfinished patient journeys: It is designed for making predictions when outcomes are still pending as as yet unknown.
67-
* Demonstrating predictive model development: The package includes examples that show how to create the predictive models for patient outcomes.
57+
- Convert individual patient predictions to cohort-level insights: Its core purpose is the creation of aggregate bed count distributions, because bed numbers are the currencly used by bed managers.
58+
- Make predictions for unfinished patient visits: It is designed for making predictions when outcome at the end of the visit are as yet unknown.
59+
- Develop your own predictive models of emergency demand: The package includes a fully worked example of how to convert data from A&E visits into the right structure, and use that data to train models that predict numbers of emergency beds.
6860

69-
## What patientflow is NOT for:
61+
## This package will not help you if:
7062

71-
* Long-term capacity planning: The package focuses on immediate operational needs (hours ahead), not strategic planning over weeks or months.
72-
* Individual patient management: It's not designed for clinical decision-making about specific patients.
73-
* Detailed clinical pathway analysis: It doesn't model complex clinical pathways or detailed patient journeys.
74-
* General hospital analytics: It's specifically focused on bed management, not broader hospital analytics like financial planning or clinical quality metrics.
75-
* Finished/historical patient analysis: While historical data might train underlying models, the package itself focuses on active cases and future projections.
76-
* Replacing human judgment: It provides decision support but isn't meant to automate bed management decisions completely.
63+
- You work with time series data: patientflow works with snapshots of a hospital visit summarising what is in the patient record up to that point in time
64+
- Your focus is on predicting clinical outcomes: the approach is designed
7765

7866
## Mathematical assumptions underlying the conversion from individual to cohort predictions:
7967

80-
* Independence of patient outcomes: The package assumes that individual patient outcomes are conditionally independent given the features used in prediction.
81-
* Symbolic probability generation: The conversion uses symbolic mathematics (via SymPy) to construct a probability generating function that represents the exact distribution of possible cohort outcomes.
82-
* Bernoulli outcome model: Each patient outcome is modeled as a Bernoulli trial with its own probability, and the package computes the exact probability distribution for the sum of these independent trials.
83-
* Coefficient extraction approach: The method works by expanding a symbolic expression and extracting coefficients corresponding to each possible cohort outcome count.
84-
* Optional weighted aggregation: When converting individual probabilities to cohort-level predictions, the package allows for weighted importance of individual predictions, modifying the contribution of each patient to the overall distribution in specific contexts (eg admissions to different specialties).
85-
* Discrete outcome space: The package assumes outcomes can be represented as discrete counts (e.g., number of admissions) rather than continuous values.
68+
- Independence of patient outcomes: The package assumes that individual patient outcomes are conditionally independent given the features used in prediction.
69+
- Symbolic probability generation: The conversion uses symbolic mathematics (via SymPy) to construct a probability generating function that represents the exact distribution of possible cohort outcomes.
70+
- Bernoulli outcome model: Each patient outcome is modeled as a Bernoulli trial with its own probability, and the package computes the exact probability distribution for the sum of these independent trials.
71+
- Coefficient extraction approach: The method works by expanding a symbolic expression and extracting coefficients corresponding to each possible cohort outcome count.
72+
- Optional weighted aggregation: When converting individual probabilities to cohort-level predictions, the package allows for weighted importance of individual predictions, modifying the contribution of each patient to the overall distribution in specific contexts (eg admissions to different specialties).
73+
- Discrete outcome space: The package assumes outcomes can be represented as discrete counts (e.g., number of admissions) rather than continuous values.
8674

8775
## Getting started
8876

notebooks/0_Background.ipynb

Whitespace-only changes.

src/patientflow/train/classifiers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ def train_classifier(
256256
use_balanced_training: bool = True,
257257
majority_to_minority_ratio: float = 1.0,
258258
calibrate_probabilities: bool = True,
259-
calibration_method: str = "isotonic",
259+
calibration_method: str = "sigmoid",
260260
) -> TrainedClassifier:
261261
"""
262262
Train a single model including data preparation and balancing.

src/patientflow/viz/calibration_plot.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,8 @@ def plot_calibration(
3131
# Sort trained_models by prediction time
3232
trained_models_sorted = sorted(
3333
trained_models,
34-
key=lambda x: x.training_results.prediction_time[0] * 60 + x.training_results.prediction_time[1],
34+
key=lambda x: x.training_results.prediction_time[0] * 60
35+
+ x.training_results.prediction_time[1],
3536
)
3637
num_plots = len(trained_models_sorted)
3738
fig, axs = plt.subplots(1, num_plots, figsize=(num_plots * 5, 4))

src/patientflow/viz/distribution_plots.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
import matplotlib.pyplot as plt
22
from patientflow.predict.emergency_demand import add_missing_columns
33
from patientflow.prepare import get_snapshots_at_prediction_time
4-
from patientflow.load import get_model_key, load_saved_model
54
from patientflow.model_artifacts import TrainedClassifier
5+
from typing import Optional
6+
from pathlib import Path
67

78
# Define the color scheme
89
primary_color = "#1f77b4"
@@ -14,8 +15,7 @@ def plot_prediction_distributions(
1415
test_visits,
1516
exclude_from_training_data,
1617
bins=30,
17-
media_file_path: str= None
18-
18+
media_file_path: Optional[Path] = None,
1919
):
2020
"""
2121
Plot prediction distributions for multiple models.
@@ -33,7 +33,8 @@ def plot_prediction_distributions(
3333
# Sort trained_models by prediction time
3434
trained_models_sorted = sorted(
3535
trained_models,
36-
key=lambda x: x.training_results.prediction_time[0] * 60 + x.training_results.prediction_time[1],
36+
key=lambda x: x.training_results.prediction_time[0] * 60
37+
+ x.training_results.prediction_time[1],
3738
)
3839
num_plots = len(trained_models_sorted)
3940
fig, axs = plt.subplots(1, num_plots, figsize=(num_plots * 5, 4))

src/patientflow/viz/feature_plot.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import numpy as np
22
import matplotlib.pyplot as plt
33
from patientflow.model_artifacts import TrainedClassifier
4-
from patientflow.load import get_model_key, load_saved_model
4+
from sklearn.pipeline import Pipeline
55

66

77
def plot_features(
@@ -22,7 +22,8 @@ def plot_features(
2222
# Sort trained_models by prediction time
2323
trained_models_sorted = sorted(
2424
trained_models,
25-
key=lambda x: x.training_results.prediction_time[0] * 60 + x.training_results.prediction_time[1],
25+
key=lambda x: x.training_results.prediction_time[0] * 60
26+
+ x.training_results.prediction_time[1],
2627
)
2728

2829
num_plots = len(trained_models_sorted)
@@ -34,7 +35,7 @@ def plot_features(
3435

3536
for i, trained_model in enumerate(trained_models_sorted):
3637
# Always use regular pipeline
37-
pipeline = trained_model.pipeline
38+
pipeline: Pipeline = trained_model.pipeline
3839
prediction_time = trained_model.training_results.prediction_time
3940

4041
# Get feature names from the pipeline
@@ -46,7 +47,9 @@ def plot_features(
4647

4748
# Get feature importances
4849
feature_importances = pipeline.named_steps["classifier"].feature_importances_
49-
indices = np.argsort(feature_importances)[-top_n:] # Get indices of the top N features
50+
indices = np.argsort(feature_importances)[
51+
-top_n:
52+
] # Get indices of the top N features
5053

5154
# Plot for this prediction time
5255
ax = axs[i]
@@ -66,6 +69,6 @@ def plot_features(
6669

6770
# Save and display plot
6871
feature_plot_path = media_file_path / "feature_importance_plots.png"
69-
plt.savefig(feature_plot_path, bbox_inches='tight')
72+
plt.savefig(feature_plot_path, bbox_inches="tight")
7073
plt.show()
7174
plt.close(fig)

src/patientflow/viz/madcap_plot.py

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
"""
2727

2828
from pathlib import Path
29-
from typing import List, Tuple, Union
29+
from typing import List, Union, Optional
3030

3131
import matplotlib.pyplot as plt
3232
import math
@@ -90,7 +90,7 @@ def generate_madcap_plots(
9090
media_file_path: Union[str, Path, None],
9191
test_visits: pd.DataFrame,
9292
exclude_from_training_data: List[str],
93-
suptitle: str = None,
93+
suptitle: Optional[str] = None,
9494
) -> None:
9595
"""
9696
Generates MADCAP plots for a list of trained models, comparing predicted probabilities
@@ -112,7 +112,8 @@ def generate_madcap_plots(
112112
# Sort trained_models by prediction time
113113
trained_models_sorted = sorted(
114114
trained_models,
115-
key=lambda x: x.training_results.prediction_time[0] * 60 + x.training_results.prediction_time[1],
115+
key=lambda x: x.training_results.prediction_time[0] * 60
116+
+ x.training_results.prediction_time[1],
116117
)
117118
num_plots = len(trained_models_sorted)
118119

@@ -126,7 +127,7 @@ def generate_madcap_plots(
126127
if num_plots == 1:
127128
# When there's only one plot, axes is a single Axes object, not an array
128129
trained_model = trained_models_sorted[0]
129-
130+
130131
# Use calibrated pipeline if available, otherwise use regular pipeline
131132
if (
132133
hasattr(trained_model, "calibrated_pipeline")
@@ -148,7 +149,7 @@ def generate_madcap_plots(
148149

149150
X_test = add_missing_columns(pipeline, X_test)
150151
predict_proba = pipeline.predict_proba(X_test)[:, 1]
151-
152+
152153
# Plot directly on the single axes
153154
plot_madcap_subplot(predict_proba, y_test, prediction_time, axes)
154155
else:
@@ -200,11 +201,12 @@ def generate_madcap_plots(
200201
if media_file_path:
201202
plot_name = "madcap_plot"
202203
madcap_plot_path = Path(media_file_path) / plot_name
203-
plt.savefig(madcap_plot_path, bbox_inches='tight')
204+
plt.savefig(madcap_plot_path, bbox_inches="tight")
204205

205206
plt.show()
206207
plt.close(fig)
207208

209+
208210
def plot_madcap_subplot(predict_proba, label, _prediction_time, ax):
209211
"""
210212
Plots a single MADCAP subplot showing cumulative predicted and observed admissions.
@@ -388,7 +390,8 @@ def generate_madcap_plots_by_group(
388390
# Sort trained_models by prediction time
389391
trained_models_sorted = sorted(
390392
trained_models,
391-
key=lambda x: x.training_results.prediction_time[0] * 60 + x.training_results.prediction_time[1],
393+
key=lambda x: x.training_results.prediction_time[0] * 60
394+
+ x.training_results.prediction_time[1],
392395
)
393396

394397
for trained_model in trained_models_sorted:

src/patientflow/viz/shap_plot.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import shap
66
import scipy.sparse
77
import numpy as np
8+
from sklearn.pipeline import Pipeline
89

910

1011
def plot_shap(
@@ -30,13 +31,14 @@ def plot_shap(
3031
# Sort trained_models by prediction time
3132
trained_models_sorted = sorted(
3233
trained_models,
33-
key=lambda x: x.training_results.prediction_time[0] * 60 + x.training_results.prediction_time[1],
34+
key=lambda x: x.training_results.prediction_time[0] * 60
35+
+ x.training_results.prediction_time[1],
3436
)
3537

3638
for trained_model in trained_models_sorted:
3739
fig, ax = plt.subplots(figsize=(8, 12))
3840

39-
pipeline = trained_model.pipeline
41+
pipeline: Pipeline = trained_model.pipeline
4042
prediction_time = trained_model.training_results.prediction_time
4143

4244
# Get test data for this prediction time

0 commit comments

Comments
 (0)