single madcap plot fix error

zmek · zmek · commit ce57c2a8b7fa · 2025-03-25T10:37:24.000Z
diff --git a/README.md b/README.md
@@ -31,28 +31,59 @@
 
 ## Summary
 
-patientflow, a Python package, converts patient-level predictions into output that is useful for bed managers in hospitals. If you have a predictive model of some outcome for a patient, like admission or discharge from hospital, you can use patientflow to create bed count distributions for a cohort of patients. The package was developed for University College London Hospitals (UCLH) NHS Trust to predict the number of emergency admissions within the next eight hours. The methods generalise to any problem where it is useful to convert patient-level predictions into outcomes for a whole cohort of patients at a point in time. The repository includes a synthetic dataset and a series of notebooks demonstrating the use of the package.
+patientflow, a Python package, converts patient-level predictions into output that is useful for bed managers in hospitals. If you have a predictive model of some outcome for a patient, like admission or discharge from hospital, you can use patientflow to create bed count distributions for a cohort of patients. 
+
+The package was developed for University College London Hospitals (UCLH) NHS Trust to predict the number of emergency admissions within the next eight hours. The methods generalise to any problem where it is useful to convert patient-level predictions into outcomes for a whole cohort of patients at a point in time. The repository includes a synthetic dataset and a series of notebooks demonstrating the use of the package.
 
 ## Background
 
 I'm [Zella King](https://github.com/zmek/), a health data scientist in the Clinical Operational Research Unit (CORU) at University College London. Since 2020, I have worked with University College London Hospital (UCLH) on practical tools to improve patient flow through the hospital.
 
-Hospital bed managers constantly monitor whether they have sufficient beds to meet demand. At specific points during the day they count numbers of inpatients likely to leave, and numbers of new admissions. Their projections about short-term changes are vital because if they anticipate a shortage of bedsd, bed managers must take swift action to mitigate the situation. 
+Hospital bed managers constantly monitor whether they have sufficient beds to meet demand. At specific points during the day they count numbers of inpatients likely to leave, and numbers of new admissions. Their projections about short-term changes are vital because if they anticipate a shortage of beds, bed managers must take swift action to mitigate the situation. 
 
 With a team from UCLH, I developed a predictive tool that is now in daily use by bed managers at the hospital. 
 
 The tool we built for UCLH takes a 'snapshot' of patients in the hospital at a point in time, and using data from the hospital's electronic record system, predicts the number of emergency admissions in the next 8 or 12 hours. We are working on predicting discharges in the same way. 
 
 The key principle is that we take data on hospital visits that are unfinished, and predict whether some outcome (admission from A&E, discharge from hospital, or transfer to another clinical specialty) will happen to each of those patients in a window of time. What the outcome is doesn't really matter; the same methods can be used. 
 
-But the true utility of our approach - and the thing that makes it very generalisable - is that we build up from the patient-level predictions into a predictions for a whole cohort of patients at a point in time. That step is what creates useful information for bed managers. They are less interested in whether any individual will need a bed and more interested in the overall number of beds needed, and in which parts of the hospital. They trade in cohort-level data - numbers of beds needed for patients in A&E, number of transfers out of the acute medical unit to other wards, number of patients leaving a certain ward. And they are always only looking a few hours ahead. 
+The utility of our approach - and the thing that makes it very generalisable - is that we build up from the patient-level predictions into a predictions for a whole cohort of patients at a point in time. That step is what creates useful information for bed managers. They are less interested in whether any individual will need a bed and more interested in the overall number of beds needed, and in which parts of the hospital. They trade in cohort-level data - numbers of beds needed for patients in A&E, number of transfers out of the acute medical unit to other wards, number of patients leaving a certain ward. And they are always only looking a few hours ahead. 
 
 The methods that we developed for UCLH can be used in any hospitals setting where point-in-time predictions about cohorts of patients are useful. We are sharing these methods because we want to make it easier for researchers and analysts in healthcare to create information products that are useful for site and operations managers in hospitals. 
 
 We provide a Python package to make this convenient. The repository includes a set of notebooks with code written in Python and commentary on how to use the package.
 
 We also show a fully worked example of how to predict emergency demand for beds, and demonstrate how we tailored the approach, using the package, to the specific demands of bed managers at UCLH. 
 
+## What patientflow is for:
+
+* Converting individual patient predictions to cohort-level insights: The core purpose is transforming patient-level predictions into aggregate bed count distributions for groups of patients.
+* Short-term operational planning: The package is designed for bed managers who need to make decisions within an 4-16 hour timeframe.
+* Use with real-time data: The modelling is intended to be used with data streamed from an electronic health record in near to real-time
+* Point-in-time analyses: It works by taking "snapshots" of hospital populations and making projections from those specific moments.
+* Various patient flow outcomes: While developed for emergency admissions, it generalises to other outcomes like discharges or transfers between units.
+* Hospital resource management: It helps operational staff anticipate bed needs across different hospital areas.
+* Working with unfinished patient journeys: It is designed for making predictions when outcomes are still pending as as yet unknown.
+* Demonstrating predictive model development: The package includes examples that show how to create the predictive models for patient outcomes.
+
+## What patientflow is NOT for:
+
+* Long-term capacity planning: The package focuses on immediate operational needs (hours ahead), not strategic planning over weeks or months.
+* Individual patient management: It's not designed for clinical decision-making about specific patients.
+* Detailed clinical pathway analysis: It doesn't model complex clinical pathways or detailed patient journeys.
+* General hospital analytics: It's specifically focused on bed management, not broader hospital analytics like financial planning or clinical quality metrics.
+* Finished/historical patient analysis: While historical data might train underlying models, the package itself focuses on active cases and future projections.
+* Replacing human judgment: It provides decision support but isn't meant to automate bed management decisions completely.
+
+## Mathematical assumptions underlying the conversion from individual to cohort predictions:
+
+* Independence of patient outcomes: The package assumes that individual patient outcomes are conditionally independent given the features used in prediction.
+* Symbolic probability generation: The conversion uses symbolic mathematics (via SymPy) to construct a probability generating function that represents the exact distribution of possible cohort outcomes.
+* Bernoulli outcome model: Each patient outcome is modeled as a Bernoulli trial with its own probability, and the package computes the exact probability distribution for the sum of these independent trials.
+* Coefficient extraction approach: The method works by expanding a symbolic expression and extracting coefficients corresponding to each possible cohort outcome count.
+* Optional weighted aggregation: When converting individual probabilities to cohort-level predictions, the package allows for weighted importance of individual predictions, modifying the contribution of each patient to the overall distribution in specific contexts (eg admissions to different specialties).
+* Discrete outcome space: The package assumes outcomes can be represented as discrete counts (e.g., number of admissions) rather than continuous values.
+
 ## Getting started
 
 - Exploration: Start with the [notebooks README](notebooks/README.md) to get an outline of what is included in the notebooks, and read the [patientflow README](src/patientflow/README.md) for an overview of the Python package
diff --git a/src/patientflow/viz/madcap_plot.py b/src/patientflow/viz/madcap_plot.py
@@ -122,11 +122,11 @@ def generate_madcap_plots(
 
     fig, axes = plt.subplots(num_rows, num_cols, figsize=(num_plots * 5, 4))
 
-    # Ensure axes is always a 2D array
-    if num_rows == 1:
-        axes = axes.reshape(1, -1)
-
-    for i, trained_model in enumerate(trained_models_sorted):
+    # Handle the case of a single plot differently
+    if num_plots == 1:
+        # When there's only one plot, axes is a single Axes object, not an array
+        trained_model = trained_models_sorted[0]
+        
         # Use calibrated pipeline if available, otherwise use regular pipeline
         if (
             hasattr(trained_model, "calibrated_pipeline")
@@ -148,16 +148,46 @@ def generate_madcap_plots(
 
         X_test = add_missing_columns(pipeline, X_test)
         predict_proba = pipeline.predict_proba(X_test)[:, 1]
+        
+        # Plot directly on the single axes
+        plot_madcap_subplot(predict_proba, y_test, prediction_time, axes)
+    else:
+        # For multiple plots, ensure axes is always a 2D array
+        if num_rows == 1:
+            axes = axes.reshape(1, -1)
+
+        for i, trained_model in enumerate(trained_models_sorted):
+            # Use calibrated pipeline if available, otherwise use regular pipeline
+            if (
+                hasattr(trained_model, "calibrated_pipeline")
+                and trained_model.calibrated_pipeline is not None
+            ):
+                pipeline = trained_model.calibrated_pipeline
+            else:
+                pipeline = trained_model.pipeline
+
+            prediction_time = trained_model.training_results.prediction_time
+
+            # Get test data for this prediction time
+            X_test, y_test = get_snapshots_at_prediction_time(
+                df=test_visits,
+                prediction_time=prediction_time,
+                exclude_columns=exclude_from_training_data,
+                single_snapshot_per_visit=False,
+            )
 
-        row = i // num_cols
-        col = i % num_cols
-        plot_madcap_subplot(predict_proba, y_test, prediction_time, axes[row, col])
+            X_test = add_missing_columns(pipeline, X_test)
+            predict_proba = pipeline.predict_proba(X_test)[:, 1]
 
-    # Hide any unused subplots
-    for j in range(i + 1, num_rows * num_cols):
-        row = j // num_cols
-        col = j % num_cols
-        axes[row, col].axis("off")
+            row = i // num_cols
+            col = i % num_cols
+            plot_madcap_subplot(predict_proba, y_test, prediction_time, axes[row, col])
+
+        # Hide any unused subplots
+        for j in range(i + 1, num_rows * num_cols):
+            row = j // num_cols
+            col = j % num_cols
+            axes[row, col].axis("off")
 
     plt.tight_layout()
 
@@ -175,7 +205,6 @@ def generate_madcap_plots(
     plt.show()
     plt.close(fig)
 
-
 def plot_madcap_subplot(predict_proba, label, _prediction_time, ax):
     """
     Plots a single MADCAP subplot showing cumulative predicted and observed admissions.