Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
04bab69
Notebook for minimal dataset
zmek Jan 30, 2025
bc1d739
Experiment with minimal dataset
zmek Feb 4, 2025
227586a
Minimal dataset plots
zmek Feb 4, 2025
4952e61
Notebook for minimal dataset
zmek Jan 30, 2025
ac73c6e
Experiment with minimal dataset
zmek Feb 4, 2025
f2c1c0d
Minimal dataset plots
zmek Feb 4, 2025
c50e52c
Update notebook text and show minimal model
zmek Mar 6, 2025
e3363aa
Keep local version of admission prediction notebook
zmek Mar 6, 2025
f9110e7
Ruff reformat
zmek Mar 6, 2025
4a963f3
Support prototype emergency-bed-predictor app
zmek Mar 6, 2025
e541f13
Apply probability calibration to the best model
zmek Mar 10, 2025
c0ffa5e
experiment with undersampling
zmek Mar 10, 2025
79e8bea
test undersampling
zmek Mar 11, 2025
efee126
madcap plots for admssion predictor
zmek Mar 13, 2025
9be8616
Return calibrated pipeline to models dict
zmek Mar 13, 2025
49cd698
Refine minimal demand modelling
zmek Mar 17, 2025
10bc03f
Update evaluation notebook
zmek Mar 17, 2025
76dfcd5
save special params as object in spec model
zmek Mar 19, 2025
e7f80a8
save special category objects in pickle-able form
zmek Mar 19, 2025
7e80ba9
update tests for new special params
zmek Mar 20, 2025
13a8449
move all model preprocessing to train_single_model()
zmek Mar 20, 2025
b4b6e24
use balanced samples by default
zmek Mar 20, 2025
8386cb5
save metadata within model objects
zmek Mar 20, 2025
0878ffe
work in progress
zmek Mar 20, 2025
ece0d93
Enforce model types in create_predictions
zmek Mar 21, 2025
c51faa2
Make model names generic
zmek Mar 21, 2025
80e3931
Refactor train.emergency_demand
zmek Mar 21, 2025
13d1d97
make epsilon optional argument
zmek Mar 21, 2025
c30ebec
restructure patientflow modules
zmek Mar 24, 2025
36cf61d
Improve training results data classes
zmek Mar 24, 2025
56f6d8d
Fix conflict with main
zmek Mar 24, 2025
dbf9573
ruff reformat
zmek Mar 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 13 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# PatientFlow: Code and explanatory notebooks for predicting short-term hospital bed capacity using real-time data
# PatientFlow: Predicting demand for hospital beds using real-time data

[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![Tests status][tests-badge]][tests-link]
Expand Down Expand Up @@ -29,28 +29,26 @@
[pypi-version]: https://img.shields.io/pypi/v/patientflow -->
<!-- prettier-ignore-end -->

Welcome to the PatientFlow repository, which is designed to support hospital bed management through predictive modelling. I'm [Zella King](https://github.com/zmek/), a health data scientist in the Clinical Operational Research Unit (CORU) at University College London. Since 2020, I have worked with University College London Hospital (UCLH) NHS Trust on practical tools to improve patient flow through the hospital.
Welcome to the PatientFlow repository, which provides predictive modelling for hospital bed management. I'm [Zella King](https://github.com/zmek/), a health data scientist in the Clinical Operational Research Unit (CORU) at University College London. Since 2020, I have worked with University College London Hospital (UCLH) on practical tools to improve patient flow through the hospital.

My most important contribution is a software application that [Jon Gillham](https://github.com/jongillham) and I developed, which is now in daily use by bed managers at the hospital. That application generates predictions of emergency demand for beds, using real-time data from the hospital's patient record system, and sends the predictions to the bed managers. I created the predictive models that are used in the application. Jon created the software that runs my modelling code five times a day, and sends the predictions by email to the bed managers.
With a team from UCLH, I developed a predictive tool that is now in daily use by bed managers at the hospital. The tool generates predictions of emergency demand for beds, using real-time data from the hospital's patient record system.

I developed the code I wrote for UCLH into a reusable resource following the principles of [Reproducible Analytical Pipelines](https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/). I did this because I want to:
I am sharing the code I wrote for UCLH as a reusable resource because I want to make it easier for researchers to convert patient-level predictions into output that is useful for bed managers in hospitals. This repository includes a Python package, called patientflow, which converts patient-level predictions into output that is useful for bed managers. If you have a predictive model of some outcome for a patient, like admission or discharge from hospital, you can use patientflow to create bed count distributions for a cohort of patients.

1. Share the code with researchers and NHS analysts who are work on similar models
2. Make it easier for others to make use of the mathemetics involved in making these predictions
3. Inform and educate anyone who wishes to adopt a similar approach
The methods generalise to any problem where it is useful to convert patient-level predictions into outcomes for a whole cohort of patients at a point in time. The repository includes a synthetic dataset and a series of notebooks demonstrating the use of the package.

## Main features of my modelling approach

- **Led by what users need:** My work is the result of close collaboration with operations directors and bed managers in the Coordination Centre, University College London Hospital (UCLH), since 2020. What is modelled directly reflects how they work and what is most useful to them.
- **Focused on short-term predictions:** I am expert in predicting demand within a short time horizon eg 8 or 12 hours. Here I show models that predict how many beds will be needed emergency patients. (Later I plan to add modules that also predict elective demand, discharge and transfers between specialties.)
- **Assumes real-time data is available:** Hospital bed managers have to deal with rapidly changing situations. My focus is on the use of real-time data (or near to real-time) to help them make informed decisions. The modelling shown here assumes that a hospital has some capacity to make use of real-time data in its electronic health record, even if this data is minimal.
- **Focused on short-term predictions:** The modelling is designed for predicting demand within a short time horizon eg 8 or 12 hours. I show how to use my code to predict how many beds will be needed emergency patients. (Later I plan to add modules that for elective demand, discharge and transfers between specialties.)
- **Assumes real-time data is available:** Hospital bed managers have to deal with rapidly changing situations. My focus is on the use of real-time data (or near to real-time) to help them make informed decisions.

## Main Features of this repository

- **Reproducible** - I follow the principles of [Reproducible Analytical Pipelines](https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/). The repository can be installed as a Python package, and imported into your own code.
- **Accessible** - All the elements are based on simple techniques and methods in Health Data Science and Operational Research. I intend that anyone with some knowledge of Python could understand and adapt the code for their use.
- **Practical:** - I believe that it is easier to follow the steps I took if you have access to the same data I have. UCLH have released an anomymised version of real patient data, which you can request access to on [Zenodo](https://zenodo.org/records/14866057), or you can use the synthetic dataset, derived from real patient data, in the `data-synthetic` folder. (Note that, if you use the synthetic dataset, the integrity of relationships between variables is not maintained and you will obtain articifically inflated model performance.)
- **Interactive:** The repository includes a set of notebooks with code written on Python, with commentary. If you clone the repo into your own workspace and have an environment for running Jupyter notebooks, you will be able to interact with the code and see it running.
- **Practical:** - I believe that it is easier to follow the steps I took if you have access to the same data I have. UCLH have released an anonymised version of real patient data, which you can request access on [Zenodo](https://zenodo.org/records/14866057), or you can use the synthetic dataset, derived from real patient data, in the `data-synthetic` folder. (Note that, if you use the synthetic dataset, you will observe articifically inflated model performance.)
- **Interactive:** The repository includes a set of notebooks with code written on Python and commentary. If you clone the repo into your own workspace and have an environment for running Jupyter notebooks, you will be able to interact with the code and see it running.

## Getting started

Expand Down Expand Up @@ -85,7 +83,7 @@ If you get errors running the pytest command, there may be other installations n

The data provided (which is synthetic) can be used to demonstrate training the models. To run training you have two options

- step through the notebooks (for this to work you'll either need copy the two csv files from `data-synthetic`into your `data-public` folder or contact us for real patient data)
- step through the notebooks (for this to work you'll either need copy the two csv files from `data-synthetic`into your `data-public` folder or request access on [Zenodo](https://zenodo.org/records/14866057) to real patient data
- run a Python script using following commands (by default this will run with the synthetic data in its current location; you can change the `data_folder_name` parameter if you have the real data in `data-public`)

```sh
Expand All @@ -104,13 +102,12 @@ The data_folder_name specifies the name of the folder containing data. The funct

## About

This project was inspired by the [py-pi template](https://github.com/health-data-science-OR/pypi-template) developed by Tom Monks, and is developed in collaboration with the
[Centre for Advanced Research Computing](https://ucl.ac.uk/arc), University
College London.
This project was inspired by the [py-pi template](https://github.com/health-data-science-OR/pypi-template) developed by Tom Monks, and is based on a template developed by the
[Centre for Advanced Research Computing](https://ucl.ac.uk/arc), University College London.

### Project Team

Dr Zella King, Clinical Operational Research Unit (CORU), UCL ([zella.king@ucl.ac.uk](mailto:zella.king@ucl.ac.uk))
Dr Zella King, Clinical Operational Research Unit (CORU), University College London ([zella.king@ucl.ac.uk](mailto:zella.king@ucl.ac.uk))
Jon Gillham, Institute of Health Informatics, UCL
Professor Sonya Crowe, CORU
Professor Martin Utley, CORU
Expand Down
Loading