Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 13 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ This snapshot-based approach to predicting demand generalises to other aspects o
## Mathematical assumptions underlying the conversion from individual to group predictions:

- Independence of patient requirements: The package assumes that individual patient requirements (eg for admission) are conditionally independent.
- Bernoulli outcome model: Each patient outcome is modeled as a Bernoulli trial with its own probability, and the package computes a probability distribution for the sum of these independent trials.
- Bernoulli outcome model: Each patient outcome is modelled as a Bernoulli trial with its own probability, and the package computes a probability distribution for the sum of these independent trials.
- Different levels of aggregation: The package can calculate probability distributions for compound scenarios (such as the probability of a patient being admitted, assigned to a specific specialty if admitted, and being admitted within the prediction window) and for patient subgroups (like distributions by age or gender). In all cases, the independence assumption between patients is maintained.

## Getting started
Expand Down Expand Up @@ -107,19 +107,25 @@ pytest

If you get errors running the pytest command, there may be other installations needed on your local machine.

### Training models with data provided
### Using the notebooks in this repository

The data provided (which is synthetic) can be used to demonstrate training the models. To run training you have two options
The notebooks in this repository demonstrate the use of some of the functions provided in `patientflow`. The cell output shows the results of running the notebooks. If you want to run them yourself, you have two options

- step through the notebooks (for this to work you'll either need copy the two csv files from `data-synthetic`into your `data-public` folder or request access on [Zenodo](https://zenodo.org/records/14866057) to real patient data
- run a Python script using following commands (by default this will run with the synthetic data in its current location; you can change the `data_folder_name` parameter if you have the real data in `data-public`)
- step through the notebooks using the real patient datasets that were used to prepare them. For this you need to request access on [Zenodo](https://zenodo.org/records/14866057) to real patient data
- step through the notebooks using synthetic data. You will need to copy the two csv files from `data-synthetic`into your `data-public` folder or change the source in the each notebook. If you use synthetic data, you will not see the same cell output.

## About the UCLH implementation

This repository includes a set of notebooks (prefixed with 4) that show a fully worked example of the implementation of the patientflow package at University College London Hospitals (UCLH).As noted above, please request access to the UCLH dataset via Zenodo.

There is also a Python script that illustrates the training of the models that predict emergency demand at UCLH and saves them in your local environment using following commands (by default this will run with the synthetic data in its current location; change the `data_folder_name` parameter if you have downloaded the Zenodo dataset in `data-public`)

```sh
cd src
python -m patientflow.train.emergency_demand --data_folder_name=data-synthetic
```

The data_folder_name specifies the name of the folder containing data. The function expects this folder to be directly below the root of the repository
The `data_folder_name`argument specifies the name of the folder containing data. The function expects this folder to be directly below the root of the repository

## Roadmap

Expand All @@ -128,10 +134,6 @@ The data_folder_name specifies the name of the folder containing data. The funct
- [ ] Alpha Release
- [ ] Feature-Complete Release

## About

This idea to create a Python package was inspired by , and

### Project Team

- [Dr Zella King](https://github.com/zmek), Clinical Operational Research Unit (CORU), University College London ([zella.king@ucl.ac.uk](mailto:zella.king@ucl.ac.uk))
Expand All @@ -143,9 +145,4 @@ This idea to create a Python package was inspired by , and

The [py-pi template](https://github.com/health-data-science-OR/pypi-template) developed by [Tom Monks](https://github.com/TomMonks) inspired us to create a Python package. This repository is based on a template developed by the [Centre for Advanced Research Computing](https://ucl.ac.uk/arc), University College London. We are grateful to [Lawrence Lai](https://github.com/lawrencelai) for creation of the synthetic dataset. MAPS QR Policy Funding from by University College London contributed to the construction of the repository.

The underlying academic work was funded by grants from

- the Wellcome Institutional Strategic Support Fund (ISSF) UCL and Partner Hospitals: AI in Healthcare Funding Call 2019 (award number BRC717/HI/RW/101440),
- the National Institute for Health Research UCLH Biomedical Research Centre HIGODS Theme (award number BRC824/HG/ZK/110420)
- the National Institute for Health Research (Artificial Intelligence, Digitally adapted, hyper-local realtime bed forecasting to manage flow for NHS wards, AI_AWARD01786) and NHSX
- University College London Hospitals NHS Trust (Zetetic Benefits-Enhancing Data Science)
The development of this repository/package was funded by UCL's QR Policy Support Fund, which is funded by [Research England](https://www.ukri.org/councils/research-england/).
2 changes: 0 additions & 2 deletions data-dictionaries/ed_visits_data_dictionary.csv
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@ Variable type,Column Name,Data Type,Description,Whole dataset,Admitted,Not admit
not used in training,snapshot_date,object,"Date of visit, shifted by a random number of days",Date Range: 2031-03-01 - 2031-12-31,Date Range: 2031-03-01 - 2031-12-31,Date Range: 2031-03-01 - 2031-12-31
not used in training,prediction_time,object,The time of day at which the visit was observed,"Frequencies: {(12, 0): 19,085, (15, 30): 22,327, (22, 0): 18,827, (6, 0): 8,152, (9, 30): 11,411}","Frequencies: {(12, 0): 2,494, (15, 30): 3,731, (22, 0): 3,385, (6, 0): 1,526, (9, 30): 1,578}","Frequencies: {(12, 0): 16,591, (15, 30): 18,596, (22, 0): 15,442, (6, 0): 6,626, (9, 30): 9,833}"
not used in training,visit_number,float64,"Hospital visit number (replaced with fictional number, but consistent across visit snapshots is retained)",,,
not used in training,training_validation_test,object,"Whether visit snapshot is assigned to training, validation or test set","Frequencies: {test: 19,674, train: 53,651, valid: 6,477}","Frequencies: {test: 3,654, train: 7,959, valid: 1,101}","Frequencies: {test: 16,020, train: 45,692, valid: 5,376}"
not used in training,random_number,int64,A random number that will be used during model training to sample one visit snapshot per visit,"Range: 1 - 79801, Mean: 39708.82, Std Dev: 23067.15, NA: 0","Range: 1 - 79801, Mean: 39965.89, Std Dev: 23136.06, NA: 0","Range: 2 - 79801, Mean: 39660.10, Std Dev: 23053.91, NA: 0"
arrival and demographic,elapsed_los,float64,Elapsed time since patient arrived in ED (seconds),"Range: 0.00 - 230162.00, Mean: 13092.83, Std Dev: 15174.84, NA: 0","Range: 3.00 - 225135.00, Mean: 16434.85, Std Dev: 15723.98, NA: 0","Range: 0.00 - 230162.00, Mean: 12459.48, Std Dev: 14984.86, NA: 0"
arrival and demographic,sex,object,Sex of patient,"Frequencies: {F: 42,088, M: 37,714}","Frequencies: {F: 6,487, M: 6,227}","Frequencies: {F: 35,601, M: 31,487}"
arrival and demographic,age_group,category,Age group,"Frequencies: {0-17: 9,208, 18-24: 10,619, 25-34: 14,907, 35-44: 11,266, 45-54: 9,750, 55-64: 9,622, 65-74: 7,059, 75-102: 7,328, nan: 43}","Frequencies: {0-17: 1,048, 18-24: 883, 25-34: 1,506, 35-44: 1,383, 45-54: 1,396, 55-64: 1,894, 65-74: 1,835, 75-102: 2,762, nan: 7}","Frequencies: {0-17: 8,160, 18-24: 9,736, 25-34: 13,401, 35-44: 9,883, 45-54: 8,354, 55-64: 7,728, 65-74: 5,224, 75-102: 4,566, nan: 36}"
Expand Down
2 changes: 0 additions & 2 deletions data-dictionaries/ed_visits_data_dictionary.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
| not used in training | snapshot_date | object | Date of visit, shifted by a random number of days | Date Range: 2031-03-01 - 2031-12-31 | Date Range: 2031-03-01 - 2031-12-31 | Date Range: 2031-03-01 - 2031-12-31 |
| not used in training | prediction_time | object | The time of day at which the visit was observed | Frequencies: {(12, 0): 19,085, (15, 30): 22,327, (22, 0): 18,827, (6, 0): 8,152, (9, 30): 11,411} | Frequencies: {(12, 0): 2,494, (15, 30): 3,731, (22, 0): 3,385, (6, 0): 1,526, (9, 30): 1,578} | Frequencies: {(12, 0): 16,591, (15, 30): 18,596, (22, 0): 15,442, (6, 0): 6,626, (9, 30): 9,833} |
| not used in training | visit_number | float64 | Hospital visit number (replaced with fictional number, but consistent across visit snapshots is retained) | | | |
| not used in training | training_validation_test | object | Whether visit snapshot is assigned to training, validation or test set | Frequencies: {test: 19,674, train: 53,651, valid: 6,477} | Frequencies: {test: 3,654, train: 7,959, valid: 1,101} | Frequencies: {test: 16,020, train: 45,692, valid: 5,376} |
| not used in training | random_number | int64 | A random number that will be used during model training to sample one visit snapshot per visit | Range: 1 - 79801, Mean: 39708.82, Std Dev: 23067.15, NA: 0 | Range: 1 - 79801, Mean: 39965.89, Std Dev: 23136.06, NA: 0 | Range: 2 - 79801, Mean: 39660.10, Std Dev: 23053.91, NA: 0 |
| arrival and demographic | elapsed_los | float64 | Elapsed time since patient arrived in ED (seconds) | Range: 0.00 - 230162.00, Mean: 13092.83, Std Dev: 15174.84, NA: 0 | Range: 3.00 - 225135.00, Mean: 16434.85, Std Dev: 15723.98, NA: 0 | Range: 0.00 - 230162.00, Mean: 12459.48, Std Dev: 14984.86, NA: 0 |
| arrival and demographic | sex | object | Sex of patient | Frequencies: {F: 42,088, M: 37,714} | Frequencies: {F: 6,487, M: 6,227} | Frequencies: {F: 35,601, M: 31,487} |
| arrival and demographic | age_group | category | Age group | Frequencies: {0-17: 9,208, 18-24: 10,619, 25-34: 14,907, 35-44: 11,266, 45-54: 9,750, 55-64: 9,622, 65-74: 7,059, 75-102: 7,328, nan: 43} | Frequencies: {0-17: 1,048, 18-24: 883, 25-34: 1,506, 35-44: 1,383, 45-54: 1,396, 55-64: 1,894, 65-74: 1,835, 75-102: 2,762, nan: 7} | Frequencies: {0-17: 8,160, 18-24: 9,736, 25-34: 13,401, 35-44: 9,883, 45-54: 8,354, 55-64: 7,728, 65-74: 5,224, 75-102: 4,566, nan: 36} |
Expand Down
Loading