UCL-CORU · zmek · Mar 31, 2025 · Mar 27, 2025 · Mar 28, 2025 · Mar 29, 2025
@@ -75,7 +75,7 @@ This snapshot-based approach to predicting demand generalises to other aspects o
 ## Mathematical assumptions underlying the conversion from individual to group predictions:
 
 - Independence of patient requirements: The package assumes that individual patient requirements (eg for admission) are conditionally independent.
-- Bernoulli outcome model: Each patient outcome is modeled as a Bernoulli trial with its own probability, and the package computes a probability distribution for the sum of these independent trials.
+- Bernoulli outcome model: Each patient outcome is modelled as a Bernoulli trial with its own probability, and the package computes a probability distribution for the sum of these independent trials.
 - Different levels of aggregation: The package can calculate probability distributions for compound scenarios (such as the probability of a patient being admitted, assigned to a specific specialty if admitted, and being admitted within the prediction window) and for patient subgroups (like distributions by age or gender). In all cases, the independence assumption between patients is maintained.
 
 ## Getting started
@@ -107,19 +107,25 @@ pytest
 
 If you get errors running the pytest command, there may be other installations needed on your local machine.
 
-### Training models with data provided
+### Using the notebooks in this repository
 
-The data provided (which is synthetic) can be used to demonstrate training the models. To run training you have two options
+The notebooks in this repository demonstrate the use of some of the functions provided in `patientflow`. The cell output shows the results of running the notebooks. If you want to run them yourself, you have two options
 
-- step through the notebooks (for this to work you'll either need copy the two csv files from `data-synthetic`into your `data-public` folder or request access on [Zenodo](https://zenodo.org/records/14866057) to real patient data
-- run a Python script using following commands (by default this will run with the synthetic data in its current location; you can change the `data_folder_name` parameter if you have the real data in `data-public`)
+- step through the notebooks using the real patient datasets that were used to prepare them. For this you need to request access on [Zenodo](https://zenodo.org/records/14866057) to real patient data
+- step through the notebooks using synthetic data. You will need to copy the two csv files from `data-synthetic`into your `data-public` folder or change the source in the each notebook. If you use synthetic data, you will not see the same cell output.
+
+## About the UCLH implementation
+
+This repository includes a set of notebooks (prefixed with 4) that show a fully worked example of the implementation of the patientflow package at University College London Hospitals (UCLH).As noted above, please request access to the UCLH dataset via Zenodo.
+
+There is also a Python script that illustrates the training of the models that predict emergency demand at UCLH and saves them in your local environment using following commands (by default this will run with the synthetic data in its current location; change the `data_folder_name` parameter if you have downloaded the Zenodo dataset in `data-public`)
 
 ```sh
 cd src
 python -m patientflow.train.emergency_demand --data_folder_name=data-synthetic
 ```
 
-The data_folder_name specifies the name of the folder containing data. The function expects this folder to be directly below the root of the repository
+The `data_folder_name`argument specifies the name of the folder containing data. The function expects this folder to be directly below the root of the repository
 
 ## Roadmap
 
@@ -128,10 +134,6 @@ The data_folder_name specifies the name of the folder containing data. The funct
 - [ ] Alpha Release
 - [ ] Feature-Complete Release
 
-## About
-
-This idea to create a Python package was inspired by , and
-
 ### Project Team
 
 - [Dr Zella King](https://github.com/zmek), Clinical Operational Research Unit (CORU), University College London ([zella.king@ucl.ac.uk](mailto:zella.king@ucl.ac.uk))
@@ -143,9 +145,4 @@ This idea to create a Python package was inspired by , and
 
 The [py-pi template](https://github.com/health-data-science-OR/pypi-template) developed by [Tom Monks](https://github.com/TomMonks) inspired us to create a Python package. This repository is based on a template developed by the [Centre for Advanced Research Computing](https://ucl.ac.uk/arc), University College London. We are grateful to [Lawrence Lai](https://github.com/lawrencelai) for creation of the synthetic dataset. MAPS QR Policy Funding from by University College London contributed to the construction of the repository.
 
-The underlying academic work was funded by grants from
-
-- the Wellcome Institutional Strategic Support Fund (ISSF) UCL and Partner Hospitals: AI in Healthcare Funding Call 2019 (award number BRC717/HI/RW/101440),
-- the National Institute for Health Research UCLH Biomedical Research Centre HIGODS Theme (award number BRC824/HG/ZK/110420)
-- the National Institute for Health Research (Artificial Intelligence, Digitally adapted, hyper-local realtime bed forecasting to manage flow for NHS wards, AI_AWARD01786) and NHSX
-- University College London Hospitals NHS Trust (Zetetic Benefits-Enhancing Data Science)
+The development of this repository/package was funded by UCL's QR Policy Support Fund, which is funded by [Research England](https://www.ukri.org/councils/research-england/).
@@ -2,8 +2,6 @@ Variable type,Column Name,Data Type,Description,Whole dataset,Admitted,Not admit
 not used in training,snapshot_date,object,"Date of visit, shifted by a random number of days",Date Range: 2031-03-01 - 2031-12-31,Date Range: 2031-03-01 - 2031-12-31,Date Range: 2031-03-01 - 2031-12-31
 not used in training,prediction_time,object,The time of day at which the visit was observed,"Frequencies: {(12, 0): 19,085, (15, 30): 22,327, (22, 0): 18,827, (6, 0): 8,152, (9, 30): 11,411}","Frequencies: {(12, 0): 2,494, (15, 30): 3,731, (22, 0): 3,385, (6, 0): 1,526, (9, 30): 1,578}","Frequencies: {(12, 0): 16,591, (15, 30): 18,596, (22, 0): 15,442, (6, 0): 6,626, (9, 30): 9,833}"
 not used in training,visit_number,float64,"Hospital visit number (replaced with fictional number, but consistent across visit snapshots is retained)",,,
-not used in training,training_validation_test,object,"Whether visit snapshot is assigned to training, validation or test set","Frequencies: {test: 19,674, train: 53,651, valid: 6,477}","Frequencies: {test: 3,654, train: 7,959, valid: 1,101}","Frequencies: {test: 16,020, train: 45,692, valid: 5,376}"
-not used in training,random_number,int64,A random number that will be used during model training to sample one visit snapshot per visit,"Range: 1 - 79801, Mean: 39708.82, Std Dev: 23067.15, NA: 0","Range: 1 - 79801, Mean: 39965.89, Std Dev: 23136.06, NA: 0","Range: 2 - 79801, Mean: 39660.10, Std Dev: 23053.91, NA: 0"
 arrival and demographic,elapsed_los,float64,Elapsed time since patient arrived in ED (seconds),"Range: 0.00 - 230162.00,  Mean: 13092.83, Std Dev: 15174.84, NA: 0","Range: 3.00 - 225135.00,  Mean: 16434.85, Std Dev: 15723.98, NA: 0","Range: 0.00 - 230162.00,  Mean: 12459.48, Std Dev: 14984.86, NA: 0"
 arrival and demographic,sex,object,Sex of patient,"Frequencies: {F: 42,088, M: 37,714}","Frequencies: {F: 6,487, M: 6,227}","Frequencies: {F: 35,601, M: 31,487}"
 arrival and demographic,age_group,category,Age group,"Frequencies: {0-17: 9,208, 18-24: 10,619, 25-34: 14,907, 35-44: 11,266, 45-54: 9,750, 55-64: 9,622, 65-74: 7,059, 75-102: 7,328, nan: 43}","Frequencies: {0-17: 1,048, 18-24: 883, 25-34: 1,506, 35-44: 1,383, 45-54: 1,396, 55-64: 1,894, 65-74: 1,835, 75-102: 2,762, nan: 7}","Frequencies: {0-17: 8,160, 18-24: 9,736, 25-34: 13,401, 35-44: 9,883, 45-54: 8,354, 55-64: 7,728, 65-74: 5,224, 75-102: 4,566, nan: 36}"

@@ -3,8 +3,6 @@
 | not used in training    | snapshot_date                                  | object    | Date of visit, shifted by a random number of days                                                         | Date Range: 2031-03-01 - 2031-12-31                                                                                                                                                                                                                                                                                                                    | Date Range: 2031-03-01 - 2031-12-31                                                                                                                                                                                                                                                                                                        | Date Range: 2031-03-01 - 2031-12-31                                                                                                                                                                                                                                                                                                                  |
 | not used in training    | prediction_time                                | object    | The time of day at which the visit was observed                                                           | Frequencies: {(12, 0): 19,085, (15, 30): 22,327, (22, 0): 18,827, (6, 0): 8,152, (9, 30): 11,411}                                                                                                                                                                                                                                                      | Frequencies: {(12, 0): 2,494, (15, 30): 3,731, (22, 0): 3,385, (6, 0): 1,526, (9, 30): 1,578}                                                                                                                                                                                                                                              | Frequencies: {(12, 0): 16,591, (15, 30): 18,596, (22, 0): 15,442, (6, 0): 6,626, (9, 30): 9,833}                                                                                                                                                                                                                                                     |
 | not used in training    | visit_number                                   | float64   | Hospital visit number (replaced with fictional number, but consistent across visit snapshots is retained) |                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                      |
-| not used in training    | training_validation_test                       | object    | Whether visit snapshot is assigned to training, validation or test set                                    | Frequencies: {test: 19,674, train: 53,651, valid: 6,477}                                                                                                                                                                                                                                                                                               | Frequencies: {test: 3,654, train: 7,959, valid: 1,101}                                                                                                                                                                                                                                                                                     | Frequencies: {test: 16,020, train: 45,692, valid: 5,376}                                                                                                                                                                                                                                                                                             |
-| not used in training    | random_number                                  | int64     | A random number that will be used during model training to sample one visit snapshot per visit            | Range: 1 - 79801, Mean: 39708.82, Std Dev: 23067.15, NA: 0                                                                                                                                                                                                                                                                                             | Range: 1 - 79801, Mean: 39965.89, Std Dev: 23136.06, NA: 0                                                                                                                                                                                                                                                                                 | Range: 2 - 79801, Mean: 39660.10, Std Dev: 23053.91, NA: 0                                                                                                                                                                                                                                                                                           |
 | arrival and demographic | elapsed_los                                    | float64   | Elapsed time since patient arrived in ED (seconds)                                                        | Range: 0.00 - 230162.00, Mean: 13092.83, Std Dev: 15174.84, NA: 0                                                                                                                                                                                                                                                                                      | Range: 3.00 - 225135.00, Mean: 16434.85, Std Dev: 15723.98, NA: 0                                                                                                                                                                                                                                                                          | Range: 0.00 - 230162.00, Mean: 12459.48, Std Dev: 14984.86, NA: 0                                                                                                                                                                                                                                                                                    |
 | arrival and demographic | sex                                            | object    | Sex of patient                                                                                            | Frequencies: {F: 42,088, M: 37,714}                                                                                                                                                                                                                                                                                                                    | Frequencies: {F: 6,487, M: 6,227}                                                                                                                                                                                                                                                                                                          | Frequencies: {F: 35,601, M: 31,487}                                                                                                                                                                                                                                                                                                                  |
 | arrival and demographic | age_group                                      | category  | Age group                                                                                                 | Frequencies: {0-17: 9,208, 18-24: 10,619, 25-34: 14,907, 35-44: 11,266, 45-54: 9,750, 55-64: 9,622, 65-74: 7,059, 75-102: 7,328, nan: 43}                                                                                                                                                                                                              | Frequencies: {0-17: 1,048, 18-24: 883, 25-34: 1,506, 35-44: 1,383, 45-54: 1,396, 55-64: 1,894, 65-74: 1,835, 75-102: 2,762, nan: 7}                                                                                                                                                                                                        | Frequencies: {0-17: 8,160, 18-24: 9,736, 25-34: 13,401, 35-44: 9,883, 45-54: 8,354, 55-64: 7,728, 65-74: 5,224, 75-102: 4,566, nan: 36}                                                                                                                                                                                                              |