|
21 | 21 |
|
22 | 22 | This module contains benchmarks used to test the performance of the RunInference transform |
23 | 23 | running inference with common models and frameworks. Each benchmark is explained in detail |
24 | | -below. Beam's performance over time can be viewed at http://s.apache.org/beam-community-metrics/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1 |
| 24 | +below. Beam's performance over time can be viewed at https://beam.apache.org/performance/. |
| 25 | + |
| 26 | +All the performance tests are defined at [beam_Inference_Python_Benchmarks_Dataflow.yml](https://github.com/apache/beam/blob/master/.github/workflows/beam_Inference_Python_Benchmarks_Dataflow.yml). |
25 | 27 |
|
26 | 28 | ## Pytorch RunInference Image Classification 50K |
27 | 29 |
|
28 | 30 | The Pytorch RunInference Image Classification 50K benchmark runs an |
29 | 31 | [example image classification pipeline](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_image_classification.py) |
30 | 32 | using various different resnet image classification models (the benchmarks on |
31 | | -[Beam's dashboard](http://s.apache.org/beam-community-metrics/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1) |
| 33 | +[Beam's dashboard](https://metrics.beam.apache.org/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1) |
32 | 34 | display [resnet101](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet101.html) and [resnet152](https://pytorch.org/vision/stable/models/generated/torchvision.models.resnet152.html)) |
33 | 35 | against 50,000 example images from the OpenImage dataset. The benchmarks produce |
34 | 36 | the following metrics: |
@@ -100,4 +102,96 @@ Approximate size of the models used in the tests |
100 | 102 | * bert-base-uncased: 417.7 MB |
101 | 103 | * bert-large-uncased: 1.2 GB |
102 | 104 |
|
103 | | -All the performance tests are defined at [job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy). |
| 105 | +## PyTorch Sentiment Analysis DistilBERT base |
| 106 | + |
| 107 | +**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased) |
| 108 | +**Accelerator**: CPU only |
| 109 | +**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM) |
| 110 | + |
| 111 | +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py). |
| 112 | + |
| 113 | +## VLLM Gemma 2b Batch Performance on Tesla T4 |
| 114 | + |
| 115 | +**Model**: google/gemma-2b-it |
| 116 | +**Accelerator**: NVIDIA Tesla T4 GPU |
| 117 | +**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM) |
| 118 | + |
| 119 | +Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py). |
| 120 | + |
| 121 | +## How to add a new ML benchmark pipeline |
| 122 | + |
| 123 | +1. Create the pipeline implementation |
| 124 | + |
| 125 | +- Location: sdks/python/apache_beam/examples/inference (e.g., pytorch_sentiment.py) |
| 126 | +- Define CLI args and the logic |
| 127 | +- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, --metrics_table). |
| 128 | + |
| 129 | +2. Create the benchmark implementation |
| 130 | + |
| 131 | +- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., pytorch_sentiment_benchmarks.py) |
| 132 | +- Inherit from DataflowCostBenchmark class. |
| 133 | +- Ensure the 'pcollection' parameter is passed to the `DataflowCostBenchmark` constructor. This is the name of the PCollection for which to measure throughput, and you can find this name in the Dataflow UI job graph. |
| 134 | +- Keep naming consistent with other benchmarks. |
| 135 | + |
| 136 | +3. Add an options txt file |
| 137 | + |
| 138 | +- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt |
| 139 | +- Include Dataflow and pipeline flags. Example: |
| 140 | + |
| 141 | +``` |
| 142 | +--region=us-central1 |
| 143 | +--machine_type=n1-standard-2 |
| 144 | +--num_workers=75 |
| 145 | +--disk_size_gb=50 |
| 146 | +--autoscaling_algorithm=NONE |
| 147 | +--staging_location=gs://temp-storage-for-perf-tests/loadtests |
| 148 | +--temp_location=gs://temp-storage-for-perf-tests/loadtests |
| 149 | +--requirements_file=apache_beam/ml/inference/your-requirements-file.txt |
| 150 | +--publish_to_big_query=true |
| 151 | +--metrics_dataset=beam_run_inference |
| 152 | +--metrics_table=your_table |
| 153 | +--influx_measurement=your-measurement |
| 154 | +--device=CPU |
| 155 | +--runner=DataflowRunner |
| 156 | +``` |
| 157 | + |
| 158 | +4. Wire it into the GitHub Action |
| 159 | + |
| 160 | +- Workflow: .github/workflows/beam_Inference_Python_Benchmarks_Dataflow.yml |
| 161 | +- Add your argument-file-path to the matrix. |
| 162 | +- Add a step that runs your <pipeline_name>_benchmarks.py with -PloadTest.args=$YOUR_ARGUMENTS. Which are the arguments created in previous step. |
| 163 | + |
| 164 | +5. Test on your fork |
| 165 | + |
| 166 | +- Trigger the workflow manually. |
| 167 | +- Confirm the Dataflow job completes successfully. |
| 168 | + |
| 169 | +6. Verify metrics in BigQuery |
| 170 | + |
| 171 | +- Dataset: beam_run_inference. Table: your_table |
| 172 | +- Confirm new rows for your pipeline_name with recent timestamps. |
| 173 | + |
| 174 | +7. Update the website |
| 175 | + |
| 176 | +- Create: website/www/site/content/en/performance/<pipeline_name>/_index.md (short title/description). |
| 177 | +- Update: website/www/site/data/performance.yaml — add your pipeline and five chart entries with: |
| 178 | + - looker_folder_id |
| 179 | + - public_slug_id (from Looker, see below) |
| 180 | + |
| 181 | +8. Create Looker content (5 charts) |
| 182 | + |
| 183 | +- In Looker → Shared folders → run_inference: create a subfolder for your pipeline. |
| 184 | +- From an existing chart: Development mode → Explore from here → Go to LookML. |
| 185 | +- Point to your table/view and create 5 standard charts (latency/throughput/cost/etc.). |
| 186 | +- Save changes → Publish to production. |
| 187 | +- From Explore, open each, set fields/filters for your pipeline, Run, then Save as Look (in your folder). |
| 188 | +- Open each Look: |
| 189 | + - Copy Look ID |
| 190 | + - Add Look IDs to .test-infra/tools/refresh_looker_metrics.py. |
| 191 | + - Exit Development mode → Edit Settings → Allow public access. |
| 192 | + - Copy public_slug_id and paste into website/performance.yml. |
| 193 | + - Run .test-infra/tools/refresh_looker_metrics.py script or manually download as PNG via the public slug and upload to GCS: gs://public_looker_explores_us_a3853f40/FOLDER_ID/<look_slug>.png |
| 194 | + |
| 195 | +9. Open a PR |
| 196 | + |
| 197 | +- Example: https://github.com/apache/beam/pull/34577 |
0 commit comments