Skip to content

Commit 702d73e

Browse files
authored
Merge pull request #36437 from apache/inference-benchmark-readme
Add readme How to add a new ML benchmark pipeline
2 parents e410e34 + 1b25848 commit 702d73e

1 file changed

Lines changed: 97 additions & 3 deletions

File tree

  • sdks/python/apache_beam/testing/benchmarks/inference

sdks/python/apache_beam/testing/benchmarks/inference/README.md

Lines changed: 97 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,16 @@
2121

2222
This module contains benchmarks used to test the performance of the RunInference transform
2323
running inference with common models and frameworks. Each benchmark is explained in detail
24-
below. Beam's performance over time can be viewed at http://s.apache.org/beam-community-metrics/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1
24+
below. Beam's performance over time can be viewed at https://beam.apache.org/performance/.
25+
26+
All the performance tests are defined at [beam_Inference_Python_Benchmarks_Dataflow.yml](https://github.com/apache/beam/blob/master/.github/workflows/beam_Inference_Python_Benchmarks_Dataflow.yml).
2527

2628
## Pytorch RunInference Image Classification 50K
2729

2830
The Pytorch RunInference Image Classification 50K benchmark runs an
2931
[example image classification pipeline](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_image_classification.py)
3032
using various different resnet image classification models (the benchmarks on
31-
[Beam's dashboard](http://s.apache.org/beam-community-metrics/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1)
33+
[Beam's dashboard](https://metrics.beam.apache.org/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1)
3234
display [resnet101](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet101.html) and [resnet152](https://pytorch.org/vision/stable/models/generated/torchvision.models.resnet152.html))
3335
against 50,000 example images from the OpenImage dataset. The benchmarks produce
3436
the following metrics:
@@ -100,4 +102,96 @@ Approximate size of the models used in the tests
100102
* bert-base-uncased: 417.7 MB
101103
* bert-large-uncased: 1.2 GB
102104

103-
All the performance tests are defined at [job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy).
105+
## PyTorch Sentiment Analysis DistilBERT base
106+
107+
**Model**: PyTorch Sentiment Analysis — DistilBERT (base-uncased)
108+
**Accelerator**: CPU only
109+
**Host**: 20 × n1-standard-2 (2 vCPUs, 7.5 GB RAM)
110+
111+
Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_sentiment_streaming.py).
112+
113+
## VLLM Gemma 2b Batch Performance on Tesla T4
114+
115+
**Model**: google/gemma-2b-it
116+
**Accelerator**: NVIDIA Tesla T4 GPU
117+
**Host**: 3 × n1-standard-8 (8 vCPUs, 30 GB RAM)
118+
119+
Full pipeline implementation is available [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/vllm_gemma_batch.py).
120+
121+
## How to add a new ML benchmark pipeline
122+
123+
1. Create the pipeline implementation
124+
125+
- Location: sdks/python/apache_beam/examples/inference (e.g., pytorch_sentiment.py)
126+
- Define CLI args and the logic
127+
- Keep parameter names consistent (e.g., --bq_project, --bq_dataset, --metrics_table).
128+
129+
2. Create the benchmark implementation
130+
131+
- Location: sdks/python/apache_beam/testing/benchmarks/inference (e.g., pytorch_sentiment_benchmarks.py)
132+
- Inherit from DataflowCostBenchmark class.
133+
- Ensure the 'pcollection' parameter is passed to the `DataflowCostBenchmark` constructor. This is the name of the PCollection for which to measure throughput, and you can find this name in the Dataflow UI job graph.
134+
- Keep naming consistent with other benchmarks.
135+
136+
3. Add an options txt file
137+
138+
- Location: .github/workflows/load-tests-pipeline-options/<pipeline_name>.txt
139+
- Include Dataflow and pipeline flags. Example:
140+
141+
```
142+
--region=us-central1
143+
--machine_type=n1-standard-2
144+
--num_workers=75
145+
--disk_size_gb=50
146+
--autoscaling_algorithm=NONE
147+
--staging_location=gs://temp-storage-for-perf-tests/loadtests
148+
--temp_location=gs://temp-storage-for-perf-tests/loadtests
149+
--requirements_file=apache_beam/ml/inference/your-requirements-file.txt
150+
--publish_to_big_query=true
151+
--metrics_dataset=beam_run_inference
152+
--metrics_table=your_table
153+
--influx_measurement=your-measurement
154+
--device=CPU
155+
--runner=DataflowRunner
156+
```
157+
158+
4. Wire it into the GitHub Action
159+
160+
- Workflow: .github/workflows/beam_Inference_Python_Benchmarks_Dataflow.yml
161+
- Add your argument-file-path to the matrix.
162+
- Add a step that runs your <pipeline_name>_benchmarks.py with -PloadTest.args=$YOUR_ARGUMENTS. Which are the arguments created in previous step.
163+
164+
5. Test on your fork
165+
166+
- Trigger the workflow manually.
167+
- Confirm the Dataflow job completes successfully.
168+
169+
6. Verify metrics in BigQuery
170+
171+
- Dataset: beam_run_inference. Table: your_table
172+
- Confirm new rows for your pipeline_name with recent timestamps.
173+
174+
7. Update the website
175+
176+
- Create: website/www/site/content/en/performance/<pipeline_name>/_index.md (short title/description).
177+
- Update: website/www/site/data/performance.yaml — add your pipeline and five chart entries with:
178+
- looker_folder_id
179+
- public_slug_id (from Looker, see below)
180+
181+
8. Create Looker content (5 charts)
182+
183+
- In Looker → Shared folders → run_inference: create a subfolder for your pipeline.
184+
- From an existing chart: Development mode → Explore from here → Go to LookML.
185+
- Point to your table/view and create 5 standard charts (latency/throughput/cost/etc.).
186+
- Save changes → Publish to production.
187+
- From Explore, open each, set fields/filters for your pipeline, Run, then Save as Look (in your folder).
188+
- Open each Look:
189+
- Copy Look ID
190+
- Add Look IDs to .test-infra/tools/refresh_looker_metrics.py.
191+
- Exit Development mode → Edit Settings → Allow public access.
192+
- Copy public_slug_id and paste into website/performance.yml.
193+
- Run .test-infra/tools/refresh_looker_metrics.py script or manually download as PNG via the public slug and upload to GCS: gs://public_looker_explores_us_a3853f40/FOLDER_ID/<look_slug>.png
194+
195+
9. Open a PR
196+
197+
- Example: https://github.com/apache/beam/pull/34577

0 commit comments

Comments
 (0)