[xegpu] Add matmul cost model and tile size selector by tkarna · Pull Request #156 · llvm/lighthouse

tkarna · 2026-05-20T13:53:00Z

Extends the XeGPU MLP/matmul schedule with a cost model that can be used to generate valid tile sizes for given (M, N, K) matmul shape.

Adds XeGPUSpecs: Object that contains GPU specifications required by the cost model.
Adds XeGPUParameterSelector: Param selector is now a class that uses XeGPUSpecs and can generate valid tile size configurations if (M, N, K) case is not found in the existing parameter JSON file.
mlp_schedule still takes a list of param dicts, one for each layer. Only "m", "n", "k" entries are required however; if any parameter is missing, XeGPUParameterSelector is called to populate the tile sizes.
Adds matmul cost model routines:
- Given matmul shape (M, N, K), the cost model routine generate_configs generates valid workgroup, subgroup, and k tile size configurations and estimates their performance based on a simple roofline model. Returns configs sorted by estimated performance.
- generate_prefetch_tiles generates all valid thread cooperative prefetch strategies, sorted by the number of cooperative threads. No performance estimate is provided.
- Simple heuristic to generate tile sizes if they are not given: Take the best WG, SG, K configuration based on cost model estimate, take one of the prefetch configurations, and use the DPAS instruction shape for A and B load tiles.

Currently data types are assumed to be float16 and float32 for A/B and C, respectively. To be generalized later.

We can now execute any nicely-shaped matrix multiplication without the need to define tile sizes. If the matmul is compute-bound performance should be decent.

python matmul.py --sizes 512 8192 128 --check-result -v

Copilot

Pull request overview

Adds a XeGPU matmul tile-parameter cost model plus device-spec plumbing, and updates the XeGPU matmul/MLP scheduling path (and examples) to auto-populate missing tiling parameters when a shape is not present in the JSON parameter DB.

Changes:

Introduces XeGPUSpecs (device specs DB) and a matmul_costmodel grid-search/roofline estimator to rank valid tiling configs.
Replaces the old function-based parameter selector with XeGPUParameterSelector, and wires mlp_schedule to generate/fill missing tiling parameters per layer.
Refactors examples to pass only required (m,n,k) (plus optional --target) and rely on the schedule to complete parameters; reuses centralized constraint checks.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
lighthouse/schedule/xegpu/xegpu_specs.py	Adds device-spec DB and `XeGPUSpecs` used by the cost model/selector.
lighthouse/schedule/xegpu/xegpu_parameter_selector.py	Implements class-based param selection with JSON DB lookup + cost-model fallback.
lighthouse/schedule/xegpu/mlp_schedule.py	Adds pre-processing to auto-fill missing layer tile parameters via selector; moves constants to shared constraints module.
lighthouse/schedule/xegpu/matmul_costmodel.py	Adds config generation + simple roofline-based performance estimation.
lighthouse/schedule/xegpu/matmul_constraints.py	Centralizes tiling/prefetch validity checks and shared constants.
lighthouse/schedule/xegpu/init.py	Exposes new selector/specs/constraint helper via package exports.
examples/xegpu/tune_matmul_gridsearch.py	Switches to shared `check_constraints` and adds GPU target selection.
examples/xegpu/torch_matmul.py	Simplifies parameter init to (m,n,k) + optional target; removes legacy selector usage.
examples/xegpu/mlp.py	Passes per-layer (m,n,k) (and optional target) and relies on schedule for completion.
examples/xegpu/matmul.py	Passes (m,n,k) (and optional target) and relies on schedule for completion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rengolin

Are you matching the perf from the GA auto-tuner?

side comment: would be nice to merge the matmul and the mlp python files. Just make matmul an mlp without element-wise.

@adam-smnk

tkarna · 2026-05-22T10:04:26Z

Are you matching the perf from the GA auto-tuner?

The method used in XeGPUParameterSelector is within ~5% of the best known config for compute bound cases. For memory-bound cases, the gap can be up to ~50% in some cases. We cannot really do better without tuning the prefetch and load tile configs.

The next PR will introduce a tuning method that uses this cost model to generate the search candidates. It's always better or equal to what the GA tuning produces.

tkarna · 2026-05-22T10:06:39Z

side comment: would be nice to merge the matmul and the mlp python files. Just make matmul an mlp without element-wise.

You mean the matmul and mlp examples? They're separate mainly because the convenient CLI is a little different in these two cases. We could refactor the common bits for sure. I'd leave this for anthoer PR to keep the diff clearer to review.

rengolin · 2026-05-22T10:30:06Z

side comment: would be nice to merge the matmul and the mlp python files. Just make matmul an mlp without element-wise.

You mean the matmul and mlp examples? They're separate mainly because the convenient CLI is a little different in these two cases. We could refactor the common bits for sure. I'd leave this for anthoer PR to keep the diff clearer to review.

Another PR for sure. Even move this to Kernel Bench would be preferable than refactoring.

Add xegpu matmul cost model

bfe39b8

tkarna requested review from adam-smnk, Copilot and rengolin May 20, 2026 13:53

Copilot started reviewing on behalf of tkarna May 20, 2026 13:53 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

tkarna force-pushed the xegpu-costmodel branch from e09cd97 to 62898c1 Compare May 20, 2026 17:24

tkarna added 2 commits May 20, 2026 20:31

copilot comments

b23c7ab

cost model: simplify generate_configs

7fd5c21

tkarna force-pushed the xegpu-costmodel branch from 62898c1 to 7fd5c21 Compare May 20, 2026 17:32

matmul example: add test for custom not pre-tuned shape

f91d352

rengolin reviewed May 22, 2026

View reviewed changes

rengolin approved these changes May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[xegpu] Add matmul cost model and tile size selector#156

[xegpu] Add matmul cost model and tile size selector#156
tkarna wants to merge 4 commits into
llvm:mainfrom
tkarna:xegpu-costmodel

tkarna commented May 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rengolin left a comment

Uh oh!

tkarna commented May 22, 2026

Uh oh!

tkarna commented May 22, 2026

Uh oh!

rengolin commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tkarna commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rengolin left a comment

Choose a reason for hiding this comment

Uh oh!

tkarna commented May 22, 2026

Uh oh!

tkarna commented May 22, 2026

Uh oh!

rengolin commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tkarna commented May 20, 2026 •

edited

Loading