-
Notifications
You must be signed in to change notification settings - Fork 4.6k
[runners-spark] Add Spark 4 runner #38255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 39 commits
1800a04
435b395
335c340
7b2745b
b5de876
792801e
2c4c9ca
1d38a7f
3709a9c
32947d3
ce7701b
e2e0135
8762e66
87e90dc
864114b
83b3c8e
a665cf3
1e2c8d7
d0960eb
1e14c83
3b6dae2
5636e59
2bbd22b
5bfcdd9
4f6f47f
a08517b
b8d49a1
3a29b47
3631077
ac00cc9
16639c8
8ebed02
08eff30
aa4e68e
f2d27e4
17aae9b
8a55bb2
467897d
2bf3607
58bd932
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| { | ||
| "comment": "Modify this file in a trivial way to cause this test suite to run" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| { | ||
| "comment": "Modify this file in a trivial way to cause this test suite to run" | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one or more | ||
| # contributor license agreements. See the NOTICE file distributed with | ||
| # this work for additional information regarding copyright ownership. | ||
| # The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| # (the "License"); you may not use this file except in compliance with | ||
| # the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| name: PostCommit Java ValidatesRunner Spark4 StructuredStreaming | ||
|
|
||
| on: | ||
| schedule: | ||
| - cron: '45 4/6 * * *' | ||
| pull_request_target: | ||
| paths: ['release/trigger_all_tests.json', '.github/trigger_files/beam_PostCommit_Java_ValidatesRunner_Spark4StructuredStreaming.json'] | ||
| workflow_dispatch: | ||
|
|
||
| #Setting explicit permissions for the action to avoid the default permissions which are `write-all` in case of pull_request_target event | ||
| permissions: | ||
| actions: write | ||
| pull-requests: write | ||
| checks: write | ||
| contents: read | ||
| deployments: read | ||
| id-token: none | ||
| issues: write | ||
| discussions: read | ||
| packages: read | ||
| pages: read | ||
| repository-projects: read | ||
| security-events: read | ||
| statuses: read | ||
|
|
||
| # This allows a subsequently queued workflow run to interrupt previous runs | ||
| concurrency: | ||
| group: '${{ github.workflow }} @ ${{ github.event.pull_request.number || github.sha || github.head_ref || github.ref }}-${{ github.event.schedule || github.event.comment.id || github.event.sender.login }}' | ||
| cancel-in-progress: true | ||
|
|
||
| env: | ||
| DEVELOCITY_ACCESS_KEY: ${{ secrets.DEVELOCITY_ACCESS_KEY }} | ||
| GRADLE_ENTERPRISE_CACHE_USERNAME: ${{ secrets.GE_CACHE_USERNAME }} | ||
| GRADLE_ENTERPRISE_CACHE_PASSWORD: ${{ secrets.GE_CACHE_PASSWORD }} | ||
|
|
||
| jobs: | ||
| beam_PostCommit_Java_ValidatesRunner_Spark4StructuredStreaming: | ||
| name: ${{ matrix.job_name }} (${{ matrix.job_phrase }}) | ||
| runs-on: [self-hosted, ubuntu-24.04, main] | ||
| timeout-minutes: 120 | ||
| strategy: | ||
| matrix: | ||
| job_name: [beam_PostCommit_Java_ValidatesRunner_Spark4StructuredStreaming] | ||
| job_phrase: [Run Spark4 StructuredStreaming ValidatesRunner] | ||
| if: | | ||
| github.event_name == 'workflow_dispatch' || | ||
| github.event_name == 'pull_request_target' || | ||
| (github.event_name == 'schedule' && github.repository == 'apache/beam') || | ||
| github.event.comment.body == 'Run Spark4 StructuredStreaming ValidatesRunner' | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - name: Setup repository | ||
| uses: ./.github/actions/setup-action | ||
| with: | ||
| comment_phrase: ${{ matrix.job_phrase }} | ||
| github_token: ${{ secrets.GITHUB_TOKEN }} | ||
| github_job: ${{ matrix.job_name }} (${{ matrix.job_phrase }}) | ||
| - name: Setup environment | ||
| uses: ./.github/actions/setup-environment-action | ||
| with: | ||
| java-version: '17' | ||
| - name: run validatesStructuredStreamingRunnerBatch script | ||
| uses: ./.github/actions/gradle-command-self-hosted-action | ||
| with: | ||
| gradle-command: :runners:spark:4:validatesStructuredStreamingRunnerBatch | ||
| arguments: | | ||
| -PtestJavaVersion=17 \ | ||
| -PdisableSpotlessCheck=true \ | ||
| - name: Archive JUnit Test Results | ||
| uses: actions/upload-artifact@v4 | ||
| if: ${{ !success() }} | ||
| with: | ||
| name: JUnit Test Results | ||
| path: "**/build/reports/tests/" | ||
| - name: Publish JUnit Test Results | ||
| uses: EnricoMi/publish-unit-test-result-action@v2 | ||
| if: always() | ||
| with: | ||
| commit: '${{ env.prsha || env.GITHUB_SHA }}' | ||
| comment_mode: ${{ github.event_name == 'issue_comment' && 'always' || 'off' }} | ||
| files: '**/build/test-results/**/*.xml' | ||
| large_files: true |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| name: PreCommit Java Spark4 Versions | ||
|
|
||
| on: | ||
| push: | ||
| tags: ['v*'] | ||
| branches: ['master', 'release-*'] | ||
| paths: | ||
| - 'runners/spark/**' | ||
| - '.github/workflows/beam_PreCommit_Java_Spark4_Versions.yml' | ||
| pull_request_target: | ||
| branches: ['master', 'release-*'] | ||
| paths: | ||
| - 'runners/spark/**' | ||
| - 'release/trigger_all_tests.json' | ||
| - '.github/trigger_files/beam_PreCommit_Java_Spark4_Versions.json' | ||
| issue_comment: | ||
| types: [created] | ||
| schedule: | ||
| - cron: '30 2/6 * * *' | ||
| workflow_dispatch: | ||
|
|
||
| # This allows a subsequently queued workflow run to interrupt previous runs | ||
| concurrency: | ||
| group: '${{ github.workflow }} @ ${{ github.event.pull_request.number || github.event.pull_request.head.label || github.sha || github.head_ref || github.ref }}-${{ github.event.schedule || github.event.comment.id || github.event.sender.login }}' | ||
| cancel-in-progress: true | ||
|
|
||
| #Setting explicit permissions for the action to avoid the default permissions which are `write-all` in case of pull_request_target event | ||
| permissions: | ||
| actions: write | ||
| pull-requests: write | ||
| checks: write | ||
| contents: read | ||
| deployments: read | ||
| id-token: none | ||
| issues: write | ||
| discussions: read | ||
| packages: read | ||
| pages: read | ||
| repository-projects: read | ||
| security-events: read | ||
| statuses: read | ||
|
|
||
| env: | ||
| DEVELOCITY_ACCESS_KEY: ${{ secrets.DEVELOCITY_ACCESS_KEY }} | ||
| GRADLE_ENTERPRISE_CACHE_USERNAME: ${{ secrets.GE_CACHE_USERNAME }} | ||
| GRADLE_ENTERPRISE_CACHE_PASSWORD: ${{ secrets.GE_CACHE_PASSWORD }} | ||
|
|
||
| jobs: | ||
| beam_PreCommit_Java_Spark4_Versions: | ||
| name: ${{ matrix.job_name }} (${{ matrix.job_phrase }}) | ||
| runs-on: [self-hosted, ubuntu-24.04, main] | ||
| strategy: | ||
| matrix: | ||
| job_name: [beam_PreCommit_Java_Spark4_Versions] | ||
| job_phrase: [Run Java_Spark4_Versions PreCommit] | ||
| timeout-minutes: 120 | ||
| if: | | ||
| github.event_name == 'push' || | ||
| github.event_name == 'pull_request_target' || | ||
| (github.event_name == 'schedule' && github.repository == 'apache/beam') || | ||
| github.event_name == 'workflow_dispatch' || | ||
| github.event.comment.body == 'Run Java_Spark4_Versions PreCommit' | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - name: Setup repository | ||
| uses: ./.github/actions/setup-action | ||
| with: | ||
| comment_phrase: ${{ matrix.job_phrase }} | ||
| github_token: ${{ secrets.GITHUB_TOKEN }} | ||
| github_job: ${{ matrix.job_name }} (${{ matrix.job_phrase }}) | ||
| - name: Setup environment | ||
| uses: ./.github/actions/setup-environment-action | ||
| with: | ||
| java-version: '17' | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may need to do something like this then gradle command has |
||
| - name: run sparkVersionsTest script | ||
| uses: ./.github/actions/gradle-command-self-hosted-action | ||
| with: | ||
| gradle-command: :runners:spark:4:sparkVersionsTest | ||
| arguments: | | ||
| -PtestJavaVersion=17 \ | ||
| -PdisableSpotlessCheck=true \ | ||
| - name: Archive JUnit Test Results | ||
| uses: actions/upload-artifact@v4 | ||
| if: ${{ !success() }} | ||
| with: | ||
| name: JUnit Test Results | ||
| path: "**/build/reports/tests/" | ||
| - name: Publish JUnit Test Results | ||
| uses: EnricoMi/publish-unit-test-result-action@v2 | ||
| if: always() | ||
| with: | ||
| commit: '${{ env.prsha || env.GITHUB_SHA }}' | ||
| comment_mode: ${{ github.event_name == 'issue_comment' && 'always' || 'off' }} | ||
| files: '**/build/test-results/**/*.xml' | ||
| large_files: true | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -60,7 +60,7 @@ | |
| ## Highlights | ||
|
|
||
| * New highly anticipated feature X added to Python SDK ([#X](https://github.com/apache/beam/issues/X)). | ||
| * New highly anticipated feature Y added to Java SDK ([#Y](https://github.com/apache/beam/issues/Y)). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Possibly a rebase error. Just revert the change on this branch. |
||
| * Experimental Spark 4 runner added (Java), built against Spark 4.0.2 / Scala 2.13 and requiring Java 17. Currently supports batch only; streaming is not yet supported ([#36841](https://github.com/apache/beam/issues/36841)). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can add to CHANGES.md latter when ValidatesRunner tests all setup and confirmed working. |
||
|
|
||
| ## I/Os | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| <!-- | ||
| Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
| --> | ||
| # Apache Beam Spark 4 Runner | ||
|
|
||
| Experimental Beam runner for Apache Spark 4 (batch-only). Built on the shared | ||
| `runners/spark` source base via `spark_runner.gradle`'s per-version | ||
| source-overrides mechanism: this module contributes the small set of files | ||
| under `src/main/java/.../structuredstreaming/` that diverge from the Spark 3 | ||
| implementation. See the parent `runners/spark/` module for the bulk of the | ||
| runner code. | ||
|
|
||
| ## Requirements | ||
|
|
||
| * **Spark 4.0.2** (and other Spark 4.0.x patch releases) | ||
| * **Scala 2.13** | ||
| * **Java 17** — Spark 4 does not run on earlier JDKs | ||
|
|
||
| ## Status | ||
|
|
||
| Batch only. Streaming is tracked in | ||
| [#36841](https://github.com/apache/beam/issues/36841). | ||
|
|
||
| ## Known issues | ||
|
|
||
| ### `StackOverflowError` from `slf4j-jdk14` on the runtime classpath | ||
|
|
||
| Spark 4 ships `org.slf4j:jul-to-slf4j` to route `java.util.logging` records | ||
| into SLF4J. If `org.slf4j:slf4j-jdk14` is also resolved at runtime — it routes | ||
| the other direction (SLF4J → JUL) — the first log line creates an infinite | ||
| loop: | ||
|
|
||
| ``` | ||
| java.lang.StackOverflowError | ||
| at org.slf4j.bridge.SLF4JBridgeHandler.publish(...) | ||
| at java.util.logging.Logger.log(...) | ||
| at org.slf4j.impl.JDK14LoggerAdapter.log(...) | ||
| at org.slf4j.bridge.SLF4JBridgeHandler.publish(...) | ||
| ... | ||
| ``` | ||
|
|
||
| This is the same condition that broke the Spark 3 runner in | ||
| [#26985](https://github.com/apache/beam/issues/26985), fixed in | ||
| [#27001](https://github.com/apache/beam/pull/27001). | ||
|
|
||
| The shared `spark_runner.gradle` already excludes `slf4j-jdk14` from the | ||
| runner module's own `configurations.all`, so in-tree builds are unaffected. | ||
| Downstream Gradle consumers that assemble a runtime classpath against | ||
| `beam-runners-spark-4` should mirror that exclude: | ||
|
|
||
| ```groovy | ||
| configurations.all { | ||
| exclude group: "org.slf4j", module: "slf4j-jdk14" | ||
| } | ||
| ``` | ||
|
|
||
| For Maven, exclude `org.slf4j:slf4j-jdk14` from any dependency that pulls it | ||
| transitively (commonly the Beam SDK harness and several IO connectors). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both trigger files are not needed (one not effective yet; another one is a leftover of deleted workflow)