[fix](be) Fix json contains duplicate array candidates#63301
Open
mrhhsg wants to merge 1 commit into
Open
Conversation
### What problem does this PR solve?
Issue Number: DORIS-25573
Related PR: None
Problem Summary: Fix json_contains array containment semantics so candidate elements are checked independently without consuming target array elements.
### Release note
Fix json_contains returning false when the candidate array contains duplicate elements already present in the target array.
### Check List (For Author)
- Test: Unit Test
- ./run-be-ut.sh --run --filter=JsonbContainsTest.* -j 32
- build-support/check-format.sh
- build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (fails because clang-tidy cannot analyze existing headers in this environment: be/src/core/types.h has an unmatched NOLINTEND and stddef.h/intrinsic toolchain headers fail to resolve)
- Behavior changed: Yes
- json_contains now follows non-consuming array containment semantics for duplicate candidate elements.
- Does this need documentation: No
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Member
Author
|
/review |
Contributor
There was a problem hiding this comment.
Code review completed. I did not find blocking issues in this PR.
Critical checkpoint conclusions:
- Goal and proof: The PR fixes JSONB array containment so duplicate candidate elements are checked independently rather than consuming/counting target elements. The implementation matches the intended non-consuming semantics and adds focused BE unit coverage for duplicates, unordered array candidates, negative cases, and recursive containment.
- Scope/minimality: The change is small and localized to
JsonbValue::containsarray handling plus a focused unit test. - Concurrency/lifecycle: Not applicable. The modified path is read-only value traversal and introduces no shared mutable state, locks, threads, static initialization dependencies, or ownership lifecycle changes.
- Configuration/compatibility/protocol: Not applicable. No config, storage format, persisted metadata, FE-BE protocol, or function signature changes are introduced.
- Parallel code paths: The changed helper is used by the existing JSONB
json_containsexecution path; I did not find another changed path that needs the same fix. - Conditional checks/error handling: No new speculative error swallowing is introduced. Existing document validation and function error propagation remain unchanged.
- Test coverage: The new BE unit test covers the regression and recursive behavior. I attempted
./run-be-ut.sh --run --filter=JsonbContainsTest.* -j 32, but this runner failed during BE UT configuration becausethirdparty/installed/bin/protocis missing, before running the test. - Observability: Not applicable for this pure comparison helper; no new logging/metrics needed.
- Transactions/persistence/data writes: Not applicable.
- Performance: Complexity remains consistent with the previous array containment approach and the code avoids new allocations; no blocking performance regression found.
User focus points: No additional user-provided review focus was present.
Member
Author
|
run buildall |
Contributor
TPC-H: Total hot run time: 31150 ms |
Contributor
TPC-DS: Total hot run time: 168753 ms |
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: DORIS-25573
Related PR: None
Problem Summary:
json_contains('[1,1,1]', '[1,1]')should return true, but the previous JSONB array containment logic counted matching target elements and compared that count with the candidate array length. Duplicate candidate elements therefore produced incorrect false results. This change checks each candidate array element independently against the target array, matching non-consuming containment semantics.Release note
Fix
json_containsreturning false when the candidate array contains duplicate elements already present in the target array.Check List (For Author)
./run-be-ut.sh --run --filter=JsonbContainsTest.* -j 32build-support/check-format.shgit diff --check 66dbb85fe3deaf9069bc58f78446fe998ba5810b..HEADbuild-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN(failed because clang-tidy could not analyze existing headers in this environment:be/src/core/types.hhas an unmatchedNOLINTEND, and the toolchain could not resolvestddef.h/intrinsic headers)json_containsnow follows non-consuming array containment semantics for duplicate candidate elements.