Skip to content

[feature](iceberg) Support reading Iceberg variant from Parquet#63192

Draft
eldenmoon wants to merge 1 commit into
apache:masterfrom
eldenmoon:codex/iceberg-v3-variant
Draft

[feature](iceberg) Support reading Iceberg variant from Parquet#63192
eldenmoon wants to merge 1 commit into
apache:masterfrom
eldenmoon:codex/iceberg-v3-variant

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon commented May 12, 2026

feature Support reading Iceberg variant from Parquet

What problem does this PR solve?

Issue Number: N/A

Related PR: #63192

Problem Summary: Doris could not read Iceberg v3 VARIANT columns from Parquet files. This change maps Iceberg VARIANT to Doris VARIANT, validates the Parquet VariantShredding wrapper shape, decodes metadata/value residual data, reads shredded typed_value columns, and prunes shredded Parquet leaves for accessed variant paths. The VARIANT reader and planner changes stay scoped to the Iceberg/Parquet VARIANT path instead of coupling generic nested-column code to Iceberg-only behavior. Typed-only shredded projections stay on native Parquet typed columns when residual value columns are not selected, with counter coverage to catch row-wise performance regressions. Selected residual or complex layouts still fall back to row-wise reconstruction. This also preserves VARIANT subpaths through casts, validates the actual Iceberg data-file format for VARIANT reads, rejects duplicate VariantShredding structural children, preserves null temporal typed leaves without reading their physical value, and keeps delete-only Iceberg MERGE projections from reading unused visible target data columns.

Release note

Support reading Iceberg v3 VARIANT Parquet columns, including shredded typed_value column pruning and binary/UUID/primitive residual VARIANT values. Writing Iceberg VARIANT columns is rejected with an explicit unsupported error.

Check List (For Author)

  • Test: Regression test / Unit Test / Manual test

    • Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.DirectTypedOnlyReaderCountersUseNativePath:ParquetVariantReaderTest.VariantReaderCountersUseRowWiseWhenResidualValueSelected:ParquetVariantReaderTest.RowWisePreservesExplicitVariantNullShreddedArrayElement:ParquetVariantReaderTest.RowWiseRejectsMissingShreddedArrayElement' (4 tests passed)

    • Unit Test: ./run-be-ut.sh --run -f 'ParquetVariantReaderTest.RowWisePreservesNullComplexTypedArrayElement:ParquetVariantReaderTest.RowWiseRejectsMissingShreddedArrayElement' (2 tests passed)

    • Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.*' (85 tests passed on rerun; the first attempt failed before tests in OpenBLAS CMake getarch bootstrap)

    • Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.:NestedColumnAccessHelperTest.' (127 tests passed)

    • Unit Test: ./run-be-ut.sh --run --filter='IcebergReaderCreateColumnIdsTest.*' (9 tests passed)

    • Unit Test: ./run-be-ut.sh --run --filter=ParquetVariantReaderTest.RejectVariantSchemaWithDuplicateStructuralChild:ParquetVariantReaderTest.DirectTypedOnlyPreservesTemporalLeafNull (2 tests passed; rerun after clang-format also passed)

    • Unit Test: ./run-be-ut.sh --run --filter=ParquetVariantReaderTest.DirectTypedOnlyReaderCountersUseNativePath (1 test passed after latest changes)

    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testVariantComparisonPredicateCollectsWholeVariantOperand (1 test passed; Maven reactor succeeded)

    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testVariantCastProjectionKeepsSubPathWithSiblingPredicate (1 test passed; Maven reactor succeeded)

    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest (70 tests passed; Maven reactor succeeded)

    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest#testExplodeSubqueryJoinAggAccessPaths (1 test passed; Maven reactor succeeded)

    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.datasource.iceberg.source.IcebergScanNodeTest#testValidateVariantDataFileFormatRejectsOrcSplit (1 test passed; Maven reactor succeeded)

    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.datasource.iceberg.source.IcebergScanNodeTest (6 tests passed; Maven reactor succeeded)

    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.IcebergMergeCommandTest#testDeleteProjectionDoesNotReadVisibleTargetColumns (1 test passed; Maven reactor succeeded)

    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest (11 tests passed; Maven reactor succeeded)

    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.datasource.iceberg.IcebergUtilsTest (passed)

    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.SlotTypeReplacerTest (5 tests passed)

    • Regression test: performance regression coverage is included in regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy, including profile assertions that typed-only projections increment VariantDirectTypedValueReadRows and keep VariantRowWiseReadRows at 0. Not run locally in this worktree because no local Doris cluster/output BE+FE runtime is available.

    • Regression test: Added regression-test/suites/external_table_p0/iceberg/test_iceberg_variant_table_path.groovy to exercise the Iceberg REST catalog table path with nested VARIANT access and profile read-column assertions. Not run locally because Docker access to spark-iceberg is unavailable in this worktree.

    • Manual test: PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/clang-format.sh

    • Manual test: PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/check-format.sh

    • Manual test: git diff --check

    • Manual test: cd fe && mvn -pl fe-core checkstyle:check -DskipTests

    • Static analysis: CLANG_TIDY_BINARY=/tmp/clang-tidy-resource-filter build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (passed for changed lines after adding the clang-tidy resource-dir and filtering a pre-existing be/src/core/types.h clang-tidy-nolint diagnostic; the unwrapped script was blocked by that existing header diagnostic)

  • Behavior changed: Yes. Doris can read Iceberg v3 VARIANT Parquet columns, supports typed-only shredded projection pruning on native typed columns, reconstructs selected residual or complex layouts row-wise, rejects malformed VariantShredding schemas and missing present shredded array payloads, preserves null complex/temporal typed values and explicit Variant null array elements, forces root access for whole-VARIANT scalar/comparison consumers while preserving literal subpath pruning for typed reads, recursively rejects Iceberg VARIANT reads from non-Parquet data files during scan planning, avoids reading unused target data columns for delete-only Iceberg MERGE, and rejects Iceberg VARIANT data-file writes explicitly.

  • Does this need documentation: No

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from cab85b8 to e9e3bfd Compare May 12, 2026 19:31
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@github-actions
Copy link
Copy Markdown
Contributor

OpenCode automated review failed and did not complete.

Error: Review step was skipped (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/25757447037

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:

I found one blocking issue in the added regression test: the local single-BE copy path does not match the file_path used by the local TVF, so the test can fail in the exact environment that branch is intended to support.

Critical checkpoint conclusions:

  • Goal: add Iceberg/Parquet VARIANT read support, including shredded projection. The implementation and regression coverage mostly target that goal, but the new regression test has a path setup bug.
  • Scope: the production changes are focused on Parquet schema parsing, variant reconstruction, column pruning, and Iceberg type mapping.
  • Concurrency/lifecycle: no new shared mutable concurrent state or non-obvious lifecycle ownership issue found in the reviewed PR diff.
  • Configuration/compatibility: no new config items or persisted storage-format changes found; FE/BE type mapping paths for Iceberg VARIANT are updated.
  • Parallel paths: Hive, Iceberg, and local Parquet pruning paths were considered; the test issue is distinct from production pruning logic.
  • Tests: regression coverage was added, but the local-file staging logic can make the new test fail before validating the feature.
  • Observability/performance: added ParquetReadColumnPaths profile string is useful for validating pruning; no blocking observability or hot-path issue found beyond the test blocker.

User focus: no additional user-provided review focus was present.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a correctness blocker in the new shredded VARIANT pruning logic. The implementation prunes the unshredded value leaf whenever a matching typed_value path exists, but Iceberg shredded VARIANT can still carry residual/unrepresentable values for that field in value, so queries can silently return NULL or partial objects for those rows.

Critical checkpoint conclusions:

  • Goal/test: the PR adds Iceberg v3 VARIANT reading and pruning tests, but the tests only cover fully typed shredded fields and do not prove residual fallback correctness.
  • Scope/focus: the change is mostly focused on Parquet/Iceberg VARIANT support.
  • Concurrency/lifecycle/config: no new concurrency, non-trivial lifecycle, or config behavior found in the reviewed PR diff.
  • Compatibility: adds new type mapping; no storage-format persistence changes found.
  • Parallel paths: the same pruning issue exists in both standalone/Hive Parquet and Iceberg Parquet helper paths.
  • Tests: missing mixed shredded/residual cases where a selected typed path is absent or has an incompatible type in typed_value but exists in value.
  • Observability/performance: profile string helps inspect selected leaves; no additional blocking observability issue found.
  • Data correctness: blocking issue below can cause incorrect query results after column pruning.

No additional user-provided focus points were present.

Comment thread be/src/format/table/hive/hive_parquet_nested_column_utils.cpp Outdated
Comment thread be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp Outdated
@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from e9e3bfd to 5fe9ca5 Compare May 12, 2026 20:03
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

1 similar comment
@eldenmoon
Copy link
Copy Markdown
Member Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full PR with the Doris code-review checklist. I found a blocking correctness issue in nested VARIANT pruning for deeper accesses under shredded fields. Goal/test: the PR adds Iceberg/Parquet VARIANT read support and includes a regression for top-level shredded field reads, but it does not cover deeper access through a shredded field residual. Scope: the change is focused, but the duplicated Hive/Iceberg pruning helper needs the same fix. Concurrency/lifecycle/config/transaction/persistence: no new concurrency, lifecycle, config, transaction, storage-format write, or persistence concerns found. Parallel paths: the same issue exists in both Hive/local Parquet and Iceberg Parquet helpers. Tests: existing tests cover unshredded reads and top-level shredded field pruning, but should add a mixed residual case such as v['metric']['x'] where metric is shredded and some rows store an object in typed_value.metric.value. Observability: the new profile string is useful for validating selected leaves. User focus: no additional user-provided review focus was supplied.

Comment thread be/src/format/table/hive/hive_parquet_nested_column_utils.cpp Outdated
Comment thread be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp Outdated
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: request changes.

Critical checkpoint conclusions:

  • Goal/test coverage: the PR adds Iceberg VARIANT Parquet reading, type mapping, pruning observability, and regression/unit coverage. The main scenario is covered, but a case-sensitive key path is not covered and currently regresses correctness.
  • Scope/focus: the change is focused on Iceberg/Parquet VARIANT support, though duplicated Hive/Iceberg pruning helpers carry the same issue.
  • Concurrency/lifecycle: no new shared mutable state, threads, locks, or static initialization hazards found in the reviewed paths.
  • Configuration/compatibility: no new configs or storage-format writes; this is a reader/type-mapping change. Mixed files with non-VARIANT types continue through existing paths.
  • Parallel paths: the Hive/local and Iceberg Parquet pruning paths were both reviewed; both have the same case-sensitivity bug and are commented separately.
  • Error handling/memory: Status returns in the new reader path are generally propagated; no ignored Status or untracked large persistent allocation issue found beyond the correctness issue raised.
  • Data correctness: blocking issue found: shredded VARIANT field lookup lowercases user path components, so distinct keys such as a and A can be pruned/read as the same field.
  • Observability/performance: the profile leaf-path observable is useful; no additional blocker found.

User focus: no additional user-provided review focus was specified.

Comment thread be/src/format/table/hive/hive_parquet_nested_column_utils.cpp Outdated
Comment thread be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp Outdated
@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from 5fe9ca5 to fa098c0 Compare May 12, 2026 20:43
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29804 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e9e3bfd819ba9d8ccea3f9b57abb35147175a7e8, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17678	3922	3919	3919
q2	q3	10718	872	608	608
q4	4655	466	351	351
q5	7465	1357	1156	1156
q6	207	173	140	140
q7	940	940	750	750
q8	9516	1396	1309	1309
q9	6069	5362	5345	5345
q10	6307	2102	1891	1891
q11	486	266	259	259
q12	687	430	299	299
q13	18191	3334	2785	2785
q14	286	284	268	268
q15	q16	904	845	793	793
q17	958	995	742	742
q18	6501	5679	5525	5525
q19	1169	1192	1132	1132
q20	512	406	259	259
q21	4752	2425	1946	1946
q22	459	402	327	327
Total cold run time: 98460 ms
Total hot run time: 29804 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4662	4580	4561	4561
q2	q3	4703	4784	4197	4197
q4	2175	2189	1420	1420
q5	5212	4969	5227	4969
q6	202	176	143	143
q7	2063	1832	1618	1618
q8	3350	3117	3095	3095
q9	8461	8840	8379	8379
q10	4501	4555	4267	4267
q11	633	423	411	411
q12	738	748	524	524
q13	3188	3594	2917	2917
q14	302	300	290	290
q15	q16	755	778	723	723
q17	1379	1276	1282	1276
q18	8031	7128	7115	7115
q19	1158	1175	1154	1154
q20	2293	2287	1992	1992
q21	6276	5486	4800	4800
q22	531	473	400	400
Total cold run time: 60613 ms
Total hot run time: 54251 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29785 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5fe9ca52dd2edab6a76b7083da4f51d88076c8c5, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17624	3891	3801	3801
q2	q3	10699	879	587	587
q4	4666	467	347	347
q5	7460	1325	1139	1139
q6	201	169	142	142
q7	918	948	769	769
q8	9690	1438	1262	1262
q9	6853	5515	5462	5462
q10	6327	2070	1808	1808
q11	481	273	258	258
q12	688	422	301	301
q13	18192	3299	2725	2725
q14	298	282	264	264
q15	q16	898	868	787	787
q17	1197	1046	735	735
q18	6458	5747	5601	5601
q19	1610	1245	1108	1108
q20	521	414	416	414
q21	4731	2362	1928	1928
q22	464	395	347	347
Total cold run time: 99976 ms
Total hot run time: 29785 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4759	4521	4635	4521
q2	q3	4621	4757	4188	4188
q4	2159	2234	1421	1421
q5	4977	4949	5181	4949
q6	207	174	135	135
q7	2087	1822	1621	1621
q8	3333	3077	3141	3077
q9	8595	8393	8429	8393
q10	4534	4533	4220	4220
q11	588	417	419	417
q12	718	809	511	511
q13	3281	3632	2864	2864
q14	310	312	281	281
q15	q16	773	798	713	713
q17	1309	1304	1293	1293
q18	7986	7053	7094	7053
q19	1161	1199	1145	1145
q20	2214	2201	1934	1934
q21	6042	5384	4784	4784
q22	530	475	405	405
Total cold run time: 60184 ms
Total hot run time: 53925 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 25.00% (1/4) 🎉
Increment coverage report
Complete coverage report

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two additional correctness issues in the Parquet VARIANT reconstruction path.

Critical checkpoint conclusions:

  • Goal/test: The PR adds Iceberg v3 Parquet VARIANT reading and shredded column pruning, with regression and FE unit coverage for the common unshredded/shredded object paths. However, valid shredded array layouts and non-finite shredded floats are not covered and can decode incorrectly or fail.
  • Scope/focus: The implementation is mostly focused on the new reader/pruning behavior, though the duplicated Hive/Iceberg pruning helper logic remains a maintainability risk rather than a blocker.
  • Concurrency/lifecycle: The reviewed changes are per-reader/per-query state and do not introduce new shared mutable state, locks, background threads, or special static lifecycle dependencies.
  • Config/compatibility: No new config items or storage-format writes are introduced. The change reads a standard Parquet/Iceberg format; mixed-version compatibility concerns are limited to reader capability.
  • Parallel paths: Hive/local and Iceberg pruning paths are both updated. Standalone Parquet uses the Hive-style name pruning helper.
  • Conditional checks: The main conditional logic around shredded value/typed_value follows the Parquet Variant Shredding spec for object fields, but the array element check currently misses a spec-valid layout.
  • Tests/results: Existing tests cover top-level object shredding, deeper residual paths, case-sensitive keys, and profile observability. Missing coverage remains for arrays whose element group omits value, and for NaN/Inf in shredded float/double typed values.
  • Observability: The added ParquetReadColumnPaths profile string is useful for pruning verification and appears lightweight.
  • Transactions/persistence/data writes: Not applicable; this is read-path only.
  • FE/BE variable passing: Iceberg type mapping and access-path rewriting are updated for VARIANT; no additional thrift variable propagation issue found.
  • Performance: The JSON reconstruction path is inherently allocation-heavy but limited to VARIANT decoding. No additional performance blocker found beyond the correctness issues below.

No additional user-provided review focus was specified.

Comment thread be/src/format/parquet/vparquet_column_reader.cpp Outdated
Comment thread be/src/format/parquet/vparquet_column_reader.cpp Outdated
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170353 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e9e3bfd819ba9d8ccea3f9b57abb35147175a7e8, data reload: false

query5	4327	649	519	519
query6	333	223	206	206
query7	4254	565	306	306
query8	323	252	224	224
query9	8875	4094	4024	4024
query10	445	351	314	314
query11	5803	2358	2258	2258
query12	186	132	123	123
query13	1273	625	441	441
query14	6085	5342	5029	5029
query14_1	4355	4361	4357	4357
query15	216	201	181	181
query16	1000	450	394	394
query17	1112	764	607	607
query18	2510	480	345	345
query19	209	203	163	163
query20	138	132	131	131
query21	209	137	118	118
query22	13639	13556	13372	13372
query23	17199	16360	16093	16093
query23_1	16079	16237	16181	16181
query24	7431	1793	1365	1365
query24_1	1355	1390	1388	1388
query25	603	540	489	489
query26	1352	324	181	181
query27	2678	608	359	359
query28	4478	2023	2012	2012
query29	994	657	554	554
query30	313	246	202	202
query31	1118	1087	924	924
query32	87	75	79	75
query33	552	376	306	306
query34	1183	1132	666	666
query35	776	778	686	686
query36	1328	1361	1199	1199
query37	157	108	95	95
query38	3190	3171	3051	3051
query39	962	931	904	904
query39_1	894	867	877	867
query40	258	167	145	145
query41	71	67	68	67
query42	117	115	113	113
query43	326	331	302	302
query44	
query45	217	208	202	202
query46	1077	1200	725	725
query47	2281	2334	2185	2185
query48	406	416	292	292
query49	652	557	470	470
query50	722	300	223	223
query51	4324	4278	4321	4278
query52	109	105	98	98
query53	253	282	209	209
query54	327	293	268	268
query55	96	92	87	87
query56	311	334	329	329
query57	1413	1401	1298	1298
query58	319	281	269	269
query59	1526	1640	1377	1377
query60	351	352	340	340
query61	203	152	156	152
query62	665	607	563	563
query63	254	199	205	199
query64	2406	813	677	677
query65	
query66	1716	512	389	389
query67	30267	29983	29905	29905
query68	
query69	461	336	301	301
query70	994	930	966	930
query71	299	273	271	271
query72	2936	2803	2490	2490
query73	858	765	404	404
query74	5055	4967	4727	4727
query75	2765	2670	2323	2323
query76	2291	1120	735	735
query77	421	435	348	348
query78	12873	13050	12248	12248
query79	1506	942	760	760
query80	1364	579	501	501
query81	532	280	244	244
query82	1020	159	128	128
query83	318	284	247	247
query84	266	139	109	109
query85	894	550	432	432
query86	449	340	305	305
query87	3422	3352	3232	3232
query88	3518	2667	2652	2652
query89	442	388	338	338
query90	1906	186	184	184
query91	178	168	137	137
query92	80	80	73	73
query93	1102	940	560	560
query94	710	348	294	294
query95	655	466	350	350
query96	1061	731	346	346
query97	2717	2699	2591	2591
query98	240	238	228	228
query99	1131	1084	979	979
Total cold run time: 253877 ms
Total hot run time: 170353 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169901 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5fe9ca52dd2edab6a76b7083da4f51d88076c8c5, data reload: false

query5	4329	655	519	519
query6	344	229	198	198
query7	4251	540	302	302
query8	334	234	206	206
query9	8828	4061	4005	4005
query10	447	353	316	316
query11	5828	2369	2201	2201
query12	178	128	128	128
query13	1284	654	423	423
query14	6612	5345	5024	5024
query14_1	4337	4351	4298	4298
query15	210	202	185	185
query16	1019	447	427	427
query17	1118	760	632	632
query18	2495	500	359	359
query19	218	208	159	159
query20	137	131	129	129
query21	217	138	115	115
query22	13679	13558	13470	13470
query23	17261	16299	15983	15983
query23_1	16164	16128	16226	16128
query24	7449	1767	1367	1367
query24_1	1373	1361	1361	1361
query25	593	546	492	492
query26	1294	333	175	175
query27	2716	614	341	341
query28	4469	1985	1985	1985
query29	1058	666	555	555
query30	311	250	203	203
query31	1131	1068	947	947
query32	93	81	78	78
query33	555	356	312	312
query34	1173	1145	626	626
query35	767	791	675	675
query36	1312	1361	1221	1221
query37	157	110	99	99
query38	3181	3151	3040	3040
query39	920	916	898	898
query39_1	885	881	874	874
query40	254	167	144	144
query41	76	69	69	69
query42	114	116	112	112
query43	320	329	287	287
query44	
query45	216	205	199	199
query46	1082	1209	710	710
query47	2332	2244	2129	2129
query48	408	421	324	324
query49	648	543	455	455
query50	694	284	219	219
query51	4281	4278	4175	4175
query52	104	108	97	97
query53	257	279	218	218
query54	334	285	280	280
query55	94	94	86	86
query56	321	317	340	317
query57	1405	1389	1338	1338
query58	314	284	280	280
query59	1540	1586	1379	1379
query60	362	358	345	345
query61	209	163	156	156
query62	675	623	559	559
query63	245	197	204	197
query64	2406	831	680	680
query65	
query66	1728	542	401	401
query67	30018	29869	29896	29869
query68	
query69	446	342	311	311
query70	1014	1021	971	971
query71	310	280	271	271
query72	3032	2707	2389	2389
query73	870	732	443	443
query74	5029	4890	4753	4753
query75	2763	2674	2344	2344
query76	2304	1109	742	742
query77	415	441	353	353
query78	12866	12893	12172	12172
query79	1554	932	750	750
query80	1362	566	529	529
query81	519	279	239	239
query82	963	162	119	119
query83	372	286	250	250
query84	268	138	113	113
query85	939	512	440	440
query86	491	337	346	337
query87	3477	3365	3223	3223
query88	3527	2653	2608	2608
query89	446	382	338	338
query90	1917	176	186	176
query91	181	170	138	138
query92	80	77	72	72
query93	1202	971	548	548
query94	715	345	298	298
query95	685	460	354	354
query96	1034	732	357	357
query97	2707	2674	2534	2534
query98	236	229	226	226
query99	1113	1129	1008	1008
Total cold run time: 254910 ms
Total hot run time: 169901 ms

@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from fa098c0 to bf0c548 Compare May 12, 2026 21:06
@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found additional blocking compatibility issues for valid Parquet VARIANT shredding layouts that omit value when data is fully represented by typed_value. Existing review threads already cover non-finite typed floats, typed array elements, case-sensitive keys, and value/residual pruning, so I did not duplicate those.

Critical checkpoints:

  • Goal/test: the PR aims to read Iceberg/Parquet VARIANT, including shredded layouts, and adds local TVF regression coverage, but coverage does not include typed-value-only top-level or nested shredded field groups.
  • Scope/focus: the change is focused, but the schema/pruning logic is stricter than the Parquet shredding layout it is trying to support.
  • Concurrency/lifecycle/config/transactions/persistence: no new concurrency, lifecycle, config, transaction, or persistence concerns found in the reviewed paths.
  • Parallel paths: Hive/local and Iceberg pruning have duplicated logic; both need the typed-value-only fix.
  • Compatibility/data correctness: current code rejects or prunes away valid typed-value-only shredded data, causing scan failure or null/missing results.
  • Tests: existing tests cover unshredded/shredded happy paths and several pruning observables, but miss the typed-value-only layouts described in the inline comments.
  • Observability/performance: no additional observability or performance blocker found beyond the added profile string being used by tests.
  • User focus: no additional user-provided review focus was present.

Comment thread be/src/format/parquet/schema_desc.cpp Outdated
Comment thread be/src/format/table/hive/hive_parquet_nested_column_utils.cpp Outdated
Comment thread be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp Outdated
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29314 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit fa098c01643af773e82d99b9046d891338e1145f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17784	4016	3937	3937
q2	q3	10704	888	609	609
q4	4677	454	346	346
q5	7441	1317	1142	1142
q6	204	174	138	138
q7	917	948	757	757
q8	9628	1397	1270	1270
q9	6188	5397	5328	5328
q10	6323	2084	1806	1806
q11	477	262	256	256
q12	687	405	294	294
q13	18197	3296	2757	2757
q14	292	282	263	263
q15	q16	911	870	787	787
q17	1011	1038	727	727
q18	6402	5668	5519	5519
q19	1470	1164	958	958
q20	505	385	258	258
q21	4852	2276	1856	1856
q22	420	353	306	306
Total cold run time: 99090 ms
Total hot run time: 29314 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4269	4176	4162	4162
q2	q3	4627	4770	4167	4167
q4	2093	2152	1380	1380
q5	4961	4992	5266	4992
q6	193	163	133	133
q7	2028	1762	2018	1762
q8	3411	3196	3191	3191
q9	8519	8460	8332	8332
q10	4522	4465	4240	4240
q11	625	431	395	395
q12	732	743	535	535
q13	3224	3547	2922	2922
q14	306	314	290	290
q15	q16	763	781	720	720
q17	1356	1296	1272	1272
q18	8062	7007	7128	7007
q19	1173	1167	1127	1127
q20	2315	2241	1938	1938
q21	6202	5384	4915	4915
q22	567	533	438	438
Total cold run time: 59948 ms
Total hot run time: 53918 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31087 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c5b83d021b49b29d05e1d4ff7f2a30464f9d95dc, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17816	3935	3842	3842
q2	q3	10818	1354	804	804
q4	4678	470	348	348
q5	7566	2223	2094	2094
q6	297	172	136	136
q7	943	762	629	629
q8	9360	1771	1647	1647
q9	6564	4931	4884	4884
q10	6429	2118	1772	1772
q11	432	268	249	249
q12	691	429	291	291
q13	18219	3354	2785	2785
q14	265	255	239	239
q15	q16	817	767	705	705
q17	925	905	943	905
q18	7096	5669	5495	5495
q19	1189	1318	1175	1175
q20	543	428	288	288
q21	5778	2733	2486	2486
q22	460	368	313	313
Total cold run time: 100886 ms
Total hot run time: 31087 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4530	4475	4619	4475
q2	q3	4806	5196	4639	4639
q4	2154	2190	1388	1388
q5	4795	4686	4650	4650
q6	228	176	135	135
q7	1798	1768	1546	1546
q8	2291	1854	1885	1854
q9	7222	7182	7175	7175
q10	4432	4390	3957	3957
q11	517	371	345	345
q12	704	714	505	505
q13	3066	3345	2775	2775
q14	265	284	255	255
q15	q16	676	693	596	596
q17	1252	1220	1216	1216
q18	7225	6790	6711	6711
q19	1152	1090	1112	1090
q20	2198	2199	1944	1944
q21	5230	4603	4393	4393
q22	514	452	401	401
Total cold run time: 55055 ms
Total hot run time: 50050 ms

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 75.64% (208/275) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169611 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c5b83d021b49b29d05e1d4ff7f2a30464f9d95dc, data reload: false

query5	4314	646	540	540
query6	329	224	205	205
query7	4224	567	298	298
query8	334	227	218	218
query9	8850	3965	3951	3951
query10	470	347	302	302
query11	5805	2338	2177	2177
query12	182	131	124	124
query13	1305	621	432	432
query14	5903	5368	5019	5019
query14_1	4382	4364	4380	4364
query15	213	203	185	185
query16	1025	463	443	443
query17	1153	737	611	611
query18	2595	488	373	373
query19	219	210	168	168
query20	139	135	130	130
query21	219	138	122	122
query22	13571	13543	13350	13350
query23	17229	16356	16085	16085
query23_1	16127	16236	16133	16133
query24	7513	1742	1278	1278
query24_1	1295	1300	1305	1300
query25	540	473	420	420
query26	1318	325	171	171
query27	2710	582	334	334
query28	4430	1928	1961	1928
query29	953	611	504	504
query30	297	237	202	202
query31	1116	1057	947	947
query32	88	75	72	72
query33	535	346	313	313
query34	1182	1146	621	621
query35	765	782	661	661
query36	1365	1393	1239	1239
query37	148	102	93	93
query38	3219	3159	3071	3071
query39	929	933	911	911
query39_1	872	879	875	875
query40	225	144	125	125
query41	67	62	61	61
query42	108	108	109	108
query43	324	328	279	279
query44	
query45	203	202	189	189
query46	1099	1172	719	719
query47	2333	2398	2210	2210
query48	430	399	288	288
query49	628	486	378	378
query50	995	355	243	243
query51	4340	4318	4245	4245
query52	111	106	96	96
query53	260	280	205	205
query54	308	268	257	257
query55	91	91	89	89
query56	300	308	297	297
query57	1425	1443	1343	1343
query58	299	280	277	277
query59	1621	1701	1516	1516
query60	330	353	335	335
query61	191	180	178	178
query62	682	665	577	577
query63	248	214	215	214
query64	2513	864	719	719
query65	
query66	2143	496	387	387
query67	30481	30122	29855	29855
query68	
query69	466	347	306	306
query70	1043	950	1015	950
query71	318	274	278	274
query72	3003	2645	2343	2343
query73	838	801	421	421
query74	5044	4923	4736	4736
query75	2650	2600	2247	2247
query76	2275	1132	765	765
query77	389	411	338	338
query78	12084	12145	11625	11625
query79	1520	1017	722	722
query80	1303	535	456	456
query81	526	278	236	236
query82	957	164	120	120
query83	335	279	245	245
query84	263	138	112	112
query85	1011	541	453	453
query86	450	325	345	325
query87	3450	3334	3234	3234
query88	3542	2668	2655	2655
query89	445	380	332	332
query90	1866	178	179	178
query91	179	178	138	138
query92	78	79	73	73
query93	1623	1463	896	896
query94	719	355	307	307
query95	678	373	342	342
query96	1037	766	333	333
query97	2714	2695	2568	2568
query98	240	229	239	229
query99	1144	1095	991	991
Total cold run time: 254859 ms
Total hot run time: 169611 ms

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review summary for PR 63192.

I did not find a distinct new inline issue beyond the already-open review threads, so I am not duplicating those comments. However, the existing threads still describe critical correctness gaps in VARIANT nested-column pruning and Parquet/Iceberg VARIANT reconstruction/read validation, including cases that can silently return truncated or wrong VARIANT data, crash/fail valid reads, or miss required validation for unsupported nested VARIANT paths. These are blocking until resolved or convincingly addressed in-thread.

Critical checkpoint conclusions:

  • Goal/test proof: the PR aims to support Iceberg/Parquet VARIANT read/pruning and reject unsupported writes. It adds unit/regression coverage, but existing review threads identify important legal query/file shapes not covered or not handled correctly.
  • Scope/focus: the change is broad across FE pruning, BE Parquet readers, Iceberg/Hive readers, and tests; several known issues indicate cross-path behavior is not yet consistently handled.
  • Concurrency/lifecycle/config: I did not identify a new concurrency, lifecycle, or config-specific issue in this pass.
  • Compatibility: read/write support intersects Iceberg format/version and mixed external paths; existing validation comments around ORC/nested VARIANT and write/delete-only behavior remain important compatibility/correctness checkpoints.
  • Parallel code paths: existing comments cover multiple parallel paths that need consistent handling, including local/Hive/Iceberg Parquet, OLAP/file scans, generator/project/CTE-style access propagation, MAP/ARRAY offset/null paths, and ORC rejection.
  • Error handling/data correctness: existing issues include silent wrong pruning, valid-data reconstruction failures, and invalid physical-column/stat access risks; these are data correctness blockers.
  • Test coverage: new tests exist, but the known threads call out missing coverage for several edge cases, including Iceberg table path behavior, nested VARIANT in complex ORC columns, dynamic/whole-VARIANT access combinations, typed/null complex elements, and MAP/ARRAY offset/null reads.
  • Performance/observability: profile assertions were added for some pruning behavior; existing offset-only MAP/ARRAY comments indicate intended performance optimizations are incomplete in some paths.
  • User focus: no additional user-provided review focus was specified.

Please address the existing inline threads before this can be approved.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 30841 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 05227f64565f223e5d985a01530bda0cc6ec263b, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17742	3907	3853	3853
q2	q3	10761	1424	788	788
q4	4686	468	360	360
q5	7576	2296	2164	2164
q6	306	178	136	136
q7	977	781	626	626
q8	9348	1791	1595	1595
q9	6882	4915	4952	4915
q10	6459	2121	1784	1784
q11	437	273	245	245
q12	690	418	289	289
q13	18202	3381	2785	2785
q14	266	253	234	234
q15	q16	822	786	702	702
q17	977	948	990	948
q18	6994	5828	5514	5514
q19	1238	1217	1025	1025
q20	487	408	259	259
q21	5737	2648	2311	2311
q22	424	365	308	308
Total cold run time: 101011 ms
Total hot run time: 30841 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4266	4212	4230	4212
q2	q3	4517	4896	4295	4295
q4	2070	2223	1359	1359
q5	4429	4329	4340	4329
q6	306	333	150	150
q7	2025	1874	1604	1604
q8	2470	2264	2202	2202
q9	7874	7829	7718	7718
q10	4486	4468	4026	4026
q11	581	454	542	454
q12	722	725	510	510
q13	3362	3638	3033	3033
q14	305	310	282	282
q15	q16	732	727	673	673
q17	1382	1305	1342	1305
q18	8052	7404	7036	7036
q19	1107	1100	1091	1091
q20	2215	2205	1923	1923
q21	5308	4769	4532	4532
q22	518	472	432	432
Total cold run time: 56727 ms
Total hot run time: 51166 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170044 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 05227f64565f223e5d985a01530bda0cc6ec263b, data reload: false

query5	4314	662	499	499
query6	347	214	196	196
query7	4345	556	291	291
query8	324	233	248	233
query9	8838	4040	4016	4016
query10	452	352	298	298
query11	5757	2357	2237	2237
query12	190	128	124	124
query13	1302	610	441	441
query14	6030	5348	5061	5061
query14_1	4346	4345	4301	4301
query15	212	205	182	182
query16	1004	462	444	444
query17	1170	743	607	607
query18	2486	500	356	356
query19	224	207	172	172
query20	140	135	130	130
query21	213	136	123	123
query22	13561	13527	13319	13319
query23	17219	16391	15987	15987
query23_1	16188	16206	16129	16129
query24	7549	1768	1326	1326
query24_1	1312	1334	1292	1292
query25	584	502	428	428
query26	1303	324	179	179
query27	2693	580	343	343
query28	4460	1947	1967	1947
query29	991	621	511	511
query30	305	246	200	200
query31	1128	1069	955	955
query32	86	86	77	77
query33	553	371	306	306
query34	1215	1187	636	636
query35	762	787	730	730
query36	1366	1340	1209	1209
query37	153	105	92	92
query38	3204	3127	3032	3032
query39	929	923	894	894
query39_1	893	870	880	870
query40	221	143	129	129
query41	65	61	60	60
query42	106	109	108	108
query43	320	336	295	295
query44	
query45	208	204	191	191
query46	1058	1193	712	712
query47	2339	2274	2274	2274
query48	399	411	292	292
query49	635	489	403	403
query50	1033	352	257	257
query51	4333	4428	4260	4260
query52	104	106	95	95
query53	251	301	215	215
query54	307	267	279	267
query55	93	92	84	84
query56	292	300	305	300
query57	1418	1395	1289	1289
query58	289	264	264	264
query59	1581	1722	1481	1481
query60	321	326	323	323
query61	156	182	154	154
query62	668	629	566	566
query63	244	204	212	204
query64	2386	791	631	631
query65	
query66	1711	500	377	377
query67	29971	29892	29878	29878
query68	
query69	462	336	314	314
query70	1048	995	997	995
query71	303	272	273	272
query72	2965	2700	2456	2456
query73	816	764	415	415
query74	5066	4934	4689	4689
query75	2633	2588	2249	2249
query76	2276	1128	752	752
query77	400	408	333	333
query78	12111	12175	11661	11661
query79	1474	1005	728	728
query80	1205	533	466	466
query81	507	283	237	237
query82	1388	157	121	121
query83	369	273	260	260
query84	272	142	113	113
query85	918	543	440	440
query86	432	324	329	324
query87	3484	3354	3169	3169
query88	3580	2658	2654	2654
query89	446	384	337	337
query90	1861	186	212	186
query91	179	165	141	141
query92	80	79	75	75
query93	1541	1471	887	887
query94	631	349	319	319
query95	657	400	350	350
query96	1036	757	325	325
query97	2737	2710	2557	2557
query98	236	228	229	228
query99	1103	1102	998	998
Total cold run time: 253879 ms
Total hot run time: 170044 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 75.58% (2655/3513) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.68% (20829/38804)
Line Coverage 37.39% (197576/528361)
Region Coverage 33.77% (154747/458212)
Branch Coverage 34.87% (67837/194535)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 74.67% (227/304) 🎉
Increment coverage report
Complete coverage report

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two blocking correctness issues. The VARIANT access-path collector still drops subpath requirements across casts whose child is VARIANT, so sibling VARIANT predicates can cause scans to prune away data required by projections. The delete-only Iceberg MERGE change avoids write-side VARIANT validation, but the plan still projects all target columns and can therefore hit read-side VARIANT rejection or unnecessary VARIANT reads.

Critical checkpoints: goal is partially met but not for casted VARIANT subpaths or delete-only MERGE on tables with VARIANT columns; tests do not cover these residual cases. The changes are mostly focused, but the MERGE fix is incomplete because it only gates sink validation and not the read/projection path. No new concurrency or lifecycle risks were identified. No new configuration or storage-format compatibility changes were found. Parallel code paths were checked across FE pruning and Iceberg read/write validation; the read-side MERGE path remains uncovered. Test coverage is insufficient for the two scenarios below. Observability is not the primary concern for these issues. No transaction/persistence changes are involved. User focus: no additional user-provided review focus was specified.

return continueCollectAccessPath(child, variantRootContext);
}
if (!context.accessPathBuilder.isEmpty()
&& cast.getDataType() instanceof NestedColumnPrunable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast handling still drops a non-empty VARIANT access suffix whenever the cast child is VariantType, because VariantType is not NestedColumnPrunable and the method falls through to visiting the child with a fresh empty context. For example, SELECT cast(v AS variant)['a'] FROM variant_tbl WHERE v['k'] IS NOT NULL reaches visitCast() with suffix a; the suffix is discarded here, the sibling predicate can leave only [v, k] at the scan, and the projection then evaluates ['a'] against a pruned VARIANT. The same pattern applies to casts from a VARIANT subpath into complex types, e.g. selecting a field from cast(v['obj'] AS struct<...>). Please propagate the access builder through VARIANT-preserving / VARIANT-to-complex casts where the suffix still refers to the child value, and add coverage combining such a casted subpath projection with a sibling VARIANT predicate.

outputExprs,
deleteCtx,
writesDataFiles(matchedClauses, notMatchedClauses),
Optional.empty(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing writesDataFiles=false only disables the sink-side VARIANT write validation; it does not stop the delete-only plan from reading every visible target column. buildDeleteProjection() still appends all visible columns, so a delete-only MERGE whose join/predicate uses only non-VARIANT columns will still materialize a VARIANT column. On an ORC Iceberg table this reaches IcebergScanNode.validateVariantReadSupported() and fails with the Parquet-only VARIANT read error even though the delete only needs row-id/delete metadata, and on Parquet it performs unnecessary VARIANT reads. This is distinct from the earlier write-validation thread: the remaining failure is introduced by the read/projection side after the new flag is passed. Please make delete-only MERGE project only the operation, row id, lineage/delete columns, and columns required by the join/predicate, or otherwise avoid scanning unused VARIANT columns; add coverage for delete-only MERGE on a table with an unused VARIANT column, including a non-Parquet table if possible.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 32090 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b88e809c8610f4487350cf3137ca8639091fde09, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17718	4206	4177	4177
q2	q3	10787	1409	829	829
q4	4685	494	340	340
q5	7567	2402	2218	2218
q6	263	189	144	144
q7	962	766	621	621
q8	9404	1753	1685	1685
q9	5195	4977	5009	4977
q10	6409	2135	1774	1774
q11	444	285	253	253
q12	630	476	297	297
q13	18076	3484	2773	2773
q14	272	256	238	238
q15	q16	825	759	705	705
q17	997	962	913	913
q18	6993	5883	5474	5474
q19	1311	1340	1291	1291
q20	561	495	283	283
q21	6182	2926	2774	2774
q22	457	377	324	324
Total cold run time: 99738 ms
Total hot run time: 32090 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4992	4965	4836	4836
q2	q3	4974	5368	4635	4635
q4	2330	2339	1410	1410
q5	5064	4889	4854	4854
q6	265	200	142	142
q7	1978	1734	1517	1517
q8	2650	2274	2249	2249
q9	7786	7332	7401	7332
q10	4556	4456	4018	4018
q11	578	390	366	366
q12	740	756	510	510
q13	3135	3522	2824	2824
q14	285	278	251	251
q15	q16	691	701	619	619
q17	1293	1265	1264	1264
q18	7546	6860	6804	6804
q19	1113	1069	1077	1069
q20	2236	2225	1948	1948
q21	5631	4864	4689	4689
q22	555	472	394	394
Total cold run time: 58398 ms
Total hot run time: 51731 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 168881 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b88e809c8610f4487350cf3137ca8639091fde09, data reload: false

query5	4338	643	509	509
query6	328	228	208	208
query7	4242	568	299	299
query8	327	230	222	222
query9	8835	4042	4008	4008
query10	456	345	296	296
query11	5757	2370	2196	2196
query12	191	132	126	126
query13	1296	591	430	430
query14	5931	5399	5084	5084
query14_1	4329	4333	4372	4333
query15	211	205	180	180
query16	985	450	422	422
query17	934	702	583	583
query18	2446	476	338	338
query19	216	198	158	158
query20	134	129	127	127
query21	214	134	117	117
query22	13663	13535	13280	13280
query23	17288	16362	16084	16084
query23_1	16165	16165	16188	16165
query24	7418	1772	1306	1306
query24_1	1328	1289	1324	1289
query25	536	464	420	420
query26	1320	337	171	171
query27	2687	569	353	353
query28	4464	1952	1974	1952
query29	1006	657	520	520
query30	322	245	202	202
query31	1128	1075	958	958
query32	98	78	73	73
query33	550	366	309	309
query34	1185	1130	654	654
query35	775	776	692	692
query36	1299	1336	1179	1179
query37	153	108	94	94
query38	3222	3117	3041	3041
query39	931	937	904	904
query39_1	866	886	885	885
query40	241	151	131	131
query41	74	70	68	68
query42	115	113	110	110
query43	330	331	285	285
query44	
query45	215	205	196	196
query46	1065	1175	703	703
query47	2303	2348	2115	2115
query48	409	434	297	297
query49	654	507	402	402
query50	981	351	267	267
query51	4270	4354	4240	4240
query52	109	110	97	97
query53	258	283	213	213
query54	335	291	277	277
query55	97	91	94	91
query56	323	307	311	307
query57	1406	1400	1324	1324
query58	318	286	283	283
query59	1567	1632	1414	1414
query60	321	334	329	329
query61	182	179	179	179
query62	665	640	562	562
query63	251	244	208	208
query64	2420	814	662	662
query65	
query66	1714	479	359	359
query67	30037	29945	29858	29858
query68	
query69	458	330	304	304
query70	998	971	1029	971
query71	305	275	271	271
query72	2983	2697	2433	2433
query73	822	740	443	443
query74	5055	4930	4724	4724
query75	2653	2590	2267	2267
query76	2296	1123	761	761
query77	383	415	375	375
query78	12174	12140	11642	11642
query79	1426	987	722	722
query80	837	539	458	458
query81	492	279	243	243
query82	1363	155	126	126
query83	349	270	241	241
query84	306	137	112	112
query85	906	550	449	449
query86	439	340	310	310
query87	3418	3396	3200	3200
query88	3537	2682	2657	2657
query89	444	386	334	334
query90	1792	181	183	181
query91	172	174	139	139
query92	80	78	73	73
query93	1439	1516	850	850
query94	617	340	308	308
query95	670	386	416	386
query96	1031	782	343	343
query97	2698	2685	2546	2546
query98	234	225	225	225
query99	1128	1096	979	979
Total cold run time: 252742 ms
Total hot run time: 168881 ms

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 75.58% (2655/3513) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.68% (20829/38804)
Line Coverage 37.39% (197573/528361)
Region Coverage 33.77% (154744/458212)
Branch Coverage 34.87% (67839/194535)

### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#63192

Problem Summary: Doris could not read Iceberg v3 VARIANT columns from Parquet files. This change maps Iceberg VARIANT to Doris VARIANT, validates the Parquet VariantShredding wrapper shape, decodes metadata/value residual data, reads shredded typed_value columns, and prunes shredded Parquet leaves for accessed variant paths. The VARIANT reader and planner changes stay scoped to the Iceberg/Parquet VARIANT path instead of coupling generic nested-column code to Iceberg-only behavior. Typed-only shredded projections stay on native Parquet typed columns when residual value columns are not selected, with counter coverage to catch row-wise performance regressions. Selected residual or complex layouts still fall back to row-wise reconstruction. This also preserves VARIANT subpaths through casts, validates the actual Iceberg data-file format for VARIANT reads, rejects duplicate VariantShredding structural children, preserves null temporal typed leaves without reading their physical value, and keeps delete-only Iceberg MERGE projections from reading unused visible target data columns.

### Release note

Support reading Iceberg v3 VARIANT Parquet columns, including shredded typed_value column pruning and binary/UUID/primitive residual VARIANT values. Writing Iceberg VARIANT columns is rejected with an explicit unsupported error.

### Check List (For Author)

- Test: Regression test / Unit Test / Manual test

    - Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.DirectTypedOnlyReaderCountersUseNativePath:ParquetVariantReaderTest.VariantReaderCountersUseRowWiseWhenResidualValueSelected:ParquetVariantReaderTest.RowWisePreservesExplicitVariantNullShreddedArrayElement:ParquetVariantReaderTest.RowWiseRejectsMissingShreddedArrayElement' (4 tests passed)

    - Unit Test: ./run-be-ut.sh --run -f 'ParquetVariantReaderTest.RowWisePreservesNullComplexTypedArrayElement:ParquetVariantReaderTest.RowWiseRejectsMissingShreddedArrayElement' (2 tests passed)

    - Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.*' (85 tests passed on rerun; the first attempt failed before tests in OpenBLAS CMake getarch bootstrap)

    - Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.*:NestedColumnAccessHelperTest.*' (127 tests passed)

    - Unit Test: ./run-be-ut.sh --run --filter='IcebergReaderCreateColumnIdsTest.*' (9 tests passed)

    - Unit Test: ./run-be-ut.sh --run --filter=ParquetVariantReaderTest.RejectVariantSchemaWithDuplicateStructuralChild:ParquetVariantReaderTest.DirectTypedOnlyPreservesTemporalLeafNull (2 tests passed; rerun after clang-format also passed)

    - Unit Test: ./run-be-ut.sh --run --filter=ParquetVariantReaderTest.DirectTypedOnlyReaderCountersUseNativePath (1 test passed after latest changes)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testVariantComparisonPredicateCollectsWholeVariantOperand (1 test passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testVariantCastProjectionKeepsSubPathWithSiblingPredicate (1 test passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest (70 tests passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest#testExplodeSubqueryJoinAggAccessPaths (1 test passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.datasource.iceberg.source.IcebergScanNodeTest#testValidateVariantDataFileFormatRejectsOrcSplit (1 test passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.datasource.iceberg.source.IcebergScanNodeTest (6 tests passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.IcebergMergeCommandTest#testDeleteProjectionDoesNotReadVisibleTargetColumns (1 test passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest (11 tests passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.datasource.iceberg.IcebergUtilsTest (passed)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.SlotTypeReplacerTest (5 tests passed)

    - Regression test: performance regression coverage is included in regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy, including profile assertions that typed-only projections increment VariantDirectTypedValueReadRows and keep VariantRowWiseReadRows at 0. Not run locally in this worktree because no local Doris cluster/output BE+FE runtime is available.

    - Regression test: Added regression-test/suites/external_table_p0/iceberg/test_iceberg_variant_table_path.groovy to exercise the Iceberg REST catalog table path with nested VARIANT access and profile read-column assertions. Not run locally because Docker access to spark-iceberg is unavailable in this worktree.

    - Manual test: PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/clang-format.sh

    - Manual test: PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/check-format.sh

    - Manual test: git diff --check

    - Manual test: cd fe && mvn -pl fe-core checkstyle:check -DskipTests

    - Static analysis: CLANG_TIDY_BINARY=/tmp/clang-tidy-resource-filter build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (passed for changed lines after adding the clang-tidy resource-dir and filtering a pre-existing be/src/core/types.h clang-tidy-nolint diagnostic; the unwrapped script was blocked by that existing header diagnostic)

- Behavior changed: Yes. Doris can read Iceberg v3 VARIANT Parquet columns, supports typed-only shredded projection pruning on native typed columns, reconstructs selected residual or complex layouts row-wise, rejects malformed VariantShredding schemas and missing present shredded array payloads, preserves null complex/temporal typed values and explicit Variant null array elements, forces root access for whole-VARIANT scalar/comparison consumers while preserving literal subpath pruning for typed reads, recursively rejects Iceberg VARIANT reads from non-Parquet data files during scan planning, avoids reading unused target data columns for delete-only Iceberg MERGE, and rejects Iceberg VARIANT data-file writes explicitly.

- Does this need documentation: No
@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 74.44% (233/313) 🎉
Increment coverage report
Complete coverage report

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking correctness issue in the current PR.

Critical checkpoint conclusions:

  • Goal/test proof: The PR aims to add Iceberg/Parquet VARIANT read support and nested pruning coverage. There is broad test coverage, but the non-Parquet Iceberg validation is not proven for pruned complex columns that contain an unselected VARIANT sibling.
  • Scope/focus: The change is mostly focused on VARIANT read/pruning support, but the new read validation is applied at the original column level rather than the materialized/pruned slot level.
  • Concurrency/lifecycle/config/compatibility: No new concurrency, lifecycle, or config issue found in the reviewed paths. The issue is a behavioral regression for valid ORC Iceberg reads of non-VARIANT subfields.
  • Parallel paths: This affects both table-format validation and per-data-file validation because both call findVariantReadColumnName().
  • Tests: Existing tests cover nested VARIANT rejection, but not the inverse case where nested pruning removes the VARIANT child before reading an ORC complex column.
  • Observability/transactions/data writes: No additional concern found beyond the inline issue.
  • Performance: No separate blocking performance issue found.
  • User focus: No additional user-provided review focus was specified.

for (SlotDescriptor slot : desc.getSlots()) {
Column column = slot.getColumn();
if (containsVariantType(column.getType())) {
return column.getName();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation uses slot.getColumn().getType(), which is the original Iceberg table column type, not the materialized slot type after nested-column pruning. That over-rejects legal non-Parquet reads when only non-VARIANT subfields are selected. For example, an ORC Iceberg column s STRUCT<a:int, v:variant> queried as SELECT s.a FROM tbl can be pruned/materialized as STRUCT<a:int> and does not read any VARIANT data, but this still sees the original s.v and throws the Parquet-only VARIANT error. The same false positive applies to validateVariantDataFileFormat() because it calls this helper too. Please validate the pruned/materialized slot type or selected access paths instead, and add coverage for reading a non-VARIANT subfield from a complex ORC Iceberg column that has an unselected VARIANT sibling.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found several correctness issues that should be addressed before merge. Critical checkpoint conclusions: the PR goal appears to be adding Iceberg/Parquet VARIANT read/write support and nested pruning; the implementation and tests cover many paths, but the current code still fails valid VARIANT array values and over-rejects non-Parquet Iceberg reads that do not actually materialize VARIANT data. The change is broad and not fully isolated due to shared FE pruning and BE reader paths. I did not find new concurrency, lifecycle, dynamic config, persistence, or transaction concerns in the reviewed paths. Compatibility risk is limited to new VARIANT support, but mixed file-format validation needs to distinguish selected data from full catalog schema. Test coverage is substantial, but missing coverage for heterogeneous/complex VARIANT array materialization and ORC Iceberg complex columns where only non-VARIANT subfields are selected. No additional user-provided review focus was specified.

}

private String findVariantReadColumnName() {
for (SlotDescriptor slot : desc.getSlots()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now checks the full catalog column type, so it rejects non-Parquet Iceberg scans even when pruning/materialization only reads non-VARIANT subfields. For example, an ORC table with s STRUCT<a INT, v VARIANT> and a query SELECT s.a FROM t can materialize only s.a, but slot.getColumn().getType() still contains s.v and findVariantReadColumnName() throws the Parquet-only VARIANT error. The validation needs to inspect the selected/pruned access paths (or the effective materialized type) and reject only when a VARIANT subfield is actually read. This is distinct from the earlier recursive-validation threads: the recursive check was needed for selected nested VARIANTs, but applying it to the whole catalog column over-rejects legal non-VARIANT reads.

const size_t elements = array.size();
value.field = Field::create_field<TYPE_ARRAY>(std::move(array));
fill_field_type_info(&value);
if (value.base_scalar_type_id == INVALID_TYPE) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a residual VARIANT array cannot be represented with one scalar element type, this fallback replaces every decoded element with null. Valid VARIANT values such as [1, "x"], [{}, 1], or arrays containing objects reach fill_field_type_info() with INVALID_TYPE, then are stored as JSONB [null, ...], silently losing the actual element values. The fallback should serialize the decoded array contents (or otherwise preserve the VARIANT structure), not synthesize a null array except for genuinely null elements; please add coverage for heterogeneous and complex residual arrays.

FieldWithDataType value;
const size_t elements_count = array.size();
value.field = Field::create_field<TYPE_ARRAY>(std::move(array));
fill_variant_field_info(&value);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typed/shredded array path has the same data-loss fallback: after each element is reconstructed into array, any array whose inferred base_scalar_type_id is INVALID_TYPE is converted to [null, ...]. A valid typed VARIANT array with struct/list/map elements or mixed element kinds will therefore return the correct length but all element values as null. Please preserve the reconstructed element values when materializing JSONB and add typed-array coverage for non-null complex/mixed elements.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31048 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b7402399a565dc0a3e621ecb4d61cfd48a4eb2fc, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17767	3923	3848	3848
q2	q3	10827	1353	820	820
q4	4683	471	347	347
q5	7578	2184	2110	2110
q6	229	177	135	135
q7	910	757	631	631
q8	9354	1766	1611	1611
q9	6120	4875	4867	4867
q10	6444	2102	1808	1808
q11	438	274	246	246
q12	695	413	293	293
q13	18225	3337	2777	2777
q14	260	255	238	238
q15	q16	821	798	715	715
q17	989	973	998	973
q18	6794	5710	5591	5591
q19	1211	1290	1101	1101
q20	508	406	272	272
q21	5809	2602	2358	2358
q22	431	352	307	307
Total cold run time: 100093 ms
Total hot run time: 31048 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4317	4074	4109	4074
q2	q3	4488	4913	4304	4304
q4	2083	2200	1378	1378
q5	4372	4252	4256	4252
q6	226	178	129	129
q7	2032	1904	1703	1703
q8	2450	2102	2013	2013
q9	7850	7781	7769	7769
q10	4523	4463	4120	4120
q11	563	419	377	377
q12	706	849	591	591
q13	3299	3636	3009	3009
q14	304	320	280	280
q15	q16	734	732	640	640
q17	1350	1281	1310	1281
q18	7934	7264	7040	7040
q19	1116	1084	1101	1084
q20	2209	2202	1922	1922
q21	5303	4672	4542	4542
q22	515	470	411	411
Total cold run time: 56374 ms
Total hot run time: 50919 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 168382 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b7402399a565dc0a3e621ecb4d61cfd48a4eb2fc, data reload: false

query5	4321	657	505	505
query6	332	219	203	203
query7	4214	575	294	294
query8	328	235	232	232
query9	8818	4000	4017	4000
query10	441	351	300	300
query11	5776	2426	2195	2195
query12	183	130	126	126
query13	1271	603	435	435
query14	5983	5378	5036	5036
query14_1	4360	4374	4317	4317
query15	207	203	181	181
query16	1009	442	424	424
query17	1144	710	599	599
query18	2731	475	352	352
query19	213	204	163	163
query20	138	134	128	128
query21	218	135	116	116
query22	13609	13533	13287	13287
query23	17065	16409	16068	16068
query23_1	16203	16105	16209	16105
query24	7543	1787	1265	1265
query24_1	1314	1292	1320	1292
query25	562	486	421	421
query26	1294	308	169	169
query27	2735	537	345	345
query28	4422	1956	1950	1950
query29	1032	630	499	499
query30	335	244	201	201
query31	1100	1046	928	928
query32	86	75	71	71
query33	534	370	297	297
query34	1183	1162	654	654
query35	755	769	667	667
query36	1296	1293	1160	1160
query37	153	108	104	104
query38	3214	3154	3074	3074
query39	918	920	897	897
query39_1	878	892	872	872
query40	249	145	128	128
query41	68	64	64	64
query42	112	114	110	110
query43	322	323	284	284
query44	
query45	279	202	200	200
query46	1067	1237	745	745
query47	2298	2335	2248	2248
query48	381	409	290	290
query49	637	491	400	400
query50	983	338	258	258
query51	4332	4280	4166	4166
query52	110	106	101	101
query53	259	282	207	207
query54	315	283	273	273
query55	96	92	88	88
query56	309	320	323	320
query57	1409	1401	1266	1266
query58	303	269	273	269
query59	1526	1671	1396	1396
query60	319	321	312	312
query61	186	178	186	178
query62	681	623	553	553
query63	251	207	216	207
query64	2457	864	713	713
query65	
query66	1663	499	393	393
query67	30057	29894	29684	29684
query68	
query69	459	342	301	301
query70	1046	995	941	941
query71	315	291	278	278
query72	2967	2703	1984	1984
query73	840	728	423	423
query74	5051	4905	4705	4705
query75	2675	2587	2258	2258
query76	2305	1180	801	801
query77	401	430	339	339
query78	12058	11971	11548	11548
query79	1242	1033	754	754
query80	584	547	468	468
query81	459	281	236	236
query82	240	163	122	122
query83	357	272	244	244
query84	257	141	109	109
query85	863	530	461	461
query86	396	373	315	315
query87	3384	3456	3256	3256
query88	3547	2689	2652	2652
query89	426	388	337	337
query90	2179	188	189	188
query91	186	168	142	142
query92	78	80	74	74
query93	1449	1528	873	873
query94	553	353	314	314
query95	696	392	348	348
query96	1040	783	329	329
query97	2704	2688	2569	2569
query98	256	243	244	243
query99	1112	1098	975	975
Total cold run time: 251753 ms
Total hot run time: 168382 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 75.50% (2668/3534) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.68% (20829/38804)
Line Coverage 37.40% (197595/528382)
Region Coverage 33.77% (154752/458224)
Branch Coverage 34.88% (67857/194543)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 75.50% (2668/3534) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.68% (20829/38804)
Line Coverage 37.40% (197591/528382)
Region Coverage 33.75% (154643/458224)
Branch Coverage 34.87% (67844/194543)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 77.54% (2503/3228) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.96% (27330/37982)
Line Coverage 55.38% (291682/526699)
Region Coverage 52.33% (241929/462341)
Branch Coverage 53.69% (104760/195104)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 43.32% (146/337) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants