Skip to content

[feature](iceberg) Support reading Iceberg variant from Parquet#63192

Draft
eldenmoon wants to merge 1 commit into
apache:masterfrom
eldenmoon:codex/iceberg-v3-variant
Draft

[feature](iceberg) Support reading Iceberg variant from Parquet#63192
eldenmoon wants to merge 1 commit into
apache:masterfrom
eldenmoon:codex/iceberg-v3-variant

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon commented May 12, 2026

What problem does this PR solve?

Issue Number: N/A

Related PR: #63192

Problem Summary: Doris could not read Iceberg v3 VARIANT columns from Parquet files. This change maps Iceberg VARIANT to Doris VARIANT, validates the Parquet VARIANT wrapper shape from the VariantShredding spec, decodes unshredded metadata/value encoding, reads shredded typed_value columns, and prunes shredded Parquet leaf columns for accessed variant paths with profile observability. It keeps selected typed-only shredded projections on native Parquet typed columns when residual value columns are not selected, including top-level, nested, root-array, nested-array, typed-only, value-only residual, typed-map key lookup, binary, temporal, UUID, optional top-level, nullable ARRAY/MAP layouts, and Iceberg field-id pruning. It falls back to row-wise reconstruction only for complex or selected-residual layouts, while preserving binary, binary-array, complex binary arrays, residual null, UUID, decimal-array, typed-array null element, empty-object, root-null, out-of-order residual object, and user fields named value semantics. It also keeps full VARIANT projection separate from predicate subpath pruning, supports element paths produced by variant array explode, forces a root read for dynamic VARIANT element access, and explicitly rejects Iceberg VARIANT writes because this PR only implements read support.

Release note

Support reading Iceberg v3 VARIANT Parquet columns, including shredded typed_value column pruning and binary/UUID VARIANT values. Non-finite floating-point VARIANT values remain unsupported and return explicit errors. Writing Iceberg VARIANT columns is rejected with an explicit unsupported error.

Check List (For Author)

  • Test: Regression test / Unit Test / Manual test / Static analysis

    • Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.:IcebergReaderCreateColumnIdsTest.:NestedColumnAccessHelperTest.*' (97 tests passed: 68 ParquetVariantReaderTest, 7 IcebergReaderCreateColumnIdsTest, 22 NestedColumnAccessHelperTest)

    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest,org.apache.doris.planner.IcebergTableSinkTest,org.apache.doris.planner.IcebergMergeSinkTest (55 tests passed: 51 PruneNestedColumnTest, 1 IcebergTableSinkTest, 3 IcebergMergeSinkTest; Maven reactor/checkstyle succeeded)

    • Regression test: performance regression coverage is included in regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy; not run locally in this worktree because no local Doris cluster/output BE+FE runtime is available.

    • Manual test: env PATH=/tmp/codex-clang-format:$PATH build-support/clang-format.sh

    • Manual test: env PATH=/tmp/codex-clang-format:$PATH build-support/check-format.sh

    • Manual test: git diff --cached --check

    • Static analysis: CLANG_TIDY_BINARY=/mnt/disk1/claude-max/ldb_toolchain16/bin/clang-tidy build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (attempted earlier in this PR loop; local clang-tidy could not analyze the changed files because existing be/src/util/jni-util.h static_assert(false) diagnostics are promoted to errors before producing actionable changed-line diagnostics.)

  • Behavior changed: Yes. Doris can read Iceberg v3 VARIANT Parquet columns, supports spec-compliant binary and UUID VARIANT values, validates wrapper fields more strictly, avoids row-wise reconstruction for supported typed-only shredded projections, preserves residual binary/null values, typed-array null elements, nullable complex pruning, user value-only fields, and complex array elements with binary leaves, preserves full VARIANT projection when predicates read typed subpaths, handles dynamic root element access, optional top-level VARIANT, root/nested variant-array element paths, typed-map key lookup paths, and rejects Iceberg VARIANT writes explicitly.

  • Does this need documentation: No

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from cab85b8 to e9e3bfd Compare May 12, 2026 19:31
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@github-actions
Copy link
Copy Markdown
Contributor

OpenCode automated review failed and did not complete.

Error: Review step was skipped (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/25757447037

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:

I found one blocking issue in the added regression test: the local single-BE copy path does not match the file_path used by the local TVF, so the test can fail in the exact environment that branch is intended to support.

Critical checkpoint conclusions:

  • Goal: add Iceberg/Parquet VARIANT read support, including shredded projection. The implementation and regression coverage mostly target that goal, but the new regression test has a path setup bug.
  • Scope: the production changes are focused on Parquet schema parsing, variant reconstruction, column pruning, and Iceberg type mapping.
  • Concurrency/lifecycle: no new shared mutable concurrent state or non-obvious lifecycle ownership issue found in the reviewed PR diff.
  • Configuration/compatibility: no new config items or persisted storage-format changes found; FE/BE type mapping paths for Iceberg VARIANT are updated.
  • Parallel paths: Hive, Iceberg, and local Parquet pruning paths were considered; the test issue is distinct from production pruning logic.
  • Tests: regression coverage was added, but the local-file staging logic can make the new test fail before validating the feature.
  • Observability/performance: added ParquetReadColumnPaths profile string is useful for validating pruning; no blocking observability or hot-path issue found beyond the test blocker.

User focus: no additional user-provided review focus was present.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a correctness blocker in the new shredded VARIANT pruning logic. The implementation prunes the unshredded value leaf whenever a matching typed_value path exists, but Iceberg shredded VARIANT can still carry residual/unrepresentable values for that field in value, so queries can silently return NULL or partial objects for those rows.

Critical checkpoint conclusions:

  • Goal/test: the PR adds Iceberg v3 VARIANT reading and pruning tests, but the tests only cover fully typed shredded fields and do not prove residual fallback correctness.
  • Scope/focus: the change is mostly focused on Parquet/Iceberg VARIANT support.
  • Concurrency/lifecycle/config: no new concurrency, non-trivial lifecycle, or config behavior found in the reviewed PR diff.
  • Compatibility: adds new type mapping; no storage-format persistence changes found.
  • Parallel paths: the same pruning issue exists in both standalone/Hive Parquet and Iceberg Parquet helper paths.
  • Tests: missing mixed shredded/residual cases where a selected typed path is absent or has an incompatible type in typed_value but exists in value.
  • Observability/performance: profile string helps inspect selected leaves; no additional blocking observability issue found.
  • Data correctness: blocking issue below can cause incorrect query results after column pruning.

No additional user-provided focus points were present.

Comment thread be/src/format/table/hive/hive_parquet_nested_column_utils.cpp Outdated
Comment thread be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp Outdated
@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from e9e3bfd to 5fe9ca5 Compare May 12, 2026 20:03
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

1 similar comment
@eldenmoon
Copy link
Copy Markdown
Member Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full PR with the Doris code-review checklist. I found a blocking correctness issue in nested VARIANT pruning for deeper accesses under shredded fields. Goal/test: the PR adds Iceberg/Parquet VARIANT read support and includes a regression for top-level shredded field reads, but it does not cover deeper access through a shredded field residual. Scope: the change is focused, but the duplicated Hive/Iceberg pruning helper needs the same fix. Concurrency/lifecycle/config/transaction/persistence: no new concurrency, lifecycle, config, transaction, storage-format write, or persistence concerns found. Parallel paths: the same issue exists in both Hive/local Parquet and Iceberg Parquet helpers. Tests: existing tests cover unshredded reads and top-level shredded field pruning, but should add a mixed residual case such as v['metric']['x'] where metric is shredded and some rows store an object in typed_value.metric.value. Observability: the new profile string is useful for validating selected leaves. User focus: no additional user-provided review focus was supplied.

} else {
std::vector<std::vector<std::string>> child_paths {
std::vector<std::string>(path.begin() + 1, path.end())};
found_typed_path = extract_nested_column_ids_by_name(*typed_child, child_paths,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles v['metric'], but deeper accesses under a shredded field still prune the wrong residual. For a schema like typed_value.metric { value, typed_value int64 }, the access path for v['metric']['x'] reaches this branch with typed_child == metric and path.size() > 1. metric is parsed as a regular struct, not TYPE_VARIANT, so extract_nested_column_ids_by_name(*typed_child, {['x']}, ...) finds no child named x and returns false. The code then falls back to top-level v.value, even though the Parquet VARIANT shredding spec stores non-int/object residuals for metric in v.typed_value.metric.value and forbids duplicating shredded field keys in top-level v.value. That makes deeper projections/filters silently return missing/null for rows where metric is an object stored in the field-level residual. Please treat these unannotated shredded field groups (value/typed_value) as variant-like during recursive pruning, and add a regression that projects a subpath from a shredded field whose residual is stored in typed_value.<field>.value.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest push. VARIANT user keys now use exact matching while structural Parquet fields still use structural lookup. The pruning helper now treats unannotated value/typed_value shredded field groups as variant-like, so v['metric']['x'] selects v.typed_value.metric.value instead of top-level v.value. Added profile assertions for the deeper residual path.

} else {
std::vector<std::vector<std::string>> child_paths {
std::vector<std::string>(path.begin() + 1, path.end())};
found_typed_path = extract_nested_column_ids_by_name(*typed_child, child_paths,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same distinct deeper-path issue as the Hive/local helper, separate from the already-discussed top-level residual question. When Iceberg pruning sees an access like v['metric']['x'] and metric is a shredded field group with value plus typed_value, the recursive call treats metric as a plain struct and returns false because there is no child named x. The fallback then selects top-level v.value, but Iceberg/Parquet shredded field residuals for metric live in v.typed_value.metric.value; top-level v.value must not contain that shredded key. Iceberg scans can therefore lose residual object values for deeper projections or filters under a shredded field. Please make the recursive pruning recognize the unannotated value/typed_value shredded-field layout and cover this with an Iceberg/local regression case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest push. VARIANT user keys now use exact matching while structural Parquet fields still use structural lookup. The pruning helper now treats unannotated value/typed_value shredded field groups as variant-like, so v['metric']['x'] selects v.typed_value.metric.value instead of top-level v.value. Added profile assertions for the deeper residual path.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: request changes.

Critical checkpoint conclusions:

  • Goal/test coverage: the PR adds Iceberg VARIANT Parquet reading, type mapping, pruning observability, and regression/unit coverage. The main scenario is covered, but a case-sensitive key path is not covered and currently regresses correctness.
  • Scope/focus: the change is focused on Iceberg/Parquet VARIANT support, though duplicated Hive/Iceberg pruning helpers carry the same issue.
  • Concurrency/lifecycle: no new shared mutable state, threads, locks, or static initialization hazards found in the reviewed paths.
  • Configuration/compatibility: no new configs or storage-format writes; this is a reader/type-mapping change. Mixed files with non-VARIANT types continue through existing paths.
  • Parallel paths: the Hive/local and Iceberg Parquet pruning paths were both reviewed; both have the same case-sensitivity bug and are commented separately.
  • Error handling/memory: Status returns in the new reader path are generally propagated; no ignored Status or untracked large persistent allocation issue found beyond the correctness issue raised.
  • Data correctness: blocking issue found: shredded VARIANT field lookup lowercases user path components, so distinct keys such as a and A can be pruned/read as the same field.
  • Observability/performance: the profile leaf-path observable is useful; no additional blocker found.

User focus: no additional user-provided review focus was specified.

Comment thread be/src/format/table/hive/hive_parquet_nested_column_utils.cpp Outdated
Comment thread be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp Outdated
@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from 5fe9ca5 to fa098c0 Compare May 12, 2026 20:43
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29804 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e9e3bfd819ba9d8ccea3f9b57abb35147175a7e8, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17678	3922	3919	3919
q2	q3	10718	872	608	608
q4	4655	466	351	351
q5	7465	1357	1156	1156
q6	207	173	140	140
q7	940	940	750	750
q8	9516	1396	1309	1309
q9	6069	5362	5345	5345
q10	6307	2102	1891	1891
q11	486	266	259	259
q12	687	430	299	299
q13	18191	3334	2785	2785
q14	286	284	268	268
q15	q16	904	845	793	793
q17	958	995	742	742
q18	6501	5679	5525	5525
q19	1169	1192	1132	1132
q20	512	406	259	259
q21	4752	2425	1946	1946
q22	459	402	327	327
Total cold run time: 98460 ms
Total hot run time: 29804 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4662	4580	4561	4561
q2	q3	4703	4784	4197	4197
q4	2175	2189	1420	1420
q5	5212	4969	5227	4969
q6	202	176	143	143
q7	2063	1832	1618	1618
q8	3350	3117	3095	3095
q9	8461	8840	8379	8379
q10	4501	4555	4267	4267
q11	633	423	411	411
q12	738	748	524	524
q13	3188	3594	2917	2917
q14	302	300	290	290
q15	q16	755	778	723	723
q17	1379	1276	1282	1276
q18	8031	7128	7115	7115
q19	1158	1175	1154	1154
q20	2293	2287	1992	1992
q21	6276	5486	4800	4800
q22	531	473	400	400
Total cold run time: 60613 ms
Total hot run time: 54251 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29785 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5fe9ca52dd2edab6a76b7083da4f51d88076c8c5, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17624	3891	3801	3801
q2	q3	10699	879	587	587
q4	4666	467	347	347
q5	7460	1325	1139	1139
q6	201	169	142	142
q7	918	948	769	769
q8	9690	1438	1262	1262
q9	6853	5515	5462	5462
q10	6327	2070	1808	1808
q11	481	273	258	258
q12	688	422	301	301
q13	18192	3299	2725	2725
q14	298	282	264	264
q15	q16	898	868	787	787
q17	1197	1046	735	735
q18	6458	5747	5601	5601
q19	1610	1245	1108	1108
q20	521	414	416	414
q21	4731	2362	1928	1928
q22	464	395	347	347
Total cold run time: 99976 ms
Total hot run time: 29785 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4759	4521	4635	4521
q2	q3	4621	4757	4188	4188
q4	2159	2234	1421	1421
q5	4977	4949	5181	4949
q6	207	174	135	135
q7	2087	1822	1621	1621
q8	3333	3077	3141	3077
q9	8595	8393	8429	8393
q10	4534	4533	4220	4220
q11	588	417	419	417
q12	718	809	511	511
q13	3281	3632	2864	2864
q14	310	312	281	281
q15	q16	773	798	713	713
q17	1309	1304	1293	1293
q18	7986	7053	7094	7053
q19	1161	1199	1145	1145
q20	2214	2201	1934	1934
q21	6042	5384	4784	4784
q22	530	475	405	405
Total cold run time: 60184 ms
Total hot run time: 53925 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 25.00% (1/4) 🎉
Increment coverage report
Complete coverage report

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two additional correctness issues in the Parquet VARIANT reconstruction path.

Critical checkpoint conclusions:

  • Goal/test: The PR adds Iceberg v3 Parquet VARIANT reading and shredded column pruning, with regression and FE unit coverage for the common unshredded/shredded object paths. However, valid shredded array layouts and non-finite shredded floats are not covered and can decode incorrectly or fail.
  • Scope/focus: The implementation is mostly focused on the new reader/pruning behavior, though the duplicated Hive/Iceberg pruning helper logic remains a maintainability risk rather than a blocker.
  • Concurrency/lifecycle: The reviewed changes are per-reader/per-query state and do not introduce new shared mutable state, locks, background threads, or special static lifecycle dependencies.
  • Config/compatibility: No new config items or storage-format writes are introduced. The change reads a standard Parquet/Iceberg format; mixed-version compatibility concerns are limited to reader capability.
  • Parallel paths: Hive/local and Iceberg pruning paths are both updated. Standalone Parquet uses the Hive-style name pruning helper.
  • Conditional checks: The main conditional logic around shredded value/typed_value follows the Parquet Variant Shredding spec for object fields, but the array element check currently misses a spec-valid layout.
  • Tests/results: Existing tests cover top-level object shredding, deeper residual paths, case-sensitive keys, and profile observability. Missing coverage remains for arrays whose element group omits value, and for NaN/Inf in shredded float/double typed values.
  • Observability: The added ParquetReadColumnPaths profile string is useful for pruning verification and appears lightweight.
  • Transactions/persistence/data writes: Not applicable; this is read-path only.
  • FE/BE variable passing: Iceberg type mapping and access-path rewriting are updated for VARIANT; no additional thrift variable propagation issue found.
  • Performance: The JSON reconstruction path is inherently allocation-heavy but limited to VARIANT decoding. No additional performance blocker found beyond the correctness issues below.

No additional user-provided review focus was specified.

std::string element_json;
bool element_present = false;
if (element_schema != nullptr && find_child_idx(*element_schema, "value") >= 0) {
RETURN_IF_ERROR(_variant_to_json(*element_schema, values[i], &metadata, &element_json,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only recognizes shredded array elements when the element group contains a value child, but the Parquet Variant Shredding spec allows array element to omit value when elements are fully shredded as a specific typed_value type. For a valid schema like typed_value (LIST) -> element { optional binary typed_value (STRING) }, this branch falls through to _field_to_json(*element_schema, ...) and reconstructs each element as an object such as {"typed_value":"x"} instead of "x". Please treat an element group with either value or typed_value as a shredded variant element and add coverage for the value-omitted array-element layout.

Comment thread be/src/format/parquet/vparquet_column_reader.cpp Outdated
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170353 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e9e3bfd819ba9d8ccea3f9b57abb35147175a7e8, data reload: false

query5	4327	649	519	519
query6	333	223	206	206
query7	4254	565	306	306
query8	323	252	224	224
query9	8875	4094	4024	4024
query10	445	351	314	314
query11	5803	2358	2258	2258
query12	186	132	123	123
query13	1273	625	441	441
query14	6085	5342	5029	5029
query14_1	4355	4361	4357	4357
query15	216	201	181	181
query16	1000	450	394	394
query17	1112	764	607	607
query18	2510	480	345	345
query19	209	203	163	163
query20	138	132	131	131
query21	209	137	118	118
query22	13639	13556	13372	13372
query23	17199	16360	16093	16093
query23_1	16079	16237	16181	16181
query24	7431	1793	1365	1365
query24_1	1355	1390	1388	1388
query25	603	540	489	489
query26	1352	324	181	181
query27	2678	608	359	359
query28	4478	2023	2012	2012
query29	994	657	554	554
query30	313	246	202	202
query31	1118	1087	924	924
query32	87	75	79	75
query33	552	376	306	306
query34	1183	1132	666	666
query35	776	778	686	686
query36	1328	1361	1199	1199
query37	157	108	95	95
query38	3190	3171	3051	3051
query39	962	931	904	904
query39_1	894	867	877	867
query40	258	167	145	145
query41	71	67	68	67
query42	117	115	113	113
query43	326	331	302	302
query44	
query45	217	208	202	202
query46	1077	1200	725	725
query47	2281	2334	2185	2185
query48	406	416	292	292
query49	652	557	470	470
query50	722	300	223	223
query51	4324	4278	4321	4278
query52	109	105	98	98
query53	253	282	209	209
query54	327	293	268	268
query55	96	92	87	87
query56	311	334	329	329
query57	1413	1401	1298	1298
query58	319	281	269	269
query59	1526	1640	1377	1377
query60	351	352	340	340
query61	203	152	156	152
query62	665	607	563	563
query63	254	199	205	199
query64	2406	813	677	677
query65	
query66	1716	512	389	389
query67	30267	29983	29905	29905
query68	
query69	461	336	301	301
query70	994	930	966	930
query71	299	273	271	271
query72	2936	2803	2490	2490
query73	858	765	404	404
query74	5055	4967	4727	4727
query75	2765	2670	2323	2323
query76	2291	1120	735	735
query77	421	435	348	348
query78	12873	13050	12248	12248
query79	1506	942	760	760
query80	1364	579	501	501
query81	532	280	244	244
query82	1020	159	128	128
query83	318	284	247	247
query84	266	139	109	109
query85	894	550	432	432
query86	449	340	305	305
query87	3422	3352	3232	3232
query88	3518	2667	2652	2652
query89	442	388	338	338
query90	1906	186	184	184
query91	178	168	137	137
query92	80	80	73	73
query93	1102	940	560	560
query94	710	348	294	294
query95	655	466	350	350
query96	1061	731	346	346
query97	2717	2699	2591	2591
query98	240	238	228	228
query99	1131	1084	979	979
Total cold run time: 253877 ms
Total hot run time: 170353 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169901 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5fe9ca52dd2edab6a76b7083da4f51d88076c8c5, data reload: false

query5	4329	655	519	519
query6	344	229	198	198
query7	4251	540	302	302
query8	334	234	206	206
query9	8828	4061	4005	4005
query10	447	353	316	316
query11	5828	2369	2201	2201
query12	178	128	128	128
query13	1284	654	423	423
query14	6612	5345	5024	5024
query14_1	4337	4351	4298	4298
query15	210	202	185	185
query16	1019	447	427	427
query17	1118	760	632	632
query18	2495	500	359	359
query19	218	208	159	159
query20	137	131	129	129
query21	217	138	115	115
query22	13679	13558	13470	13470
query23	17261	16299	15983	15983
query23_1	16164	16128	16226	16128
query24	7449	1767	1367	1367
query24_1	1373	1361	1361	1361
query25	593	546	492	492
query26	1294	333	175	175
query27	2716	614	341	341
query28	4469	1985	1985	1985
query29	1058	666	555	555
query30	311	250	203	203
query31	1131	1068	947	947
query32	93	81	78	78
query33	555	356	312	312
query34	1173	1145	626	626
query35	767	791	675	675
query36	1312	1361	1221	1221
query37	157	110	99	99
query38	3181	3151	3040	3040
query39	920	916	898	898
query39_1	885	881	874	874
query40	254	167	144	144
query41	76	69	69	69
query42	114	116	112	112
query43	320	329	287	287
query44	
query45	216	205	199	199
query46	1082	1209	710	710
query47	2332	2244	2129	2129
query48	408	421	324	324
query49	648	543	455	455
query50	694	284	219	219
query51	4281	4278	4175	4175
query52	104	108	97	97
query53	257	279	218	218
query54	334	285	280	280
query55	94	94	86	86
query56	321	317	340	317
query57	1405	1389	1338	1338
query58	314	284	280	280
query59	1540	1586	1379	1379
query60	362	358	345	345
query61	209	163	156	156
query62	675	623	559	559
query63	245	197	204	197
query64	2406	831	680	680
query65	
query66	1728	542	401	401
query67	30018	29869	29896	29869
query68	
query69	446	342	311	311
query70	1014	1021	971	971
query71	310	280	271	271
query72	3032	2707	2389	2389
query73	870	732	443	443
query74	5029	4890	4753	4753
query75	2763	2674	2344	2344
query76	2304	1109	742	742
query77	415	441	353	353
query78	12866	12893	12172	12172
query79	1554	932	750	750
query80	1362	566	529	529
query81	519	279	239	239
query82	963	162	119	119
query83	372	286	250	250
query84	268	138	113	113
query85	939	512	440	440
query86	491	337	346	337
query87	3477	3365	3223	3223
query88	3527	2653	2608	2608
query89	446	382	338	338
query90	1917	176	186	176
query91	181	170	138	138
query92	80	77	72	72
query93	1202	971	548	548
query94	715	345	298	298
query95	685	460	354	354
query96	1034	732	357	357
query97	2707	2674	2534	2534
query98	236	229	226	226
query99	1113	1129	1008	1008
Total cold run time: 254910 ms
Total hot run time: 169901 ms

@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from fa098c0 to bf0c548 Compare May 12, 2026 21:06
@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found additional blocking compatibility issues for valid Parquet VARIANT shredding layouts that omit value when data is fully represented by typed_value. Existing review threads already cover non-finite typed floats, typed array elements, case-sensitive keys, and value/residual pruning, so I did not duplicate those.

Critical checkpoints:

  • Goal/test: the PR aims to read Iceberg/Parquet VARIANT, including shredded layouts, and adds local TVF regression coverage, but coverage does not include typed-value-only top-level or nested shredded field groups.
  • Scope/focus: the change is focused, but the schema/pruning logic is stricter than the Parquet shredding layout it is trying to support.
  • Concurrency/lifecycle/config/transactions/persistence: no new concurrency, lifecycle, config, transaction, or persistence concerns found in the reviewed paths.
  • Parallel paths: Hive/local and Iceberg pruning have duplicated logic; both need the typed-value-only fix.
  • Compatibility/data correctness: current code rejects or prunes away valid typed-value-only shredded data, causing scan failure or null/missing results.
  • Tests: existing tests cover unshredded/shredded happy paths and several pruning observables, but miss the typed-value-only layouts described in the inline comments.
  • Observability/performance: no additional observability or performance blocker found beyond the added profile string being used by tests.
  • User focus: no additional user-provided review focus was present.

Comment thread be/src/format/parquet/schema_desc.cpp Outdated
has_value = child.physical_type == tparquet::Type::BYTE_ARRAY;
}
}
if (!has_metadata || !has_value) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rejects a valid fully-shredded VARIANT schema where the top-level value field is omitted and all data is represented by typed_value. The Parquet shredding layout makes the fallback value optional for fully shredded values, and the reader below already has code paths that can decode wrappers with typed_value but no value. With this check, such files fail schema parsing before reading. Please accept metadata plus either value or typed_value (validating their physical/logical shape), and add coverage for a top-level typed-value-only shredded variant.

}

bool is_shredded_variant_field(const FieldSchema& field_schema) {
return find_child_by_structural_name(field_schema, "value") != nullptr &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only treats a nested shredded field as variant-like when both value and typed_value are present. A fully shredded field may omit the fallback value, for example typed_value.metric { typed_value { x ... } }. For an access like v['metric']['x'], extract_variant_nested_column_ids() recurses into metric, this predicate returns false, and the generic struct path looks for a child named x next to typed_value, so it returns false and falls back to top-level v.value instead of selecting v.typed_value.metric.typed_value.x. Please also recognize typed_value-only shredded field groups and add a pruning/read regression for that layout.

Comment thread be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp Outdated
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29314 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit fa098c01643af773e82d99b9046d891338e1145f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17784	4016	3937	3937
q2	q3	10704	888	609	609
q4	4677	454	346	346
q5	7441	1317	1142	1142
q6	204	174	138	138
q7	917	948	757	757
q8	9628	1397	1270	1270
q9	6188	5397	5328	5328
q10	6323	2084	1806	1806
q11	477	262	256	256
q12	687	405	294	294
q13	18197	3296	2757	2757
q14	292	282	263	263
q15	q16	911	870	787	787
q17	1011	1038	727	727
q18	6402	5668	5519	5519
q19	1470	1164	958	958
q20	505	385	258	258
q21	4852	2276	1856	1856
q22	420	353	306	306
Total cold run time: 99090 ms
Total hot run time: 29314 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4269	4176	4162	4162
q2	q3	4627	4770	4167	4167
q4	2093	2152	1380	1380
q5	4961	4992	5266	4992
q6	193	163	133	133
q7	2028	1762	2018	1762
q8	3411	3196	3191	3191
q9	8519	8460	8332	8332
q10	4522	4465	4240	4240
q11	625	431	395	395
q12	732	743	535	535
q13	3224	3547	2922	2922
q14	306	314	290	290
q15	q16	763	781	720	720
q17	1356	1296	1272	1272
q18	8062	7007	7128	7007
q19	1173	1167	1127	1127
q20	2315	2241	1938	1938
q21	6202	5384	4915	4915
q22	567	533	438	438
Total cold run time: 59948 ms
Total hot run time: 53918 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 168835 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6587ec2344072618a1af162b2e1735ca7ed86ca0, data reload: false

query5	4317	649	511	511
query6	326	215	199	199
query7	4226	579	308	308
query8	326	229	207	207
query9	8839	4036	4009	4009
query10	443	341	301	301
query11	5815	2397	2148	2148
query12	207	131	121	121
query13	1307	587	429	429
query14	5884	5325	5038	5038
query14_1	4355	4334	4381	4334
query15	214	201	180	180
query16	1004	464	422	422
query17	1093	728	590	590
query18	2437	482	345	345
query19	205	199	162	162
query20	136	133	128	128
query21	221	135	120	120
query22	13619	13511	13351	13351
query23	17260	16322	16070	16070
query23_1	16220	16250	16063	16063
query24	7435	1736	1333	1333
query24_1	1292	1306	1313	1306
query25	554	493	413	413
query26	1312	328	172	172
query27	2691	532	340	340
query28	4444	1951	1910	1910
query29	1006	628	489	489
query30	296	239	203	203
query31	1097	1074	939	939
query32	86	74	72	72
query33	551	343	296	296
query34	1182	1118	645	645
query35	772	762	676	676
query36	1322	1378	1169	1169
query37	155	107	93	93
query38	3219	3143	3048	3048
query39	927	915	914	914
query39_1	882	873	877	873
query40	227	147	127	127
query41	68	63	61	61
query42	108	109	109	109
query43	348	324	281	281
query44	
query45	210	202	197	197
query46	1059	1200	732	732
query47	2293	2328	2256	2256
query48	397	412	292	292
query49	630	489	378	378
query50	967	353	261	261
query51	4303	4295	4243	4243
query52	103	109	97	97
query53	262	284	207	207
query54	330	306	271	271
query55	97	94	87	87
query56	319	348	310	310
query57	1415	1389	1315	1315
query58	318	290	277	277
query59	1568	1644	1409	1409
query60	330	334	321	321
query61	188	182	183	182
query62	673	646	568	568
query63	249	208	213	208
query64	2500	863	695	695
query65	
query66	1735	493	372	372
query67	29971	29953	29865	29865
query68	
query69	470	353	307	307
query70	1015	993	984	984
query71	314	295	283	283
query72	3252	2765	2398	2398
query73	839	795	418	418
query74	5056	4890	4697	4697
query75	2672	2574	2256	2256
query76	2276	1115	762	762
query77	403	414	328	328
query78	12021	12108	11567	11567
query79	1413	999	713	713
query80	646	533	452	452
query81	453	282	243	243
query82	1389	153	126	126
query83	350	267	249	249
query84	277	143	112	112
query85	892	535	445	445
query86	397	331	339	331
query87	3400	3395	3203	3203
query88	3502	2644	2637	2637
query89	447	377	338	338
query90	1960	182	178	178
query91	186	170	146	146
query92	77	79	73	73
query93	1497	1510	827	827
query94	548	352	328	328
query95	674	378	335	335
query96	1033	811	314	314
query97	2695	2687	2558	2558
query98	239	231	261	231
query99	1109	1095	962	962
Total cold run time: 252874 ms
Total hot run time: 168835 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 60.87% (42/69) 🎉
Increment coverage report
Complete coverage report

eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request May 16, 2026
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#63192

Problem Summary: Doris could not read Iceberg v3 VARIANT columns from Parquet files. This change maps Iceberg VARIANT to Doris VARIANT, validates the Parquet VARIANT wrapper shape from the VariantShredding spec, decodes unshredded metadata/value encoding, reads shredded typed_value columns, and prunes shredded Parquet leaf columns for accessed variant paths with profile observability. It keeps selected typed-only shredded projections on native Parquet typed columns when residual value columns are not selected, including top-level, nested, root-array, nested-array, typed-only, value-only residual, typed-map key lookup, binary, temporal, UUID, optional top-level, nullable ARRAY/MAP layouts, and Iceberg field-id pruning. It falls back to row-wise reconstruction only for complex or selected-residual layouts, while preserving binary, binary-array, complex binary arrays, residual null, UUID, decimal-array, typed-array null element, empty-object, root-null, out-of-order residual object, and user fields named value semantics. It also keeps full VARIANT projection separate from predicate subpath pruning, supports element paths produced by variant array explode, forces a root read for dynamic VARIANT element access, and explicitly rejects Iceberg VARIANT writes because this PR only implements read support.

### Release note

Support reading Iceberg v3 VARIANT Parquet columns, including shredded typed_value column pruning and binary/UUID VARIANT values. Non-finite floating-point VARIANT values remain unsupported and return explicit errors. Writing Iceberg VARIANT columns is rejected with an explicit unsupported error.

### Check List (For Author)

- Test: Regression test / Unit Test / Manual test / Static analysis

    - Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.*:IcebergReaderCreateColumnIdsTest.*:NestedColumnAccessHelperTest.*' (97 tests passed: 68 ParquetVariantReaderTest, 7 IcebergReaderCreateColumnIdsTest, 22 NestedColumnAccessHelperTest)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest,org.apache.doris.planner.IcebergTableSinkTest,org.apache.doris.planner.IcebergMergeSinkTest (55 tests passed: 51 PruneNestedColumnTest, 1 IcebergTableSinkTest, 3 IcebergMergeSinkTest; Maven reactor/checkstyle succeeded)

    - Regression test: performance regression coverage is included in regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy; not run locally in this worktree because no local Doris cluster/output BE+FE runtime is available.

    - Manual test: env PATH=/tmp/codex-clang-format:$PATH build-support/clang-format.sh

    - Manual test: env PATH=/tmp/codex-clang-format:$PATH build-support/check-format.sh

    - Manual test: git diff --cached --check

    - Static analysis: CLANG_TIDY_BINARY=/mnt/disk1/claude-max/ldb_toolchain16/bin/clang-tidy build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (attempted earlier in this PR loop; local clang-tidy could not analyze the changed files because existing be/src/util/jni-util.h static_assert(false) diagnostics are promoted to errors before producing actionable changed-line diagnostics.)

- Behavior changed: Yes. Doris can read Iceberg v3 VARIANT Parquet columns, supports spec-compliant binary and UUID VARIANT values, validates wrapper fields more strictly, avoids row-wise reconstruction for supported typed-only shredded projections, preserves residual binary/null values, typed-array null elements, nullable complex pruning, user value-only fields, and complex array elements with binary leaves, preserves full VARIANT projection when predicates read typed subpaths, handles dynamic root element access, optional top-level VARIANT, root/nested variant-array element paths, typed-map key lookup paths, and rejects Iceberg VARIANT writes explicitly.

- Does this need documentation: No
@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from 6587ec2 to 4dacd58 Compare May 16, 2026 17:22
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request May 16, 2026
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#63192

Problem Summary: Doris could not read Iceberg v3 VARIANT columns from Parquet files. This change maps Iceberg VARIANT to Doris VARIANT, validates the Parquet VARIANT wrapper shape from the VariantShredding spec, decodes unshredded metadata/value encoding, reads shredded typed_value columns, and prunes shredded Parquet leaf columns for accessed variant paths with profile observability. It keeps selected typed-only shredded projections on native Parquet typed columns when residual value columns are not selected, including Iceberg field-id pruning, and falls back to row-wise reconstruction only for complex or selected-residual layouts. It also keeps full VARIANT projection separate from predicate subpath pruning, supports element paths produced by variant array explode, forces a root read for dynamic VARIANT element access, and explicitly rejects Iceberg VARIANT writes because this PR only implements read support.

### Release note

Support reading Iceberg v3 VARIANT Parquet columns, including shredded typed_value column pruning and binary/UUID VARIANT values. Non-finite floating-point VARIANT values remain unsupported and return explicit errors. Writing Iceberg VARIANT columns is rejected with an explicit unsupported error.

### Check List (For Author)

- Test: Regression test / Unit Test / Manual test

    - Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.*:NestedColumnAccessHelperTest.*:DeltaByteArrayDecoderTest.*:IcebergReaderCreateColumnIdsTest.*' (107 tests passed)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest,org.apache.doris.planner.IcebergTableSinkTest,org.apache.doris.planner.IcebergMergeSinkTest (55 tests passed: 51 PruneNestedColumnTest, 1 IcebergTableSinkTest, 3 IcebergMergeSinkTest; Maven reactor succeeded)

    - Regression test: performance regression coverage is included in regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy, including profile assertions that typed-only projections increment VariantDirectTypedValueReadRows and keep VariantRowWiseReadRows at 0. Not run locally in this worktree because no local Doris cluster/output BE+FE runtime is available.

    - Manual test: PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/clang-format.sh

    - Manual test: PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/check-format.sh

    - Manual test: git diff --check

- Behavior changed: Yes. Doris can read Iceberg v3 VARIANT Parquet columns, supports spec-compliant binary and UUID VARIANT values, validates wrapper fields more strictly, avoids row-wise reconstruction for supported typed-only shredded projections, preserves residual binary/null values, typed-array null elements, nullable complex pruning, user value-only fields, and complex array elements with binary leaves, preserves full VARIANT projection when predicates read typed subpaths, handles dynamic root element access, optional top-level VARIANT, root/nested variant-array element paths, typed-map key lookup paths, and rejects Iceberg VARIANT writes explicitly.

- Does this need documentation: No
@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from 4dacd58 to e1dce69 Compare May 16, 2026 17:25
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 76.57% (2137/2791) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.67% (20800/38754)
Line Coverage 37.37% (197208/527695)
Region Coverage 33.76% (154499/457618)
Branch Coverage 34.84% (67625/194127)

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review summary for head 4dacd58. I reviewed the current diff with the existing inline review threads as known context and did not find an additional distinct issue that was not already covered by those threads. The existing VARIANT/Parquet correctness comments should still be resolved before merge if they remain applicable.\n\nCritical checkpoint conclusions:\n- Goal/test coverage: The PR aims to add Iceberg/Parquet VARIANT read support, nested pruning, write rejection for unsupported VARIANT writes, and focused unit/regression coverage. Tests were added, but existing review threads identify remaining uncovered edge cases.\n- Scope: The change is large but focused on Iceberg/Parquet VARIANT support and related pruning/planning paths.\n- Concurrency/lifecycle/config: I did not identify new concurrency, lifecycle, or configuration risks in the reviewed diff.\n- Compatibility/write behavior: Iceberg VARIANT reads are introduced while writes are rejected in FE planning; BE/FE type mapping paths were checked at a high level.\n- Parallel paths: Local Parquet, Hive, Iceberg, FE collector, and TVF paths were reviewed for consistency; no new distinct inconsistency beyond existing threads was found.\n- Data correctness/error handling: The highest risk remains VARIANT reconstruction/pruning correctness, already covered by existing inline threads.\n- Performance/observability: No additional blocking performance or observability issue found beyond the cost inherent in row-wise fallback paths.\n\nUser focus: No additional user-provided review focus was specified.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one remaining blocking correctness issue in the Parquet VARIANT reader.

Critical checkpoint conclusions:

  • Goal/test: the PR aims to read Iceberg/Parquet VARIANT, including shredded typed values and nested pruning. The current tests do not cover optional complex typed_value dispatch, and that valid path can still fail reader creation.
  • Scope/focus: the change is generally focused on VARIANT read/pruning support, but this reader factory dispatch gap affects the new VARIANT struct-reader path.
  • Concurrency/lifecycle/config/compatibility: no new concurrency, lifecycle, configuration, or storage-format compatibility concern found beyond the read-path correctness issue.
  • Parallel paths: the same factory dispatch is shared by local/Hive/Iceberg Parquet reads, so the fix should apply centrally.
  • Error handling/data correctness: the issue can turn valid optional complex typed shards into scalar readers with invalid physical column indexes, causing scan failure instead of returning data.
  • Test coverage: add coverage for optional complex typed_value shards (struct/list/map) selected through VARIANT pruning/full projection.
  • Observability/performance: no separate observability or performance blocker found.
  • User focus: no additional user-provided review focus was specified.

Please address the inline issue before merging.

@@ -115,7 +1667,7 @@ Status ParquetColumnReader::create(io::FileReaderSPtr file, FieldSchema* field,
size_t total_rows = row_group.num_rows;
if (field->data_type->get_primitive_type() == TYPE_ARRAY) {
std::unique_ptr<ParquetColumnReader> element_reader;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The factory still dispatches ARRAY/MAP/STRUCT using field->data_type->get_primitive_type() without unwrapping nullable, so the new VARIANT struct-reader path can still send a valid optional complex typed_value child into the scalar-reader branch. For example, a VARIANT schema with optional group typed_value { optional int64 a; } selected by v['a'] reaches the top synthetic struct reader, then recurses into the typed_value child whose data_type is Nullable(Struct(...)); this skips the STRUCT branch and falls through with the group field's physical_column_index == -1. This is distinct from the earlier optional top-level VARIANT thread because the top synthetic struct is now non-nullable, but its optional complex children remain nullable. Please dispatch ARRAY/MAP/STRUCT on remove_nullable(field->data_type) as well, and add coverage for optional complex typed_value struct/list/map shards.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one additional correctness issue in the new VARIANT row-wise typed array path. Existing threads already cover several other blocking VARIANT pruning/decoding gaps, including nullable dispatch, residual handling, binary nested arrays, metadata validation, and FE full-column access collection; I did not repeat those.

Critical checkpoint conclusions: the PR goal is to support Iceberg/Parquet VARIANT reads and pruning; the implementation is broad and has tests, but the current code still misses valid typed-array cases and therefore does not fully satisfy data correctness. The change is focused on VARIANT support but touches several parallel BE/FE paths, and the existing review context shows multiple parallel paths still need alignment. I did not identify new concurrency, lifecycle, configuration, transaction, or persistence concerns in this pass. FE-BE protocol compatibility appears intentionally read-focused with Iceberg VARIANT writes rejected in FE. Test coverage has been expanded, but it is still missing the empty/all-null typed decimal array case below and the existing already-raised cases. User focus: no additional user-provided review focus was specified.

case TYPE_ARRAY:
value->field = field;
fill_variant_field_info(value);
fill_variant_leaf_type_info(type, value);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This direct typed-array branch derives base_scalar_type_id and dimensions from the row value before applying schema metadata. For a valid typed leaf such as typed_value: array<decimal(18,2)>, rows containing [] or [null] have no non-null element, so variant_util::get_field_info() leaves the scalar type as INVALID_TYPE/Nothing; fill_variant_leaf_type_info() only fills precision/scale and never restores the array element type or dimensions from field_schema.data_type. Those rows are then inserted with different/invalid VARIANT type metadata from non-empty rows, losing the Parquet typed-array fidelity. This is distinct from the existing decimal-array JSON round-trip thread because it happens in the direct typed leaf path even without serializing through JSON. Please fill the array scalar type/dimensions from the typed schema when field-derived info is invalid, and add coverage for empty and all-null typed decimal arrays.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 30734 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e1dce6934ce20d01059a46068509f8d202249881, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17657	3841	3881	3841
q2	q3	10787	1379	826	826
q4	4688	464	357	357
q5	7569	2223	2077	2077
q6	239	175	137	137
q7	927	787	610	610
q8	9352	1807	1573	1573
q9	6517	4907	4883	4883
q10	6428	2097	1806	1806
q11	439	273	244	244
q12	691	422	292	292
q13	18253	3373	2770	2770
q14	259	260	240	240
q15	q16	820	762	709	709
q17	978	874	1007	874
q18	6863	5627	5510	5510
q19	1273	1274	1085	1085
q20	499	384	267	267
q21	5595	2545	2333	2333
q22	428	353	300	300
Total cold run time: 100262 ms
Total hot run time: 30734 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4254	4165	4153	4153
q2	q3	4508	4873	4296	4296
q4	2089	2237	1371	1371
q5	4384	4250	4290	4250
q6	226	174	129	129
q7	2049	1868	1731	1731
q8	2402	2130	2076	2076
q9	7797	7732	7611	7611
q10	4522	4439	4035	4035
q11	579	401	382	382
q12	892	751	509	509
q13	3328	3599	3073	3073
q14	277	303	292	292
q15	q16	702	749	672	672
q17	1310	1311	1311	1311
q18	7962	7392	6903	6903
q19	1111	1085	1127	1085
q20	2196	2194	1924	1924
q21	5235	4559	4352	4352
q22	521	448	399	399
Total cold run time: 56344 ms
Total hot run time: 50554 ms

eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request May 16, 2026
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#63192

Problem Summary: Doris could not read Iceberg v3 VARIANT columns from Parquet files. This change maps Iceberg VARIANT to Doris VARIANT, validates the Parquet VARIANT wrapper shape from the VariantShredding spec, decodes unshredded metadata/value encoding, reads shredded typed_value columns, and prunes shredded Parquet leaf columns for accessed variant paths with profile observability. It keeps selected typed-only shredded projections on native Parquet typed columns when residual value columns are not selected, including Iceberg field-id pruning below VARIANT typed_value trees, and falls back to row-wise reconstruction only for complex or selected-residual layouts. It also keeps full VARIANT projection separate from predicate subpath pruning, supports element paths produced by variant array explode, forces a root read for dynamic VARIANT element access, and explicitly rejects Iceberg VARIANT writes because this PR only implements read support.

### Release note

Support reading Iceberg v3 VARIANT Parquet columns, including shredded typed_value column pruning and binary/UUID VARIANT values. Non-finite floating-point VARIANT values remain unsupported and return explicit errors. Writing Iceberg VARIANT columns is rejected with an explicit unsupported error.

### Check List (For Author)

- Test: Regression test / Unit Test / Manual test

    - Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.*:NestedColumnAccessHelperTest.*:DeltaByteArrayDecoderTest.*:IcebergReaderCreateColumnIdsTest.*' (109 tests passed)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.SlotTypeReplacerTest,org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest (53 tests passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest,org.apache.doris.planner.IcebergTableSinkTest,org.apache.doris.planner.IcebergMergeSinkTest (55 tests passed earlier in this worktree)

    - Regression test: performance regression coverage is included in regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy, including profile assertions that typed-only projections increment VariantDirectTypedValueReadRows and keep VariantRowWiseReadRows at 0. Not run locally in this worktree because no local Doris cluster/output BE+FE runtime is available.

    - Manual test: PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/clang-format.sh

    - Manual test: PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/check-format.sh

    - Manual test: git diff --check

    - Manual test: CLANG_TIDY_BINARY=/mnt/disk6/common/ldb_toolchain_toucan/bin/clang-tidy-16 build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (blocked by local analyzer errors in included be/src/util/jni-util.h after the changed-line warnings in this patch were fixed)

- Behavior changed: Yes. Doris can read Iceberg v3 VARIANT Parquet columns, supports spec-compliant binary and UUID VARIANT values, validates wrapper fields more strictly, avoids row-wise reconstruction for supported typed-only shredded projections, preserves residual binary/null values, typed-array null elements, nullable complex pruning, user value-only fields, and complex array elements with binary leaves, preserves full VARIANT projection when predicates read typed subpaths, handles dynamic root element access, optional top-level VARIANT, root/nested variant-array element paths, typed-map key lookup paths, Iceberg field-id paths below VARIANT typed_value trees, and rejects Iceberg VARIANT writes explicitly.

- Does this need documentation: No
@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from e1dce69 to f78e331 Compare May 16, 2026 17:56
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#63192

Problem Summary: Doris could not read Iceberg v3 VARIANT columns from Parquet files. This change maps Iceberg VARIANT to Doris VARIANT, validates the Parquet VARIANT wrapper shape from the VariantShredding spec, decodes unshredded metadata/value encoding, reads shredded typed_value columns, and prunes shredded Parquet leaf columns for accessed variant paths with profile observability. It keeps selected typed-only shredded projections on native Parquet typed columns when residual value columns are not selected, including Iceberg field-id pruning below VARIANT typed_value trees, and falls back to row-wise reconstruction only for complex or selected-residual layouts. It also keeps full VARIANT projection separate from predicate subpath pruning, supports element paths produced by variant array explode, forces a root read for dynamic VARIANT element access, and explicitly rejects Iceberg VARIANT writes because this PR only implements read support.

### Release note

Support reading Iceberg v3 VARIANT Parquet columns, including shredded typed_value column pruning and binary/UUID VARIANT values. Non-finite floating-point VARIANT values remain unsupported and return explicit errors. Writing Iceberg VARIANT columns is rejected with an explicit unsupported error.

### Check List (For Author)

- Test: Regression test / Unit Test / Manual test

    - Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.*:IcebergReaderCreateColumnIdsTest.*:NestedColumnAccessHelperTest.*' (99 tests passed)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.SlotTypeReplacerTest (2 tests passed)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.SlotTypeReplacerTest,org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest,org.apache.doris.planner.IcebergTableSinkTest,org.apache.doris.planner.IcebergMergeSinkTest (57 tests passed; Maven reactor succeeded)

    - Regression test: performance regression coverage is included in regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy, including profile assertions that typed-only projections increment VariantDirectTypedValueReadRows and keep VariantRowWiseReadRows at 0. Not run locally in this worktree because no local Doris cluster/output BE+FE runtime is available.

    - Manual test: env PATH=/tmp/codex-clang-format:$PATH build-support/clang-format.sh

    - Manual test: env PATH=/tmp/codex-clang-format:$PATH build-support/check-format.sh

    - Manual test: git diff --check

    - Static analysis: CLANG_TIDY_BINARY=/mnt/disk1/claude-max/ldb_toolchain16/bin/clang-tidy-16 build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN was attempted. The first checked file reported no warnings; later files were blocked by pre-existing analyzer errors from included be/src/util/jni-util.h static_assert(false) branches.

- Behavior changed: Yes. Doris can read Iceberg v3 VARIANT Parquet columns, supports spec-compliant binary and UUID VARIANT values, validates wrapper fields more strictly, avoids row-wise reconstruction for supported typed-only shredded projections, preserves residual binary/null values, typed-array null elements, nullable complex pruning, user value-only fields, and complex array elements with binary leaves, preserves full VARIANT projection when predicates read typed subpaths, handles dynamic root element access, optional top-level VARIANT, root/nested variant-array element paths, typed-map key lookup paths, Iceberg field-id paths below VARIANT typed_value trees, and rejects Iceberg VARIANT writes explicitly.

- Does this need documentation: No
@eldenmoon eldenmoon force-pushed the codex/iceberg-v3-variant branch from f78e331 to 798ea04 Compare May 16, 2026 17:59
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

1 similar comment
@eldenmoon
Copy link
Copy Markdown
Member Author

/review

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169296 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e1dce6934ce20d01059a46068509f8d202249881, data reload: false

query5	4319	646	513	513
query6	333	229	205	205
query7	4237	529	307	307
query8	327	237	244	237
query9	8828	4078	4041	4041
query10	455	340	312	312
query11	5784	2402	2202	2202
query12	189	131	124	124
query13	1317	643	439	439
query14	6037	5329	5068	5068
query14_1	4316	4659	4360	4360
query15	233	204	186	186
query16	1038	460	445	445
query17	1160	741	627	627
query18	2640	479	349	349
query19	211	197	158	158
query20	137	132	127	127
query21	217	134	118	118
query22	13613	13544	13359	13359
query23	17102	16356	15948	15948
query23_1	16137	16112	16026	16026
query24	7751	1736	1272	1272
query24_1	1295	1277	1286	1277
query25	565	479	419	419
query26	1309	302	175	175
query27	2713	556	321	321
query28	4448	1928	1955	1928
query29	968	637	495	495
query30	305	252	196	196
query31	1114	1092	935	935
query32	85	76	74	74
query33	531	347	287	287
query34	1167	1172	628	628
query35	754	777	669	669
query36	1328	1334	1190	1190
query37	154	104	89	89
query38	3165	3118	3058	3058
query39	928	936	895	895
query39_1	857	883	861	861
query40	229	148	125	125
query41	67	67	65	65
query42	111	110	111	110
query43	336	336	287	287
query44	
query45	212	196	191	191
query46	1074	1199	710	710
query47	2285	2302	2200	2200
query48	373	419	302	302
query49	625	508	410	410
query50	1043	353	257	257
query51	4329	4229	4210	4210
query52	110	117	98	98
query53	253	279	210	210
query54	341	289	276	276
query55	94	93	87	87
query56	324	319	323	319
query57	1405	1387	1291	1291
query58	303	294	279	279
query59	1537	1614	1430	1430
query60	338	333	330	330
query61	185	182	185	182
query62	680	642	564	564
query63	248	211	209	209
query64	2480	882	702	702
query65	
query66	1736	500	381	381
query67	30063	29986	29860	29860
query68	
query69	451	346	299	299
query70	1003	982	945	945
query71	302	282	269	269
query72	3088	2746	2371	2371
query73	858	775	442	442
query74	5130	5002	4774	4774
query75	2706	2619	2305	2305
query76	2303	1161	778	778
query77	410	422	350	350
query78	12530	12231	11522	11522
query79	1438	1037	733	733
query80	1340	538	481	481
query81	502	273	246	246
query82	1308	159	117	117
query83	346	273	256	256
query84	260	144	111	111
query85	913	537	445	445
query86	454	362	297	297
query87	3454	3366	3177	3177
query88	3543	2652	2624	2624
query89	447	390	329	329
query90	1790	191	183	183
query91	180	177	144	144
query92	82	77	74	74
query93	1516	1423	821	821
query94	682	360	306	306
query95	670	400	443	400
query96	1007	766	347	347
query97	2697	2678	2552	2552
query98	241	228	229	228
query99	1094	1084	1032	1032
Total cold run time: 254516 ms
Total hot run time: 169296 ms

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two additional correctness issues in the BE Parquet VARIANT paths that are distinct from the existing review threads.

Checkpoint conclusions:

  • Goal/test: the PR aims to add Iceberg/Parquet VARIANT read, pruning, and write mapping support with BE/unit/regression coverage. The current tests cover many paths, but they do not prove residual non-JSON-native scalar type fidelity or metadata selection for root non-STRUCT typed_value residuals.
  • Scope/clarity: the change is broad but generally focused on VARIANT support; the remaining issues are localized to VariantMap decoding and nested column id selection.
  • Concurrency/lifecycle: no new concurrency, locking, or special lifecycle issue found in the reviewed paths.
  • Configuration/compatibility: no new config found. This does involve external Parquet/Iceberg format compatibility; the issues below affect valid VARIANT encodings and can produce incorrect decoded values or missing metadata.
  • Parallel paths: Hive/Iceberg/local pruning and JSON/VariantMap decode paths were checked against existing comments; I avoided duplicating already-known nullable/pruning/array/binary/object-offset issues.
  • Tests/results: added coverage is substantial, but missing for the two scenarios commented inline. Regression outputs appear deterministic where reviewed.
  • Observability/performance: no additional observability or obvious hot-path performance blocker found beyond existing comments.
  • Transaction/persistence/data writes: no FE transaction or persisted metadata issue found in this pass; Iceberg write support concerns already have existing context.
  • User focus: no additional user-provided review focus was specified.

Given the data-correctness risk for valid Parquet VARIANT encodings, I recommend changes before merge.

return Status::OK();
}

std::string json;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This VariantMap path still round-trips non-binary Parquet VARIANT primitives through JSON, which loses the primitive type carried by the VARIANT value encoding. For example, a residual root value encoded with primitive header 8/9/10 for DECIMAL reaches decode_primitive_to_variant_map(), decode_primitive() emits only a JSON number, and parse_json_to_variant_map() reconstructs it as a normal numeric field without the original decimal precision/scale metadata; the same applies to date/time/timestamp/UUID headers that are emitted as JSON numbers/strings. This is distinct from the existing typed-array JSON round-trip comments because it affects residual scalar VARIANT values on the VariantMap path. Please decode these primitive headers directly into FieldWithDataType instead of reparsing JSON, and add coverage for residual DECIMAL and temporal/UUID scalar values, including array elements.

if (typed_value != nullptr) {
const auto typed_value_type = remove_nullable(typed_value->data_type);
if (typed_value_type->get_primitive_type() != TYPE_STRUCT) {
typed_result = extract_variant_typed_nested_column_ids(*typed_value, {path},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This non-STRUCT root typed_value branch drops the needs_metadata signal returned by extract_variant_typed_nested_column_ids(). For a root typed array/map that contains a shredded residual child, e.g. typed_value: array<struct<metric: { value: binary }>> queried as v[0]['metric']['x'], the recursive extraction selects the nested value leaf and returns needs_metadata=true, but this branch never adds the top-level metadata column. Row-wise reconstruction then tries to decode the residual value without the inherited VARIANT metadata and can fail or decode from missing metadata. This is distinct from the existing root typed-array selection thread because the typed leaf is selected, but its required metadata is not. Please mirror the STRUCT branch by adding add_variant_metadata(variant_field, column_ids) when the non-STRUCT typed result needs metadata, with coverage for root typed array/map residual children.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one additional issue. The main feature goal is to add Iceberg/Parquet VARIANT read support and nested pruning; the PR adds broad BE/FE tests, but this remaining FE collector case can still force full VARIANT reads for unused CTE producer outputs. The change is large and mostly focused on VARIANT support, but many correctness edge cases are already tracked in existing review threads. Concurrency/lifecycle: no new relevant shared-state concurrency or special lifecycle issue found in the reviewed FE collector path. Configuration/compatibility: no new config; Iceberg VARIANT write compatibility is already covered by an existing thread. Parallel paths: BE/Hive/Iceberg/local variant paths have many already-known parallel-path comments; I did not duplicate them. Test coverage: broad regression/unit coverage was added, but this unused CTE producer pruning scenario is not covered. User focus: no additional user-provided review focus was specified.

@@ -230,11 +231,15 @@ public Void visitLogicalGenerate(LogicalGenerate<? extends Plan> generate, State
public Void visitLogicalProject(LogicalProject<? extends Plan> project, StatementContext context) {
AccessPathExpressionCollector exprCollector
= new AccessPathExpressionCollector(context, allSlotToAccessPaths, false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses an empty global allSlotToAccessPaths as proof that the current project is the query result, but inner producer projects can also be visited with no propagated demand. For example, WITH c AS (SELECT v FROM variant_tbl) SELECT 1 FROM c visits the CTE consumer first, records no access path for v, then visits the producer project with this map still empty. This branch records a root [v] path even though the outer query never uses v, so the scan is forced to read the full VARIANT column unnecessarily. Please distinguish true root/result projection demand from a producer/subquery with no consumer demand, and add coverage for an unused VARIANT CTE/subquery output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants