Skip to content

fix: avoid internal errors for OneOf signature mismatches#21032

Open
myandpr wants to merge 2 commits intoapache:mainfrom
myandpr:20501-oneof-error-message
Open

fix: avoid internal errors for OneOf signature mismatches#21032
myandpr wants to merge 2 commits intoapache:mainfrom
myandpr:20501-oneof-error-message

Conversation

@myandpr
Copy link
Copy Markdown
Contributor

@myandpr myandpr commented Mar 18, 2026

Which issue does this PR close?

Rationale for this change

After #18769, unsupported calls such as SUM(Boolean) started producing misleading internal errors.

The immediate problem is that TypeSignature::OneOf aggregates branch failures into an Internal error, which then becomes visible during planning together with internal TypeSignatureClass::... details.

This PR fixes that path in the shared coercion logic, while preserving more actionable diagnostics where they can be determined reliably.

This PR is still narrower than #20070: it focuses on TypeSignature::OneOf error selection and the affected public function contracts, rather than changing the broader error-formatting flow.

What changes are included in this PR?

In datafusion/expr/src/type_coercion/functions.rs:

  • keep the existing success path when any OneOf branch matches
  • stop turning all-failed OneOf resolution into an Internal error
  • restore concrete arity errors instead of falling back to a generic mismatch
  • preserve concrete type errors for unique matching branches
  • preserve hinted errors such as the Binary -> String cast hint
  • combine same-arity Coercible mismatches into a concrete type error instead of falling back to a generic one
  • continue to propagate non-Plan errors only when no matching branch succeeds and no better planner diagnostic is available

For aggregate functions that need semantic diagnostics beyond what the generic OneOf logic can express:

  • restore function-specific diagnostics for sum(Boolean) and avg(Boolean)

For shared window-function implementations:

  • narrow the public signature() of first_value, last_value, and nth_value so validation, candidate signatures, and introspection all reflect the real public arity contract

Are these changes tested?

Yes.

Added or updated unit tests covering:

  • OneOf mismatches no longer returning Internal error
  • OneOf arity mismatches restoring concrete planning errors
  • unique matching branches preserving concrete type errors
  • same-arity Coercible mismatches combining into a concrete type error
  • hinted mismatches preserving the hint
  • function-specific semantic diagnostics for sum(Boolean) / avg(Boolean)

Updated sqllogictests for affected user-visible errors, including:

  • sum(Boolean)
  • avg(Boolean)
  • nth_value(...)
  • first_value(...)
  • last_value(...)
  • substr(...)
  • hex(...)
  • generate_series(...)

Are there any user-facing changes?

Yes.

For failed TypeSignature::OneOf function calls, DataFusion now returns normal planning errors instead of an Internal error, while preserving more specific diagnostics where possible, including:

  • concrete arity errors
  • concrete type mismatch errors
  • cast hints
  • restored semantic errors such as Sum not supported for Boolean

For first_value, last_value, and nth_value, the advertised candidate signatures now match the real public arity contract.

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Mar 19, 2026
@myandpr myandpr requested a review from martin-g March 19, 2026 09:23
@myandpr
Copy link
Copy Markdown
Contributor Author

myandpr commented Mar 19, 2026

fixed ci

@myandpr myandpr force-pushed the 20501-oneof-error-message branch from f13570a to 8866d95 Compare March 19, 2026 13:14
@myandpr
Copy link
Copy Markdown
Contributor Author

myandpr commented Mar 19, 2026

The current CI failures are due to an outdated clickbench.slt expectation rather than the OneOf changes in this PR.

The failing EXPLAIN for COUNT(DISTINCT ...) now gets optimized to ProjectionExec + PlaceholderRowExec because exact NDV can be derived from Parquet metadata.

That was fixed in #21050.

I have rebased this branch onto the latest main, so the updated expectation is now included here as well.

@myandpr
Copy link
Copy Markdown
Contributor Author

myandpr commented Mar 24, 2026

Hi @martin-g, just a gentle ping on this PR when you have a chance. Thanks!

);

let err = fields_with_udf(&current_fields, &MockUdf(signature)).unwrap_err();
let err = err.to_string();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can assert that the Err type is DataFusionError::Plan before calling .to_string() on it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

);

let err = fields_with_udf(&current_fields, &MockUdf(signature)).unwrap_err();
let err = err.to_string();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.


let err =
fields_with_udf(&current_fields, &AlwaysExecErrUdf(signature)).unwrap_err();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can assert that err is DataFusionError::Exec

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

776F726C64

statement error Function 'hex' expects 1 arguments but received 2
statement error DataFusion error: Error during planning: Function 'hex' failed to match any signature
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the previous error was more actionable by the user.
Now the user will need to consult with the documentation of hex and compare it against the way (s)he tries to use it.

Copy link
Copy Markdown
Contributor Author

@myandpr myandpr Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @martin-g I agree the previous hex(1, 2) error was more actionable.

The current change intentionally fixes the Internal error path in TypeSignature::OneOf, but it does make some overload failures fall back to a more generic planning error. I avoided surfacing a branch-specific OneOf error directly because that turned out to be incorrect in other cases, for example mixed-arity overloads like nth_value(c5, 2, 3) or mixed-type overloads like sum(bool_col).

Do you think we should try to preserve more actionable planner diagnostics for cases like this in the current PR, or handle that in a follow-up by choosing the best OneOf planning error?

Let me know if you have any preference. Thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand the reason for this PR!
Let's see what others think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should try to preserve more actionable planner diagnostics for cases like this in the current PR, or handle that in a follow-up by choosing the best OneOf planning error?

It would be nice to avoid a regression in error messages, even if temporarily

How hard is it to get the old behavior back?

@myandpr
Copy link
Copy Markdown
Contributor Author

myandpr commented Mar 25, 2026

Hey @timsaucer @alamb, would appreciate another look at this PR when you have a chance. Thanks!

Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a clear example of the before & after error messages of this change? The tests only match a subset of the error message so it's hard to understand how this actually displays to the user.

I'll also link my thoughts on an approach from my previous PR: #20070 (comment)

@myandpr myandpr force-pushed the 20501-oneof-error-message branch from 33cf7ab to 3845631 Compare March 31, 2026 20:59
@github-actions github-actions bot added the functions Changes to functions implementation label Mar 31, 2026
Signed-off-by: yaommen <myanstu@163.com>
@myandpr myandpr force-pushed the 20501-oneof-error-message branch from 3845631 to 2ddec30 Compare April 1, 2026 19:58
@myandpr
Copy link
Copy Markdown
Contributor Author

myandpr commented Apr 1, 2026

@alamb @Jefffrey sorry for the delayed reply. I spent some time digging into this further and also looked through the discussion in #20070.

I updated the implementation to keep the OneOf fix in the shared coercion path, while restoring more actionable diagnostics where they can be determined reliably.

The current approach is:

  • keep the OneOf logic in the shared coercion path rather than special-casing individual functions there
  • preserve concrete arity errors instead of falling back to a generic mismatch
  • preserve concrete type errors for unique matching branches
  • preserve hinted errors such as the Binary -> String cast hint
  • combine same-arity Coercible mismatches into a concrete error instead of falling back to a generic one
  • keep function-specific semantic diagnostics only where the generic OneOf logic cannot express the intended error well, such as sum(Boolean) / avg(Boolean)
  • narrow the public signature() for first_value / last_value / nth_value so validation, candidate signatures, and introspection all reflect the real public contract

I also put together a few representative before/after examples from actual runs on origin/main vs this branch:

Case Why this case SQL Before (origin/main) After (current branch)
hex_arity arity mismatch on Spark function mentioned in review select hex(1, 2); DataFusion error: Error during planning: Internal error: Function 'hex' failed to match any signature, errors: Error during planning: Function 'hex' expects 1 arguments but received 2,Error during planning: Function 'hex' expects 1 arguments but received 2,Error during planning: Function 'hex' expects 1 arguments but received 2. This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues. No function matches the given name and argument types 'hex(Int64, Int64)'. You might need to add explicit type casts. Candidate functions: hex(Int64) hex(String) hex(Binary) DataFusion error: Error during planning: Function 'hex' expects 1 arguments but received 2. No function matches the given name and argument types 'hex(Int64, Int64)'. You might need to add explicit type casts. Candidate functions: hex(Int64) hex(String) hex(Binary)
substr_type unique same-arity type mismatch SELECT substr(1, 3); Error: Error during planning: Internal error: Function 'substr' failed to match any signature, errors: Error during planning: Function 'substr' requires String, but received Int64 (DataType: Int64).,Error during planning: Function 'substr' expects 3 arguments but received 2. This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues. No function matches the given name and argument types 'substr(Int64, Int64)'. You might need to add explicit type casts. Candidate functions: substr(str: String, start_pos: Int64) substr(str: String, start_pos: Int64, length: Int64) Error: Error during planning: Function 'substr' requires String, but received Int64 (DataType: Int64).. No function matches the given name and argument types 'substr(Int64, Int64)'. You might need to add explicit type casts. Candidate functions: substr(str: String, start_pos: Int64) substr(str: String, start_pos: Int64, length: Int64)
substr_binary_hint hinted type mismatch SELECT substr(arrow_cast('foo', 'Binary'), 1); Error: Error during planning: Internal error: Function 'substr' failed to match any signature, errors: Error during planning: Function 'substr' requires String, but received Binary (DataType: Binary). Hint: Binary types are not automatically coerced to String. Use CAST(column AS VARCHAR) to convert Binary data to String.,Error during planning: Function 'substr' expects 3 arguments but received 2. This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues. No function matches the given name and argument types 'substr(Binary, Int64)'. You might need to add explicit type casts. Candidate functions: substr(str: String, start_pos: Int64) substr(str: String, start_pos: Int64, length: Int64) Error: Error during planning: Function 'substr' requires String, but received Binary (DataType: Binary). Hint: Binary types are not automatically coerced to String. Use CAST(column AS VARCHAR) to convert Binary data to String.. No function matches the given name and argument types 'substr(Binary, Int64)'. You might need to add explicit type casts. Candidate functions: substr(str: String, start_pos: Int64) substr(str: String, start_pos: Int64, length: Int64)
sum_bool aggregate semantic diagnostic SELECT sum(bool_col) FROM (VALUES (true), (false), (null)) AS t(bool_col); Error: Error during planning: Internal error: Function 'sum' failed to match any signature, errors: Error during planning: Function 'sum' requires Decimal, but received Boolean (DataType: Boolean).,Error during planning: Function 'sum' requires UInt64, but received Boolean (DataType: Boolean).,Error during planning: Function 'sum' requires Int64, but received Boolean (DataType: Boolean).,Error during planning: Function 'sum' requires Float64, but received Boolean (DataType: Boolean).,Error during planning: Function 'sum' requires Duration, but received Boolean (DataType: Boolean).. This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues. No function matches the given name and argument types 'sum(Boolean)'. You might need to add explicit type casts. Candidate functions: sum(Decimal) sum(UInt64) sum(Int64) sum(Float64) sum(Duration) Error: Error during planning: Sum not supported for Boolean. No function matches the given name and argument types 'sum(Boolean)'. You might need to add explicit type casts. Candidate functions: sum(Decimal) sum(UInt64) sum(Int64) sum(Float64) sum(Duration)
nth_value_arity window function arity contract SELECT nth_value(c1, 2, 3) OVER (ORDER BY c1) FROM (VALUES (1), (2), (3)) AS t(c1); Error: Error during planning: Internal error: Function 'nth_value' failed to match any signature, errors: Error during planning: The function 'nth_value' expected zero argument but received 3,Error during planning: The function 'nth_value' expected 1 arguments but received 3,Error during planning: The function 'nth_value' expected 2 arguments but received 3. This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues. No function matches the given name and argument types 'nth_value(Int64, Int64, Int64)'. You might need to add explicit type casts. Candidate functions: nth_value(NullAry()) nth_value(Any) nth_value(Any, Any) Error: Error during planning: The function 'nth_value' expected 2 arguments but received 3. No function matches the given name and argument types 'nth_value(Int64, Int64, Int64)'. You might need to add explicit type casts. Candidate functions: nth_value(Any, Any)
log_same_arity ambiguous same-arity overload SELECT log(1, ''); Error: Error during planning: Internal error: Function 'log' failed to match any signature, errors: Error during planning: Function 'log' expects 1 arguments but received 2,Error during planning: Function 'log' expects 1 arguments but received 2,Error during planning: Function 'log' requires Decimal, but received String (DataType: Utf8).,Error during planning: Function 'log' requires Float, but received String (DataType: Utf8).. This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues. No function matches the given name and argument types 'log(Int64, Utf8)'. You might need to add explicit type casts. Candidate functions: log(Decimal) log(Float) log(Float, Decimal) log(Float, Float) Error: Error during planning: Function 'log' requires one of Decimal, Float, but received String (DataType: Utf8).. No function matches the given name and argument types 'log(Int64, Utf8)'. You might need to add explicit type casts. Candidate functions: log(Decimal) log(Float) log(Float, Decimal) log(Float, Float)

Would appreciate another look. Thanks.

Signed-off-by: yaommen <myanstu@163.com>
@myandpr myandpr requested a review from alamb April 1, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Type signature matching errors are overly verbose and show internal details

4 participants