Skip to content

Split type-checking into interface and implementation in parallel workers#21119

Open
ilevkivskyi wants to merge 14 commits intopython:masterfrom
ilevkivskyi:intf-impl-parallel
Open

Split type-checking into interface and implementation in parallel workers#21119
ilevkivskyi wants to merge 14 commits intopython:masterfrom
ilevkivskyi:intf-impl-parallel

Conversation

@ilevkivskyi
Copy link
Copy Markdown
Member

The general idea is very straightforward: when doing type-checking, we first type-check only module top-levels and those functions/methods that define/infer externally visible variables. Then we write cache and send new interface hash back to coordinator to unblock more SCCs early. This makes parallel type-checking ~25% faster.

However, this simple idea surfaced multiple quirks and old hacks. I address some of them in this PR, but I decided to handle the rest in follow up PR(s) to limit the size of this one.

First, important implementation details:

  • On each select() call, coordinator collects all responses, both interface and implementation ones, and processes them as a single batch. This simplifies reasoning and shouldn't affect performance.
  • We need to write indirect dependencies to a separate cache file, since they are only known after processing function bodies. I combine them together with error messages in files called foo.meta_ex.ff. Not 100% sure about the name, couldn't find anything more meaningful.
  • Overload signatures are now processed as part of the top-level in type checker. This is a big change, but it is unavoidable and it didn't cause any problems with the daemon.
  • Initializers (default values of function arguments) are now processed as part of the top-levels (to match runtime semantics). Btw @hauntsaninja you optimized them away in some cases, I am not sure this is safe in presence of walrus, see e.g. testWalrus.
  • local_definitions() now do not yield methods of classes nested in functions. We add such methods to both symbol table of their actual class, and to the module top-level symbol table, thus causing double-processing.

Now some smaller things I already fixed:

  • We used to have three scoping systems to track current class in type checker. One existed purely for the purpose of TypeForm support. I think two is enough, so I deleted the last one.
  • AwaitableGenerator return type wrapping used to happen during processing of function body, which is obviously wrong.
  • Invalid function redefinitions sometimes caused duplicate errors in case of partial types/deferrals. Now they should not, as I explicitly skip them after emitting first error.
  • Some generated methods were not marked as such. Now they are.

Finally, some remaining problems and how I propose to address them in followups:

  • Narrowing of final global variables is not preserved in functions anymore, see testNarrowingOfFinalPersistsInFunctions. Supporting this will be tricky/expensive, it would require preserving binder state at the point of each function definition, and restoring it later. IMO this is a relatively niche edge case, and we can simply "un-support" it (there is a simple workaround, add an assert in function body). To be clear, there are no problems with a much more common use of this feature: preserving narrowing in nested functions/lambdas.
  • Support for --disallow-incomplete-defs in plugins doesn't work, see testDisallowIncompleteDefsAttrsPartialAnnotations. I think this should be not hard to fix (with some dedicated cleaner support). I can do this in a follow-up PR soon.
  • Around a dozen incremental tests are skipped in parallel mode because order of error messages is more unstable now (which is expected). To be clear, we still group errors per module, but order of modules is much less predictable now. If there are no objections, I am going to ignore order of modules when comparing errors in incremental tests in a follow-up PR.
  • When inferred type variable variance is not ready, we fall back to covariance, see testPEP695InferVarianceNotReadyWhenNeeded. However, when processing function/method bodies in a later phase, variance is ready more often. Although this is an improvement, it creates an inconsistency between parallel mode, and regular mode. I propose to address this by making the two-phase logic default even without parallel checking, see below.
  • Finally, there are few edge cases with --local-partial-types when behavior is different in parallel mode, see e.g. testLocalPartialTypesWithGlobalInitializedToNone. Again the new behavior is IMO clearly better. However, it again creates an inconsistency with non-parallel mode. I propose to address this by enabling two-phase (interface then implementation) checking whenever --local-partial-types is enabled (globally, not per-file), even without parallel checking. Since --local-partial-types will be default behavior soon (and hopefully the only behavior at some point), this will allow us to avoid discrepancies between parallel and regular checking. @JukkaL what do you think?

@ilevkivskyi ilevkivskyi requested a review from JukkaL March 31, 2026 18:34
@ilevkivskyi ilevkivskyi changed the title Split type-checking into interface and impplementation in parallel workers Split type-checking into interface and implementation in parallel workers Mar 31, 2026
@ilevkivskyi
Copy link
Copy Markdown
Member Author

Oh btw, @JukkaL I think there is a bug in misc/diff-cache.py that may cause spurious diffs, see a TODO I added.

@github-actions
Copy link
Copy Markdown
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

cki-lib (https://gitlab.com/cki-project/cki-lib)
- cki_lib/krb_ticket_refresher.py:26: error: Call to untyped function "_close_to_expire_ticket" in typed context  [no-untyped-call]
+ cki_lib/krb_ticket_refresher.py:26: error: Call to untyped function "_close_to_expire_ticket" of "RefreshKerberosTicket" in typed context  [no-untyped-call]

discord.py (https://github.com/Rapptz/discord.py)
- discord/backoff.py:63: error: Incompatible default for parameter "integral" (default has type "Literal[False]", parameter has type "Literal[True]")  [assignment]
+ discord/backoff.py:63: error: Incompatible default for parameter "integral" (default has type "Literal[False]", parameter has type "T")  [assignment]
- discord/interactions.py:1109: error: Incompatible default for parameter "delay" (default has type "float | None", parameter has type "float")  [assignment]
- discord/interactions.py:1255: error: Incompatible default for parameter "delay" (default has type "float | None", parameter has type "float")  [assignment]
- discord/interactions.py:1645: error: Incompatible default for parameter "delay" (default has type "float | None", parameter has type "float")  [assignment]
- discord/webhook/async_.py:969: error: Incompatible default for parameter "delay" (default has type "float | None", parameter has type "float")  [assignment]

@ilevkivskyi
Copy link
Copy Markdown
Member Author

All things in (small) mypy_primer are either good or neutral.

@hauntsaninja
Copy link
Copy Markdown
Collaborator

Could be worth adding a test for the discord.py improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants