Skip to content

feat: pruning migration#3597

Open
EgeCaner wants to merge 8 commits intomainfrom
feat/pruning-migration
Open

feat: pruning migration#3597
EgeCaner wants to merge 8 commits intomainfrom
feat/pruning-migration

Conversation

@EgeCaner
Copy link
Copy Markdown
Contributor

@EgeCaner EgeCaner commented May 2, 2026

Adds optional migration for history-pruning (migration/historyprunner) that brings an existing node's database in line with the rolling pruner's retention window. Activated whenever --prune-mode is passed (with or without a value, including --prune-mode=0); no-op otherwise.

The migration runs in two pipelined, resumable phases. The keeper window is anchored on the L1-verified head (matching the rolling pruner's reorg-safe floor): with minHead = min(l1Head, chainHeight) and retainedBlocks taken from the --prune-mode=N value (defaulting to 0 when the flag is passed bare), the window covers [minHead - retainedBlocks + 1, chainHeight]. If minHead < retainedBlocks the chain is shorter than the window and the migration no-ops, leaving the rolling pruner to take over once blocks arrive.

  • Stager — range-deletes the cold number-keyed buckets and wipes reverse-lookup buckets (TransactionBlockNumbersAndIndicesByHash, L1HandlerTxnHashByMsgHash, BlockHeaderNumbersByHash), then copies in-window entries from the live history buckets (ContractStorageHistory, ContractClassHashHistory, ContractNonceHistory) into a scratch namespace.
  • Restorer — wipes the live history buckets, copies keepers back from scratch, and rebuilds the reverse-lookup indices per block. Wipes the scratch namespace on completion.

Resume state (stagerProgress | restorerProgress | numRetainedBlocks, 24 bytes BE) is persisted between runs; the originally-configured retention pins the cutoff so a --prune-mode change mid-flight cannot retarget an in-progress migration (logged when ignored).

@EgeCaner EgeCaner requested a review from MaksymMalicki May 2, 2026 03:03
@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Codecov Report

❌ Patch coverage is 73.67150% with 109 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.87%. Comparing base (af1fe00) to head (b45d05e).

Files with missing lines Patch % Lines
migration/historyprunner/migrator.go 72.46% 32 Missing and 25 partials ⚠️
migration/historyprunner/restorer.go 61.01% 16 Missing and 7 partials ⚠️
migration/historyprunner/copy.go 67.50% 8 Missing and 5 partials ⚠️
migration/historyprunner/stager.go 78.12% 4 Missing and 3 partials ⚠️
node/migration.go 0.00% 6 Missing ⚠️
migration/historyprunner/committer.go 89.47% 1 Missing and 1 partial ⚠️
migration/progresslogger/block_progress_tracker.go 50.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #3597    +/-   ##
========================================
  Coverage   75.87%   75.87%            
========================================
  Files         387      393     +6     
  Lines       34918    35330   +412     
========================================
+ Hits        26493    26806   +313     
- Misses       6556     6614    +58     
- Partials     1869     1910    +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@EgeCaner EgeCaner force-pushed the feat/pruning-migration branch from 480986b to f403dea Compare May 4, 2026 09:10
@EgeCaner EgeCaner requested a review from rodrodros May 4, 2026 12:52
@EgeCaner EgeCaner force-pushed the feat/history-pruner branch from 15a5437 to 22030a1 Compare May 5, 2026 12:51
@EgeCaner EgeCaner force-pushed the feat/pruning-migration branch 2 times, most recently from 2220c26 to 6907a65 Compare May 5, 2026 13:00
Comment thread migration/historyprunner/committer.go
Comment thread migration/historyprunner/committer.go
Comment thread migration/historyprunner/copy.go
@EgeCaner EgeCaner force-pushed the feat/history-pruner branch from 4c820cc to b899d50 Compare May 7, 2026 17:06
@EgeCaner EgeCaner force-pushed the feat/pruning-migration branch from 6907a65 to 32ceaff Compare May 7, 2026 17:25
Base automatically changed from feat/history-pruner to main May 7, 2026 18:12
@EgeCaner EgeCaner force-pushed the feat/pruning-migration branch from d3606cb to 8ef5a41 Compare May 8, 2026 09:39
@EgeCaner EgeCaner force-pushed the feat/pruning-migration branch from 41c23ee to b45d05e Compare May 8, 2026 14:38
@rodrodros rodrodros enabled auto-merge (squash) May 9, 2026 14:39
@rodrodros rodrodros disabled auto-merge May 9, 2026 14:39
Comment on lines +106 to +109
l1Head, err := core.GetL1Head(database)
if err != nil {
return nil, err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have blocks but no l1 head I agree we should stop, but does a freshly synced node with no L1 head yet will fail this right? Can not having an L1 Head but being on a fresh new head cause bad UX 🤔

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it, it will crash, additionally with no extra info such as, why is it crashing:

16:13:19.401 09/05/2026 +01:00	ERROR	node/node.go:662	Error while running migrations	{"error": "key not found"}

Copy link
Copy Markdown
Contributor

@rodrodros rodrodros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks like really good work. Just sharing some questions and nitpicks

I am curious about the cases when we start the migration close to the prune threshold

  1. L1Head == L2Head, do we delete everything at this stage
  2. L1Head - 1 == L2Head, we have only one block?

Comment on lines +85 to +90
func (m *Migrator) Migrate(
ctx context.Context,
database db.KeyValueStore,
network *networks.Network,
logger log.StructuredLogger,
) ([]byte, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add descriptive errors to each of the err returns for this function, otherwise error cause is obscure in many cases. Other functions are optional, but this is the main one

Comment on lines +54 to +73
return err
}
}
}

for addr := range diff.Nonces {
if err := move(
fillNonceHistoryKey(scratch.historyKey[:], &addr, blockBE),
fillNonceScratchKey(scratch.scratchKey[:], &addr, blockBE),
); err != nil {
return err
}
}

for addr := range diff.ReplacedClasses {
if err := move(
fillClassHashHistoryKey(scratch.historyKey[:], &addr, blockBE),
fillClassHashScratchKey(scratch.scratchKey[:], &addr, blockBE),
); err != nil {
return err
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error wrapper for this as well please

Comment on lines +186 to +189
batch := database.NewBatch()
if err := pruner.PruneBlockDataUpto(batch, oldestBlockKept); err != nil {
return err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would benefit from error wrapping

Comment on lines +186 to +195
batch := database.NewBatch()
if err := pruner.PruneBlockDataUpto(batch, oldestBlockKept); err != nil {
return err
}
if err := wipeReverseLookupBuckets(batch); err != nil {
return err
}
if err := batch.Write(); err != nil {
return err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that this batch can potentially be very big 🤔, you're pruning until the first block to keep, which the bigger mainnet, the lot more changes that the batch is going to hold.

Add that to wipeReversaeLookupBuckets(batch) which I am not sure how much they add but that also compounds.

I think for safety, even if now we are good, think of a huge mainnet (10Tb of data), would this still hold or would it be better to batch it through?

@rodrodros
Copy link
Copy Markdown
Contributor

I am thinking that instead of having prune mode with 0 as a default we can put a higher number (?). This can not make this edge cases the default?

Comment on lines +293 to +301
batch := database.NewBatch()
err = core.WriteBlockHeaderNumberByHash(batch, header.Hash, oldestBlockKept-uint64(1))
if err != nil {
return nil, false, err
}
if err := batch.Write(); err != nil {
return nil, false, err
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be part of setupRestorer, what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants