Add skipDuplicates parameter on @entity directive#6458
Conversation
Add kw::SKIP_DUPLICATES constant, skip_duplicates bool field to ObjectType, and parsing logic in ObjectType::new() defaulting to false when absent.
Added SkipDuplicatesRequiresImmutable error variant, bool_arg validation for skipDuplicates in validate_entity_directives(), and three test functions covering non-boolean value, mutable entity, and timeseries+skipDuplicates.
Add TypeInfo::skip_duplicates(), InputSchema::skip_duplicates(), and EntityType::skip_duplicates() following the is_immutable() three-layer delegation pattern. Object types return immutable && skip_duplicates; Interface and Aggregation types return false.
Add skip_duplicates: bool field to RowGroup struct alongside immutable, update RowGroup::new() to accept the parameter, and wire it from entity_type.skip_duplicates() in RowGroups::group_entry(). All other call sites (RowGroupForPerfTest, test helpers, example) pass false.
Modify RowGroup::append_row() so that when immutable=true and skip_duplicates=true, cross-block duplicate inserts and Overwrite/Remove operations log warnings and return Ok(()) instead of failing. Same-block duplicates remain allowed. Default behavior (skip_duplicates=false) is preserved exactly. Added Logger field to RowGroup/RowGroups with CacheWeight impl, threaded through all construction sites. 5 unit tests covering all scenarios.
Added skip_duplicates: bool field to Table struct in relational.rs, wired from EntityType::skip_duplicates() in Table::new(), copied in Table::new_like(), and defaulted to false in make_poi_table().
Added logger parameter to Layout::update() and Layout::delete() in relational.rs. When table.immutable && table.skip_duplicates, these methods now log a warning and return Ok(0) instead of returning an error. Default immutable behavior (skip_duplicates=false) is preserved. Updated all callers including deployment_store.rs and test files. Added 4 unit tests with SkipDupMink entity type to verify both skip_duplicates and default immutable behavior.
Add conditional ON CONFLICT (primary_key) DO NOTHING clause to InsertQuery::walk_ast() when table.immutable && table.skip_duplicates. This handles cross-batch duplicates where an entity committed in a previous batch is re-inserted due to the immutable entity cache skipping store lookups. Two unit tests verify: skip_duplicates inserts include ON CONFLICT, default immutable inserts do not.
Restructured Layout::insert() from if-let-Err to match to capture affected_rows from InsertQuery::execute(). Tracks expected vs actual row counts across both the main batch path and the row-by-row fallback path. Logs a warning when affected_rows < expected for skip_duplicates immutable tables.
Add end-to-end runner test exercising @entity(immutable: true, skipDuplicates: true) with duplicate entity inserts across blocks. - tests/runner-tests/skip-duplicates/: subgraph with Ping entity using skipDuplicates, block handler that saves same ID every block - tests/tests/runner_tests.rs: skip_duplicates() test syncs 4 blocks and verifies entity is queryable (indexing did not fail)
3a426e4 to
78c451c
Compare
lutter
left a comment
There was a problem hiding this comment.
One thing to consider here, too, is that there is really a strict hierarchy of mutability: mutable entities, immutable entities, and immutable with skipUpdates entities. That's currently expressed with 2 booleans. It might be clearer to express that with an enum
| warn!(self.logger, "Skipping duplicate insert for immutable entity"; | ||
| "entity" => row.key().to_string(), | ||
| "block" => row.block(), | ||
| "previous_block" => prev.block()); |
There was a problem hiding this comment.
That shouldn't be a warning; it's what the user asked for. I actually don't think we should log here and below and can therefore avoid giving a logger to the RowGroup. It seems these logs are mostly good during development to make sure what we expect to happen does happen.
| if self.skip_duplicates { | ||
| warn!(self.logger, "Skipping unsupported operation for immutable entity"; | ||
| "entity_type" => self.entity_type.to_string(), | ||
| "operation" => format!("{:?}", row)); |
There was a problem hiding this comment.
That's not worth a warning either - it is expected behavior
|
|
||
| fn skip_duplicates(&self) -> bool { | ||
| match self { | ||
| TypeInfo::Object(obj_type) => obj_type.immutable && obj_type.skip_duplicates, |
There was a problem hiding this comment.
At this point, we should have checked enough that we don't need the obj_type.immutable guard
| } | ||
|
|
||
| #[test] | ||
| fn validate_entity_directives_skip_duplicates_non_boolean() { |
There was a problem hiding this comment.
There's a whole setup where you can just create test schemas and say what result validation should provide in schema/test_schemas. Just add these as new test cases there. It's all driven by the badly named agg() test.
| "expected_rows" => expected_rows, | ||
| "affected_rows" => affected_rows, | ||
| "skipped" => expected_rows - affected_rows); | ||
| } |
There was a problem hiding this comment.
We shouldn't log this as a warning; maybe as a debug! log - there's nothing wrong, and we don't need to alarm anybody with this log. The message should make it clear that there is no reason to worry.
| let ids = group.ids().join(", "); | ||
| warn!(logger, "Skipping immutable entity update in store layer"; | ||
| "entity_type" => group.entity_type.to_string(), | ||
| "ids" => ids); |
| let ids = group.ids().join(", "); | ||
| warn!(logger, "Skipping immutable entity delete in store layer"; | ||
| "entity_type" => group.entity_type.to_string(), | ||
| "ids" => ids); |
Summary
Add
skipDuplicatesparameter to the@entitydirective for immutable entities.@entity(immutable: true, skipDuplicates: true)skips duplicate inserts with warnings instead of failing indexing. This unblocks subgraph composition on Amp-powered subgraphs where SQL queries can produce the same entities across block ranges.Changes
skipDuplicatesfrom@entitydirective (requiresimmutable: true)ON CONFLICT (id) DO NOTHINGfor skip-duplicates INSERT queries, with affected-rows loggingDefault behavior (
skipDuplicates: false) is unchanged - duplicates still fail indexing.