Feature Request: Support Parallel Processing (Sharding) for DB and File Sources

# Support Parallel Processing (Sharding) for DB and File Sources

### Context

To support high-volume data imports, we need the ability to run multiple importer containers simultaneously. Each container should automatically process a unique slice of the data to avoid duplicates or locking.

### Requirements

The system should respect two standard environment variables:
- SHARD_COUNT: The total number of parallel containers (denominator).
- SHARD_INDEX: The unique ID of the current container (0 to SHARD_COUNT - 1).

### Approach

**Database sources** can apply the logic to the primary key (or configured field) using SQL (i.e. `WHERE MOD(record_id, SHARD_COUNT) = SHARD_INDEX`)

**File sources** such as CSV, apply the logic to line number during iteration (i.e. `if current_line_number % SHARD_COUNT == SHARD_INDEX: ...` 

### Configuration

No changes to config.yml should be required. This should be driven entirely by environment variables to allow for easy scaling in Docker Compose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support Parallel Processing (Sharding) for DB and File Sources #33

Support Parallel Processing (Sharding) for DB and File Sources

Context

Requirements

Approach

Configuration

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Support Parallel Processing (Sharding) for DB and File Sources #33

Description

Support Parallel Processing (Sharding) for DB and File Sources

Context

Requirements

Approach

Configuration

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions