This file provides guidance to Claude Code when working with the Osiris repository.
Osiris MVP is an LLM-first conversational ETL pipeline generator that uses AI to understand intent, discover schemas, generate SQL, and create YAML pipelines - replacing template-based approaches with intelligent conversation.
- User starts conversation:
osiris chat - AI discovers database: Profiles tables and schemas
- User describes intent: Natural language request
- AI generates pipeline: Creates YAML pipeline
- Human validates: Reviews before execution
- Optional execution: Approved pipelines run locally or in E2B cloud
- Version: v0.5.4 PRODUCTION READY (October 2025)
- Testing: 1577+ tests passing (98.1% pass rate)
- Coverage: 78.4% overall (85.1% adjusted)
- Features: E2B Integration, Component Registry, Rich CLI, AIOP System, MCP v0.5.4
# Create and activate environment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Initialize (sets base_path to current directory)
cd testing_env
python ../osiris.py init# Core operations
make chat # Main conversational interface
osiris run pipeline.yaml --e2b # Run in E2B cloud sandbox
osiris logs aiop --last # Export AIOP for LLM analysis
# MCP operations (v0.5.4+)
osiris mcp run --selftest # Server self-test (<1.3s)
osiris mcp connections list --json
osiris mcp aiop list --json
# Development
make fmt # Auto-format code
make test # Run tests
make lint # Lint checks
make security # Security checks
# Pro Mode (custom LLM prompts)
osiris dump-prompts --export # Export prompts
# Edit in .osiris_prompts/
osiris chat --pro-mode # Use custom promptsSkills are specialized tools invoked via the Skill tool when needed:
- codex: Interact with OpenAI Codex CLI for second opinions, multi-model analysis, and structured output generation. Useful for code review from different AI perspective, architectural validation, or when you need structured JSON responses with schemas.
Usage: "Use the codex skill to [task]" or "Invoke codex skill"
For building new Osiris components with AI assistance:
Entry Point: docs/developer-guide/ai/START-HERE.md
- Task-based routing for component development
- Decision trees (API type, auth, pagination)
- Working recipes (REST, GraphQL, SQL)
- 57 validation rules (COMPONENT_AI_CHECKLIST.md)
Key Documentation:
- E2B compatibility:
docs/developer-guide/ai/e2b-compatibility.md(792 lines, 100% coverage) - Error patterns:
docs/developer-guide/ai/error-patterns.md(18+ common errors with fixes) - Dependencies:
docs/developer-guide/ai/dependency-management.md(requirements.txt, venv, E2B) - Recipes:
docs/developer-guide/ai/recipes/(REST, GraphQL, SQL templates)
Coverage: 98% of critical topics (metrics, secrets, filesystem contract, data passing, E2B, dependencies)
osiris/core/- LLM-first functionality (agent, discovery, state, AIOP)osiris/connectors/- Database adapters (MySQL, Supabase)osiris/drivers/- Runtime implementationsosiris/remote/- E2B cloud executionosiris/cli/- Rich-powered CLI with helpers for code reuseosiris/mcp/- Model Context Protocol server with CLI-first securitycomponents/- Component specs with x-connection-fields for override control
- LLM-First: AI handles discovery, SQL generation, pipeline creation
- Security-First: MCP zero-secret access via CLI delegation, x-connection-fields prevent credential overrides
- DRY Code: Shared helpers eliminate duplication
- Progressive Discovery: Intelligent schema exploration
- Human Validation: Expert approval required
- Agent-First Workflow: When the work plan allows, prefer using sub-agents (Task tool) for complex, multi-step tasks. This enables parallel execution, specialized expertise, and better resource management. Use sub-agents for exploration, testing, documentation, and any task that can be delegated.
MCP Server (NO SECRETS) → CLI Bridge → CLI Subcommands (HAS SECRETS)
- Code Reuse: MCP commands MUST reuse CLI logic via
osiris/cli/helpers/ - Secret Masking: Use
mask_connection_for_display()with spec-aware detection - Delegation Pattern: MCP tools call CLI via
run_cli_json() - Filesystem Contract: All paths MUST be config-driven, never hardcoded (
Path.home()forbidden) - Override Control: Component specs use
x-connection-fieldsto declare which fields can be overridden (seedocs/reference/x-connection-fields.md) - Handshake Instructions: MCP server provides usage instructions during initialize handshake, guiding LLMs on proper OML workflow and validation requirements
Example:
# Shared helper (single source)
from osiris.cli.helpers.connection_helpers import mask_connection_for_display
result = mask_connection_for_display(config, family=family) # Spec-awareComponents declare which fields come from connections and control override policies:
override: allowed- Infrastructure fields (host, port) can be overridden for testingoverride: forbidden- Security fields (password, token) cannot be overriddenoverride: warning- Ambiguous fields (headers) can override but emit warning
See docs/reference/x-connection-fields.md for full specification.
- All tests in
tests/- Never create tests elsewhere - Run from
testing_env/- Isolates artifacts - Real credentials only - Never use fake passwords
- Use fixtures -
tmp_pathfor temp directories - Add suppressions -
# noqafor intentional violations
# Test credentials (not real secrets)
config = {"password": "test123"} # pragma: allowlist secret
conn_str = "mysql://user:pass@localhost" # nosec B105
# Lazy imports for performance
import yaml # noqa: PLC0415
from osiris.core import Config # noqa: PLC0415, I001
# Required initialization order
setup_environment()
from osiris.mcp import Server # noqa: E402
# Complex CLI routers (naturally verbose)
def handle_command(args): # noqa: PLR0915Key suppression codes:
pragma: allowlist secret- detect-secrets suppressionnosec B105- Bandit password false positivesPLC0415- Import not at top-level (lazy imports)E402- Module import not at top (required order)PLR0915- Too many statements (>50 lines)
- 3-Layer Validation: Schema → Semantic → Runtime
- Business Logic: Validator enforces requirements (e.g., primary_key for replace/upsert modes)
- Reference: See
docs/reference/oml-validation.mdfor complete validation architecture
- ~43 tests skip without credentials (normal)
- Supabase isolation:
pytest -m "not supabase" - Format first: Run
make fmtbefore committing
- main: Protected, PR-only, always stable
- feature branches: All development work
- No direct commits to main
- Line length: 120 chars
- Pre-commit: Black, isort, Ruff, detect-secrets
- CI: Strict checks without auto-fix
- Emergency:
make commit-wipormake commit-emergency
Structure:
- ADRs:
docs/adr/- Short, immutable architecture decisions - Milestones:
docs/milestones/<slug>/- Initiative folders with:00-initiative.md- Index, goal, DoD, KPIs10-plan.md- Scope, risks, estimates20-execution.md- Checklists, PR links30-verification.md- Tests, metrics40-retrospective.md- Learningsattachments/- Reports, coverage data
- Reference:
docs/reference/- Stable specifications (e.g.,oml-validation.md,x-connection-fields.md) - AI Guides:
docs/developer-guide/ai/- AI-assisted component development- START-HERE.md - Entry point with task routing
- Decision trees, recipes, error patterns, E2B guide
- Design:
docs/design/- Work-in-progress technical designs - Reports:
docs/reports/<date>-<topic>/- One-off reports - Archive:
docs/archive/<slug>-v<semver>/- Completed initiatives
Rules:
- Every non-trivial ADR spawns an initiative folder
- Update initiative index when scope/DoD changes
- Link reports from initiative's
attachments/ - Archive completed initiatives to keep active folders clean
AI Operation Package for LLM-consumable debugging:
- Multi-layered: Evidence, Semantic, Narrative, Metadata
- Deterministic: Stable IDs and timestamps
- Secret-free: Automatic DSN redaction
- Size-controlled: ≤300KB packages
Critical: Never change core function signatures without review.
osiris initautomatically setsfilesystem.base_pathto absolute path of current directory- Example:
cd testing_env && osiris init→base_path: "/Users/padak/github/osiris/testing_env" - All paths MUST be config-driven - Never use
Path.home()or hardcode paths - MCP logs structure:
<base_path>/.osiris/mcp/logs/{audit,cache,telemetry}/ - Session artifacts:
<base_path>/{.osiris_sessions, .osiris_cache, logs, output}/
# Search order for .env files:
1. $OSIRIS_HOME/.env (if OSIRIS_HOME is set) - HIGHEST PRIORITY
2. Current directory
3. Project root
4. testing_env/ (when CWD)
# Setting secrets:
export MYSQL_PASSWORD="value" # Option 1: Export
echo "MYSQL_PASSWORD=value" > .env # Option 2: File
MYSQL_PASSWORD="value" osiris run ... # Option 3: Inline
# Using OSIRIS_HOME:
export OSIRIS_HOME="/path/to/project" # Ensures .env is loaded from project directory
osiris run pipeline.yaml # Works from any directory❌ DON'T:
- Duplicate code between CLI and MCP
- Access secrets in MCP process
- Create tests outside
tests/ - Use fake credentials
- Commit directly to main
✅ DO:
- Extract shared logic to helpers
- Delegate MCP to CLI subcommands
- Use pytest fixtures
- Run from
testing_env/ - Create PRs for all changes
- Ensure validator checks business logic (e.g., primary_key for replace/upsert modes)
- For component development, start with
docs/developer-guide/ai/START-HERE.md - Test components with
--e2bflag for cloud compatibility - Follow 57 validation rules in COMPONENT_AI_CHECKLIST.md
Drivers use DuckDB tables for data exchange between pipeline steps. All data flows through a shared pipeline_data.duckdb file per session.
Drivers receive a ctx object with these methods:
Available methods:
- ✅
ctx.get_db_connection()- Get shared DuckDB connection for data exchange - ✅
ctx.log_metric(name, value, **kwargs)- Log metrics to metrics.jsonl - ✅
ctx.output_dir- Path to step's artifacts directory (Path object)
NOT available:
- ❌
ctx.log()- Does NOT exist! Uselogger.info()instead
ALWAYS use Python's standard logging module. Never use ctx.log().
import logging
logger = logging.getLogger(__name__)
def run(*, step_id: str, config: dict, inputs: dict, ctx):
logger.info(f"[{step_id}] Starting extraction")
logger.error(f"[{step_id}] Failed: {error}")
# Metrics go via ctx
ctx.log_metric("rows_read", 1000)def run(self, *, step_id: str, config: dict, inputs: dict, ctx) -> dict:
conn = ctx.get_db_connection()
table_name = step_id
# Stream data in batches
for i, batch_df in enumerate(fetch_batches()):
if i == 0:
conn.execute(f"CREATE TABLE {table_name} AS SELECT * FROM batch_df")
else:
conn.execute(f"INSERT INTO {table_name} SELECT * FROM batch_df")
ctx.log_metric("rows_read", total_rows)
return {"table": table_name, "rows": total_rows}def run(self, *, step_id: str, config: dict, inputs: dict, ctx) -> dict:
conn = ctx.get_db_connection()
input_table = inputs.get("table") # From upstream step
query = config["query"] # SQL referencing input_table
conn.execute(f"CREATE TABLE {step_id} AS {query}")
row_count = conn.execute(f"SELECT COUNT(*) FROM {step_id}").fetchone()[0]
return {"table": step_id, "rows": row_count}def run(self, *, step_id: str, config: dict, inputs: dict, ctx) -> dict:
conn = ctx.get_db_connection()
table_name = inputs["table"] # From upstream step
# Read data from DuckDB
df = conn.execute(f"SELECT * FROM {table_name}").df()
# Write to destination (API, file, etc.)
write_to_destination(df, config)
ctx.log_metric("rows_written", len(df))
return {} # Writers return empty dictALWAYS test drivers in both environments before committing:
# 1. Local execution
osiris compile your_pipeline.yaml
osiris run --last-compile
# 2. E2B cloud execution
osiris run --last-compile --e2b --e2b-install-depsBoth environments use identical DuckDB-based data exchange - no special handling needed.
Every component needs x-runtime dependencies declared:
x-runtime:
driver: osiris.drivers.your_driver.YourDriver
requirements:
imports:
- pandas
- requests
packages:
- pandas
- requests>=2.0osiris/
├── cli/ # CLI with helpers/
├── connectors/ # Database connectors
├── core/ # Core functionality
├── drivers/ # Runtime drivers
├── mcp/ # MCP server
├── remote/ # E2B execution
└── runtime/ # Local execution
- Current: v0.5.4 (Production Ready)
- Model: Opus 4.1 (claude-opus-4-1-20250805)
- Python: 3.11+ required
- Test Suite: ~196 seconds full run
For detailed information, see documentation in docs/.