🌱 sqlseed

Declarative SQLite Test Data Generation Toolkit

One line of code, tens of thousands of rows. Zero-config smart generation, AI-powered precision tuning.

import sqlseed

# Just one line. Auto-infers schema, auto-selects strategy, auto-optimizes writes.
result = sqlseed.fill("test.db", table="users", count=100_000)
print(result)
# → GenerationResult(table=users, count=100000, elapsed=2.34s, speed=42735 rows/s)

💡 Why sqlseed?

In development and testing workflows, we often need to populate SQLite databases with large volumes of realistic test data. Traditional approaches either require writing verbose data generation scripts or maintaining hard-to-scale SQL fixtures. sqlseed solves this with a declarative approach:

Feature	sqlseed	Hand-written Scripts	SQL Fixtures
Zero-config smart generation	✅	❌	❌
Automatic FK maintenance	✅	Manual	Manual
100K+ rows	✅ Streaming	⚠️ OOM	❌
Column semantic inference	✅ 9-level strategy	❌	❌
Reproducible generation	✅ seed	⚠️ Manual	✅
AI-powered tuning	✅ LLM	❌	❌
Config reuse	✅ YAML	❌	❌

✨ Core Features

🚀 Zero-Config Smart Generation

Auto-infers database schema and selects the best generator for each column via a 9-level strategy chain. Column named email? Generates email addresses. Column named *_at? Generates timestamps. No configuration needed.

🎯 Declarative Fine-Grained Control

Precisely control each column's data generation strategy, constraints, and null ratio via Python API or YAML/JSON configuration.

🔗 Automatic FK Ordering

Topological sort auto-detects table dependencies. SharedPool cross-table value sharing maintains referential integrity with zero configuration.

🌊 Streaming Memory Safety

DataStream yields batches via Iterator[list[dict]]. 1 million rows use the same memory as 1,000 rows.

🧮 Expression Engine & Constraint Solving

Supports derived column computation (short_code = project_no[-8:]), unique constraint backtracking, and timeout protection against infinite loops.

🤖 AI First-Class Citizen

sqlseed-ai plugin uses LLM to analyze schema semantics, auto-generates YAML config suggestions with self-correction loop.

🧩 11 Lifecycle Hooks

pluggy-based plugin architecture covering every stage from provider registration to batch insertion.

📊 3-Tier PRAGMA Optimization

Intelligently switches between LIGHT / MODERATE / AGGRESSIVE write strategies based on data volume for maximum throughput.

📦 Installation

Basic

pip install sqlseed

Choose Data Engine

# Recommended: Mimesis (high performance, great locale support)
pip install sqlseed[mimesis]

# Optional: Faker (rich ecosystem)
pip install sqlseed[faker]

# Install all
pip install sqlseed[all]

Optional Plugins

# AI analysis plugin (requires openai SDK)
pip install sqlseed-ai

# MCP server (requires mcp SDK, lets AI assistants operate sqlseed)
pip install mcp-server-sqlseed

# MCP server + AI support (all-in-one)
pip install mcp-server-sqlseed[ai]

Docs Build (Developers)

pip install sqlseed[docs]   # mkdocs-material + mkdocstrings

📋 Full Dev Environment Setup

git clone https://github.com/sunbos/sqlseed.git
cd sqlseed

# Install core + all providers + dev dependencies
pip install -e ".[dev,all]"

# Optional plugins
pip install -e "./plugins/sqlseed-ai"
pip install -e "./plugins/mcp-server-sqlseed"

# Verify installation
pytest
ruff check src/ tests/
mypy src/sqlseed/

🚀 Quick Start

Try with Demo Database

Want to try sqlseed right away? Build the demo database:

python examples/build_demo_db.py

Then explore:

sqlseed preview examples/sqlseed_demo.db --table members --count 5
sqlseed inspect examples/sqlseed_demo.db --show-mapping
sqlseed fill examples/sqlseed_demo.db --table members --count 100

Get Started in 30 Seconds

Suppose you have a SQLite database app.db with a users table:

CREATE TABLE users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,
    email TEXT,
    age INTEGER,
    phone TEXT,
    created_at TEXT,
    is_active INTEGER DEFAULT 1,
    balance REAL
);

One line of code fills 10,000 rows of high-quality test data:

import sqlseed

result = sqlseed.fill("app.db", table="users", count=10_000)
print(result)
# → GenerationResult(table=users, count=10000, elapsed=0.52s, speed=19230 rows/s)

sqlseed automatically:

✅ Skips id (autoincrement PK)
✅ Skips is_active (has default value)
✅ name → generates real names
✅ email → generates email addresses
✅ age → generates integers 18–100
✅ phone → generates phone numbers
✅ created_at → generates datetime (matches *_at pattern)
✅ balance → generates floats

Fully zero-config. Smart inference for everything.

📖 Tutorials

Tutorial 1: Python API — Fine-Grained Control

For precise control over each column, declare generation strategies via the columns parameter:

import sqlseed

result = sqlseed.fill(
    "app.db",
    table="users",
    count=50_000,
    columns={
        # Shorthand: specify generator name directly
        "email": "email",
        "phone": "phone",

        # Full config: specify parameters
        "age": {"type": "integer", "min_value": 18, "max_value": 65},
        "balance": {"type": "float", "min_value": 0.0, "max_value": 100000.0, "precision": 2},
        "name": "name",

        # Random selection from candidate list
        "status": {"type": "choice", "choices": ["active", "inactive", "banned"]},
    },
    provider="mimesis",      # Use Mimesis engine
    locale="en_US",          # English locale
    seed=42,                 # Fixed seed for reproducibility
    clear_before=True,       # Clear table before generation
    enrich=True,             # Infer distribution from existing data
    transform="./transform_users.py",  # Custom transform per row
)
print(result)

Supported Generator Types

Generator	Description	Example Parameters
`string`	Random string	`min_length`, `max_length`, `charset`
`integer`	Integer	`min_value`, `max_value`
`float`	Float	`min_value`, `max_value`, `precision`
`boolean`	Boolean	—
`name`	Full name	—
`first_name`	First name	—
`last_name`	Last name	—
`email`	Email address	—
`phone`	Phone number	—
`address`	Address	—
`company`	Company name	—
`url`	URL	—
`ipv4`	IPv4 address	—
`uuid`	UUID	—
`date`	Date	`start_year`, `end_year`
`datetime`	Datetime	`start_year`, `end_year`
`timestamp`	Unix timestamp	—
`text`	Long text	`min_length`, `max_length`
`sentence`	Sentence	—
`password`	Password	`length`
`choice`	Pick from list	`choices`
`json`	JSON string	`schema`
`pattern`	Regex match	`regex`
`bytes`	Binary data	`length`
`username`	Username	—
`city`	City	—
`country`	Country	—
`state`	State/Province	—
`zip_code`	Zip/Postal code	—
`job_title`	Job title	—
`country_code`	Country code	—
`foreign_key`	FK reference	`ref_table`, `ref_column`, `strategy`
`skip`	Skip (use default/NULL)	—

Tutorial 2: Multi-Table Associations — Automatic FK Integrity

Use the context manager pattern to handle cross-table data dependencies:

import sqlseed

with sqlseed.connect("app.db", provider="mimesis", locale="en_US") as db:
    # Step 1: Fill parent table first
    db.fill("users", count=10_000, seed=42)

    # Step 2: Fill child table — sqlseed auto-detects FK constraints
    #         and picks random values from users.id for orders.user_id
    db.fill("orders", count=50_000, columns={
        "amount": {"type": "float", "min_value": 9.99, "max_value": 999.99, "precision": 2},
        "quantity": {"type": "integer", "min_value": 1, "max_value": 20},
        "status": {"type": "choice", "choices": ["pending", "paid", "shipped", "delivered"]},
    })

    # Step 3: View generation report
    print(db.report())
    # → Database: app.db
    # → ==================================================
    # →   users: 10000 rows
    # →   orders: 50000 rows

💡 Tip: If two tables share a column name (e.g., member_no), even without a declared FK constraint, sqlseed automatically maintains cross-table consistency via the SharedPool implicit association mechanism.

Explicit Cross-Table Associations (ColumnAssociation)

When the target column name differs from the source (e.g., department_id → id), or there's no FK constraint but you need an association, declare it explicitly via associations:

db_path: "app.db"
provider: mimesis

tables:
  - name: departments
    count: 5
    clear_before: true
  - name: employees
    count: 20
    clear_before: true

associations:
  - column_name: department_id     # Column name in the target table
    source_table: departments      # Source table providing values
    source_column: id              # Column name in source table (defaults to column_name)
    target_tables:                 # Target tables using this association
      - employees
    strategy: shared_pool          # Association strategy

This way, even without FOREIGN KEY (department_id) REFERENCES departments(id), department_id values will come from departments.id.

Tutorial 3: YAML Config-Driven Batch Generation

For complex multi-table scenarios, use YAML configuration:

1. Generate config template

sqlseed init generate.yaml --db app.db

2. Edit config file

# generate.yaml
db_path: "app.db"
provider: mimesis
locale: en_US
optimize_pragma: true

tables:
  - name: users
    count: 100000
    clear_before: true
    seed: 42
    columns:
      - name: username
        generator: name
      - name: email
        generator: email
      - name: phone
        generator: phone
      - name: age
        generator: integer
        params:
          min_value: 18
          max_value: 65
      - name: status
        generator: choice
        params:
          choices: [0, 1, 2]
        null_ratio: 0.05       # 5% chance of NULL

  - name: orders
    count: 500000
    batch_size: 10000          # 10K rows per batch, optimizes memory
    columns:
      - name: user_id
        generator: foreign_key
        params:
          ref_table: users
          ref_column: id
          strategy: random
      - name: amount
        generator: float
        params:
          min_value: 1.0
          max_value: 9999.99
          precision: 2
      - name: created_at
        generator: datetime
        params:
          start_year: 2024

3. Execute generation

sqlseed fill --config generate.yaml

Or in Python:

results = sqlseed.fill_from_config("generate.yaml")
for r in results:
    print(r)

Tutorial 4: Derived Columns & Expression Engine

sqlseed v2.0 introduces column dependency DAG and expression engine for computing derived columns:

# Project info table scenario
tables:
  - name: projects
    count: 10000
    columns:
      - name: project_no
        generator: pattern
        params:
          regex: "PRJ-\\d{6}"       # Project number pattern
        constraints:
          unique: true

      - name: short_code
        derive_from: project_no       # Depends on project_no
        expression: "value[-6:]"   # Last 6 chars
        constraints:
          unique: true

      - name: region_code
        derive_from: project_no
        expression: "value[-4:]"   # Last 4 chars

      - name: member_no
        generator: pattern
        params:
          regex: "M-\\d{4}"         # Member number pattern
        constraints:
          unique: true

How it works:

sqlseed builds a column dependency DAG: project_no → short_code, region_code
Topological sort determines generation order
Generates project_no first, then computes short_code via value[-6:]
If short_code unique constraint fails, backtracks to regenerate project_no

Expression Engine Functions (21 total)

Function	Usage	Description
`len(s)`	`len(value)`	Length
`int(s)`	`int(value)`	To integer
`str(s)`	`str(value)`	To string
`float(s)`	`float(value)`	To float
`hex(n)`	`hex(value)`	To hexadecimal
`oct(n)`	`oct(value)`	To octal
`bin(n)`	`bin(value)`	To binary
`abs(n)`	`abs(value)`	Absolute value
`min(*args)`	`min(a, b)`	Minimum
`max(*args)`	`max(a, b)`	Maximum
`upper(s)`	`upper(value)`	Uppercase
`lower(s)`	`lower(value)`	Lowercase
`strip(s)`	`strip(value)`	Trim both ends
`lstrip(s)`	`lstrip(value)`	Trim left
`rstrip(s)`	`rstrip(value)`	Trim right
`zfill(s, width)`	`zfill(value, 10)`	Zero-fill
`replace(s, old, new)`	`replace(value, "-", "")`	Replace
`substr(s, start, end)`	`substr(value, 0, 8)`	Substring
`lpad(s, width, char)`	`lpad(value, 8, "0")`	Left-pad
`rpad(s, width, char)`	`rpad(value, 8, "0")`	Right-pad
`concat(*args)`	`concat("PRE_", value)`	Concatenate
Slicing	`value[-8:]`	Python slice syntax
Math	`value * 2 + 1`	Basic arithmetic

⚠️ Safety: The expression engine is based on simpleeval with 5-second timeout protection. import, exec, and file I/O are not allowed.

Tutorial 5: Transform Scripts — Complex Business Logic

For complex business logic that can't be expressed declaratively, write Python transform scripts:

1. Write transform script

# transform_users.py
def transform_row(row, ctx):
    """Called for every generated row."""

    # Calculate VIP level based on age
    age = row.get("age", 0)
    if age >= 60:
        row["vip_level"] = 3
    elif age >= 40:
        row["vip_level"] = 2
    else:
        row["vip_level"] = 1

    # Normalize phone format
    phone = row.get("phone", "")
    if phone and not phone.startswith("+1"):
        row["phone"] = f"+1{phone}"

    return row

2. Use in CLI

sqlseed fill app.db --table users --count 10000 --transform transform_users.py

3. Use in YAML

tables:
  - name: users
    count: 10000
    transform: "./transform_users.py"

Tutorial 6: Preview & Debug

Preview data before generating at scale:

Python API:

rows = sqlseed.preview("app.db", table="users", count=5, seed=42)
# Also supports enrich and transform parameters
rows = sqlseed.preview("app.db", table="users", count=5, seed=42, enrich=True)
for row in rows:
    print(row)
# → {'name': 'John Smith', 'email': 'jsmith@example.com', 'age': 32, ...}
# → {'name': 'Jane Doe', 'email': 'jdoe@test.org', 'age': 28, ...}
# → ...

CLI (Rich table output):

sqlseed preview app.db --table users --count 5

# ┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
# ┃ name       ┃ email                ┃ age ┃ created_at          ┃
# ┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
# │ John Smith │ jsmith@example.com   │ 32  │ 2024-03-15 08:23:11 │
# │ ...        │ ...                  │ ... │ ...                 │
# └────────────┴──────────────────────┴─────┴─────────────────────┘

View column mapping strategy:

sqlseed inspect app.db --table users --show-mapping

# See what generation strategy sqlseed chose for each column
# ┏━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
# ┃ Column     ┃ Type    ┃ Nullable ┃ Generator    ┃ Params       ┃
# ┡━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
# │ id         │ INTEGER │ ✗        │ skip         │ {}           │
# │ name       │ TEXT    │ ✗        │ name         │ {}           │
# │ email      │ TEXT    │ ✓        │ email        │ {}           │
# │ age        │ INTEGER │ ✓        │ integer      │ {min: 18...} │
# │ ...        │ ...     │ ...      │ ...          │ ...          │
# └────────────┴─────────┴──────────┴──────────────┴──────────────┘

Tutorial 7: Snapshots & Replay

Save a successful generation config for exact replay later:

# Generate and save snapshot
sqlseed fill app.db --table users --count 10000 --seed 42 --snapshot
# → Snapshot saved: <cache_dir>/snapshots/YYYY-MM-DD_HHMMSS_users.yaml

# Replay anytime
sqlseed replay <cache_dir>/snapshots/YYYY-MM-DD_HHMMSS_users.yaml
# → GenerationResult(table=users, count=10000, elapsed=0.52s, speed=19230 rows/s)

Use cases:

🧪 Reproducible test data in CI/CD
📋 Consistent test environments across teams
🔄 Quick database state reconstruction during development

Tutorial 8: AI-Powered Configuration (sqlseed-ai Plugin)

Let LLM analyze your database schema and auto-generate optimal config suggestions:

# Install AI plugin
pip install sqlseed-ai

# Set API key
export SQLSEED_AI_API_KEY="your-api-key"
export SQLSEED_AI_BASE_URL="https://your-llm-api-endpoint"

# AI analysis and config generation
sqlseed ai-suggest app.db --table projects --output projects.yaml

# AI suggestions with self-correction (3 rounds by default)
sqlseed ai-suggest app.db --table projects --output projects.yaml --verify

# Specify model (defaults to most popular free model)
sqlseed ai-suggest app.db --table projects --output projects.yaml --model nvidia/nemotron-3-super-120b-a12b:free

# Skip cache
sqlseed ai-suggest app.db --table projects --output projects.yaml --no-cache

AI Workflow:

1. Extract schema context (columns, indexes, sample data, FK, distribution)
2. Build LLM prompt with few-shot examples
3. LLM returns JSON column config suggestions
4. AiConfigRefiner auto-validates config correctness
5. If errors found (unknown generator, type mismatch, etc.), sends correction request to LLM
6. Up to 3 self-correction rounds, outputs validated YAML config

💡 Environment Variables: Supports SQLSEED_AI_API_KEY, SQLSEED_AI_BASE_URL, SQLSEED_AI_MODEL. Also supports OPENAI_API_KEY / OPENAI_BASE_URL as fallback. Defaults to auto-selecting the most popular free model from OpenRouter (base_url https://openrouter.ai/api/v1). Set --model or SQLSEED_AI_MODEL to specify a model.

Tutorial 9: MCP Server Integration

Let AI assistants (Claude, Cursor, etc.) operate sqlseed directly via Model Context Protocol:

# Install MCP server
pip install mcp-server-sqlseed

# All-in-one: MCP server + AI support
pip install mcp-server-sqlseed[ai]

# Manual start (usually managed by MCP client)
python -m mcp_server_sqlseed

Configure MCP client (Claude Desktop example):

{
  "mcpServers": {
    "sqlseed": {
      "command": "mcp-server-sqlseed"
    }
  }
}

MCP Capabilities:

Type	Name	Description
📖 Resource	`sqlseed://schema/{db_path}/{table_name}`	Get table schema as JSON
🔍 Tool	`sqlseed_inspect_schema`	Inspect schema (columns, FK, indexes, samples, schema_hash)
🤖 Tool	`sqlseed_generate_yaml`	AI-driven YAML config generation with self-correction. Supports `api_key`/`base_url`/`model` overrides
⚡ Tool	`sqlseed_execute_fill`	Execute data generation (supports YAML config string, includes `enrich` option)

This means you can tell your AI assistant:

"Analyze the structure of the projects table in app.db, generate a YAML config, then fill 5000 rows."

The AI assistant will call sqlseed_inspect_schema → sqlseed_generate_yaml → sqlseed_execute_fill in sequence, without you writing any code.

Tutorial 10: Custom Provider Plugin

You can create your own data generation provider:

# my_provider.py
from __future__ import annotations
from typing import Any

from sqlseed.generators import UnknownGeneratorError

class MyCustomProvider:
    """Just implement the DataProvider Protocol. No base class required."""

    def __init__(self) -> None:
        self._locale: str = "en_US"

    @property
    def name(self) -> str:
        return "my_custom"

    def set_locale(self, locale: str) -> None:
        self._locale = locale

    def set_seed(self, seed: int) -> None:
        ...

    def generate(self, type_name: str, **params: Any) -> Any:
        if type_name == "string":
            return "custom_string"
        if type_name == "email":
            return "user@example.com"
        raise UnknownGeneratorError(type_name)

    # ... handle generator names you want to support
    # Full Protocol: src/sqlseed/generators/_protocol.py

To reuse the built-in generator name dispatch logic instead of hand-writing generate() routing, inherit BaseProvider and override selectively.

Registration method 1: via pyproject.toml entry-point (recommended)

[project.entry-points."sqlseed"]
my_custom = "my_provider:MyCustomProvider"

Registration method 2: via plugin hook

from sqlseed.plugins.hookspecs import hookimpl

class MyPlugin:
    @hookimpl
    def sqlseed_register_providers(self, registry):
        from my_provider import MyCustomProvider
        registry.register(MyCustomProvider())

🖥️ CLI Quick Reference

# ═══════════════════════════════════════
# 📋 Data Generation
# ═══════════════════════════════════════

# Fill data (--count required when not using --config)
sqlseed fill app.db --table users --count 10000

# Full parameters
sqlseed fill app.db -t users -n 100000 \
    --provider mimesis \
    --locale en_US \
    --seed 42 \
    --batch-size 10000 \
    --clear \
    --enrich \
    --snapshot

# YAML config-driven (count from config file)
sqlseed fill --config generate.yaml

# Transform script
sqlseed fill app.db -t users -n 10000 --transform transform.py

# Enable debug logging
SQLSEED_LOG_LEVEL=DEBUG sqlseed fill app.db -t users -n 10

# ═══════════════════════════════════════
# 🔍 Inspect & Preview
# ═══════════════════════════════════════

# Preview data (no write)
sqlseed preview app.db --table users --count 5

# List all tables
sqlseed inspect app.db

# View column mapping strategy
sqlseed inspect app.db --table users --show-mapping

# ═══════════════════════════════════════
# 📸 Snapshots & Replay
# ═══════════════════════════════════════

# Generate config template
sqlseed init generate.yaml --db app.db

# Replay snapshot
sqlseed replay <cache_dir>/snapshots/YYYY-MM-DD_users.yaml

# ═══════════════════════════════════════
# 🤖 AI Features
# ═══════════════════════════════════════

# AI suggestions (requires sqlseed-ai)
sqlseed ai-suggest app.db -t users -o users.yaml
sqlseed ai-suggest app.db -t users -o users.yaml --verify

# Specify API config
sqlseed ai-suggest app.db -t users -o users.yaml --api-key sk-xxx --base-url https://api.openai.com/v1

# Control self-correction
sqlseed ai-suggest app.db -t users -o users.yaml --max-retries 0   # Disable
sqlseed ai-suggest app.db -t users -o users.yaml --no-verify       # Skip verification

# Skip cache
sqlseed ai-suggest app.db -t users -o users.yaml --no-cache

🧠 9-Level Smart Column Mapping

One of sqlseed's core highlights is the ColumnMapper's 9-level strategy chain. Each column is matched by priority:

Level 1 │ Autoincrement PK    PK + AUTOINCREMENT / INTEGER → skip
        ▼
Level 2 │ User config         columns={"email": "email"} highest priority
        ▼
Level 3 │ Custom exact match  Rules registered via plugin hooks
        ▼
Level 4 │ Built-in exact      74 rules: email→email, phone→phone, age→integer...
        ▼
Level 5 │ DEFAULT check       Has default → skip / __enrich__ (when enrich=True)
        ▼
Level 6 │ Custom pattern      Regex rules registered via plugin hooks
        ▼
Level 7 │ Built-in pattern    26 regexes: *_at→datetime, *_id→foreign_key, is_*→boolean...
        ▼
Level 8 │ NULLABLE fallback   Nullable → skip / __enrich__
        ▼
Level 9 │ Type-faithful       VARCHAR(32)→max 32 chars, INT8→0~255, BLOB(1024)→1024 bytes

What this means:

Column user_email → Level 7 pattern *_email → email generator ✅
Column is_verified → Level 7 pattern is_* → boolean generator ✅
Column type VARCHAR(20) → Level 9 type fallback → max 20-char string ✅
Column with DEFAULT 1 → Level 5 → skip generation ✅
Column gender with DEFAULT 'male' → Level 4 exact match → choice generator (exact match takes priority over DEFAULT) ✅

🧩 Plugin System

sqlseed provides 11 hook points via pluggy, covering the full data generation lifecycle:

Hook	firstresult	Trigger
`sqlseed_register_providers`		Register custom data providers
`sqlseed_register_column_mappers`		Register custom column mapping rules
`sqlseed_ai_analyze_table`	✓	AI analyzes table schema (returns column config)
`sqlseed_pre_generate_templates`	✓	AI pre-computes candidate value pools
`sqlseed_before_generate`		Before data generation loop
`sqlseed_after_generate`		After data generation completes
`sqlseed_transform_row`		Per-row transform (hot path, mind performance)
`sqlseed_transform_batch`		Per-batch transform (supports chaining)
`sqlseed_before_insert`		Before each batch write to DB
`sqlseed_after_insert`		After each batch write to DB
`sqlseed_shared_pool_loaded`		After SharedPool registration (pool readable)

🏗️ Project Architecture

src/sqlseed/
├── __init__.py              # Public API (fill, connect, fill_from_config, preview)
├── core/                    # ===== Core Orchestration =====
│   ├── orchestrator.py      # DataOrchestrator main engine
│   ├── mapper.py            # ColumnMapper 9-level strategy chain
│   ├── schema.py            # SchemaInferrer — columns, indexes, distribution
│   ├── relation.py          # RelationResolver + SharedPool — FK & cross-table sharing
│   ├── column_dag.py        # ColumnDAG — column dependency graph + topological sort
│   ├── expression.py        # ExpressionEngine — safe expressions (simpleeval + timeout)
│   ├── constraints.py       # ConstraintSolver — unique backtracking
│   ├── transform.py         # TransformLoader — dynamic user script loading
│   └── result.py            # GenerationResult dataclass
├── generators/              # ===== Generator Layer =====
│   ├── _protocol.py         # DataProvider Protocol + UnknownGeneratorError
│   ├── registry.py          # ProviderRegistry (entry-point auto-discovery)
│   ├── base_provider.py     # Built-in base generators (zero dependencies)
│   ├── faker_provider.py    # Faker adapter
│   ├── mimesis_provider.py  # Mimesis adapter
│   └── stream.py            # DataStream streaming + constraint backtracking
├── database/                # ===== Database Layer =====
│   ├── _protocol.py         # DatabaseAdapter Protocol (ColumnInfo, ForeignKeyInfo, IndexInfo)
│   ├── sqlite_utils_adapter.py   # Default adapter
│   ├── raw_sqlite_adapter.py     # sqlite3 fallback adapter
│   └── optimizer.py         # PragmaOptimizer 3-tier optimization
├── plugins/                 # ===== Plugin Layer =====
│   ├── hookspecs.py         # 11 pluggy hook definitions
│   └── manager.py           # PluginManager
├── config/                  # ===== Config Management =====
│   ├── models.py            # Pydantic models (GeneratorConfig/TableConfig/ColumnConfig)
│   ├── loader.py            # YAML/JSON load & save
│   └── snapshot.py          # Snapshot save & replay
├── cli/                     # ===== CLI =====
│   └── main.py              # click commands (fill/preview/inspect/init/replay/ai-suggest)
└── _utils/                  # ===== Internal Utilities =====
    ├── sql_safe.py          # quote_identifier — SQL injection protection
    ├── schema_helpers.py    # AUTOINCREMENT detection
    ├── metrics.py           # MetricsCollector performance metrics
    ├── paths.py             # get_cache_dir — platform cache directory
    ├── progress.py          # Rich progress bar
    └── logger.py            # structlog logging

plugins/
├── sqlseed-ai/              # AI plugin — LLM-driven smart configuration
│   └── src/sqlseed_ai/      # SchemaAnalyzer, AiConfigRefiner, few-shot examples...
└── mcp-server-sqlseed/      # MCP server — AI assistant integration
    └── src/mcp_server_sqlseed/   # FastMCP tools (sqlseed_inspect_schema/sqlseed_generate_yaml/sqlseed_execute_fill)

🛠️ Development

# Run tests (with coverage)
pytest

# Lint
ruff check src/ tests/

# Auto-fix
ruff check --fix src/ tests/

# Type check
mypy src/sqlseed/

Tests cover all core modules, with path structure mirroring src/: test_core/, test_database/, test_generators/, test_plugins/, test_config/, test_utils/.

Dependencies

Package	Core Dependencies	Description
`sqlseed`	sqlite-utils, pydantic, pluggy, structlog, pyyaml, click, rich, typing_extensions, simpleeval, rstr	rstr used for `pattern` generator regex matching
`sqlseed[faker]`	+ faker>=30.0	Faker data engine
`sqlseed[mimesis]`	+ mimesis>=18.0	Mimesis data engine (recommended)
`sqlseed[docs]`	+ mkdocs-material, mkdocstrings	Documentation build
`sqlseed-ai`	sqlseed, openai>=1.0	AI plugin, auto-registered via entry-point
`mcp-server-sqlseed`	sqlseed, mcp>=1.0	MCP server, standalone CLI tool
`mcp-server-sqlseed[ai]`	+ sqlseed-ai	MCP server with AI support

📄 License

AGPL-3.0-or-later

🌱 sqlseed — Stop writing fixtures. Start generating data.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
plugins		plugins
src/sqlseed		src/sqlseed
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CHANGELOG.zh-CN.md		CHANGELOG.zh-CN.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🌱 sqlseed

Declarative SQLite Test Data Generation Toolkit

💡 Why sqlseed?

✨ Core Features

📦 Installation

Basic

Choose Data Engine

Optional Plugins

Docs Build (Developers)

🚀 Quick Start

Try with Demo Database

Get Started in 30 Seconds

📖 Tutorials

Tutorial 1: Python API — Fine-Grained Control

Supported Generator Types

Tutorial 2: Multi-Table Associations — Automatic FK Integrity

Explicit Cross-Table Associations (ColumnAssociation)

Tutorial 3: YAML Config-Driven Batch Generation

Tutorial 4: Derived Columns & Expression Engine

Expression Engine Functions (21 total)

Tutorial 5: Transform Scripts — Complex Business Logic

Tutorial 6: Preview & Debug

Tutorial 7: Snapshots & Replay

Tutorial 8: AI-Powered Configuration (sqlseed-ai Plugin)

Tutorial 9: MCP Server Integration

Tutorial 10: Custom Provider Plugin

🖥️ CLI Quick Reference

🧠 9-Level Smart Column Mapping

🧩 Plugin System

🏗️ Project Architecture

🛠️ Development

Dependencies

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages