Skip to content

Query-Doctor/analyzer

Repository files navigation

Query Doctor Analyzer

PostgreSQL index recommendation tool that analyzes your queries and suggests optimal indexes.

Features

  • Works with any language, ORM, or query builder
  • Automatic index recommendations based on real query execution
  • Cost-based analysis using PostgreSQL EXPLAIN plans
  • Optional production statistics sync for accurate cost estimates
  • Support for TimescaleDB hypertables and materialized views
  • Non-destructive (all analysis done in a local copy of the schema)

Modes

The analyzer can run in two modes:

Server Mode

Run as an HTTP server for real-time query analysis during development.

docker run --pull always -t -p 2345:2345 ghcr.io/query-doctor/analyzer

The server runs on port 2345 and can be connected to from app.querydoctor.com.

CI Mode

Run as a GitHub Action to automatically analyze queries from your test suite and post index recommendations as PR comments.

There are a couple assumptions about your CI pipeline we make for this to work.

  1. There are database queries that hit up a real postgres database in your pipeline. The source is not important, it could be e2e, load or integration tests. The queries can be run in a rolled-back transaction and it will still work fine.
  2. The final schema (after migrations run) is available for analyzer to introspect. And that every table has at least 1 row in it as part of your db seed. We use the database to do extra work by testing your query against different index configurations with your production stats, but all of that work is done in a transaction that's always rolled back. Data is never modified
  3. Your postgres is configured with at least one of the supported query sources.

Using pg_stat_statements (simpler, works well with service containers):

shared_preload_libraries='pg_stat_statements'

Using auto_explain (captures full execution plans):

shared_preload_libraries='auto_explain'
auto_explain.log_min_duration=0
auto_explain.log_analyze=true
auto_explain.log_verbose=true
auto_explain.log_buffers=true
auto_explain.log_format='json'
logging_collector=on
log_directory='/var/log/postgresql' # or any path you like
log_filename='postgres.log' # or any name you like

Optional

You have a production database you can pull statistics from (using a query given by us).

CI Mode Setup

Currently we only support GitHub Actions but it would not be difficult to add support for other CI platforms like Azure Pipelines.

GitHub Actions

There are two ways to run postgres in GitHub Actions: as a service container, or by configuring the postgres instance that ships with ubuntu-latest runners.

Option A: Service container (recommended)

This approach gives you full control over the postgres version and configuration via the services block. Pass postgres flags directly with command:.

  1. Set up the workflow trigger. Include both pull_request (for PR analysis) and push to your main branch (to establish a baseline for comparison).
on:
  pull_request:
  push:
    branches: [main]
  1. Add a postgres service with pg_stat_statements loaded and a health check.
jobs:
  run:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres
          POSTGRES_DB: myapp_test
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        command: >-
          postgres
          -c shared_preload_libraries=pg_stat_statements
    steps:
      - uses: actions/checkout@v4
      - name: Enable pg_stat_statements
        run: psql -h localhost -U postgres -d myapp_test -c "CREATE EXTENSION IF NOT EXISTS pg_stat_statements;"
        env:
          PGPASSWORD: postgres
  1. Run your migrations and seed scripts, then your test suite.
      - name: Migrate
        run: pnpm run migrate && pnpm run seed
        env:
          POSTGRES_URL: postgres://postgres:postgres@localhost/myapp_test
      - name: Run integration tests
        run: pnpm run test:integration
        env:
          POSTGRES_URL: postgres://postgres:postgres@localhost/myapp_test
  1. Run the analyzer.
      - name: Run query-doctor/analyzer
        uses: query-doctor/analyzer@main
        env:
          GITHUB_TOKEN: ${{ github.token }}
          SITE_API_ENDPOINT: https://api.querydoctor.com
          SOURCE_DATABASE_URL: postgres://postgres:postgres@localhost/myapp_test
          TOKEN: ${{ secrets.QUERY_DOCTOR_TOKEN }}
  1. Add pull-requests: write permission to your job.
jobs:
  run:
    permissions:
      contents: read
      pull-requests: write
    runs-on: ubuntu-latest

Option B: System postgres on ubuntu-latest

ubuntu-latest runners ship with postgres pre-installed. This skips the service container overhead but requires modifying the system config to load extensions.

  1. Set up the workflow trigger (same as above).

  2. Configure and start postgres.

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Postgres
        run: |
          sudo tee -a /etc/postgresql/16/main/postgresql.conf <<EOF
            shared_preload_libraries = 'auto_explain'
            auto_explain.log_min_duration = 0
            auto_explain.log_analyze = true
            auto_explain.log_verbose = true
            auto_explain.log_buffers = true
            auto_explain.log_format = 'json'
            logging_collector = on
            log_directory = '/var/log/postgresql'
            log_filename = 'postgres.log'
          EOF
          sudo tee /etc/postgresql/16/main/pg_hba.conf > /dev/null <<EOF
            host all all 127.0.0.1/32 trust
            host all all ::1/128 trust
            local all all peer
          EOF
          sudo systemctl start postgresql.service
          sudo -u postgres createuser -s -d -r -w me
          sudo -u postgres createdb testing
          sudo chmod 666 /var/log/postgresql/postgres.log

you can change sudo -u postgres createuser -s -d -r -w me to create a new user with a name of your choosing and sudo -u postgres createdb testing to create a db with a different name.

  1. Run your migrations and seed scripts.
      - name: Migrate
        run: pnpm run migrate && pnpm run seed
        env:
          POSTGRES_URL: postgres://me@localhost/testing
  1. Run your test suite against the same database. You can do this with any tool and use any query builder or ORM you like.
  2. Run the analyzer.
      - name: Run query-doctor/analyzer
        uses: query-doctor/analyzer@main
        env:
          GITHUB_TOKEN: ${{ github.token }}
          SITE_API_ENDPOINT: https://api.querydoctor.com
          SOURCE_DATABASE_URL: postgres://me@localhost/testing
          TOKEN: ${{ secrets.QUERY_DOCTOR_TOKEN }}
  1. Add pull-requests: write permission to your job.
jobs:
  run:
    permissions:
      contents: read
      pull-requests: write
    runs-on: ubuntu-latest
...

Statistics Synchronization

To make sure that we can most accurately emulate your production database, we need access to its stats.

You can use the following function to dump the stats for your database. Copy paste this into psql to create the function.

Since Postgres also keeps track of things like "most common values in table", part of the statistics table includes copies of cells from all tables. include_sensitive_info

CREATE OR REPLACE FUNCTION _qd_dump_stats(include_sensitive_info boolean) RETURNS jsonb AS $$
SELECT
  json_agg(t)
  FROM (
    SELECT
  c.table_name as "tableName",
  c.table_schema as "schemaName",
  cl.reltuples,
  cl.relpages,
  cl.relallvisible,
  n.nspname as "schemaName",
  json_agg(
    json_build_object(
      'columnName', c.column_name,
      'dataType', c.data_type,
      'isNullable', (c.is_nullable = 'YES')::boolean,
      'characterMaximumLength', c.character_maximum_length,
      'numericPrecision', c.numeric_precision,
      'numericScale', c.numeric_scale,
      'columnDefault', c.column_default,
      'stats', (
        select json_build_object(
          'starelid', s.starelid,
          'staattnum', s.staattnum,
          'stainherit', s.stainherit,
          'stanullfrac', s.stanullfrac,
          'stawidth', s.stawidth,
          'stadistinct', s.stadistinct,
          -- slot 1
          'stakind1', s.stakind1,
          'staop1', s.staop1,
          'stacoll1', s.stacoll1,
          'stanumbers1', s.stanumbers1,
          -- slot 2
          'stakind2', s.stakind2,
          'staop2', s.staop2,
          'stacoll2', s.stacoll2,
          'stanumbers2', s.stanumbers2,
          -- slot 3
          'stakind3', s.stakind3,
          'staop3', s.staop3,
          'stacoll3', s.stacoll3,
          'stanumbers3', s.stanumbers3,
          -- slot 4
          'stakind4', s.stakind4,
          'staop4', s.staop4,
          'stacoll4', s.stacoll4,
          'stanumbers4', s.stanumbers4,
          -- slot 5
          'stakind5', s.stakind5,
          'staop5', s.staop5,
          'stacoll5', s.stacoll5,
          'stanumbers5', s.stanumbers5,
          -- non-anonymous stats
          'stavalues1', case when $1 then s.stavalues1 else null end,
          'stavalues2', case when $1 then s.stavalues2 else null end,
          'stavalues3', case when $1 then s.stavalues3 else null end,
          'stavalues4', case when $1 then s.stavalues4 else null end,
          'stavalues5', case when $1 then s.stavalues5 else null end
        )
          from pg_statistic s
        where
          s.starelid = a.attrelid
          and s.staattnum = a.attnum
      )
    )
  ORDER BY c.ordinal_position) as columns
FROM
    information_schema.columns c
JOIN
    pg_attribute a
    ON a.attrelid = (quote_ident(c.table_schema) || '.' || quote_ident(c.table_name))::regclass
    AND a.attname = c.column_name
JOIN
    pg_class cl
    ON cl.relname = c.table_name
JOIN
    pg_namespace n
    ON n.oid = cl.relnamespace
WHERE
    c.table_name not like 'pg_%'
    and n.nspname <> 'information_schema'
    and c.table_name not in ('pg_stat_statements', 'pg_stat_statements_info')
GROUP BY
    c.table_name, c.table_schema, cl.reltuples, cl.relpages, cl.relallvisible, n.nspname /* @qd_introspection */
) t;
$$ LANGUAGE sql STABLE SECURITY DEFINER;

Note: The function is defined with SECURITY DEFINER so that it can be called either manually, or automatically by the analyzer if you set up stats pull integration.

Regular PSQL

psql -d <yourdb> -At -F "" -c "select _qd_dump_stats(false)" > stats.json

Postgres in Kubernetes

This example uses cloudnative-pg, but it can apply to any pod that has access to psql as superuser.

kubectl exec <podname> -n cnpg-system -c postgres -- psql -d <yourdb> -At -F "" -c "select _qd_dump_stats(false)" > stats.json

About

Check SQL queries in your tests to see if they'll run slow in production

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages