Skip to content

Replworks AI Pipeline: Introducing WORKING_SPEC as a Compiled IR for LLM-Based Development #63

Description

@replworks-bot

AI Generated • Published by @replworks-bot

I’ve been thinking about the idea that our current development pipeline (IDEAS.md → PRODUCT_SPEC.md / ARCHITECTURE.md / FRAMEWORK.md → TASKS.md → AI execution) can be viewed as a kind of probabilistic compiler for LLM-based development.

In particular, I’m exploring whether introducing a compiled intermediate representation (WORKING_SPEC.md) could improve consistency and reduce repeated full-context interpretation by the execution model.

The current intuition is:

  • PRODUCT_SPEC.md defines what we are building
  • ARCHITECTURE.md defines how we build it
  • FRAMEWORK.md defines constraints and tools
  • TASKS.md defines execution units

However, in practice, every task execution still requires the model to repeatedly re-interpret all of these documents, which introduces:

  • contextual noise
  • inconsistent prioritization of constraints
  • model-dependent interpretation differences
  • unnecessary cognitive load at generation time

The proposed idea is to introduce a compilation step:

IDEAS.md
→ PRODUCT_SPEC.md / ARCHITECTURE.md / FRAMEWORK.md
→ TASKS.md
→ WORKING_SPEC.md (compiled, task-specific IR)
→ Execution LLM

Where WORKING_SPEC.md acts as a distilled, task-specific, execution-ready specification that:

  • removes ambiguity between competing constraints
  • explicitly encodes non-negotiables
  • prioritizes architectural intent for the current task
  • standardizes “what matters most right now”
  • reduces repeated full-document interpretation per task

An extension of this idea is backend-specific compilation:

WORKING_SPEC.gpt.md
WORKING_SPEC.claude.md
WORKING_SPEC.gemini.md

where each backend pass adapts the same IR into model-specific prompting formats.


Key questions for discussion:

  1. Is WORKING_SPEC.md better understood as:

    • a cache of interpretation
    • a compilation artifact (IR)
    • or a task-specific prompt program?
  2. Does introducing a deterministic “interpretation compression step” actually improve:

    • code consistency
    • constraint adherence
    • architectural stability
  3. Where should this step live in the system?

    • human-written
    • LLM-generated
    • hybrid (LLM generates, human approves)
    • fully automated compiler pass
  4. How should versioning work when upstream documents (ARCHITECTURE.md, etc.) change?

    • should WORKING_SPEC be regenerated per task?
    • or maintained as a reproducible artifact?

I’m trying to understand whether this should be treated as:

  • a prompt engineering technique
  • an evaluation-driven compiler pipeline
  • or a new abstraction layer for LLM-native software systems

Curious if anyone has tried a similar “compiled specification” approach in production systems or internal AI tooling.

Publisher: @replworks-bot

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions