THOUGHTS

Building With Specs, Not Prompts: How I Turn Ideas Into Executable Projects

A spec-first workflow for AI-assisted development — using Claude, Obsidian, and an LLM wiki to turn rough ideas into executable plans and preserve the outcome of each cycle.

A lot of AI-assisted development still happens one prompt at a time. That approach is fast for small experiments, but it starts to break once a project needs planning, consistency, and a way to carry decisions across multiple cycles. Spec-driven development offers a better model: define the work through structured artifacts first, then use AI to execute against those artifacts instead of improvising from chat history.

That is the workflow I have been building around Claude, Obsidian, and an LLM wiki. The seed idea comes from Andrej Karpathy’s LLM Wiki pattern, and the system I built around it is a personal development knowledge base where source material enters through raw/, gets distilled into wiki/, and becomes reusable project context over time.1 But the real point of the setup is not “memory” on its own. The point is to support a spec-first development process where each phase of a project produces a concrete artifact the next phase can build on.

In practice, that means I do not treat the prompt as the unit of work.

The unit of work is the spec.

The operating model

The system has three working surfaces.

The first is raw/, where I capture the material that defines or informs the project: idea notes, research, PRDs, feature plans, article clippings, repo notes, and ad hoc project context. The second is wiki/, where the LLM maintains a more durable layer of distilled knowledge: source summaries, project pages, decision pages, patterns, technologies, and journals. The third is the actual project repository, where implementation, testing, commits, and pull requests happen.

// RAW I write / drop · idea notes · research · PRDs · feature plans // WIKI Claude writes · summaries · patterns · decisions · cross-links DISTIL READ ON DEMAND // PROJECT REPO where the software gets built · features · tests · commits / PRs

That split matters because each surface has a different job. raw/ is where I write or collect inputs. wiki/ is where the LLM turns those inputs into connected knowledge I can query later. The project repo is where the software gets built. When those boundaries are clear, Claude has a much better environment to work inside because it is not trying to infer the whole project from one conversation.2

The flow is intentionally one-way at the knowledge level. Material comes in through raw/, gets distilled into wiki/, and future questions are answered by reading the maintained knowledge base instead of re-researching the same topic from scratch. That gives me a stable place to preserve the outcome of each development cycle, but the central mechanism is still specification: ideas become documents, documents become plans, and plans drive execution.

CRAFTED

To make that process repeatable, I use a workflow I call CRAFTED: Conceive, Research, Architect, Frame, Try, Evaluate, Deliver.

Those phases are not just labels. They map directly to a chain of project artifacts and to a clear split between the vault and the codebase. Early phases happen in the vault because they define what should be built. Middle phases happen in the repo because they are about building and validating it. Final output returns to the vault so the finished work becomes reusable context instead of disappearing into a commit history.

The idea behind CRAFTED is simple: every project should move from ambiguity to execution through a series of progressively sharper specs.

C CONCEIVE 00-idea.md VAULT R RESEARCH 01-research.md VAULT A ARCHITECT 02-prd.md VAULT F FRAME 03-plan.md + features/*.md VAULT → REPO T TRY implementation + history-ingest REPO E EVALUATE tests, validation vs. plan REPO D DELIVER wiki/projects/<slug>/* + decisions REPO → VAULT

Conceive

Every project starts with a rough idea, not a roadmap.

I create a project folder and capture the first version of the idea in raw/projects/<slug>/00-idea.md. That file is not meant to be polished. It exists to pin down the original shape of the problem before it gets flattened by implementation details. What is the idea. Why does it matter. Who might it be for. What makes it interesting enough to continue.

This stage is intentionally light, because early project thinking is usually fragile. The goal is not to force clarity too early. It is to make sure the project enters the system in a structured way so the later stages have something concrete to refine.

Typical Claude invocation:

> Stub raw/projects/repolens/00-idea.md from the idea template:
  problem, hypothesis, non-goals, why it's interesting. Don't
  polish it — I just want the shape pinned down.

If the idea has a soft edge worth pressing on, /grill-me is an optional next pass before research. It interviews me on the assumptions I didn’t realise I was making, so research starts from sharper questions.

> /grill-me   (optional — press on the idea note before research)

Research

Once the idea exists, I test it against reality.

This phase produces 01-research.md, usually with the help of a research skill that can perform multi-round web search and write out a landscape review with an honest verdict. That is an important detail: I do not want research that merely collects links. I want research that helps decide whether the idea is still worth building after seeing the market, the adjacent tools, the likely gaps, and the parts of the idea that are weaker than they first looked.

Good research changes the spec. It narrows scope, exposes false novelty, and forces better questions. By the end of this phase, the project should feel more grounded and less aspirational.

> /wiki-research raw/projects/repolens/00-idea.md
> Produce 01-research.md: market landscape, adjacent tools, likely
  gaps, and a verdict on whether to continue. Cite every claim.

Architect

After research, I turn the project into a product definition.

This is where 02-prd.md gets written. The PRD defines the problem, the target user, the core workflow, the scope of the first version, the non-goals, and the constraints that should shape the build. In a traditional workflow this might live in a doc tool or ticketing system; in mine, it lives alongside the rest of the lifecycle so it can feed directly into the later steps.

This phase is where speculation becomes commitment. Once the PRD is written, I can stop asking “what are we even building?” and start asking “what is the cleanest path to the first correct version?”

Which tool I reach for depends on size. For larger projects I lean on /to-prd, which walks a structured PRD template and asks the questions I’d otherwise miss. For smaller, well-defined features I open Claude Code’s plan mode (Shift+Tab) and let the PRD settle there.

> /to-prd
> From 00-idea.md and 01-research.md, draft 02-prd.md. Be explicit
  about v1 non-goals — I'd rather cut scope than carry it.

Frame

Frame is where the project leaves the vault and becomes executable.

At this point the work moves into the actual code repo, and the main task is to convert the PRD into an implementation plan. Claude Code’s workflow supports planning before editing, which fits this phase well because the model can propose steps before touching files.2 The result is a high-level 03-plan.md plus individual feature plans that break the project into bounded units of execution.

This is the most important transition in the workflow. The PRD answers what and why. The plan answers how. Once a feature has its own plan, dependencies, acceptance criteria, and scope boundaries, Claude is no longer guessing what success looks like. It is working against a defined artifact.

Same size call as Architect. For substantial features I run /superpowers:brainstorming — it forces the question-by-question shape a plan needs before any code is touched. For smaller, more obvious features, plan mode is enough.

> /superpowers:brainstorming
> Topic: feature shape for repolens v1, derived from 02-prd.md.
  Output: one features/<slug>.md per bounded unit, with scope,
  dependencies, acceptance criteria, and a phased build order.

Try

Try is the implementation phase.

This is where the code gets written in the project repo: features are built, branches move, commits accumulate, and the project starts to take real shape. But in a spec-driven workflow, implementation is never supposed to drift too far from the feature document that led to it. The point is not just to write code. The point is to execute against the planned shape of the work.

This is also where I want the system to preserve context from the development cycle itself. My repo includes a history-ingest skill designed to mine Claude sessions, update project state, advance feature statuses, and surface blockers or decisions back into the vault. That means the implementation trail can become part of the project record instead of staying trapped in ephemeral terminal sessions.

> Implement features/repository-ingestion.md, phase 1 only. Stop
  before moving to phase 2 so I can review.
> /claude-history-ingest
  → log this session, advance the feature status, surface decisions.

Evaluate

Evaluation happens in the code repo as well, because that is where the software has to prove itself.

This includes tests, debugging, validation against acceptance criteria, and the more qualitative question of whether the implementation still matches the intent of the spec. In a prompt-first workflow, evaluation often happens as a loose conversation after code already exists. In a spec-first workflow, evaluation is much tighter: did the feature do what the plan said it should do, and where did the spec itself turn out to be weak.

That second question matters a lot. Good evaluation does not just catch bugs. It improves the next version of the spec. If the implementation drifted, maybe the code was wrong. But sometimes the more interesting answer is that the plan was incomplete, overconfident, or blind to some constraint the build exposed.

> /wiki-query "symbol resolution drift in monorepos"
  → if it's been hit before, link the prior notes. If not, write
  today's investigation into raw/decisions/.

Deliver

Deliver is the phase where shipped work becomes reusable knowledge.

Once a feature is done, it gets promoted from the working feature doc in raw/projects/<slug>/features/ into a schema-compliant page under wiki/projects/<slug>/features/. The deliver step can also surface decision pages and reusable patterns worth preserving beyond the current project. This is what keeps the system from becoming just another planning layer with no long-term payoff.

At this point the project has completed a full development cycle. An idea became a research artifact, then a PRD, then a plan, then an implementation, then a validated result, and finally a durable reference. That is the loop I care about: not one perfect prompt, but a system where each cycle leaves the next one in a stronger position.

> /wiki-promote-feature repository-ingestion
  → carry through the decision about chunking strategy. If the
  shape is reusable, add a pattern page for "codebase-to-knowledge
  transformation".

Example project: RepoLens

A concrete example makes the workflow easier to see. RepoLens is a tool for turning a codebase into onboarding documentation: architecture notes, feature summaries, and source-grounded explanations of how a system is organized. It is a useful example because it has enough surface area to require research, a product definition, a multi-step implementation plan, and feature-level delivery.

Each CRAFTED phase produces a file. Below is what each one looks like in practice.

Conceive — raw/projects/repolens/00-idea.md

# RepoLens — idea note

## Problem
Teams move fast, repo context decays fast. Onboarding is rebuilt
from tribal knowledge instead of a maintained source of truth.

## Hypothesis
A tool that ingests a codebase and produces architecture notes,
feature summaries, and source-grounded explanations — kept fresh
as the repo evolves.

## Not (v0)
- Real-time indexing
- Auto-remediation / "fix it for me"
- Code review

## Why interesting
The artifact is a *living* doc, not a static export. The repo is
the source of truth; the doc is a derived view.

Research — raw/projects/repolens/01-research.md (excerpt)

## Landscape verdict
Crowded at the edges (devportals, code-summary tools), thinner in
the middle — "always-fresh, source-grounded onboarding docs" is
where the real angle sits. Continue.

## Closest neighbours
- Backstage — devportal, not source-grounded
- Sourcegraph Cody — source-grounded, not onboarding-shaped
- Mintlify — docs-as-code, no codebase ingestion

## Risks to spec
- "Always-fresh" is a much bigger commitment than v0 should make
- Onboarding doc quality is hard to evaluate automatically

Architect — raw/projects/repolens/02-prd.md (sketch)

# RepoLens PRD

Target user      : eng teams onboarding new developers
Core workflow    : ingest repo → detect structure → generate docs → human review
v1 scope         : single-repo TypeScript / Python projects
v1 non-goals     : monorepos, real-time updates, auto-PRs to docs
Hard constraints : local-first; no upload of source to external services
Success signal   : a new hire can answer "where does X live?" without asking

Frame — features/architecture-summarizer.md (excerpt)

# Feature: Architecture summarizer

## Scope
Take a repo tree + entry points → produce architecture.md with:
module map, data flow, key boundaries.

## Acceptance criteria
- Runs offline against the local repo
- Generated doc references real file paths
- A human reviewer can mark sections "looks right / looks wrong"
- Re-run keeps the human verdicts unless the code changed

## Phased build
1. Tree walk + entry-point detection
2. LLM pass to draft sections with citations
3. Verdict file + diff-aware re-run

Try — implementation in the repo

The work happens against features/architecture-summarizer.md, not against a free-form conversation. Phase 1 lands, gets reviewed, then phase 2 starts. claude-history-ingest runs in the background and writes session summaries back into wiki/projects/repolens/sessions/.

Evaluate — testing against the spec

## Verdict for architecture-summarizer v1
- AC1 (offline)              : pass
- AC2 (real file paths)       : pass
- AC3 (human verdict UI)      : pass
- AC4 (diff-aware re-run)     : FAIL on rename — file move resets verdict

→ Spec was incomplete. Add: AC5 — rename detection feeds prior verdicts forward.

The failure isn’t just a bug. It’s a missing line in the spec. That’s the kind of insight a spec-first loop is supposed to surface.

Deliver — wiki/projects/repolens/features/architecture-summarizer.md

# Architecture Summarizer (delivered)

Status     : shipped, v1
Plan       : [[features/architecture-summarizer]]
Decisions  : [[decisions/diff-aware-rerun]]
Pattern    : [[patterns/codebase-to-knowledge-transformation]]

## What changed about the spec
v1 shipped with AC5 (rename detection) added mid-build, after a
failure mode the original plan didn't see. The pattern page
captures the general shape ("derive a doc from a repo and keep
human verdicts attached to evolving code") so the next project
doesn't relearn it.

That trail — 00-idea01-research02-prdfeatures/* → verdict → promoted wiki entry — is the project’s permanent record. The next project starts with the pattern page already on hand, and the decision page is one wiki-query away.

Why this works better

The main advantage of this workflow is not that it makes AI look smarter. It is that it reduces drift.

Spec-first development creates stable checkpoints. The model works across them because the project has a structure it can keep returning to.

Prompt-first development tends to distribute important reasoning across temporary conversations. That makes it easy to move fast at the beginning and surprisingly hard to stay coherent later. Spec-first development gives Claude (and me) anchors: the idea file, the research artifact, the PRD, the plan, the feature spec, the evaluation trail, and the final promoted result.3

This is also why Obsidian and the LLM wiki are useful here. They are not there to romanticize note-taking. They are there to give the project lifecycle a durable file-based interface, one where the artifacts can be read, linked, reviewed, and updated over time. The wiki preserves what each cycle produced, but the center of gravity is still execution through specs.4

That matters more as projects become real. It is easy to vibe-code a toy project. It is much harder to build something that can survive research, planning, implementation, validation, and iteration without losing its shape. A spec-first workflow gives the model better instructions, but more importantly, it gives the project better boundaries.

Closing

The way I think about AI-assisted development has changed pretty sharply.

I no longer see the prompt as the main interface for building software. The prompt is only useful if the project already has a structure behind it. The real interface is the chain of artifacts that define the work: idea notes, research, PRDs, plans, feature files, tests, decisions, and promoted results. Claude helps me move through that chain, but the chain itself is what makes the work coherent.

That is what this workflow is trying to do. It turns rough ideas into executable projects by making specs the center of the process and by preserving the result of each development cycle in a form that future work can build on. Not prompts first. Specs first.


Footnotes

  1. dev-llm-wiki — the repo this workflow lives in. https://github.com/YoniRaviv/dev-llm-wiki

  2. Claude Code — Common workflows. https://code.claude.com/docs/en/common-workflows 2

  3. Martin Fowler — Exploring Gen AI: spec-driven development tools. https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html

  4. Agentpedia — Karpathy’s LLM wiki “idea file” pattern. https://agentpedia.codes/blog/karpathy-llm-wiki-idea-file