frameworkvibe-codingai-toolsproductionclaude-code

Evaluate Before You Build: A Practitioner's Guide to Disciplined Vibe Coding

Vibe coding ships fast. Disciplined vibe coding ships code that lasts. A step-by-step walkthrough of the pre-build checkpoint that closes the production gap.

Published April 3, 2026 min read

Jasem Neaimi

AI Collaboration Researcher

Your AI coding assistant wrote a working backend service in under two hours. The demo looked great. The client signed off. Three months later, your team is debugging an outage caused by code nobody fully understood — because nobody asked what "production-ready" meant before the first prompt.

This is the production gap. Not a tool problem. A thinking problem.

In the first article in this series, we mapped the gap: vibe coding operates at the lower cognitive levels — remember, understand, apply — while production decisions live at the top: analyze, evaluate, create. The fix isn't to stop using AI. It's to think before you prompt.

This article is the how. A walkthrough of the pre-build checkpoint that turns vibe coding from a prototype generator into a production workflow.

The Framework: Evaluate, Analyze, Build

Most developers start a vibe coding session with a prompt like:

"Build me an API that connects to the payments provider and processes refunds."

That's a Level 3 instruction — Apply. It tells the AI what to build but nothing about why, what could go wrong, or what good looks like. The AI fills in every gap with generic defaults. Those defaults are where outages come from.

The fix is three phases, in order:

Evaluate — You lead. The AI waits. You answer six questions that define the boundaries of what gets built.

Analyze — You and the AI collaborate. Research patterns, compare options, surface trade-offs.

Build — The AI leads. You review. The code generated now reflects your decisions, not the AI's guesses.

The entire Evaluate phase takes ten minutes. It replaces months of accumulated technical debt.

The Six Questions

Before writing a single prompt that generates code, answer these:

What is the purpose? Not what it does — why it exists. "Process refunds" is a feature. "Reduce support tickets by letting customers self-serve refunds under $50" is a purpose. The AI can't know this. You do.
What already exists? What patterns, utilities, or conventions are already in the codebase? What should be reused? What should be avoided? If you skip this, the AI will reinvent what's already there — or worse, introduce a conflicting pattern.
What are the risks? What breaks if this fails? What data is sensitive? What are the edge cases that matter? A payment refund API that doesn't handle idempotency will double-refund customers. The AI won't think about that unless you do first.
What are the alternatives? Is an API the right approach, or should this be a background job? Should it call the provider directly or go through an existing service layer? Alternatives force you to justify the approach before committing to it.
What does success look like? Not "it works" — specific criteria. Response time under 200ms. Retry logic for provider timeouts. Audit log for every transaction. These become your review checklist after the AI generates code.
What should we save? What decisions, trade-offs, or learnings from this session should be documented for the next developer (or the next AI session) that touches this code?

None of these require technical depth. A product manager could answer them. A junior developer could answer them. The point is that someone answers them before the AI starts generating.

The Walkthrough: Building a Payment Integration

Here's what disciplined vibe coding looks like in practice, using an AI coding assistant in the terminal.

Phase 1: Evaluate (You Lead — 10 Minutes)

You open your project. Before typing any build instruction, you start with the checkpoint.

You tell the assistant: "I need to build a refund processing endpoint. Before we write any code, I want to work through the pre-build checkpoint."

A well-configured assistant asks you the six questions. You answer:

Purpose: Self-serve refunds for orders under $50, reducing support volume by an estimated 30%.

What exists: The project already has a payment service wrapper in the services directory. There's a middleware pattern for authentication. The team uses a specific error format across all API routes.

Risks: Double refunds if the endpoint isn't idempotent. Provider API has a 5-second timeout that could hang the request. Refund amounts need validation against the original order — can't refund more than what was paid.

Alternatives: Could be a background job triggered from the admin panel instead of a customer-facing endpoint. Decision: customer-facing endpoint, but with an async fallback — if the provider times out, queue the refund and notify the customer.

Success criteria: Idempotent (same request twice = one refund). Timeout handling with async fallback. Audit log entry for every attempt. Response under 500ms for the synchronous path. Uses existing error format and auth middleware.

Save: Document the timeout fallback pattern — this will apply to every provider integration we build.

Ten minutes. No code. But every architectural decision is made.

Phase 2: Analyze (Collaborate — 5 Minutes)

Now you bring the AI in as a research partner:

"Look at the existing payment service wrapper and the error handling middleware. Show me how the current patterns work so we can match them."

The AI reads the codebase, surfaces the conventions, and confirms the patterns you'll follow. You might discover that the existing wrapper already handles retries — one less thing to build. Or you might find that the error format changed two months ago and half the routes are using the old one. Better to know now.

Phase 3: Build (AI Leads, You Review)

Now — and only now — you give the build instruction. But it's not a vague prompt anymore. It's specific:

"Build a POST endpoint for processing refunds. Use the existing payment service wrapper. Apply the auth middleware. Make it idempotent using the order ID as the key. If the provider times out after 5 seconds, queue the refund as a background job and return a 202 with a tracking ID. Log every attempt to the audit table. Follow the existing error response format."

Every decision from Phase 1 is encoded in the prompt. The AI isn't guessing. It's executing your architecture.

The generated code won't be perfect. But the flaws will be implementation details — a missing null check, an off-by-one in validation — not architectural mistakes that take weeks to unwind.

The Anti-Pattern: Same Task, No Checkpoint

For contrast, here's the same task vibe-coded without the checkpoint:

"Build me a refund API endpoint."

The AI generates a working endpoint. It processes refunds. The demo works. But:

It creates a new payment client instead of using the existing wrapper — now there are two ways to talk to the provider, and they handle errors differently.
No idempotency. A retry from the frontend double-refunds the customer. You find out from the finance team, not from your logs.
The provider timeout isn't handled. The endpoint hangs for 30 seconds, the load balancer kills it, the customer sees a generic error, and the refund may or may not have gone through.
No audit trail. When finance asks "what happened to order #4471," nobody can answer without digging through provider logs.

Every one of these gaps existed because nobody asked the six questions. The AI did exactly what it was told. The problem is what it wasn't told.

Make It Automatic: The Pre-Build Checkpoint

The six questions work on paper. They work better when they're built into your workflow.

If you use an AI coding assistant that supports project-level instructions, you can encode the checkpoint directly. Add this to your project configuration:

## Pre-Build Checkpoint

Before implementing any new feature or significant change, ask me
these questions and wait for my answers before generating code:

1. What is the purpose of this change?
2. What existing code or patterns should we reuse?
3. What could break? What are the risks?
4. What alternatives did we consider?
5. How will we know this works? (success criteria)
6. What should we document for next time?

Do not generate implementation code until I have answered all six.

Drop that into your project's AI configuration file. Your assistant will ask before it builds. Every session. Every feature. The checkpoint becomes automatic — not something you have to remember.

For the full interactive version — with codebase scanning, selectable options, and domain-adapted questions for any type of decision — the complete skills are open source at github.com/jneaimi/evaluate-before-you-build.

The Wider Pyramid

Vibe coding has widened the developer pyramid more than any technology shift in decades. More people can build. That's unambiguously good.

But wider entry doesn't mean the code holds. The developers who last — who ship products that survive their first hundred users, their first security audit, their first on-call rotation — are the ones who evaluate before they build.

The thinking is the discipline. The AI is the tool. Don't confuse which one carries the weight.

Get new insights

Subscribe for the latest research and frameworks, delivered to your inbox.