A definition of done (DoD) is a checklist of conditions a feature must meet before it counts as shipped. For traditional teams, the list is short: code works, tests pass, reviewed, deployed. For AI-built features, the list needs to be longer — because AI tools routinely produce work that looks finished on screen but fails the moment a real user touches it.
This post explains why AI-built features need a stricter definition of done, gives you an acceptance criteria template you can use today, and lists the symptoms that tell you your team ships without one.
Why AI-built features need a stricter definition of done
AI coding tools — Cursor, Claude Code, Lovable, Bolt.new — optimize for visible output. They generate screens, wire up forms, and produce demos that look polished in minutes. But visible output is not the same as working software.
A traditional developer, finishing a signup flow, thinks about what happens when the email is already taken, what the user sees on a slow connection, whether the session persists after a page refresh, and whether the rest of the app still works after the change. An AI tool, given the same task, produces the happy path. The form appears. The button submits. The success message shows. Everything else — error states, edge cases, regressions in adjacent flows — is left as an exercise for whoever reviews the output.
This is not a flaw in the tool. It is a gap in the process. AI tools do not know when a feature is done because “done” is a product decision, not a code-generation problem. Without a definition of done adapted for AI-built features, your acceptance criteria default to “it looks right,” and that standard will cost you users.
Acceptance criteria template for AI-built features
Use this checklist after every feature an AI tool generates. It covers the gaps AI tools leave most often.
Functional completeness
- The happy path works end to end (form submits, data saves, confirmation appears).
- Error states exist for every input: empty fields, invalid formats, duplicate entries, server errors.
- Loading and disabled states prevent double submissions.
- The feature works on mobile viewports, not just desktop.
- Data persists after page refresh and across sessions.
Integration safety
- Authentication still works. The user can sign in, stay signed in, and reach only what they should.
- Navigation to and from the new feature works without dead ends or loops.
- Adjacent flows (onboarding, billing, settings, dashboards) behave the same as before the change.
- Shared components (headers, footers, sidebars) render correctly on the new screens.
Code quality
- No duplicated logic. The AI did not create a second version of something that already exists.
- Names are clear. Functions, variables, and files describe what they do.
- No dead code or commented-out blocks left behind by the generation process.
- The feature uses the project’s existing patterns for data fetching, state management, and styling.
Data and security
- User input is validated on the server, not just the client.
- Sensitive actions (delete, payment, role change) require confirmation.
- The feature does not expose data from other users or roles.
- Database changes are reversible or have a migration path.
Paste this list into your project documentation and run through it for every AI-generated feature before you call it shipped.
Example: acceptance criteria for a signup flow in AI-built features
Suppose your AI tool just generated a signup flow. Here is what the definition of done looks like in practice:
Happy path: User enters name, email, and password. Clicks “Sign Up.” Account is created. User lands on the dashboard, signed in.
Acceptance criteria the AI likely missed:
- Submitting with an existing email shows a clear error, not a crash or a blank page.
- Password requirements are enforced and communicated before submission, not after.
- The form disables the button during submission so the user cannot create duplicate accounts.
- On mobile, the form is usable without horizontal scrolling and the keyboard does not hide the submit button.
- After signup, refreshing the dashboard does not redirect to the login page.
- The existing login flow still works. Signing in with a valid account still reaches the dashboard.
- The password reset flow, if it exists, still sends emails and resolves correctly.
Each of these is a real failure mode we have seen in AI-generated signup flows. None of them show up in a quick demo. All of them show up the first week in production.
Signs your AI-built features ship without a definition of done
If you recognize three or more of these symptoms, your team is shipping features without meaningful acceptance criteria:
- Users report bugs you thought were finished features. The screen looked complete during your review, but it fails when someone enters unexpected input or uses a different device.
- Every new feature breaks something that used to work. The AI generated new code instead of modifying existing code, and the two versions now conflict.
- Login stops working after unrelated changes. Authentication is a common casualty because AI tools frequently regenerate auth-adjacent components without preserving session logic.
- Data disappears on refresh. The AI wired up local state instead of persistent storage, so the feature works during the session and resets afterward.
- Your demo works but your product does not. The happy path is the only path. Anything outside the script — a slow connection, a back-button press, an empty state — produces confusion or errors.
- You fix the same bug more than once. Without a checklist, the same gap reappears every time the AI generates new code that touches the same area.
- Mobile users churn faster than desktop users. The AI optimized for the viewport it was shown. If you only reviewed on desktop, mobile was never tested.
These symptoms share a root cause: the team treats AI output as finished work instead of a first draft that needs verification against explicit criteria.
How to introduce a definition of done without slowing down
Founders worry that checklists kill momentum. The opposite is true. A definition of done for AI-built features reduces rework, which is the real source of slowdown in AI-generated codebases. Every hour spent on a checklist saves multiple hours of bug-fixing, user complaints, and emergency patches.
Start small:
- Pick the five criteria that matter most to your product right now. For most early-stage apps, those are: error states exist, data persists, auth works, mobile works, adjacent flows unbroken.
- Run the checklist on one feature. Just one. See what it catches.
- Expand the list as your product grows. Add security criteria before launch. Add performance criteria when traffic increases. Add accessibility criteria when your user base diversifies.
The goal is not bureaucracy. The goal is to close the gap between what the AI produced and what your users need. A definition of done makes that gap visible, which is the first step toward closing it.
When your definition of done reveals too many gaps
If running this checklist on a feature produces more failures than passes, you are not dealing with a process problem. You are dealing with a codebase that needs stabilization.
AI-generated projects accumulate structural gaps faster than hand-coded ones because each generation session starts fresh, without full awareness of what the last session built. Over time, these gaps compound: duplicated logic, inconsistent patterns, auth flows that work differently on different screens.
At Spin by Fryga, we step into AI-built and vibe-coded projects at exactly this point. We audit the codebase, stabilize the flows users depend on, and establish the kind of structural consistency that makes a definition of done passable feature after feature. Your AI tools got you to traction. A steady hand keeps you there.