The Zeus Playbook
Project-agnostic methodology for AI-native startups. Validated against Anthropic's Founder's Playbook (May 2026), threaded through our GBrain, FEASY Room, War Room, and stage gates. One system, any project.
How We Operate
Founder = Orchestrator
M~=Vision. Zeus=Hands+Devil's Advocate. AI agents execute. The founder directs systems, not tasks. Every stage has a human checkpoint.
Validate Before Build
No line of code before the evidence justifies it. AI compresses build time β making it dangerously easy to skip validation. We don't.
Adversarial by Default
AI amplifies confirmation bias. At every gate, we run adversarial checks: argue against, find disconfirming evidence, competitor neglect exercise.
Context Prevents Drift
Agentic tech debt kills. Every session ends with updated context docs. 5 minutes of documentation = insurance against architectural drift that compounds silently.
Stage Gates, Not Vibes
We don't advance on gut feel. Each stage has explicit exit criteria: measurable, adversarially tested, with false-positive definitions built in.
Methodology > Project
Every framework, gate, and checklist is project-agnostic. What we learn on KNQX feeds the next project. The operating system compounds.
4-Tier Memory Architecture
Persistent knowledge that survives across sessions. The right tier for the right content β no duplication, no bloat.
If a fact lives in GBrain, it does NOT go in L1. L1 costs tokens every turn. GBrain costs zero until queried. The duplication test: search GBrain first. If found, don't duplicate.
Framework Arsenal
We don't just have opinions β we have instruments.
FEASY Room
8-lens rapid feasibility scoring. Market demand, size, competition, capital, cost, complexity, time-to-revenue, risk. Quantified + radar chart.
War Room
6 Thinking Hats. Full-spectrum perspective analysis: facts, emotion, caution, optimism, creativity, process. Run at every stage gate.
SCAMPER Room
7 mutation verbs. Substitute, Combine, Adapt, Modify, Put to Other Use, Eliminate, Reverse. For ideation, not analysis.
Stage Gates
Explicit exit criteria for IdeaβMVPβLaunchβScale. No advancing without passing. Adversarial check mandatory at every gate.
GBrain
Persistent knowledge graph. Domain expertise, competitive intel, project context. Semantic search + wikilinks. The compounding memory.
Brainstorming Gate
Before ANY creative work: explore intent, propose 2-3 approaches, get approval. Hard gate β no skipping to implementation.
gstack
Full dev lifecycle framework: plan reviews (/plan-ceo-review, /plan-eng-review, /plan-design-review), QA (/qa, /qa-only), code review (/review), ship (/ship), deploy verification (/canary), retros (/retro), context persistence (/context-save, /context-restore), and more.
| Room | Best For | When to Run |
|---|---|---|
| FEASY Room | Scoring feasibility, comparing ideas | Idea stage entry, gate review |
| War Room | Full-spectrum perspective on a decision | Every stage gate |
| SCAMPER Room | Forcing mutations on a concept | Post-validation ideation |
| Stage Gates | Go/no-go decision before advancing | End of each stage |
| gstack | Dev lifecycle execution: plan reviews, QA, ship, canary, retros | MVP stage onward (build/ship cycle) |
gstack: The Stage-by-Stage Toolkit
gstack is our execution layer β the CLI skills that turn the playbook's principles into action. Here's which gstack commands map to each stage:
| Stage | gstack Command | Purpose |
|---|---|---|
| π‘ Idea | /office-hours | YC-style forcing questions: biggest risk, simplest test, kill criteria |
| /plan-ceo-review | Founder-mode plan review: rethink the problem, find the 10x path | |
| /design-consultation | Understand the product, research competition, propose design direction | |
| π§ MVP | /plan-eng-review | Lock in execution plan: architecture, scope, sprint sequence |
| /qa | Systematic QA testing β find bugs, fix them, capture evidence | |
| /review | Pre-landing code review: security, quality, auto-fix | |
| /ship | Detect base branch, merge, run tests, create PR | |
| /context-save | Save session context β the 5-minute antidote to agentic tech debt | |
| π Launch | /canary | Post-deploy monitoring: watch live app, alert on regressions |
| /devex-review | Audit developer experience: time-to-hello-world, friction, docs | |
| /design-review | Visual audit: spacing, consistency, responsive, accessibility | |
| /benchmark | Performance regression detection | |
| π Scale | /retro | Weekly retrospective: commit analysis, velocity, blockers |
| /health | Code quality dashboard: test coverage, lint, type-check | |
| /document-generate | Auto-generate missing docs from code | |
| /document-release | Post-ship docs update: changelog, API docs, migration guides | |
| All Stages | /investigate | Systematic debugging with root cause analysis |
| /careful or /guard | Safety modes: destructive command warnings, full safety guardrails |
The playbook says "agentic tech debt kills." gstack's /context-save and /context-restore are the operational implementation of that principle. Every session ends with /context-save. Every new session starts with /context-restore. No context drift.
The 4-Stage Flow
AI compresses quarters into weeks. These gates are the braking system that prevents you from building fast in the wrong direction.
Validate Before You Build
The most important work happens before any code. AI makes it dangerously easy to skip this and jump to building. We don't.
What We Do
- Define problem hypothesis with testable specificity β "Finance managers at mid-market companies spend 3+ days/week on reconciliation because..." not "expense reporting is painful"
- Run FEASY Room (8-lens feasibility) on the hypothesis
- Run War Room (6 Hats) adversarial analysis
- Market research: competitor mapping by tier, trend analysis, TAM/SAM/SOM
- Customer discovery: who to talk to, what to ask, post-interview synthesis
- Only then: build a lightweight prototype for user conversations (NOT production)
What We Do NOT Do
- Build production before validation
- Treat a prototype as evidence (conversations are the evidence)
- Ask AI to validate our idea (it will find confirming evidence)
- Scale execution before validating problem-solution fit
Problem-Solution Fit
All 3 must be YES to advance:
- Is the problem real and specific? Can you name exactly who experiences it, how often, how severely, and what they currently do about it.
- Does your solution address the actual problem? Not the problem you assumed β the one validation revealed.
- Enough signal to justify building? Not certainty β but enough qualitative evidence that committing to MVP is reasoned, not faith.
- Competitor neglect exercise: "Make the most compelling argument for why a competitor in this space would succeed while we do not." Accept the uncomfortable answer.
- Confirmation bias check: Run War Room White Hat β what does the data actually say, stripped of narrative? If supporting evidence >> challenging evidence, ask: does that reflect reality, or what you hoped to find?
- Disconfirming evidence search: Actively seek: failed competitors, negative market signals, structural obstacles, customer behavior patterns that contradict your thesis.
"Ask AI to validate your startup idea and it will find supporting evidence; ask it to size your TAM and it will find the number that makes it look fundable. The antidote is the same tool, pointed in the opposite direction."
Build the Smallest Thing That Generates Real Evidence
The MVP stage is still an evidence-gathering exercise. We're gathering evidence about the solution, not the problem. Ship the smallest focused iteration that puts a real solution in front of real users.
What We Do
- Define architecture and scope BEFORE building β CLAUDE.md / project README with scope definition, what it does, what it deliberately does NOT do
- Build MVP with agentic coding β each session executes pre-made decisions, not impromptu new ones
- End every session: update context docs + log.md (5-min rule against agentic tech debt)
- Define measurement framework before first user: retention benchmarks, activation criteria, Day 7/30 targets, false positive definitions
- Security review before any user touches the product
- Manage feedback logistics: outreach, scheduling, structured intake, weekly synthesis
What We Do NOT Do
- Add features because they're easy with AI (zero-friction scope creep)
- Declare PMF from launch energy (friends, HN spike β product-market fit)
- Skip specs/architecture docs (agentic tech debt compounds invisibly)
- Track metrics after launch instead of before (choosing metrics to confirm rather than surface what's wrong)
Product-Market Fit
Genuine evidence, not launch energy. Pass at least ONE:
- Sean Ellis Test: 40%+ of active users say they'd be "very disappointed" if the product disappeared.
- Effort Test: Product pulls users in (post-PMF) instead of you pushing them (pre-PMF). When outreach, incentives, and personal follow-up stop being necessary for retention.
- Retention + Revenue: Users return without prompting. Day 7 and Day 30 hit pre-defined benchmarks. Revenue follows usage, not pushing.
- False positive definition: Before looking at data, write down what a FALSE positive looks like for this product (signups without activation, revenue without retention, initial enthusiasm that fades by week 6).
- Skeptic's audit: "What would a skeptic say about these numbers?" Run the numbers through Black Hat (War Room).
- Scope audit: Has the product sprawled beyond its original scope definition? If every feature took an afternoon, how many "just one more thing" have accumulated?
Prove the Business Deserves to Grow
MVP proved the product deserves to exist. Launch proves the business deserves to grow. Harden infrastructure, build operational systems, remove founder bottlenecks.
What We Do
- Technical debt audit: Systematic pass β identify where codebase is brittle, shortcuts that compound, thin test coverage. Feed findings β triage β sprint sequencing.
- Founder bottleneck audit: Map every recurring task, decision, and workflow that only happens because the founder personally remembers. Categorize: automate, delegate, or keep.
- Security & compliance hardening: Code-level review for SOC 2 / GDPR / PDPA. Not a one-time project β build into development cycle.
- Product management processes: Sprint cadence, spec templates, bug triage decision trees, weekly metrics briefs. Automated via Cron/agents.
- GTM foundation: Segmentation, messaging architecture, sales playbook β not just shipping features.
What We Do NOT Do
- Expand into new markets before nailing the original one (PMF goes to die)
- Stay in builder mode while the organization stalls (founder bottleneck)
- Treat security/compliance as deferrable (was fine at MVP, liability at Launch)
Repeatable, Hardened, Self-Running
All 3 must be TRUE:
- Growth is repeatable & channel-driven. CAC, LTV, payback period β understood unit economics.
- Product handles production workloads. Infrastructure hardened. Security & compliance in order. Reliability holds under real conditions.
- Ops run without founder bottlenecks. Processes exist. Automation in place. Not personally handling support, triage, sprint planning, reporting.
Sustainable Without the Founder
The product is still central, but the founder's attention expands to the company itself. Build defensible moats, mature operations, and organizational infrastructure that withstands scrutiny.
What We Do
- Delegate the operational layer: Bottleneck map β what stalls if the founder is unavailable for a week? Those are the handoff candidates.
- Scale tech into enterprise-grade: Documentation, SLAs, support playbooks, monitoring, incident response. Discord support β enterprise support.
- Build GTM function: Market segmentation, analyst relations, sales playbooks, content pipelines, CRM hygiene. Founder hustle β sustainable growth engine.
- Compound domain expertise into moat: Encode edge cases into the product. Every domain-specific gotcha is a test case. Test suite = map of moat.
- Create workflow lock-in: Integrations, automations, trained teams. The deeper the product embeds in daily operations, the harder it is to leave.
Systematic, Auditable, Defensible
- Systematic growth: Not founder-dependent. Auditable metrics.
- Organizational maturity: Governance, compliance, financial controls withstand external scrutiny.
- Defensible moat: "If a well-funded incumbent copied your product today, would your users stay?"
Typical exit forms: sustainable profitability, IPO-readiness, or acquisition.
How AI-Native Startups Fail
| Failure Mode | Stage | What It Looks Like | Our Defense |
|---|---|---|---|
| Mistaking building for validating | Idea | Prototype becomes "proof" without user conversations | Stage gate: conversations are the evidence |
| Confirmation bias on steroids | Idea | AI finds evidence for whatever you ask it to validate | War Room adversarial check at every gate |
| Premature scaling | Idea/MVP | Building ahead of demand because it's easy | Scope definition before build; feature amendment criteria |
| Agentic tech debt | MVP | Codebase drifts β no coherent mental model, pieces never designed to fit together | CLAUDE.md / README updated every session; 5-min doc rule |
| False PMF | MVPβLaunch | Launch energy confused with genuine retention | Sean Ellis 40% test, effort test, false positive definitions before data |
| Zero-friction scope creep | MVP | Every feature takes an afternoon, so why not? | Written scope: what it does NOT do + evidence threshold for additions |
| Founder bottleneck | Launch | Decisions pile up because only the founder can make them | Bottleneck audit: what stalls if you're gone a week? |
| Insecure by inexperience | MVP | Functional code β secure code. No natural feedback loop for vulnerabilities. | Security review before any user touches it |
| Expansion before readiness | Launch | New markets kill PMF β too many variables, lose ability to read data | Nail original market first. Expand only with data. |
Pivot Criteria
If evidence doesn't support the current direction after 3+ iteration cycles, run a diagnostic β not a hope cycle.
Three Questions to Answer
- Is there a segment responding differently? Often the right audience is already in your data, just underweighted.
- Is the gap a positioning problem or a product problem? Adjust messaging vs. rebuild.
- What would have to be true for PMF? Is that scenario realistic given what you're seeing?
Let the answers determine: adjust, pivot, or return to Idea stage.
The Moat Playbook
At scale, the question is: "If a well-funded incumbent copied your product today, would your users stay?" Build moats deliberately from day one.
Accumulated Depth
Edge cases, domain logic, industry gotchas that competitors can't replicate. Every bug workaround, every 340B-style exception = a test case = a map of your moat.
Integration Depth
Workflows built on top of your product: automations, trained teams, connected data sources. Switching becomes an operational project, not a product decision.
Data Flywheel
User behavioral signals compound improvement. Time-locked, context-specific, impossible to buy. A well-resourced competitor starting today can't replicate 2 years of usage data.
Workflow Lock-In
Prompts, automations, standardized outputs shaped around your product. The more integrations, the more surface area for building moat. APIs, webhooks, SDKs.
Sources & Integration
| Source | What We Adopted | Where It Lives |
|---|---|---|
| Anthropic Founder's Playbook (May 2026) | Stage gates, adversarial validation protocol, agentic tech debt concept, measurement framework, moat playbook, scope definition discipline | GBrain: founders-playbook-2026 |
| OPC Operating Model | M~=Vision, Zeus=MD+Hands+Devil's Advocate. Founder orchestrates, agents execute. | Core memory (L1) |
| 3-Tier Memory Architecture | 4-layer knowledge persistence. GBrain β Memory β README β log.md. Duplication test. | Skill: 3-tier-memory |
| FEASY Room | 8-lens feasibility scoring + radar chart. Run at Idea gate. | Skill: opportunity-scorecard |
| War Room (6 Hats) | Full-spectrum perspective coverage. Run at every gate. | Skill: opc-war-room |
| SCAMPER Room | 7 mutation verbs for ideation. Post-validation, not pre-. | Skill: scamper-room |
| Brainstorming Hard Gate | No skipping to implementation. Always design before build. | Skill: brainstorming |
| Stage Gates | Explicit exit criteria per stage. Adversarial check mandatory. | Skill: startup-stage-gates |
| Verification Before Completion | No claiming done until evidence confirms. | Skill: verification-before-completion |
Every project runs the same operating system. KNQX validates the gates, FEASY scores the ideas, War Room pressure-tests the decisions, GBrain remembers everything. The methodology gets sharper with every project β that's the real moat for our OPC.