OPC Operating System v1.0

The Zeus Playbook

Project-agnostic methodology for AI-native startups. Validated against Anthropic's Founder's Playbook (May 2026), threaded through our GBrain, FEASY Room, War Room, and stage gates. One system, any project.

Principles

How We Operate

🧠

Founder = Orchestrator

M~=Vision. Zeus=Hands+Devil's Advocate. AI agents execute. The founder directs systems, not tasks. Every stage has a human checkpoint.

🛡️

Validate Before Build

No line of code before the evidence justifies it. AI compresses build time — making it dangerously easy to skip validation. We don't.

⚔️

Adversarial by Default

AI amplifies confirmation bias. At every gate, we run adversarial checks: argue against, find disconfirming evidence, competitor neglect exercise.

📝

Context Prevents Drift

Agentic tech debt kills. Every session ends with updated context docs. 5 minutes of documentation = insurance against architectural drift that compounds silently.

🎯

Stage Gates, Not Vibes

We don't advance on gut feel. Each stage has explicit exit criteria: measurable, adversarially tested, with false-positive definitions built in.

🔄

Methodology > Project

Every framework, gate, and checklist is project-agnostic. What we learn on KNQX feeds the next project. The operating system compounds.

Infrastructure

4-Tier Memory Architecture

Persistent knowledge that survives across sessions. The right tier for the right content — no duplication, no bloat.

GBrain

Knowledge graph. Project facts, domain expertise, competitive intel. Queried on demand. Unlimited capacity.

Core Memory

Behavioral directives only. Who we are, how we communicate. Always in-context. ~400 chars target.

Project README

Active project context. Tech stack, decisions, NEXT_STEP. Read before every session.

log.md

Detailed session history. What was tried, what failed, decisions. Read only when deep context needed.

The Rule

If a fact lives in GBrain, it does NOT go in L1. L1 costs tokens every turn. GBrain costs zero until queried. The duplication test: search GBrain first. If found, don't duplicate.

Tools

Framework Arsenal

We don't just have opinions — we have instruments.

🎯

FEASY Room

8-lens rapid feasibility scoring. Market demand, size, competition, capital, cost, complexity, time-to-revenue, risk. Quantified + radar chart.

🎩

War Room

6 Thinking Hats. Full-spectrum perspective analysis: facts, emotion, caution, optimism, creativity, process. Run at every stage gate.

🔀

SCAMPER Room

7 mutation verbs. Substitute, Combine, Adapt, Modify, Put to Other Use, Eliminate, Reverse. For ideation, not analysis.

📋

Stage Gates

Explicit exit criteria for Idea→MVP→Launch→Scale. No advancing without passing. Adversarial check mandatory at every gate.

🧠

GBrain

Persistent knowledge graph. Domain expertise, competitive intel, project context. Semantic search + wikilinks. The compounding memory.

🎨

Brainstorming Gate

Before ANY creative work: explore intent, propose 2-3 approaches, get approval. Hard gate — no skipping to implementation.

⚡

gstack

Full dev lifecycle framework: plan reviews (/plan-ceo-review, /plan-eng-review, /plan-design-review), QA (/qa, /qa-only), code review (/review), ship (/ship), deploy verification (/canary), retros (/retro), context persistence (/context-save, /context-restore), and more.

Room	Best For	When to Run
FEASY Room	Scoring feasibility, comparing ideas	Idea stage entry, gate review
War Room	Full-spectrum perspective on a decision	Every stage gate
SCAMPER Room	Forcing mutations on a concept	Post-validation ideation
Stage Gates	Go/no-go decision before advancing	End of each stage
gstack	Dev lifecycle execution: plan reviews, QA, ship, canary, retros	MVP stage onward (build/ship cycle)

Execution Engine

gstack: The Stage-by-Stage Toolkit

gstack is our execution layer — the CLI skills that turn the playbook's principles into action. Here's which gstack commands map to each stage:

Stage	gstack Command	Purpose
💡 Idea	/office-hours	YC-style forcing questions: biggest risk, simplest test, kill criteria
	/plan-ceo-review	Founder-mode plan review: rethink the problem, find the 10x path
	/design-consultation	Understand the product, research competition, propose design direction
🔧 MVP	/plan-eng-review	Lock in execution plan: architecture, scope, sprint sequence
	/qa	Systematic QA testing — find bugs, fix them, capture evidence
	/review	Pre-landing code review: security, quality, auto-fix
	/ship	Detect base branch, merge, run tests, create PR
	/context-save	Save session context — the 5-minute antidote to agentic tech debt
🚀 Launch	/canary	Post-deploy monitoring: watch live app, alert on regressions
	/devex-review	Audit developer experience: time-to-hello-world, friction, docs
	/design-review	Visual audit: spacing, consistency, responsive, accessibility
	/benchmark	Performance regression detection
📈 Scale	/retro	Weekly retrospective: commit analysis, velocity, blockers
	/health	Code quality dashboard: test coverage, lint, type-check
	/document-generate	Auto-generate missing docs from code
	/document-release	Post-ship docs update: changelog, API docs, migration guides
All Stages	/investigate	Systematic debugging with root cause analysis
All Stages	/careful or /guard	Safety modes: destructive command warnings, full safety guardrails

Key Insight

The playbook says "agentic tech debt kills." gstack's /context-save and /context-restore are the operational implementation of that principle. Every session ends with /context-save. Every new session starts with /context-restore. No context drift.

Lifecycle

The 4-Stage Flow

AI compresses quarters into weeks. These gates are the braking system that prevents you from building fast in the wrong direction.

💡

Idea

"Is this worth building?"

🔧

MVP

"What should we build first?"

🚀

Launch

"Does the business deserve to grow?"

📈

Scale

"Is it sustainable without me?"

💡 Stage 1

Idea Stage

Validate Before You Build

The most important work happens before any code. AI makes it dangerously easy to skip this and jump to building. We don't.

What We Do

Define problem hypothesis with testable specificity — "Finance managers at mid-market companies spend 3+ days/week on reconciliation because..." not "expense reporting is painful"
Run FEASY Room (8-lens feasibility) on the hypothesis
Run War Room (6 Hats) adversarial analysis
Market research: competitor mapping by tier, trend analysis, TAM/SAM/SOM
Customer discovery: who to talk to, what to ask, post-interview synthesis
Only then: build a lightweight prototype for user conversations (NOT production)

What We Do NOT Do

Build production before validation
Treat a prototype as evidence (conversations are the evidence)
Ask AI to validate our idea (it will find confirming evidence)
Scale execution before validating problem-solution fit

🚧 Exit Gate — Idea → MVP

Problem-Solution Fit

All 3 must be YES to advance:

Is the problem real and specific? Can you name exactly who experiences it, how often, how severely, and what they currently do about it.
Does your solution address the actual problem? Not the problem you assumed — the one validation revealed.
Enough signal to justify building? Not certainty — but enough qualitative evidence that committing to MVP is reasoned, not faith.

⚔️ Adversarial Check (Mandatory Before Advancing)

Competitor neglect exercise: "Make the most compelling argument for why a competitor in this space would succeed while we do not." Accept the uncomfortable answer.
Confirmation bias check: Run War Room White Hat — what does the data actually say, stripped of narrative? If supporting evidence >> challenging evidence, ask: does that reflect reality, or what you hoped to find?
Disconfirming evidence search: Actively seek: failed competitors, negative market signals, structural obstacles, customer behavior patterns that contradict your thesis.

Anthropic Playbook Insight

"Ask AI to validate your startup idea and it will find supporting evidence; ask it to size your TAM and it will find the number that makes it look fundable. The antidote is the same tool, pointed in the opposite direction."

🔧 Stage 2

MVP Stage

Build the Smallest Thing That Generates Real Evidence

The MVP stage is still an evidence-gathering exercise. We're gathering evidence about the solution, not the problem. Ship the smallest focused iteration that puts a real solution in front of real users.

What We Do

Define architecture and scope BEFORE building — CLAUDE.md / project README with scope definition, what it does, what it deliberately does NOT do
Build MVP with agentic coding — each session executes pre-made decisions, not impromptu new ones
End every session: update context docs + log.md (5-min rule against agentic tech debt)
Define measurement framework before first user: retention benchmarks, activation criteria, Day 7/30 targets, false positive definitions
Security review before any user touches the product
Manage feedback logistics: outreach, scheduling, structured intake, weekly synthesis

What We Do NOT Do

Add features because they're easy with AI (zero-friction scope creep)
Declare PMF from launch energy (friends, HN spike ≠ product-market fit)
Skip specs/architecture docs (agentic tech debt compounds invisibly)
Track metrics after launch instead of before (choosing metrics to confirm rather than surface what's wrong)

🚧 Exit Gate — MVP → Launch

Product-Market Fit

Genuine evidence, not launch energy. Pass at least ONE:

Sean Ellis Test: 40%+ of active users say they'd be "very disappointed" if the product disappeared.
Effort Test: Product pulls users in (post-PMF) instead of you pushing them (pre-PMF). When outreach, incentives, and personal follow-up stop being necessary for retention.
Retention + Revenue: Users return without prompting. Day 7 and Day 30 hit pre-defined benchmarks. Revenue follows usage, not pushing.

⚔️ Adversarial Check (Mandatory Before Advancing)

False positive definition: Before looking at data, write down what a FALSE positive looks like for this product (signups without activation, revenue without retention, initial enthusiasm that fades by week 6).
Skeptic's audit: "What would a skeptic say about these numbers?" Run the numbers through Black Hat (War Room).
Scope audit: Has the product sprawled beyond its original scope definition? If every feature took an afternoon, how many "just one more thing" have accumulated?

🚀 Stage 3

Launch Stage

Prove the Business Deserves to Grow

MVP proved the product deserves to exist. Launch proves the business deserves to grow. Harden infrastructure, build operational systems, remove founder bottlenecks.

What We Do

Technical debt audit: Systematic pass — identify where codebase is brittle, shortcuts that compound, thin test coverage. Feed findings → triage → sprint sequencing.
Founder bottleneck audit: Map every recurring task, decision, and workflow that only happens because the founder personally remembers. Categorize: automate, delegate, or keep.
Security & compliance hardening: Code-level review for SOC 2 / GDPR / PDPA. Not a one-time project — build into development cycle.
Product management processes: Sprint cadence, spec templates, bug triage decision trees, weekly metrics briefs. Automated via Cron/agents.
GTM foundation: Segmentation, messaging architecture, sales playbook — not just shipping features.

What We Do NOT Do

Expand into new markets before nailing the original one (PMF goes to die)
Stay in builder mode while the organization stalls (founder bottleneck)
Treat security/compliance as deferrable (was fine at MVP, liability at Launch)

🚧 Exit Gate — Launch → Scale

Repeatable, Hardened, Self-Running

All 3 must be TRUE:

Growth is repeatable & channel-driven. CAC, LTV, payback period — understood unit economics.
Product handles production workloads. Infrastructure hardened. Security & compliance in order. Reliability holds under real conditions.
Ops run without founder bottlenecks. Processes exist. Automation in place. Not personally handling support, triage, sprint planning, reporting.

📈 Stage 4

Scale Stage

Sustainable Without the Founder

The product is still central, but the founder's attention expands to the company itself. Build defensible moats, mature operations, and organizational infrastructure that withstands scrutiny.

What We Do

Delegate the operational layer: Bottleneck map — what stalls if the founder is unavailable for a week? Those are the handoff candidates.
Scale tech into enterprise-grade: Documentation, SLAs, support playbooks, monitoring, incident response. Discord support ≠ enterprise support.
Build GTM function: Market segmentation, analyst relations, sales playbooks, content pipelines, CRM hygiene. Founder hustle ≠ sustainable growth engine.
Compound domain expertise into moat: Encode edge cases into the product. Every domain-specific gotcha is a test case. Test suite = map of moat.
Create workflow lock-in: Integrations, automations, trained teams. The deeper the product embeds in daily operations, the harder it is to leave.

🎯 Exit Gate — Scale → Sustainable Business

Systematic, Auditable, Defensible

Systematic growth: Not founder-dependent. Auditable metrics.
Organizational maturity: Governance, compliance, financial controls withstand external scrutiny.
Defensible moat: "If a well-funded incumbent copied your product today, would your users stay?"

Typical exit forms: sustainable profitability, IPO-readiness, or acquisition.

Danger Zones

How AI-Native Startups Fail

Failure Mode	Stage	What It Looks Like	Our Defense
Mistaking building for validating	Idea	Prototype becomes "proof" without user conversations	Stage gate: conversations are the evidence
Confirmation bias on steroids	Idea	AI finds evidence for whatever you ask it to validate	War Room adversarial check at every gate
Premature scaling	Idea/MVP	Building ahead of demand because it's easy	Scope definition before build; feature amendment criteria
Agentic tech debt	MVP	Codebase drifts — no coherent mental model, pieces never designed to fit together	CLAUDE.md / README updated every session; 5-min doc rule
False PMF	MVP→Launch	Launch energy confused with genuine retention	Sean Ellis 40% test, effort test, false positive definitions before data
Zero-friction scope creep	MVP	Every feature takes an afternoon, so why not?	Written scope: what it does NOT do + evidence threshold for additions
Founder bottleneck	Launch	Decisions pile up because only the founder can make them	Bottleneck audit: what stalls if you're gone a week?
Insecure by inexperience	MVP	Functional code ≠ secure code. No natural feedback loop for vulnerabilities.	Security review before any user touches it
Expansion before readiness	Launch	New markets kill PMF — too many variables, lose ability to read data	Nail original market first. Expand only with data.

Decision Framework

Pivot Criteria

If evidence doesn't support the current direction after 3+ iteration cycles, run a diagnostic — not a hope cycle.

🔍 3-Iteration Diagnostic

Three Questions to Answer

Is there a segment responding differently? Often the right audience is already in your data, just underweighted.
Is the gap a positioning problem or a product problem? Adjust messaging vs. rebuild.
What would have to be true for PMF? Is that scenario realistic given what you're seeing?

Let the answers determine: adjust, pivot, or return to Idea stage.

Defensibility

The Moat Playbook

At scale, the question is: "If a well-funded incumbent copied your product today, would your users stay?" Build moats deliberately from day one.

🏗️

Accumulated Depth

Edge cases, domain logic, industry gotchas that competitors can't replicate. Every bug workaround, every 340B-style exception = a test case = a map of your moat.

🔗

Integration Depth

Workflows built on top of your product: automations, trained teams, connected data sources. Switching becomes an operational project, not a product decision.

📊

Data Flywheel

User behavioral signals compound improvement. Time-locked, context-specific, impossible to buy. A well-resourced competitor starting today can't replicate 2 years of usage data.

🧩

Workflow Lock-In

Prompts, automations, standardized outputs shaped around your product. The more integrations, the more surface area for building moat. APIs, webhooks, SDKs.

Sources

Sources & Integration

Source	What We Adopted	Where It Lives
Anthropic Founder's Playbook (May 2026)	Stage gates, adversarial validation protocol, agentic tech debt concept, measurement framework, moat playbook, scope definition discipline	GBrain: founders-playbook-2026
OPC Operating Model	M~=Vision, Zeus=MD+Hands+Devil's Advocate. Founder orchestrates, agents execute.	Core memory (L1)
3-Tier Memory Architecture	4-layer knowledge persistence. GBrain → Memory → README → log.md. Duplication test.	Skill: 3-tier-memory
FEASY Room	8-lens feasibility scoring + radar chart. Run at Idea gate.	Skill: opportunity-scorecard
War Room (6 Hats)	Full-spectrum perspective coverage. Run at every gate.	Skill: opc-war-room
SCAMPER Room	7 mutation verbs for ideation. Post-validation, not pre-.	Skill: scamper-room
Brainstorming Hard Gate	No skipping to implementation. Always design before build.	Skill: brainstorming
Stage Gates	Explicit exit criteria per stage. Adversarial check mandatory.	Skill: startup-stage-gates
Verification Before Completion	No claiming done until evidence confirms.	Skill: verification-before-completion

The Compound Insight

Every project runs the same operating system. KNQX validates the gates, FEASY scores the ideas, War Room pressure-tests the decisions, GBrain remembers everything. The methodology gets sharper with every project — that's the real moat for our OPC.