7 Principles for Using AI Agents Safely in Production

The Problem

When you use Claude Code, Gemini Code Assist, and GitHub Copilot in parallel, you eventually realize: AI is so convenient that invisible holes accumulate without notice.

API key overwrites, hallucination loops, auto-post spam... These are all hidden defects embedded during AI-assisted development.

This post shares the 7 AI development principles I use in my solo SaaS project Jibun Kabushiki Kaisha (Flutter Web + Supabase).

The 7 Principles

Principle 1: Auth Layer (Single Source of Truth)

// ❌ Bad: API keys scattered across files
const key = Deno.env.get("OPENAI_KEY") || "fallback-value";

// ✅ Good: One source of truth
const getApiKey = (provider: string) => {
  const key = Deno.env.get(`${provider.toUpperCase()}_API_KEY`);
  if (!key) throw new Error(`${provider} API key not configured`);
  return key;
};

AI assistants tend to "helpfully" add fallback values or duplicate key fetching. A single source of truth makes these overwrites immediately visible.

Principle 2: Deny-by-default Security

// Add auth + rate limit from day 1, not "later"
const { data: { user } } = await supabase.auth.getUser();
if (!user) return new Response("Unauthorized", { status: 401 });

AI-generated code defaults to open access. Deny-by-default flips this.

Principle 3: Trace-based Observability

const traceId = crypto.randomUUID();
const startTime = Date.now();
const result = await callAI(prompt);
const elapsed = Date.now() - startTime;

if (elapsed > 5000) {
  console.warn(`[${traceId}] Slow AI call detected: ${elapsed}ms`);
}

Without trace_id and timing, you can't debug AI call failures in production.

Principle 4: Cost Circuit Breaker (4 tiers)

const LIMITS = {
  request:  0.10,  // $0.10 per request
  agent:    1.00,  // $1.00 per agent run
  business: 10.00, // $10.00 per day
  platform: 50.00, // $50.00 per month
};

An infinite loop in an AI agent without circuit breakers = surprise cloud bill.

Principle 5: Team Memory + Effectiveness Score

Track which prompts work and which fail. Automatically decay low-scoring patterns so the agent gets smarter over time instead of repeating mistakes.

Principle 6: Checkpoint + Retry + Dead Letter Queue

// Save intermediate state before each step
await supabase.from("job_checkpoints").upsert({
  job_id: jobId,
  step: "generate",
  data: generatedContent,
});

Long AI processes crash. Without checkpoints, you restart from zero.

Principle 7: Quality Gate (Sentinel + Warden)

Two-layer check before any AI output goes public:

Sentinel: fact verification (hallucination detection)
Warden: quality scoring (>70% threshold)

The Scoring System

For each new AI feature, score it on all 7 principles:

6+ ✅ → Ship it
4-5 ✅ → Redesign first
3 or fewer → Reject or major rework

My Current Scores

Feature	Score	Gap
ai-assistant Edge Function	5/7	Memory + Quality Gate missing
competitor-monitoring	3/7	Needs circuit breaker + retry + memory
blog-publish	2/7	Quality gate + circuit breaker critical

Low-scoring features get improved incrementally rather than blocked entirely.

Key Insight

AI tools dramatically accelerate development speed. But invisible defects (key overwrites, hallucination loops, auto-post spam) carry a huge hidden cost.

The 7 principles aren't about perfection — they're a checklist habit that catches the most dangerous failure modes before they hit production.

Building in public: https://my-web-app-b67f4.web.app/