TechBlast - Tech News for Builders and Operators

If you read my earlier post on the Skill Vault pattern, you know I cut Claude Code's per-session overhead by 96%. After living with it for a few weeks, I went looking for what was still eating tokens — and found three more sinks worth killing.

This is the follow-up: smaller wins individually, but together they shaved another ~5,000 tokens off every single session with zero capability lost.

Where the Tokens Were Still Hiding

After the vault, my baseline was around 51K tokens per session. I dug into the system prompt to see what remained:

Skill list (still): ~3K tokens
Project CLAUDE.md: ~2K tokens
claude-mem auto-injected timeline: ~2K tokens
Plugin hook reminders: ~1K tokens (some recurring per turn)

Three of these were either unnecessary or way overweight. Here's how I killed each.

Sink #1: A Bloated Root CLAUDE.md (~1.5K tokens)

CLAUDE.md files auto-load into every session that touches the repo. Mine had grown to 326 lines / 8 KB with quickstart instructions for every component:

Flutter dev commands
Backend npm scripts
React build steps
ML service Python setup
A duplicate gstack skill listing

The problem: I was loading every component's instructions on every turn, even when I was only working in one of them.

Fix: Hierarchical CLAUDE.md Files

Claude Code loads CLAUDE.md hierarchically — only the ones in your current working tree get pulled in. So instead of one fat root file, I split it:

khetisahayak/
├── CLAUDE.md                    # 55 lines — overview, ports, creds only
├── kheti_sahayak_app/CLAUDE.md  # Flutter details
├── frontend/CLAUDE.md           # React details
├── ml/CLAUDE.md                 # ML service details
└── kheti_sahayak_backend/CLAUDE.md  # already existed

The root now contains only:

Project overview (5 lines)
Service ports table
Test credentials
Cross-cutting auth + DB notes
Troubleshooting

Component-specific details only load when I'm actually working in that subdir.

Result: root CLAUDE.md trimmed from 326 → 55 lines (8 KB → 1.6 KB). ~1.5K tokens saved per session.

Sink #2: Plugin SessionStart Hooks Injecting "Helpful" Context

I use claude-mem for persistent memory across sessions. Genuinely useful. But it has a SessionStart hook that auto-injects a timeline of recent observations at the top of every conversation — about 50 entries, ~2K tokens:

S499 Indeed Auto-Apply — User asked how to automate job applications
S498 Indeed MCP Integration Query — clarifying JobSpy vs Apify
2337 12:33p ✅ systemd/install.sh — Backend Services Enabled
... 47 more lines

I almost never needed this auto-recall. When I want past context, I call mem-search explicitly.

Fix: Disable the Auto-Inject, Keep the Memory

The hook config lives at:

~/.claude/plugins/cache/thedotmack/claude-mem/<version>/hooks/hooks.json

The SessionStart array has three hooks. The third is the timeline injection:

{
  "type": "command",
  "command": "... node bun-runner.js worker-service.cjs hook claude-code context ..."
}

I removed just that one entry. The other two SessionStart hooks (install + worker-start) and the recording hooks (PostToolUse, Stop, SessionEnd) stay intact, so memory is still being captured. The MCP search server still works — I just have to ask for it.

# Always back up before editing plugin internals
cp hooks.json hooks.json.bak
# Remove the 3rd SessionStart hook (jq or manual edit)

Result: ~2K tokens saved per session. Memory still works on demand.

⚠️ Caveat: editing a file in ~/.claude/plugins/cache/ will be overwritten on plugin upgrade. For durability, mirror the change in your user-level ~/.claude/settings.json hooks block, or add a small post-upgrade re-patch script.

Sink #3: Round 2 of the Skill Vault (~1.4K tokens)

The original vault was a one-time bulk move. After several weeks of actual usage, I saw which skills I'd installed and never touched. The vault was overdue for a second pass.

Audit script (same as the first post, run again):

for f in ~/.claude/skills/*/SKILL.md; do
  dir=$(dirname "$f")
  name=$(basename "$dir")
  size=$(wc -c < "$f")
  echo "$size $name"
done | sort -rn | head -30

I ended up vaulting 27 more skills out of 73 actively loaded:

8 marketing skills (mkt-content, mkt-seo, mkt-social, etc.) — I do real marketing in a separate context
6 research skills (research, research-deep, research-report, etc.) — episodic, not daily
5 niche tools (obsidian-vault, make-pdf, pair-agent, setup-browser-cookies, open-gstack-browser)
4 design-heavy (design-consultation, design-html, design-shotgun, devex-review) — restored only when designing
3 plan reviews (plan-design-review, plan-devex-review, plan-tune)
1 interview prep (staff-engineer-interview) — used a few times last quarter, not weekly

Bulk move:

for s in mkt-content mkt-email mkt-growth mkt-pr mkt-review mkt-seo mkt-social cmo \
         research research-add-fields research-add-items research-deep research-report edit-article \
         staff-engineer-interview \
         obsidian-vault make-pdf pair-agent setup-browser-cookies open-gstack-browser \
         plan-design-review plan-devex-review plan-tune \
         design-consultation design-html design-shotgun devex-review; do
  mv ~/.claude/skills/$s ~/.claude/skills-vault/ 2>/dev/null
done

The skill-vault index skill from the original post still bridges everything — Claude knows where each skill lives and restores it on demand.

Result: 73 → 46 active skills. ~1.4K tokens saved.

The Combined Result

Fix	Tokens Saved (per session)
Sink #1 — CLAUDE.md trim + per-component split	~1.5K
Sink #2 — claude-mem timeline disabled	~2K
Sink #3 — Round 2 skill vault	~1.4K
Total	~4.9K

On top of the original 96% reduction from the Skill Vault, this is another solid bite. But honestly, the dollar value isn't the point.

Why This Matters Beyond Cost

Every token in your context is a token Claude has to attend over before generating its response. The bigger your prompt, the more diluted attention becomes on the actual task.

Big context ≠ better answers. Frequently it's the opposite. Targeted context wins.

The pattern across all three fixes is the same:

Audit what's auto-loaded vs. what's actually useful. You'll be surprised.
Move episodic content out of the always-on path and into on-demand access.
Trust the model to pull what it needs — when it does need the vaulted skill, the subdir's CLAUDE.md, or the memory search, it will reach for it.

Less ambient noise. Sharper signal.

Tips for Your Own Audit

Inspect, don't guess. wc -c your CLAUDE.md files. ls ~/.claude/skills/. Read your plugin hooks.json files line by line.
Hierarchy is free. Per-directory CLAUDE.md files cost nothing when you're not in that directory.
Plugin hooks are debt. Every SessionStart or UserPromptSubmit hook is a tax. Audit them by hand. Some are essential (auth, telemetry); some inject "context" that's just clutter.
Re-vault every few weeks. Usage patterns shift. Skills that were daily three months ago may be quarterly now. Yesterday's must-have is tomorrow's vault candidate.
Watch for per-turn taxes. A UserPromptSubmit hook costs N tokens every single turn. Even a small reminder block adds up fast in a long session.

Conclusion

The Skill Vault pattern is still the heaviest hitter. These three follow-ups are the long tail:

CLAUDE.md → split per component, slim the root.
Plugin hooks → audit auto-injected context. Disable what you don't need.
Skill vault → revisit it. Vault more.

Together: another ~5K tokens per session, zero capability lost.

When your context is tight, your model is sharp.

I'm Prakash Ponali, a Staff Engineer with 16+ years in enterprise eCommerce. Currently building Khetisahayak — a farming helper app for Telugu-speaking farmers in Andhra Pradesh. Find me on LinkedIn.