The LLM Dependency Test: A New Way to Interview Software Engineers in the Age of AI

Tags: ai, career, security, productivity

The Pentagon recently discovered that it could not comply with its own Secretary of Defense's direct order to remove an AI tool from its weapons targeting system. Not because the order was classified. Not because of a bureaucratic delay. Because the targeting workflows were so deeply embedded in that single commercial AI that the military — with a $900 billion annual budget and the entire US defense industrial base behind it — literally could not finish the job without it.

The same week, Pentagon staff resorted to Microsoft Excel to handle tasks previously managed by the AI.

This is not a story about the Pentagon. This is a story about every software team that has quietly built itself into the same trap — just at a smaller scale and with lower stakes.

The Problem Nobody Is Naming

There is a growing and largely unacknowledged skill crisis forming underneath the surface of AI-assisted software development.

A generation of engineers is learning to build with AI as a first-class team member. They are shipping features faster, writing tests more confidently, navigating unfamiliar codebases with ease. By every observable metric, they are more productive than engineers who came before them.

But strip away the AI — network outage, service disruption, vendor dispute, policy change — and a disturbing number of them cannot finish what they started.

The problem is not that they use AI. The problem is that the AI has become load-bearing infrastructure in their cognitive workflow. The understanding of what is being built, the reasoning behind architectural decisions, the ability to close the last 10% of a project under pressure — all of it has migrated into the chat window.

When the chat window goes dark, so does the team.

The Horror Story Is Real

On March 17, 2026, Claude went down for roughly five hours. Over 6,800 users reported problems. Developers working in Claude Code described it as a "snow day." They were mid-project. They stopped. (Source: 6,800 users report Claude AI down in major today outage, Rolling Out, March 17, 2026)

That is the benign version of the story. A team misses a deadline. A deployment slips. A demo gets rescheduled.

The catastrophic version is Palantir's Maven Smart System — a billion-dollar defense platform for intelligence analysis and weapons targeting — built so thoroughly on Claude Code prompts and workflows that recertifying it with a replacement model will take twelve to eighteen months according to defense contractors. Meanwhile the military is using it anyway, in an active conflict, in defiance of its own Secretary's order, because there is no alternative ready.

"Removing Claude will be a major undertaking. For example, Palantir's Maven Smart Systems — a software platform that supplies militaries with intelligence analysis and weapons targeting — uses multiple prompts and workflows that were built using Anthropic's Claude Code... Palantir will have to replace Claude with another AI model and rebuild parts of its software."

— Reuters / Military Times, Hegseth wants Pentagon to dump Claude, but military users say it's not so easy, March 19, 2026

"Tasks previously handled by Claude, such as querying large datasets for information, are in some cases now being done manually with tools such as Microsoft Excel."

— Reuters / U.S. News, Hegseth Wants Pentagon to Dump Anthropic's Claude, but Military Users Say It's Not So Easy, March 19, 2026

"An internal Pentagon memo said use of Anthropic's tools may continue beyond the six-month period if deemed 'mission-critical' with no viable alternative."

— CNBC, Palantir is still using Anthropic's Claude as Pentagon blacklist plays out, March 12, 2026

The underlying engineering failure is identical in both cases. A single external dependency became load-bearing. No fallback was built. The humans forgot how to execute without the tool.

Introducing the LLM Dependency Test

What if we could identify this problem before we hire — or before we deploy — rather than discovering it at the worst possible moment?

Here is a proposed interview format that directly measures the skill that actually matters:

Phase 1 — AI-Assisted Development

The candidate begins working on a novel software project with full access to their preferred LLM assistant. The project is unique to each candidate and each session. The AI helps them build. The candidate directs, reviews, and integrates the output. This phase continues for a set window of time — say, sixty to ninety minutes.

Phase 2 — The Cutoff

At a moment chosen at random within Phase 1, the AI is cut. No warning. No graceful transition. The service simply becomes unavailable, exactly as it would in a real outage.

Phase 3 — The Finish

The candidate must complete the remaining work without any LLM assistance. They have access to documentation, Stack Overflow, their own notes — everything a working engineer would have. Just not the AI.

The Evaluation

The test measures two things simultaneously:

First, what did the candidate do before the cutoff? Did they write clear comments? Did they commit incrementally? Did they ask the AI clarifying questions that forced explicit specification? Or did they passively accept generated output without building their own understanding of it? Their behavior during the AI-assisted phase reveals their architecture instincts.

Second, what do they do after the cutoff? Do they panic or shift gears? Can they read the AI-generated code they were steering and continue it coherently? Can they close the gap between where they are and a working deliverable?

A candidate who passes is not just good at using AI. They are good at engineering. The AI made them faster. Their fundamentals make them resilient.

What the Test Is Really Measuring

This test does not measure raw coding speed. It does not measure prompt engineering skill. It does not measure whether a candidate has memorized syntax or API signatures.

It measures mental model quality.

If a candidate genuinely understood the project as it was being built — if they were directing the AI rather than following it — then the cutoff is an inconvenience. They know what remains. They know why each piece exists. They can continue.

If the candidate was watching the AI generate and clicking accept, the cutoff is a wall. They have working code they do not understand, a half-finished project with undocumented reasoning, and no map forward.

The test surfaces the difference in about fifteen minutes.

The Architectural Principle Behind the Test

The insight the test is built on is simple: the dependency on AI is not the problem. The architecture of that dependency is.

A surgeon using a robotic system is not helpless when the system malfunctions — because their manual surgical skills are maintained. The robot made them more precise, not more dependent. Their training preserved the fallback.

An engineer who builds with AI as acceleration on top of solid fundamentals is not helpless when the AI goes down. The AI made them faster, not dependent. Their fundamentals preserved the fallback.

The test identifies which kind of engineer you are hiring. Not by asking. By showing.

Why This Matters More in Security Engineering

For security engineers specifically, the stakes of AI dependency are compounded.

AI coding agents introduced into security workflows — for code review, vulnerability scanning, threat modeling — generate working outputs that can appear correct while containing subtle flaws. The DryRun Security study from March 2026 found that Claude Code, OpenAI Codex, and Google Gemini all introduced broken access control, OAuth implementation failures, and business logic vulnerabilities into every application they were tasked to build from scratch.

A security engineer who cannot independently audit AI-generated code is not a security engineer. They are a human rubber stamp on an AI output pipeline.

The LLM Dependency Test applied to a security engineering candidate would reveal immediately whether they can actually read and reason about code — or whether they can only steer an AI that reads and reasons for them.

In security, that distinction is the difference between a defended system and a breach waiting to happen.

The Second-Order Effect

Here is the part of this proposal I find most interesting.

The moment a test like this exists and becomes known in the industry, it changes how candidates prepare. Engineers who know they will face a mid-project AI cutoff in their interviews cannot afford to let their fundamentals atrophy. They have to actually build their skills without the AI, not just alongside it.

The test does not just filter for the right candidates. It shapes the behavior of the candidate pool before anyone sits down to take it.

Most interview formats test for skills that candidates develop in order to pass the interview. This test forces candidates to develop the skill that protects them — and the teams that hire them — for the rest of their careers.

A Note on What This Is Not

This is not an argument against using AI tools. Engineers who use AI assistants well are genuinely more productive. The data is clear on that.

This is an argument for using AI tools in the right architectural relationship — as acceleration on top of maintained human capability, not as a replacement for it.

The Pentagon did not fail because it used AI. It failed because it forgot to remain capable without it. The distinction is everything.

The Challenge to the Industry

If you are running engineering interviews in 2026, consider adding a version of this test to your process. The implementation details are yours to design — the cutoff timing, the project scope, the evaluation rubric. But the core structure is sound.

If you are preparing for engineering interviews, consider what it would mean to face this test unprepared. Then build accordingly.

And if you are an engineering manager who has watched your team slow-roll to a halt every time a major AI service goes down — you already know what this test is measuring. The question is whether you hire for it before the next outage, or discover the gap during one.

This post is based on a conversation about AI-assisted software engineering, the March 2026 Anthropic-Pentagon dispute, and what the engineering profession is not yet asking about AI dependency. The LLM Dependency Test concept was proposed by Tanveer Salim.

Discussion prompts:

Have you experienced an AI outage mid-project? What happened?
Would you pass the LLM Dependency Test today?
Should this become a standard part of engineering interviews?

The LLM Dependency Test: A New Way to Interview Software Engineers in the Age of AI

The LLM Dependency Test: A New Way to Interview Software Engineers in the Age of AI

The Problem Nobody Is Naming

The Horror Story Is Real

Introducing the LLM Dependency Test

What the Test Is Really Measuring

The Architectural Principle Behind the Test

Why This Matters More in Security Engineering

The Second-Order Effect

A Note on What This Is Not

The Challenge to the Industry

Comments (0)

United States