Threat Modeling on brain overflow

AI-Native Threat Modeling

Wed, 20 May 2026 11:29:02 -0700

When I ask hiring managers why they’re opening a product security role, the answer is usually the same: we can’t keep up. Development org grew, product surface expanded, and the security team is the bottleneck. It’s not a problem unique to any one organization — it’s the default state of product security. AI-accelerated development and vibe coding are making it worse: more code, shipped faster, with the same security team trying to keep up. The conventional wisdom is that vibe coding is a killer for AppSec — and on the current trajectory, it is.

In this post, I argue that linear scaling won’t solve that problem, and make the case that AI-generated code, treated the right way, can be a force multiplier for security.

1. The AppSec Scaling Problem

The 1:100 ratio — one AppSec engineer for every hundred developers — is the number the industry has quietly accepted as roughly accurate for mature organizations. It sounds manageable until you sit with what it means in practice: a team of five reviewing the output of five hundred, under sprint pressure, across a surface that keeps growing. It’s a demanding job — I wrote about what it actually takes.

The standard response is to hire more security engineers. That’s reasonable when the ratio is temporarily out of balance, but it doesn’t address the structural problem. If the development org doubled and the security team grew from five to ten, you’re at the same ratio. And the ratio assumes a roughly stable development velocity. AI coding assistants are shattering that assumption.

Developers using GitHub Copilot, Cursor, or Claude Code ship more, faster. Vibe coding — letting the model write code from a high-level natural language prompt — compresses timelines further. Features that took two weeks take days. The code surface is expanding at a rate that’s no longer proportional to engineering headcount, which means the AppSec scaling problem is now a two-sided function: development velocity increasing, security team capacity roughly flat. The gap is structural, and it is getting wider.

2. Where Traditional Approaches Break Down

The vocabulary for addressing the AppSec scaling problem is well developed: shift-left, secure-by-design, developer enablement, creating paved roads. They’re not wrong ideas. The problem is that they all require the same scarce resource: AppSec time.

Threat modeling — the recommended practice for high-risk features — is the clearest example. The canonical process: the development team writes a design document; the security team (or a joint session) works through the STRIDE framework or similar, maps data flows and trust boundaries, produces a model; there’s back-and-forth and eventual sign-off. This is genuinely valuable when it happens. In practice, it often doesn’t — the process is time-consuming, and AppSec time is scarce.

What actually happens is one of three failure modes:

Delay — security reviews become release blockers, friction accumulates, relationships with engineering teams deteriorate.
Risk-accept — features ship with “accepted risk” security exceptions that go into a backlog and are rarely revisited.
No review at all — code ships without security involvement, entire product areas built and deployed without the security team ever being in the loop.

With AI now compressing time-to-exploitation — public vulnerabilities can have working proof-of-concept code within hours — the third option is no longer a viable gamble.

Security code reviews have the same structural problem one step later: someone writes code, another team reads it, back-and-forth, sign-off. Every handoff is a scheduling dependency that adds release latency.

3. The Threat Model Maintenance Problem

There’s a second-order problem with threat modeling that gets less attention than the initial production cost: drift.

A threat model is created as a snapshot, but the system keeps evolving. New endpoints added, authentication flows refactored. Six months after a threat model is signed off, it describes a system that no longer looks the same. The question of who owns maintenance is usually a gray area: the development team didn’t write the model and isn’t trained to maintain it; the security team is not aware of changes and has to context-switch back into a system they last looked at months ago. Neither path works well in practice.

Most organizations treat the threat model as a gate the security team required at feature launch — it was produced, the box was checked, and maintenance was never part of the contract. It documents what the system looked like at one point in time and then quietly expires.

4. The Key Insight

Here’s where the mental model needs to shift.

In the current workflow, threat modeling is derivative work: a security person reads what a developer built and reconstructs the security-relevant picture from it — after the fact, inherently lossy, potentially inaccurate, and always one step behind.

Open-source projects such as Tachi and several commercial offerings recognize this and offer tools that automate the reconstruction: read the codebase, analyze diffs, apply a methodology, output a structured model. These tools are useful, but they’re still doing the same derivative work, just faster — reverse-engineering security structure from existing code rather than having a human do it. There’s also a cost dimension: analyzing an existing codebase means feeding it back through an LLM as new input, which is expensive at scale. The larger and more frequently updated the codebase, the higher the token cost of each analysis pass.

Now consider what changes when AI is writing the code — through vibe coding, spec-driven development, AI-generated scaffolding from a design document, or an agentic coding loop that implements a full feature end-to-end.

It doesn’t reverse-engineer anything — it knows, because it built it: every data flow it designed, every entry point it created, every asset it touched, every trust boundary it crossed or established, every authentication decision it made. The complete map required for a threat model exists as a natural byproduct of the design work the AI just did — and it exists at the moment of creation, not after. And because that context is already in the model’s working window, generating the threat model alongside the code is parallel effort on the same inputs, with little additional token cost.

The consequence of this observation is straightforward: threat models should be generated alongside code, as first-class artifacts, not assembled later as derivative documents.

gantt
    title 1. Current — human-driven, sequential
    dateFormat YYYY-MM-DD
    axisFormat %d
    section Developer
    Design doc            :a1, 2024-01-01, 3d
    Write code            :a2, 2024-01-04, 3d
    section Reconstruct (Security)
    Reconstruct & model   :a3, 2024-01-07, 3d
    section Review (Security)
    Review & sign-off     :a4, 2024-01-10, 2d

gantt
    title 2. AI-assisted — LLM writes code, LLM reads code
    dateFormat YYYY-MM-DD
    axisFormat %d
    section Developer
    Generate code         :b1, 2024-01-01, 3d
    section Reconstruct (AI-assisted)
    LLM reconstructs TM   :b2, 2024-01-04, 2d
    section Review (Security)
    Review & sign-off     :b3, 2024-01-06, 2d
    section Time saved
    time saved            :done, 2024-01-08, 4d

gantt
    title 3. AI-native — code and threat model in parallel
    dateFormat YYYY-MM-DD
    axisFormat %d
    section Code
    Generate code         :c1, 2024-01-01, 3d
    section Threat Model
    Generate threat model :c2, 2024-01-01, 3d
    section Review (Security)
    Review & sign-off     :c3, 2024-01-04, 2d
    section Time saved
    time saved            :done, 2024-01-06, 6d

Accuracy improves — the model is a direct output from the entity that designed the system, not a reconstruction. Maintenance improves because every code change can regenerate or update it in the same operation; the entity making the change already knows what changed and why. The multi-step, multi-team back-and-forth collapses into a single step. Security practitioners remain in the loop — for methodology, formal sign-off, challenging assumptions the AI didn’t surface — but the labor-intensive baseline work of constructing the model moves from a human bottleneck to an automatic output.

This is what “shift-left” should actually mean: not have the security team review earlier, but produce the security model at the same moment the system is designed. The security artifact is contemporaneous with the code, not chasing it.

5. On Model Bias in Security Analysis

A legitimate concern about this approach is AI model bias. There’s a well-documented pattern in AI-assisted security review: when a model writes code and is then asked to evaluate it for security in the same context window, it tends to anchor to its own design decisions, finding reasons why its choices are sound rather than challenging them. An independent reviewer operating from a fresh context — a second model, or a human who didn’t write the code — is more likely to surface issues the original author missed. This is a real limitation, and it applies directly to using AI for code security review.

The core distinction here is that code security review and threat modeling are quite different. A security review asks the model to evaluate whether its own implementation is correct and secure — the question where anchoring bites hardest, because the model is judging choices it already committed to. A threat model asks something structurally different: document the architecture, establish trust boundaries, map data flows and assets, then apply a framework like STRIDE that poses a fixed set of questions across threat categories. The framework is external to the code; its questions don’t change based on how well or poorly the implementation is written. The question it asks — given what this system does, what can go wrong in each of these categories? — is answered from the architectural map, not from a judgment about implementation quality.

What bias could still affect is the model’s assessment of severity — an AI that made a particular design trade-off might rate the resulting risk lower than an independent reviewer would. That’s a real concern, and it’s exactly why human review of the model’s outputs and assumptions is still valuable in this workflow.

6. Why Threat Modeling Still Matters

A reasonable objection at this point: if AI writes the code, why not just ask it to write secure code and skip the threat model entirely? We should absolutely ask for that — but threat modeling serves purposes that “write secure code” doesn’t address.

Security architecture documentation. Threat models capture architectural decisions and their security implications: trust boundaries, data classifications, what the system assumes about its environment, where the blast radius of a failure ends. These don’t live in code. A system can be implemented correctly while making architectural trade-offs that accept certain risks; those trade-offs need to be explicit, owned, and findable.

Known gaps and accepted risks. Every system ships with tradeoffs — incomplete defenses, deferred work, risks that were evaluated and accepted. A threat model makes these explicit: here is what we considered, here is what we’re not defending against, and here is why. This matters for accountability, for prioritization, and for the engineer who joins the team six months from now.

Compensating controls. Good security architecture is layered. WAF rules, rate limiting, network segmentation, monitoring and alerting — these don’t live in application code, but they’re part of the security posture. The threat model is where they’re connected to the threats they compensate for. This is also where code-analysis-based automated tools tend to generate false positives: they see the change in isolation, unaware of the external controls that already mitigate a given risk.

Compliance requirements. SOC 2, PCI-DSS, ISO 27001, HIPAA, and similar frameworks require documented evidence of threat analysis. Auditors want artifacts. A threat model that exists and is demonstrably current — generated from the same codebase it describes — is a far stronger compliance artifact than one that was carefully written at launch and hasn’t been touched since.

Incident response preparation. When something goes wrong — and eventually something does — a current threat model tells you what’s at risk, what attacker paths exist, and what to prioritize. You want this analysis done before the incident, not during it.

Stakeholder communication. Engineering leadership, legal, product, and board-level security committees need to understand risk in terms they can act on. The codebase doesn’t serve this purpose; a structured threat model does.

The case for threat modeling doesn’t weaken when AI writes the code — if anything, AI makes the security artifacts cheaper to produce, easier to keep current, and more consistently complete than the human-driven alternative.

Final thoughts

I think this is the direction the AI coding toolchain is already moving toward, even if the full vision hasn’t arrived yet. AI coding tools are increasingly integrating security into the development workflow: GitHub Copilot’s real-time vulnerability detection during code generation, Claude Code’s security analysis during code review, Replit’s Security Agent in the development environment. None of these offer AI-native threat model generation, but they signal that the industry is treating security as something the coding tool prioritizes and produces alongside code. The extension of that to living, maintained threat models is the logical next step.

The reframe for ProdSec practitioners is this: stop thinking of threat modeling as a process your team performs on code that developers write. Start thinking of it as an artifact the AI coding assistant produces alongside the code, which your team validates, challenges, and signs off on. The security team’s job shifts from construction to judgment — which is where human expertise actually compounds.

The dreaded 1:100 ratio won’t disappear. But the work of constructing and maintaining the threat model doesn’t have to stay a human-hours problem. The needle can move — but only if the security team’s role evolves with it.

TrustFall: The Perimeter Problem in Agentic Tools

Mon, 18 May 2026 10:00:00 -0700

On May 7, 2026, Adversa AI published TrustFall — a one-click remote code execution in Claude Code, with variants across Gemini CLI, Cursor, and GitHub Copilot: clone a repository, open it, click “Yes, I trust this folder”, and an attacker-controlled process runs with your full OS privileges.

Anthropic declined the finding as outside their threat model and the behavior as functioning by design. This post digs into that response — and argues that the core issue is architectural: a perimeter security model that can’t carry the weight placed on it, and that makes the vulnerability structurally hard to surface through threat modeling. It looks at what a zero trust alternative would look like for agentic tools.

1. The vulnerability

Project-level config files — .mcp.json and .claude/settings.json — committed to a repository can activate attacker-controlled MCP servers. When a developer clones the repo and clicks through the trust dialog, an unsandboxed Node.js process spawns with full user OS privileges — no further prompts. Three developer actions: clone, open, click. The attack also has a zero-click CI/CD variant where Claude Code runs headless in GitHub Actions against an untrusted pull request branch, bypassing the trust dialog entirely.

For the full technical details see TrustFall.

In threat modeling terms, the mechanism is a trust boundary problem. Project-scope config — committed to a repository — can activate MCP servers: external processes that run with full user OS privileges. The gate between untrusted repo content and those privileges is a single user prompt:

flowchart TB
    subgraph Untrusted["Untrusted · attacker-controlled"]
        PCfg["Project config\n(committed to the repo)"]
    end

    CLI["Claude Code CLI"]

    subgraph Gate["Trust gate"]
        Prompt["'Do you trust this folder?'\nsingle Yes/No"]
    end

    subgraph Privileged["Privileged · full user OS access"]
        MCP["MCP Server process\nNode.js · no sandbox"]
        OS["~/ · ~/.ssh · ~/.aws\nread / write / exec — unrestricted"]
    end

    PCfg --> CLI
    CLI --> Prompt
    Prompt -- "on 'Yes'" --> MCP

That single Yes/No covers four distinct capability grants:

Claude reading and editing project files (clearly implied)
Claude following project-level behavioral settings (reasonable)
Activating MCP servers defined in project config (not stated)
Those servers running as unsandboxed processes with full user privileges (definitely not stated)

Two of these carry significant security consequences — and neither appears in the prompt language. The gap between what the user consents to and what the system delivers is a textbook Elevation of Privilege (the E in STRIDE methodology): the subject grants more than they know.

The pattern holds across the tools TrustFall examined — Gemini CLI and Cursor do mention MCP servers in their consent language, Claude Code and Copilot don’t, but all four default to Yes or Trust.

2. “Outside our threat model”

Anthropic’s phrase is worth examining literally. Per TrustFall’s analysis, the missing enforcement isn’t outside Anthropic’s defined boundary — it’s inside it. The trust dialog is the perimeter; what happens after it is, by their own framing, the trusted zone. “Outside our threat model” means, in practice: inside our perimeter, but below the granularity we protect at.

That granularity isn’t uniformly coarse — the capability is demonstrably known. bypassPermissions gets a dedicated warning because it is dangerous; enableAllProjectMcpServers, enabledMcpjsonServers, and permissions.allow activate equally dangerous behavior without equivalent disclosure. TrustFall also notes that earlier versions of Claude Code included an explicit MCP consent prompt that was later removed. These are the tell-tale signs of a threat model that’s coarser than the reality it represents — some dangerous capabilities are visible enough to gate explicitly, others slip through the same boundary unexamined.

That pattern of selective gating is further undermined by the CVE record. Anthropic’s response to TrustFall was that the behavior functions as designed — clicking “trust this folder” means accepting the project’s configuration, MCP servers included. Yet over six months before TrustFall, Anthropic patched three related vulnerabilities: delayed MCP activation until after the trust dialog (CVE-2025-59536, Oct 2025), blocked ANTHROPIC_BASE_URL from project scope (CVE-2026-21852, Jan 2026), and blocked bypassPermissions from project scope (CVE-2026-33068, Mar 2026) — the same setting that already carried a UI warning. Each fix adds a specific gate or blocklist entry — the signature of a perimeter being hardened incrementally, one dangerous capability at a time, without a unifying policy. If the trust dialog truly constitutes full consent by design, there would be nothing to patch.

3. From perimeter to zero trust

The “functions as designed” response places the burden on the developer: audit what you clone. That position rests on a perimeter security architecture — verify once at the gate, trust everything inside:

flowchart LR
    A1["Repo"] -->|"✓ trust gate"| A2["CLI"] --> A3["MCP"] --> A4["OS"]

The perimeter pattern is one modern security has largely moved past. The alternative approach is a zero trust — identity propagated through the capability chain, evaluated at each grant. Git provides the primitives: clone origin, remote URL, commit author. Conceptually, it would look like this:

flowchart LR
    B1["Repo\norigin · author"] -->|"✓ id check"| B2["CLI"] -->|"✓ capability?"| B3["MCP"] -->|"✓ scope?"| B4["OS"]

Read through the zero trust lens, Anthropic’s position has three problems.

Shared responsibility requires the system to carry its half. Under a perimeter model, all verification burden falls on the user at the gate — the system has the identity signals, but leaves them unused. Zero trust distributes the burden: each capability is evaluated at the point it’s granted.

The consent gate can’t convey per-capability trust. A perimeter gate concentrates all trust decisions into one moment. Anthropic’s gate covers MCP server activation, process spawning, and full OS access — none of it signaled. The coarser the gate, the harder it is to make consent meaningful.

The perimeter model puts expert-level burden on non-expert users. A perimeter gate requires the user to reason about all downstream consequences of a single click — a reasonable ask for a security engineer, not for a vibe-coder. Zero trust shifts that burden to the architecture: each capability grant is evaluated by the system, not the user.

The three problems compound: the system ignores available identity signals, the gate doesn’t compensate by informing the user what it actually grants, and the users left holding that gap aren’t equipped to close it.

Final thoughts

This post looked at two connected things: what the trust gate actually grants, and why the security architecture makes that easy to miss. At its core, TrustFall is a consent gap — a single prompt covering MCP server activation and unsandboxed OS access without stating either. The perimeter model is the structural reason that gap is hard to surface at design time: inside the perimeter is trusted by definition, leaving no natural STRIDE targets. Threat modeling a perimeter-architected system, finding the EoP requires a modeler to look past the gate and ask what it actually grants — that’s a skill, not something the methodology surfaces automatically. The CVE record shows this playing out: each patch adds a specific gate or blocklist entry without restructuring the boundary, and “functioning as designed” remains the public response to TrustFall.

A zero trust security architecture changes the shape of the problem. Explicit trust boundaries at each capability grant are natural STRIDE targets — the EoP question surfaces at design time regardless of modeler experience, not because of better analysts, but because the architecture itself gives threat modeling more boundaries to work with.

TrustFall affected Claude Code, Cursor, Gemini CLI, and Copilot — evidently all due to the same perimeter model. The broader security industry made this architectural transition before: perimeter security dominated until systems grew complex enough that a single gate couldn’t hold, and zero trust emerged as the answer. Agentic tools are on a similar trajectory — gaining capability and OS access fast, with the same pressure building at the trust boundary.

Is zero trust the natural next step in the architectural evolution of agentic tools?

References

Adversa AI — TrustFall: Coding Agent Security Flaw Enabling RCE in Claude, Cursor, Gemini CLI, Copilot
Anthropic — Claude Code Settings