<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
	<channel>
		<title>Security Architecture on brain overflow</title>
		<link>https://brainoverflow.blog/categories/security-architecture/</link>
		<description>Recent content in Security Architecture on brain overflow</description>
		<generator>Hugo -- 0.162.1</generator>
		<language>en-us</language>
		<lastBuildDate>Mon, 18 May 2026 10:00:00 -0700</lastBuildDate>
		<atom:link href="https://brainoverflow.blog/categories/security-architecture/index.xml" rel="self" type="application/rss+xml" />
		
		
		<item>
			<title>TrustFall: The Perimeter Problem in Agentic Tools</title>
			<link>https://brainoverflow.blog/posts/perimeter-problem-in-agentic-tools/</link>
			<pubDate>Mon, 18 May 2026 10:00:00 -0700</pubDate><guid>https://brainoverflow.blog/posts/perimeter-problem-in-agentic-tools/</guid>
			<description><![CDATA[&lt;no value&gt;]]></description><content type="text/html" mode="escaped"><![CDATA[<p><em>On May 7, 2026, Adversa AI published <a href="https://adversa.ai/blog/trustfall-coding-agent-security-flaw-rce-claude-cursor-gemini-cli-copilot/">TrustFall</a> — a one-click remote code execution in Claude Code, with variants across Gemini CLI, Cursor, and GitHub Copilot: clone a repository, open it, click &ldquo;Yes, I trust this folder&rdquo;, and an attacker-controlled process runs with your full OS privileges.</em></p>
<p><em>Anthropic declined the finding as outside their threat model and the behavior as functioning by design. This post digs into that response — and argues that the core issue is architectural: a perimeter security model that can&rsquo;t carry the weight placed on it, and that makes the vulnerability structurally hard to surface through threat modeling. It looks at what a zero trust alternative would look like for agentic tools.</em></p>
<hr>
<h2 id="1-the-vulnerability">1. The vulnerability<a href="#1-the-vulnerability" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>Project-level config files — <code>.mcp.json</code> and <code>.claude/settings.json</code> — committed to a repository can activate attacker-controlled MCP servers. When a developer clones the repo and clicks through the trust dialog, an unsandboxed Node.js process spawns with full user OS privileges — no further prompts. Three developer actions: clone, open, click. The attack also has a zero-click CI/CD variant where Claude Code runs headless in GitHub Actions against an untrusted pull request branch, bypassing the trust dialog entirely.</p>
<p>For the full technical details see <a href="https://adversa.ai/blog/trustfall-coding-agent-security-flaw-rce-claude-cursor-gemini-cli-copilot/">TrustFall</a>.</p>
<p>In threat modeling terms, the mechanism is a trust boundary problem. Project-scope config — committed to a repository — can activate MCP servers: external processes that run with full user OS privileges. The gate between untrusted repo content and those privileges is a single user prompt:</p>
<pre class="mermaid">flowchart TB
    subgraph Untrusted["Untrusted · attacker-controlled"]
        PCfg["Project config\n(committed to the repo)"]
    end

    CLI["Claude Code CLI"]

    subgraph Gate["Trust gate"]
        Prompt["'Do you trust this folder?'\nsingle Yes/No"]
    end

    subgraph Privileged["Privileged · full user OS access"]
        MCP["MCP Server process\nNode.js · no sandbox"]
        OS["~/ · ~/.ssh · ~/.aws\nread / write / exec — unrestricted"]
    end

    PCfg --> CLI
    CLI --> Prompt
    Prompt -- "on 'Yes'" --> MCP
</pre>
<p>That single Yes/No covers four distinct capability grants:</p>
<ol>
<li>Claude reading and editing project files <em>(clearly implied)</em></li>
<li>Claude following project-level behavioral settings <em>(reasonable)</em></li>
<li>Activating MCP servers defined in project config <em>(not stated)</em></li>
<li>Those servers running as unsandboxed processes with full user privileges <em>(definitely not stated)</em></li>
</ol>
<p>Two of these carry significant security consequences — and neither appears in the prompt language. The gap between what the user consents to and what the system delivers is a textbook Elevation of Privilege (the E in STRIDE methodology): the subject grants more than they know.</p>
<p>The pattern holds across the tools TrustFall examined — Gemini CLI and Cursor do mention MCP servers in their consent language, Claude Code and Copilot don&rsquo;t, but all four default to Yes or Trust.</p>
<hr>
<h2 id="2-outside-our-threat-model">2. &ldquo;Outside our threat model&rdquo;<a href="#2-outside-our-threat-model" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>Anthropic&rsquo;s phrase is worth examining literally. Per TrustFall&rsquo;s analysis, the missing enforcement isn&rsquo;t outside Anthropic&rsquo;s defined boundary — it&rsquo;s inside it. The trust dialog is the perimeter; what happens after it is, by their own framing, the trusted zone. &ldquo;Outside our threat model&rdquo; means, in practice: inside our perimeter, but below the granularity we protect at.</p>
<p>That granularity isn&rsquo;t uniformly coarse — the capability is demonstrably known. <code>bypassPermissions</code> gets a dedicated warning because it is dangerous; <code>enableAllProjectMcpServers</code>, <code>enabledMcpjsonServers</code>, and <code>permissions.allow</code> activate equally dangerous behavior without equivalent disclosure. TrustFall also notes that earlier versions of Claude Code included an explicit MCP consent prompt that was later removed. These are the tell-tale signs of a threat model that&rsquo;s coarser than the reality it represents — some dangerous capabilities are visible enough to gate explicitly, others slip through the same boundary unexamined.</p>
<p>That pattern of selective gating is further undermined by the CVE record. Anthropic&rsquo;s response to TrustFall was that the behavior functions as designed — clicking &ldquo;trust this folder&rdquo; means accepting the project&rsquo;s configuration, MCP servers included. Yet over six months before TrustFall, Anthropic patched three related vulnerabilities: delayed MCP activation until after the trust dialog (CVE-2025-59536, Oct 2025), blocked <code>ANTHROPIC_BASE_URL</code> from project scope (CVE-2026-21852, Jan 2026), and blocked <code>bypassPermissions</code> from project scope (CVE-2026-33068, Mar 2026) — the same setting that already carried a UI warning. Each fix adds a specific gate or blocklist entry — the signature of a perimeter being hardened incrementally, one dangerous capability at a time, without a unifying policy. If the trust dialog truly constitutes full consent by design, there would be nothing to patch.</p>
<hr>
<h2 id="3-from-perimeter-to-zero-trust">3. From perimeter to zero trust<a href="#3-from-perimeter-to-zero-trust" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>The &ldquo;functions as designed&rdquo; response places the burden on the developer: audit what you clone. That position rests on a perimeter security architecture — verify once at the gate, trust everything inside:</p>
<pre class="mermaid">flowchart LR
    A1["Repo"] -->|"✓ trust gate"| A2["CLI"] --> A3["MCP"] --> A4["OS"]
</pre>
<p>The perimeter pattern is one modern security has largely moved past. The alternative approach is a zero trust — identity propagated through the capability chain, evaluated at each grant. Git provides the primitives: clone origin, remote URL, commit author. Conceptually, it would look like this:</p>
<pre class="mermaid">flowchart LR
    B1["Repo\norigin · author"] -->|"✓ id check"| B2["CLI"] -->|"✓ capability?"| B3["MCP"] -->|"✓ scope?"| B4["OS"]
</pre>
<p>Read through the zero trust lens, Anthropic&rsquo;s position has three problems.</p>
<p><strong>Shared responsibility requires the system to carry its half.</strong>
Under a perimeter model, all verification burden falls on the user at the gate — the system has the identity signals, but leaves them unused. Zero trust distributes the burden: each capability is evaluated at the point it&rsquo;s granted.</p>
<p><strong>The consent gate can&rsquo;t convey per-capability trust.</strong>
A perimeter gate concentrates all trust decisions into one moment. Anthropic&rsquo;s gate covers MCP server activation, process spawning, and full OS access — none of it signaled. The coarser the gate, the harder it is to make consent meaningful.</p>
<p><strong>The perimeter model puts expert-level burden on non-expert users.</strong>
A perimeter gate requires the user to reason about all downstream consequences of a single click — a reasonable ask for a security engineer, not for a vibe-coder. Zero trust shifts that burden to the architecture: each capability grant is evaluated by the system, not the user.</p>
<p>The three problems compound: the system ignores available identity signals, the gate doesn&rsquo;t compensate by informing the user what it actually grants, and the users left holding that gap aren&rsquo;t equipped to close it.</p>
<hr>
<h2 id="final-thoughts">Final thoughts<a href="#final-thoughts" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>This post looked at two connected things: what the trust gate actually grants, and why the security architecture makes that easy to miss. At its core, TrustFall is a consent gap — a single prompt covering MCP server activation and unsandboxed OS access without stating either. The perimeter model is the structural reason that gap is hard to surface at design time: inside the perimeter is trusted by definition, leaving no natural STRIDE targets. Threat modeling a perimeter-architected system, finding the EoP requires a modeler to look past the gate and ask what it actually grants — that&rsquo;s a skill, not something the methodology surfaces automatically. The CVE record shows this playing out: each patch adds a specific gate or blocklist entry without restructuring the boundary, and &ldquo;functioning as designed&rdquo; remains the public response to TrustFall.</p>
<p>A zero trust security architecture changes the shape of the problem. Explicit trust boundaries at each capability grant are natural STRIDE targets — the EoP question surfaces at design time regardless of modeler experience, not because of better analysts, but because the architecture itself gives threat modeling more boundaries to work with.</p>
<p>TrustFall affected Claude Code, Cursor, Gemini CLI, and Copilot — evidently all due to the same perimeter model. The broader security industry made this architectural transition before: perimeter security dominated until systems grew complex enough that a single gate couldn&rsquo;t hold, and zero trust emerged as the answer. Agentic tools are on a similar trajectory — gaining capability and OS access fast, with the same pressure building at the trust boundary.</p>
<p>Is zero trust the natural next step in the architectural evolution of agentic tools?</p>
<hr>
<h2 id="references">References<a href="#references" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<ul>
<li>Adversa AI — <a href="https://adversa.ai/blog/trustfall-coding-agent-security-flaw-rce-claude-cursor-gemini-cli-copilot/">TrustFall: Coding Agent Security Flaw Enabling RCE in Claude, Cursor, Gemini CLI, Copilot</a></li>
<li>Anthropic — <a href="https://code.claude.com/docs/en/settings">Claude Code Settings</a></li>
</ul>
<hr>
]]></content>
		</item>
		
	</channel>
</rss>
