<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
	<channel>
		<title>Agentic Ai on brain overflow</title>
		<link>https://brainoverflow.blog/categories/agentic-ai/</link>
		<description>Recent content in Agentic Ai on brain overflow</description>
		<generator>Hugo -- 0.162.1</generator>
		<language>en-us</language>
		<lastBuildDate>Sun, 24 May 2026 13:01:31 -0700</lastBuildDate>
		<atom:link href="https://brainoverflow.blog/categories/agentic-ai/index.xml" rel="self" type="application/rss+xml" />
		
		
		<item>
			<title>AI Assisted Bug Bounty Experiment</title>
			<link>https://brainoverflow.blog/posts/ai-assisted-bug-bounty/</link>
			<pubDate>Sun, 24 May 2026 13:01:31 -0700</pubDate><guid>https://brainoverflow.blog/posts/ai-assisted-bug-bounty/</guid>
			<description><![CDATA[&lt;no value&gt;]]></description><content type="text/html" mode="escaped"><![CDATA[<p><em>Five authorization bypass paths, clean PoCs for each, full disclosure report — output of a six-hour session: one human, one Claude Code agent, an idea, a repo clone, and a live target in Docker. I&rsquo;ll cover the technical details in a follow-up post once the disclosure process is complete.</em></p>
<hr>
<h2 id="1-the-experiment">1. The experiment<a href="#1-the-experiment" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>With the recent hype surrounding Mythos and its autonomous vulnerability discovery capabilities, the question of what AI can do for security research is hard to ignore. Most of that conversation centers on fully autonomous systems — multi-agent frameworks operating at scale with large budgets. I wanted to try something way simpler: what can one person accomplish with a single AI agent and a modest budget?</p>
<p>I came in with five things: familiarity with the target as a practitioner, enough Python and Django knowledge to steer the agent, years of application security experience, an evening to experiment, and a $20 Claude subscription with Cyber Verification Program approval.</p>
<p><img src="images/image-1779751697871.png" alt="Fear and Loathing in Las Vegas (1998)">
<em>Photo by Archive Photos/Getty Images — © 2012 Getty Images. (cropped image)</em></p>
<hr>
<h2 id="2-the-hypothesis">2. The hypothesis<a href="#2-the-hypothesis" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>My target is a Django-based web application enforcing a role-based access model — controlling who can see which resources, data, and system settings, at what level of detail. The access control model is the interesting surface.</p>
<p>I asked Claude to map the REST API — URL patterns, associated models and views — and report back on any gaps in URL parameter validation. It flagged something it thought looked interesting: a parameter responsible for fetching related objects alongside the primary response. My intuition when I saw it: likely a WebUI-driven performance optimization added on top of the existing API functionality, making the authorization model a potential target for logical flaws. Features like this get tested for functionality — does it return the right data? — but access control testing of this specific path may not have received the same attention as the core endpoints. Performance shortcuts and abstraction layers are common places where authorization logic gets underspecified, because they&rsquo;re added later and the direct-endpoint tests don&rsquo;t exercise them.</p>
<p>Whether it would lead anywhere was an open question.</p>
<hr>
<h2 id="3-how-the-research-ran">3. How the research ran<a href="#3-how-the-research-ran" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>From that exchange — Claude surfacing the parameter, me recognizing the authorization angle — we had a hypothesis. I asked Claude to examine how the feature works internally. It traced the relevant source files and identified the core gap: when the feature resolves a related object, no check is made against whether the requesting user is authorized to see it. The access controls on the direct API endpoints are simply not invoked in this path.</p>
<p>From there: local Docker instance, Claude calling the application&rsquo;s own REST API with admin credentials to provision test accounts and settings, then switching to an unprivileged account to test the hypothesis. The test pattern was clean — confirm the direct endpoint returns 403, then show the same data returns through the bypass. Seeing both in the same output is unambiguous.</p>
<p>The first two bypass paths were straightforward once the root cause was clear. I then asked Claude to think more broadly about the attack surface, prompting it to consider how Django&rsquo;s data model exposes direct and reverse object relationships. It identified three additional paths, including one through internal notes that users can mark private — the bypass returns their full content regardless, circumventing the visibility restriction the UI enforces.</p>
<p>Five exploitable paths in total, all from the same root cause — Missing Authorization (CWE-862) across the board, with Exposure of Sensitive Information (CWE-200) sprinkled in. Network-exploitable via the REST API by an authenticated user with read permissions, accessing admin-only data. Drafted to <code>CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:L/I:N/A:N</code> (4.3 Medium).</p>
<h2 id="4-chasing-the-escalation">4. Chasing the escalation<a href="#4-chasing-the-escalation" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>The core bypass came together quickly. What followed was the natural next move for any researcher: try to chain it, escalate severity, turn a Medium into something bigger. That&rsquo;s where most of the dead ends lived. Some of the vectors we tested:</p>
<ul>
<li>Could the bypass leak API authentication tokens (full account takeover)? Blocked.</li>
<li>Could it be flipped into a write path? Read-only by design.</li>
<li>Could unusual parameter values expose internal state or crash the app in a useful way? One 500 error, no data.</li>
<li>Could object traversal chain across multiple hops to reach higher-value targets? Limited to one level.</li>
<li>Could the attack surface extend beyond the object relationships already mapped? Explored and found nothing new reachable.</li>
<li>Could reading related data trigger a server-side request — an SSRF via this feature would have been a significant escalation. Nope.</li>
</ul>
<p>Thorough analysis of every relevant code path found nothing exploitable. At some point Claude even made a comment that the codebase outside the bypass appeared well-hardened — the dead ends weren&rsquo;t just unlucky angles, they were genuinely blocked.</p>
<p><img src="images/claude-comment.png" alt="Claude comment"></p>
<p>The dead-end analysis likely consumed more tokens than the core bypass itself. Each candidate path required the agent to load and reason over significant amounts of source context before concluding it was blocked. At some point I made a call to stop — it was getting late, and it was clear the token burn rate on escalation paths was outpacing any realistic chance of a meaningful outcome.</p>
<hr>
<h2 id="5-division-of-labor">5. Division of labor<a href="#5-division-of-labor" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>Across the full session, the process looked like this:</p>
<p><img src="images/research-loop.svg" alt="Research loop diagram"></p>
<p><strong>Target selection</strong> was mine. The application is open source with a commercial offering, backed by OWASP, and used in production by security teams — that&rsquo;s how I knew about it. Hundreds of releases and thousands of GitHub stars; it runs a HackerOne bug bounty program for many years with a good number of submissions behind it.</p>
<p><strong>Attack surface discovery</strong> was Claude&rsquo;s. Asked to map the REST API — URL patterns, models, and views — it flagged a specific parameter as worth investigating: one responsible for fetching related objects alongside the primary response. That observation became the hypothesis we tested.</p>
<p><strong>The hypothesis</strong> was joint. I directed the exploration and recognized why the parameter Claude surfaced was worth pursuing — that features like this tend to be undertested for authorization completeness. That judgment comes from years of looking at application security. An autonomous agent starting from scratch has no basis for it.</p>
<p><strong>Reading and tracing source code</strong> across a large, unfamiliar codebase — holding relevant context across many files simultaneously — was Claude&rsquo;s most consistent contribution. Work that would take a human hours of careful reading took Claude minutes. That&rsquo;s where LLMs really shine.</p>
<p><strong>Environment setup</strong> was handled by Claude directly against the live application. The agent called the target&rsquo;s own REST APIs with admin credentials to provision the test accounts and configuration needed to validate each vector — then switched to an unprivileged account to confirm the bypass. No manual setup required and it knew the APIs already from reading the codebase.</p>
<p><strong>PoC development and live validation</strong> was Claude&rsquo;s. It wrote the test scripts, ran them against the live Docker instance, interpreted the results, diagnosed problems and iterated to completion.</p>
<p><strong>Steering</strong> was mine throughout. When the initial finding was confirmed, I directed Claude to expand the search along a specific axis: how Django models expose object relationships, and where those traversal paths might reach objects the requesting user shouldn&rsquo;t see. That framing produced three additional bypass paths.</p>
<p><strong>Boundary analysis</strong> was Claude&rsquo;s. When I asked whether the initial finding could be chained or escalated, Claude systematically traced each candidate path and explained whether it was viable or blocked and why.</p>
<p><strong>Impact assessment</strong> was mine. One of the five bypass paths exposes internal security notes that teams mark private. Characterizing what that means in a real-world application requires understanding how those notes are used in practice, not just what the access model technically permits.</p>
<p><strong>Structured reporting</strong> was Claude&rsquo;s — CVSS scores, impact analysis, affected-code tables, and remediation recommendations for all five findings in a single consolidated report, with live response examples for each confirmed vector.</p>
<hr>
<h2 id="6-numbers-and-disclosure">6. Numbers and disclosure<a href="#6-numbers-and-disclosure" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>What led to this session was a failed attempt with one of the open source autonomous frameworks. I pointed it at the same target, let it run, and watched it burn through 2.5 million tokens — mostly attempting to set up its own environment, failing to start the target application it didn&rsquo;t need, and exhausting its budget before reaching the actual testing phase. That experience prompted the question: what could a different approach produce?</p>
<p>Our session — hypothesis to five confirmed findings, with live validation, dead-end analysis, and a full structured report — consumed 1.4 million tokens at a total API cost of $25. That gap is partly explained by the hypothesis: starting from a well-formed idea of where to look is a significant multiplier on what a given budget can produce. Autonomous tools trading specificity for breadth need proportionally more tokens to work through open-ended reconnaissance before they converge on anything. It&rsquo;s also worth noting that Anthropic&rsquo;s model is arguably more capable than what the open source framework was running — and more expensive per token — so the human-agent configuration does double duty: it keeps the effort targeted, and that targeting matters more when the model you&rsquo;re running costs more to use.</p>
<p>After the core research wrapped, I asked Claude to analyze the git commit history and map when the vulnerable feature was introduced. It traced the initial commit to January 2021 — identified the exact PR that introduced it and confirmed the authorization gap was present from the start — then mapped the feature&rsquo;s expansion across subsequent releases. The finding had been in production for over five years across multiple major version milestones, including a significant expansion of the attack surface in 2023 that added it to roughly twenty additional API endpoints.</p>
<p>I submitted the findings to the vendor&rsquo;s HackerOne bug bounty program — public, improvement-oriented, no monetary bounties. The report covered all bypass paths with confirmed live responses, CVSS scoring, and root cause analysis. A follow-up post with the full technical details will go up once the disclosure process is complete.</p>
<hr>
<h2 id="final-thoughts">Final thoughts<a href="#final-thoughts" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>Looking at the flow diagram, the natural question is: can the human node be replaced with another autonomous agent? Yes — projects like Mythos and <a href="https://github.com/KeygraphHQ/shannon">Shannon</a> demonstrate that fully autonomous security research is viable, and companies operate at scale doing it. The cost is higher: an autonomous agent has to discover through exploration what a human researcher brings as context, and that shows up in the token count.</p>
<p>All in all, a Medium severity authorization bypass with five confirmed network vectors, in a reasonably hardened codebase in a single evening — that&rsquo;s a good result in my book, and it took way less effort than comparable research I&rsquo;ve done solo. Happy bug hunting!</p>
]]></content>
		</item>
		
		<item>
			<title>Honeypots for AI Agents</title>
			<link>https://brainoverflow.blog/posts/ai-agent-honeypots/</link>
			<pubDate>Wed, 06 May 2026 08:00:07 -0700</pubDate><guid>https://brainoverflow.blog/posts/ai-agent-honeypots/</guid>
			<description><![CDATA[&lt;no value&gt;]]></description><content type="text/html" mode="escaped"><![CDATA[<p><em>AI agents run with broad access to the local environment — credentials, config
files, shell history, dotfiles — and nothing stops a misbehaving one from
reading things it was never meant to touch. The question is: how do you catch it?</em></p>
<hr>
<p>My professional career in network security started in 2005 when I joined
Juniper Networks&rsquo; IDP team — which is probably why, when thinking about how to
detect a misbehaving agent, my mind went straight to a technique from that era.</p>
<p>The IDS world solved an analogous problem decades ago. A network intruder
probes hosts and services they have no business touching; a <strong>honeypot</strong> sits in
those places looking like an attractive target — an exposed SSH server, an
unprotected admin console — but it&rsquo;s a decoy, and it alerts when anything
interacts with it. Honeypots are traps, tripwires — there&rsquo;s no legitimate use
for a fake host, so the false-positive rate is near zero.</p>
<p>The same idea can be applied to AI agents: plant bogus credentials where a rogue
agent would look, and treat any access as a signal. Rather than trying to define
what legitimate agent behavior looks like — which is hard, because legitimate
agents do a lot of things — it defines what illegitimate behavior looks like:
touching credentials that were never surfaced as part of the task. That&rsquo;s a
tighter definition, and it maps well to the threat model.</p>
<hr>
<h2 id="1-the-threat">1. The threat<a href="#1-the-threat" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>An AI agent running on a developer machine or in CI has access to everything its
process can read: <code>~/.aws/</code>, <code>~/.ssh/</code>, <code>~/.kube/</code>, <code>~/.config/</code>, environment
variables, dotfiles, every file in the project directory. That access is usually
not scoped to what the agent actually needs to do its job — it&rsquo;s scoped to the
developer&rsquo;s own permissions, because that&rsquo;s how agents tend to run inside an IDE.</p>
<p>There are three failure modes this matters for, in increasing order of intent.</p>
<ul>
<li>
<p><strong>Overprivileged agents.</strong> A legitimate, well-behaved agent still runs with the developer&rsquo;s full filesystem permissions. It may passively sweep up credential files while reading context, or actively reason that credentials it found are relevant to the task — either way, not out of malice, just because nothing stops it. The agent&rsquo;s judgment about what it&rsquo;s authorized to do is not the same as what it&rsquo;s actually authorized to do.</p>
</li>
<li>
<p><strong>Prompt injection.</strong> A malicious instruction embedded in content the agent
processes — a README, a web page, a database row — redirects the agent to look
for credentials and exfiltrate them. The agent is not compromised; it&rsquo;s
following instructions that were smuggled into its context. It tops the
<a href="https://genai.owasp.org/llmrisk/llm01-prompt-injection/">OWASP Top 10 for LLM Applications</a>
as LLM01:2025, and for good reason — it exploits the design of LLMs rather than
a flaw that can be patched.</p>
</li>
<li>
<p><strong>Compromised agents.</strong> A model that has been fine-tuned adversarially, or a
scaffolding layer with a supply-chain compromise, probes the environment
deliberately. Skills and MCP servers install from the internet with no signing
and no scanning — malicious ones are a real and active delivery mechanism, not
a future concern. The emergence of dedicated skills security solutions is evidence
enough that this threat is already being taken seriously.</p>
</li>
</ul>
<p>The distinguishing behavior is the same in all three cases: the agent accesses
resources it was never explicitly directed to use.</p>
<hr>
<h2 id="2-why-defenses-fall-short">2. Why defenses fall short<a href="#2-why-defenses-fall-short" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>What makes this more dangerous is how creative agents get in pursuing their goals.
Anthropic&rsquo;s <a href="https://www.anthropic.com/research/agentic-misalignment">agentic misalignment research</a>
stress-tested models in simulated corporate environments and found that agents would pursue their assigned
goals through unexpected means when straightforward paths were blocked: leaking
sensitive information to competitors, attempting to influence their own
evaluation, sabotaging research they disagreed with. The models weren&rsquo;t broken;
they were doing what they were optimized to do, just not in the way their
operators intended.</p>
<p>This creativity makes naive defenses, such as blacklisting certain
paths, blocking specific tool calls, or simply instructing the agent not to
touch certain things — largely moot. A sufficiently motivated agent will reason
around them, find an indirect path, or reframe the task in a way that makes the
restricted action seem justified.</p>
<p>This compounds with a structural problem on the human side: agents increasingly
run with minimal oversight. Skip-permissions modes bypass approval prompts.
Long-running background tasks accumulate hundreds of tool calls nobody reviewed.
Users experiencing approval fatigue click through without reading. The result is
an agent with broad local access, a disposition to find creative paths to its
goals, and nobody watching closely.</p>
<p>Simon Willison coined the <a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/">lethal trifecta</a> as a
framework for exactly this risk: private data access, exposure to untrusted
content, and exfiltration capability. When all three are present, an attacker
who can inject instructions into the agent&rsquo;s context has essentially won. It has
become a widely accepted shorthand for why agentic deployments require a
different security posture than traditional software.</p>
<hr>
<h2 id="3-existing-tools">3. Existing tools<a href="#3-existing-tools" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>Network honeypots have been around since the early 2000s. The idea is unchanged
across thirty years: deploy something that looks real but has no legitimate use,
and alert on any interaction. A low-interaction honeypot like Honeyd emulates
network services; a high-interaction one like Cowrie runs a full fake SSH
daemon. Either way, any connection is anomalous by definition — legitimate users
don&rsquo;t hit the honeypot.</p>
<p><a href="https://canarytokens.org">Canarytokens</a> applied this to
credentials and files rather than network services. You generate a fake AWS
access key or a Word document with a beacon embedded; when someone uses the key
or opens the document, you get an alert. The AWS canary creates a real IAM user
and monitors CloudTrail — there&rsquo;s a lag of minutes, and it requires external AWS
infrastructure.</p>
<p><a href="https://github.com/peg/snare">Snare</a> is a newer honeypot built specifically for
the AI agent threat model, with a few meaningful differences from Canarytokens.
It covers 18 credential types in one shot — including AI-native ones like
OpenAI, Anthropic, and MCP server configs that Canarytokens doesn&rsquo;t have —
placing canaries in all the standard locations an agent would probe. The AWS
canary fires at credential-resolution time via a local shell hook, before any
API call is made, which is faster and doesn&rsquo;t require external AWS
infrastructure. Alerts include the SDK user agent and ASN, with a &ldquo;Likely AI
agent&rdquo; flag when the request originates from cloud infrastructure — context that
Canarytokens doesn&rsquo;t surface.</p>
<p>The conceptual lineage runs from Honeyd to Canarytokens to Snare with the same
core insight at each step: if you can define what legitimate access looks like,
anything outside that definition is a signal.</p>
<hr>
<h2 id="4-limitations">4. Limitations<a href="#4-limitations" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p><strong>It detects, it doesn&rsquo;t prevent.</strong> The agent already misbehaved by the time you get the alert. These are detection controls, not prevention controls. Knowing an agent touched a credential is useful — it&rsquo;s not the same as stopping it. That said, detection data has value beyond the alert itself: observing what agents actually reach for in practice — even legitimate ones — tells you where the real access boundaries need to be. That&rsquo;s useful input for building guardrails, scoping permissions, or writing policies grounded in observed behavior rather than guesswork.</p>
<p><strong>Placement is manual.</strong> Canaries live where tools naturally look for
credentials. If an agent is directed to a custom config path or a non-standard
environment variable, the canary won&rsquo;t be there. Coverage is bounded by where
you planted the wires — the same fundamental limitation as any tripwire-based
detection.</p>
<p><strong>Detection confidence varies.</strong> Precision depends on the canary design and how
deeply it hooks into the agent&rsquo;s execution environment.
Not all credential access paths are equally observable.</p>
<p><strong>Shared machines.</strong> The low false-positive guarantee relies on the canary being invisible to legitimate users. On a machine shared by multiple developers, someone may stumble across a planted credential and use it for a real task — generating an alert that has nothing to do with a misbehaving agent. Dedicated agent environments sidestep this; shared workstations require more care.</p>
<hr>
<h2 id="conclusion">Conclusion<a href="#conclusion" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>The IDS-era idea is simple: legitimate users only access resources they have business
accessing — so any interaction with a decoy is a signal by definition.
What&rsquo;s new is the target — an agent running under your
own account, following instructions that arrived via a file you never meant to
treat as executable, in a world where the line between &ldquo;following instructions&rdquo;
and &ldquo;going rogue&rdquo; is invisible to a monitoring system that only observes API calls.</p>
<p>The tools are already there. <a href="https://github.com/peg/snare">Snare</a> in particular
caught my attention — built specifically for the AI agent threat model, it covers the
credential surface an agent would probe and fires faster than any CloudTrail-based
approach. An old technique that still holds up in the agentic age.</p>
<hr>
<h2 id="references">References<a href="#references" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round" class="feather">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<ul>
<li><a href="https://genai.owasp.org/llmrisk/llm01-prompt-injection/">LLM01:2025 Prompt Injection</a> — OWASP Top 10 for LLM Applications</li>
<li><a href="https://www.anthropic.com/research/agentic-misalignment">Agentic Misalignment: How LLMs Could Be Insider Threats</a> — Anthropic</li>
<li><a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/">The lethal trifecta for AI agents</a> — Simon Willison</li>
<li><a href="https://github.com/peg/snare">Snare — honeypot canaries for AI agents</a></li>
</ul>
]]></content>
		</item>
		
	</channel>
</rss>
