2 min readfrom Machine Learning

[D] Your Agent, Their Asset: Real-world safety evaluation of OpenClaw agents (CIK poisoning raises attack success to ~64–74%)

Paper: https://arxiv.org/abs/2604.04759

This paper presents a real-world safety evaluation of OpenClaw, a personal AI agent with access to Gmail, Stripe, and the local filesystem.

The authors introduce a taxonomy of persistent agent state:

- Capability (skills / executable code)

- Identity (persona, trust configuration)

- Knowledge (memory)

They evaluate 12 attack scenarios on a live system across multiple models.

Key results:

- baseline attack success rate: ~10–36.7%

- after poisoning a single dimension (CIK): ~64–74%

- even the strongest model shows >3× increase in vulnerability

- best defense still leaves Capability attacks at ~63.8%

- file protection reduces attacks (~97%) but also blocks legitimate updates at similar rates

The paper argues these vulnerabilities are structural, not model-specific.

One interpretation is that current defenses mostly operate at the behavior or context level:

- prompt-level alignment

- monitoring / logging

- state protection mechanisms

But execution remains reachable once the system state is compromised.

This suggests a different framing:

proposal -> authorization -> execution

where authorization is evaluated deterministically:

(intent, state, policy) -> ALLOW / DENY

and execution is only reachable if explicitly authorized.

Curious how others interpret this:

  1. Is this primarily a persistent state poisoning problem?

  2. A capability isolation / sandboxing problem?

  3. Or evidence that agent systems need a stronger execution-time control layer?

submitted by /u/docybo
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#real-time data collaboration
#real-time collaboration
#enterprise-level spreadsheet solutions
#rows.com
#financial modeling with spreadsheets
#no-code spreadsheet solutions
#OpenClaw
#real-world safety evaluation
#personal AI agent
#CIK poisoning
#attack scenarios
#persistent agent state
#execution-time control
#vulnerability
#capability
#authorization
#state protection mechanisms
#identity
#knowledge
#sandboxing