AI Consent Security Series

A three-part series about building robust consent systems for AI coding assistants. Each post documents a real security vulnerability I discovered while working with Claude, and how we caught and fixed it together.

✅ Complete (3 parts)

Part 1

Stop Approving ls: Using a Local LLM to Auto-Classify Command Safety

If you use AI coding assistants like Cline, Cursor, or Claude Code, you know this pain: [y/n/s/a]: [y/n/s/a]: y

December 05, 2025

Part 2

When Your Trusted Commands Betray You: How an LLM Exploited My Safety Allowlist

Last week I published about building a local LLM command safety classifier. I thought I had command approval figured out. Then my AI assistant got sneaky.

December 08, 2025

Part 3

The AI That Helped Catch Itself: Consent Bypass via Indirect Script Execution

Part 3 of the AI Consent Security series. Previously: Local LLM Command Safety and Trusted Commands Betrayal.

December 10, 2025

About This Series

When you give an AI assistant access to your terminal, how do you keep it from running dangerous commands? This series chronicles my journey building a consent system that went from naive allowlisting to sophisticated command analysis.

What makes this unique: Each vulnerability was discovered during actual development work. The AI (Claude) helped me identify and fix its own potential exploits.

What You’ll Learn

Part 1: Auto-approving safe commands with local LLM classification
Part 2: Why allowlists fail (the cat >> betrayal)
Part 3: Detecting indirect execution attacks (python /tmp/evil.py)

Status

✅ Series Complete - All 3 parts published!

Human-AI security collaboration in action.

nightowlcoder@home:~$

Archive

About

RSS