AI Agents Need Permission Boundaries, Not Personalities
Most agent runtimes add more roles. punk starts from a harder premise: trust comes from boundaries, durable state, and proof.
Context-First Thinking
Practical writing on AI agents, context engineering, and toolchains. Full articles live on the site. Short updates and discussion happen in Telegram.
Most agent runtimes add more roles. punk starts from a harder premise: trust comes from boundaries, durable state, and proof.
The hardest bug was not in the code. It was in the trust model between the engineer agent and the orchestrator.
AI agents rewrite code but leave the old version behind. jj's predecessor chains make ghost solutions detectable -- git can't.
When agents read project.intent.md as ground truth, stale specs become execution bugs. Here's how I caught real drift in two projects.
A Rust CLI that makes your AI agent skills portable across Claude Code, Codex, and Gemini. One command to install, one command to switch.
Studying Mistral's Leanstral -- an agent for Lean 4 theorem proving -- led to concrete improvements in Signum, a multi-model code audit pipeline.
AI code verification as a loop, not a gate. Iterative audit, contract self-critique, and shared context across tasks in Signum v4.6.
We carefully design prompts and tools but rarely audit the environment where the agent actually runs. Sentinel makes that measurable.
A PostToolUse hook that logs every skill activation to local JSONL. No existing tool tracks whether the model actually follows a skill's instructions.
Most AI research tools optimize for coherent synthesis, not factual accuracy. Delve adds a claim-level adversarial verification stage that changes the trust model entirely.
A local news plugin worked until it didn't. The fix was a different data model, language, and delivery surface.
AI made code cheap. It didn't make trust cheap. The fix isn't better reviewers - it's moving the gate from PR diff to approved intent.
Proofpack chains contract, implementation, and audit into a single verifiable record. Why proof artifacts are the missing primitive of AI code generation.
How I built a 4-stage news pipeline that clusters articles into stories using title similarity, all in stdlib Python with SQLite.
Why running AI-generated code through more AI reviewers doesn't solve the reliability problem — and what a contract-first pipeline changes about it.
How I built a plugin ecosystem for Claude Code — from scattered scripts to a full lifecycle with scaffolding, quality gates, multi-AI review, and one-command install.
Analyzing SkillsBench — the first systematic benchmark for Agent Skills. 7,308 trajectories, critical review, and why skills are context engineering for agents.
Why git breaks AI agents and how jj solves every single one of these problems
How content format determines whether an AI agent can see your site. Research data, real standards, and what to do right now.
Research on Claude model selection for multi-agent teams. Why Opus can be cheaper than Sonnet, and Haiku is dangerous for agentic tasks.
Reference guide for Gas Town — a system for parallel management of 20-30 Claude Code agents. Commands, concepts, workflows.
Applying evolutionary algorithms to startup idea generation with AI agents
How to solve skill drift in AI agents. Manifest + lock + symlinks — a pattern from package managers applied to context management.
Breaking down Claude Code architecture based on PromptLayer founder's talk. Why while-loop, Bash, and context management matter more than complex workflows.
Why Text-to-SQL and direct REST API mapping fail, and how a semantic graph of business entities solves the context delivery problem in enterprise.
Introduction to context engineering — an engineering approach to working with LLMs. Why prompts stop working and what to do about it.
5k+ monthly readers
Join @ctxtdev for short updates, in-between ideas, and discussion around new posts.
Support keeps the writing going. If you are building AI agents and need a second set of eyes, the consulting path is open too.
100+ daily readers