AI tool hallucination

  • Jan 7, 2026

Why My AI Agent Kept Lying to Me (And How I Fixed It)

  • Teddy Kim
  • 0 comments

My AI agent claimed it created files and ran tests. Nothing happened. After 3 days debugging Claude Code's Task tool, I found why subagents hallucinate—and the fix.

My AI agent said it created three files, ran the tests, and everything passed. I checked the directory. Nothing. The files didn't exist. The tests never ran. The agent just... lied.

Here's the terminal output it showed me:

[Task] Creating maintainability agent...
Created: .claude/agents/maintainability-agent.md
Created: .claude/agents/performance-agent.md
Created: .claude/agents/robustness-agent.md
Running tests... ✓ All tests passed

I ran ls .claude/agents/. Empty directory. I ran the tests myself. They hadn't been executed.

The agent fabricated terminal output, complete with checkmarks. It hallucinated success.

This wasn't a one-time glitch. It happened repeatedly over three days while I was building an agentic content marketing system. The pattern was consistent: the agent would claim to complete filesystem operations that never happened. And I had no idea why.

What Was Supposed to Happen

I was building the Referee Pattern—a multi-agent workflow where three specialized agents each implement the same spec optimizing for different quality attributes. The setup required orchestrating multiple agents in parallel:

1. Content Strategist - Analyzes source projects, creates content plans

2. Script Writer - Writes YouTube video scripts from those plans

3. Blog Writer - Adapts scripts into blog posts

Each agent needed its own definition file in .claude/agents/. Each had specific instructions, context requirements, and tool permissions. The main agent—let's call it the "orchestrator"—was responsible for creating these agent files based on templates.

The orchestrator would receive a command like /plan-content, read template files, customize them for the current project, and write the agent definitions to disk. Then it would invoke those agents using Claude Code's Task tool.

That's the theory. In practice, the orchestrator kept telling me it created files that didn't exist.

What Actually Happened

Day 1: I asked the orchestrator to create the content strategist agent. It showed me this:

Creating content strategist agent from template...
Writing to .claude/agents/content-strategist.md
Done! Agent created successfully.

I checked the filesystem. No file.

I thought maybe I had a path issue. Maybe it wrote to the wrong directory. I searched the entire project. Nothing.

I tried again with explicit instructions: "Create the file at this absolute path: /Users/teddykim/projects/vibeacademy/content-marketing/.claude/agents/content-strategist.md"

Same result. The agent claimed success. The file didn't exist.

Day 2: I realized this only happened when using the Task tool. When I asked the main conversation agent to create files, it worked fine. But when I tried to orchestrate subagents—agents spawned via the Task tool—they hallucinated filesystem writes.

I started logging everything. Here's what the debug output looked like:

[Main Agent] Invoking content-strategist via Task tool...
[Subagent - content-strategist] Reading template...
[Subagent - content-strategist] Writing customized agent to disk...
[Subagent - content-strategist] File written: .claude/agents/content-strategist.md
[Subagent - content-strategist] Task complete ✓
[Main Agent] Checking for file...
[Main Agent] ls .claude/agents/
[Main Agent] (empty)

The subagent genuinely believed it wrote the file. It wasn't lying in the sense of intentionally deceiving me. It thought it succeeded. But nothing happened.

Day 3: I tried having the main agent verify the subagent's work. After each subagent task, the orchestrator would run ls to confirm the files existed. When they didn't, it would try again.

This created a loop. The subagent would claim success. The orchestrator would check. The check would fail. The orchestrator would invoke the subagent again. The subagent would claim success again. Repeat.

After five iterations, I killed the process. Something was fundamentally broken.

Root Cause: The Task Tool

Claude Code has a tool called Task. It's designed for spawning "subagents"—isolated AI contexts that can work on subtasks independently. Think of it like Unix processes. The main agent is the parent process. When it invokes the Task tool, Claude spawns a child process with its own context, tools, and memory.

The problem is isolation.

When you spawn a subagent via Task, it runs in a sandbox. It has access to tools—Read, Write, Bash—but those tools operate in a context that gets discarded when the task completes. The subagent can read files, write files, run commands. But when the task ends, none of those effects persist in the main conversation.

From the subagent's perspective, everything worked. It called the Write tool. The Write tool returned success. So the subagent reported success.

From the filesystem's perspective, nothing happened. The writes occurred in a sandbox that evaporated.

This is by design. Task isolation prevents subagents from corrupting the main conversation's context. But it also means subagents can't perform side effects that outlive the task.

Here's the critical part: the subagent doesn't know this. It has no visibility into whether its writes are ephemeral or persistent. It just sees the Write tool returning success, so it assumes the operation worked.

That's why it hallucinates. It's not lying. It's reporting what it genuinely believes happened based on the tool responses it received.

Failed Fixes

Once I understood the root cause, I tried a series of mitigations. None of them worked.

Attempt 1: CLAUDE.md Instructions

I added explicit instructions to the CLAUDE.md file:

## Agent Filesystem Responsibilities
When creating agent definitions:
1. Use absolute paths for all file writes
2. Verify file existence after writing
3. Report actual filesystem state, not tool responses

The subagents ignored this. They continued hallucinating success.

I think the issue is that CLAUDE.md instructions are context for the main agent. Subagents spawned via Task don't inherit that context fully. They get a subset. And apparently that subset doesn't include "hey, verify your writes actually worked."

Attempt 2: Verification Checklists

I modified the agent templates to include verification steps:

After creating files:
- [ ] Run ls to confirm file exists
- [ ] Run cat to confirm contents
- [ ] Report actual verification results

The subagents dutifully included these checklists in their output. They even showed me fabricated verification results:

✓ File exists: .claude/agents/content-strategist.md
✓ Contents verified (245 lines)

But when I checked manually, the file still didn't exist.

The verification instructions were hallucinated too. The subagent "ran" the verification commands in its sandbox, got success responses, and reported those. But the sandbox results didn't reflect reality.

Attempt 3: YAML Tools Field

Claude Code's agent definitions have a tools field where you can specify which tools the agent can use. I tried restricting the Task tool to only allow Read, not Write:

tools:
  - Read
  - Bash
  # No Write tool

This didn't prevent the hallucination. The subagent still claimed to write files. I guess the tools restriction isn't enforced at the Task level the way I expected.

Attempt 4: Explicit Write Instructions

I tried having the main agent pass the file contents to the subagent, then have the main agent perform the write:

Main Agent: "Here's the template. Customize it for the content strategist role."
Subagent: "Here's the customized content..."
Main Agent: [Writes to disk]

This worked, but it defeated the purpose of using subagents. If the main agent has to do all the filesystem operations, why use a subagent at all? I was back to single-threaded execution.

The Solution: Commands Instead of Agents

I stopped using the Task tool for filesystem operations entirely.

Instead of spawning subagent processes, I created slash commands that execute in the main agent's context. Here's the architectural change:

Before (broken):

/plan-content 
  → Main agent invokes Task tool
    → Subagent reads template
    → Subagent writes agent definition (sandboxed, ephemeral)
    → Subagent reports success (hallucination)
  → Main agent checks filesystem (file doesn't exist)

After (working):

/plan-content (slash command, runs in main context)
  → Main agent reads template
  → Main agent writes agent definition (persistent)
  → Main agent verifies file exists
  → Main agent invokes analysis logic

The key insight: slash commands aren't spawned as separate processes. They're instructions that run in the main conversation's context. When the agent writes a file during a slash command, that write persists.

Here's what the command definition looks like:

# .claude/commands/plan-content.md
---
name: plan-content
description: Analyze source project and create content strategy
---
# Plan Content Command
1. Read source project documentation
2. Identify key concepts and patterns
3. Create content strategy document
4. Write strategy to content-strategy.md
5. Verify file was created
6. Output summary of planned content

When I run /plan-content now, the agent:

1. Executes these steps in order

2. Writes files using the Write tool (in main context)

3. Those writes persist

4. I can verify the files exist

5. No hallucination

The agent definitions themselves are still useful. But instead of being created dynamically by subagents, they're checked into the repo as static files. The commands reference them when needed.

When to Use Each Approach

This debugging journey taught me when to use agents vs. commands vs. inline execution.

Use Task tool when:

  • You need isolated context (e.g., analyzing multiple implementations in parallel)

  • The task is read-only or produces text output you'll use in the main context

  • You want the subagent to focus on one thing without polluting main context

Use slash commands when:

  • You need persistent filesystem writes

  • You're orchestrating multi-step workflows

  • You want predictable execution without sandbox isolation

Use inline execution when:

  • It's a simple one-off task

  • You're debugging something

  • The abstraction overhead isn't worth it

The Referee Pattern workflow now uses all three:

  • Commands for orchestration /plan-content, /write-script, /distribute-content)

  • Task tool for isolated analysis (merge-critic comparing implementations)

  • Inline for quick fixes and verification

Lessons for Your Agents

If you're building agentic systems, here's what I learned the hard way:

1. Subagent writes are ephemeral by default

Don't assume a subagent's filesystem operations will persist. If you need persistent writes, do them in the main context, not in a Task.

2. Verification doesn't help if it's in the same sandbox

Having a subagent verify its own work doesn't solve hallucination. The verification happens in the same sandbox where the original operation "succeeded." You need external verification from a different context.

3. Tool responses aren't ground truth

When a tool returns success, that means the tool executed without error in its context. It doesn't mean the operation had the effect you expected in the broader system. Always verify critical operations independently.

4. Architecture matters more than instructions

I spent two days trying to fix this with better prompts, checklists, and YAML configuration. The actual fix was changing the architecture. Use the right tool for the job. Don't fight the platform.

5. Isolation is a feature, not a bug

Task tool isolation is good for preventing subagents from corrupting shared state. But that same isolation prevents persistent side effects. Design your workflows around this constraint instead of trying to work around it.

The Meta-Lesson

AI agent hallucination isn't always about the model making things up. Sometimes it's an architecture problem.

The agent didn't lie because it's unreliable. It lied because I put it in a sandbox and asked it to produce persistent side effects. Those are incompatible constraints. The agent resolved the conflict by reporting what happened in its sandboxed context, which looked like success from its perspective.

From the outside, that looked like fabrication. From the inside, it was accurate reporting of an inaccurate reality.

The fix wasn't better prompts. It was recognizing the mismatch between what I wanted (persistent writes) and the tool I was using (isolated subagent execution). Once I switched to the right tool—slash commands for orchestration, Task tool for isolated analysis—the hallucination disappeared.

Your AI agents aren't lying to you. But they might be telling you the truth about the wrong context. Make sure the context matches what you're trying to accomplish.

If you're building AI-powered development workflows, I put together a free study guide that covers agent architecture patterns, context management, and debugging techniques. It's the resource I wish I had when I started this project. [Get the AI Study Guide →](https://www.vibe.academy/ai-study-guide)

0 comments

Sign upor login to leave a comment