ideastack·7 min read·14 May 2026

Claude Code memory and context management for UK indie hackers: how to keep a long build cheap and on-track

Every UK indie hacker hits the same wall around hour two of a long Claude Code session: the agent starts forgetting things. That is a context window filling up. Two memory systems, the /compact mechanics, the 80% rule, and the task-chunking maths - an atomic task needs 5-10k tokens, a do-everything session needs 80k - that is the biggest lever on a solo founder's token bill.

By IdeaStack

Claude Code memory and context management for UK indie hackers: how to keep a long build cheap and on-track

At a glance

Claude Code has two memory systems: CLAUDE.md (instructions you write, read every session) and auto-memory (notes the agent writes itself across sessions). CLAUDE.md is what you tell the agent; auto-memory is what it tells itself.
/compact distils the context window into a summary so a long session can continue. Your project-root CLAUDE.md survives it - it is re-read from disk after every compaction.
The 80% rule: at 80% context fill, finish the step and compact or restart. The last 20% of the window is where quality quietly degrades.
Task-chunking is the biggest lever on token cost. An atomic task needs 5-10k tokens; a do-everything session needs 80k+. Break a spec into atomic tasks before you start.
Treat CLAUDE.md as a token budget, not a wiki. It is paid for on every session and after every compact - keep it lean and push depth into on-demand skills.

Every UK indie hacker hits the same wall on a long Claude Code session. Around hour two, the agent starts forgetting things. It re-reads a file it already read. It re-asks a question you answered. It loses the thread on the architecture decision you made before lunch. The build slows down, the quality drops, and if you are on API billing, the cost quietly climbs.

This is not the agent being bad. It is a context window filling up. Claude Code has a finite working memory, and a long session fills it the same way a small desk fills with paper. Once it is full, something has to give — and the thing that gives is the agent's grip on what you are doing.

The good news: Claude Code has two memory systems and a set of context tools that, used deliberately, keep a long build both coherent and cheap. This is the UK builder's guide to all of it. The two kinds of memory and what each is for. How /compact works and what survives it. The 80% rule. And the task-chunking maths that, for a solo founder on a budget, is the single biggest lever on your monthly token bill.

Two memory systems, two jobs

Claude Code has two complementary memory systems, and confusing them is the root of most "why doesn't it remember" frustration.

CLAUDE.md — the instructions you write. A markdown file at your project root that Claude Code reads at the start of every session. It is where your persistent, deliberate instructions live: the stack, the house style, the GBP-in-pence rule, the British-English-in-commit-messages rule. You write it, you maintain it, you decide what goes in. (For what belongs in a CLAUDE.md, see the CLAUDE.md primer for UK indie hackers — this post is about how it behaves as memory, not what to put in it.)

Auto-memory — the notes the agent writes itself. Claude Code accumulates its own knowledge across sessions without you writing anything. As it works, it saves notes: the build command it discovered, the debugging insight it earned the hard way, the architecture quirk it tripped over. Next session, it has them. You do not maintain auto-memory; you just benefit from it — and occasionally correct it when it learns the wrong lesson.

The split in one line: CLAUDE.md is what you tell the agent. Auto-memory is what the agent tells itself. A UK micro-SaaS founder should treat CLAUDE.md as a deliberate, version-controlled document and auto-memory as a helpful background process they nudge when it drifts.

What /compact does, and what survives it

When a session gets long, the context window approaches full. Claude Code's answer is /compact: it distils the contents of the context window into a high-fidelity summary, then continues with that summary in place of the raw history. The conversation keeps going; the agent keeps most of its understanding; the token weight drops sharply.

Compaction can happen two ways. Automatic — Claude Code compacts on its own when the window gets too full. Manual — you run /compact yourself at a natural break point, which is almost always better because you pick the moment, not the agent mid-thought.

The critical thing for an indie hacker to know: your project-root CLAUDE.md survives compaction. After a /compact, Claude Code re-reads CLAUDE.md from disk and re-injects it into the fresh context. So the house rules, the stack, the GBP convention — they are never lost to compaction, because they were never really in the conversation; they are re-loaded from the file every time.

That has a direct, practical consequence. Anything you genuinely need the agent to know for the whole build belongs in CLAUDE.md, not in a chat message. A chat message can be summarised away by compaction. A CLAUDE.md line is permanent. If you find yourself re-explaining the same thing after every compact, that thing should be a CLAUDE.md line.

The 80% rule

Claude Code shows you how full the context window is. Watch that number. The working rule for a long UK build:

At 80% context, finish the current step, run /compact (or exit and restart with a fresh session), and carry on.

Why 80% and not 99%? Because the last 20% of a context window is where quality quietly degrades. The agent is still working, but it is working with a crowded desk — slower, more likely to re-read, more likely to lose a thread. Compacting at 80% keeps the agent in the zone where it is sharp. Riding it to 99% to "get more done in one session" is a false economy: you get more turns, but worse ones.

For deep, multi-file work, exiting the session entirely and restarting with claude is often cleaner than /compact — a genuinely fresh context with CLAUDE.md re-loaded beats a compacted summary of a sprawling session. The five seconds it takes to restart buys you a sharp agent.

This Week's Free Business Idea

Audit UK sites for DMCCA fake-review and drip-pricing breaches

Find the breaches, get the policy, for £39 a month

7.2/10Read the full breakdown →

The task-chunking maths

This is the lever that matters most for a UK solo founder watching the API bill — and it is just arithmetic.

A well-isolated, atomic task — "add a Zod schema to this one route" — needs roughly 5,000 to 10,000 tokens of context to do well. A do-everything mega-session — "build the whole billing flow" with the entire project history loaded — can run to 80,000 tokens or more before it even starts the work.

That is not a 2x difference. It is closer to 10x. And it compounds: the mega-session is not only more expensive per turn, it is also the session most likely to need compaction, most likely to lose the thread, and most likely to produce work you have to redo.

The fix is to break a high-level spec into atomic tasks before you start, and run each one as a mini-session with minimal context. Plan mode is the natural tool for this — it drafts the plan, you break it into steps, each step becomes its own tight session. (See the plan mode guide for the look-before-you-leap posture that makes this work.)

Concretely, for a UK micro-SaaS:

Don't: open one session, say "build subscription billing with Stripe", and let it run for three hours across forty files.
Do: plan it into six atomic tasks — schema, checkout route, webhook handler, customer portal link, tests, the pricing page wiring — and run each as its own 5-10k-token session.

Same feature. A fraction of the tokens. And every task arrives with a clean context instead of inheriting the confusion of the last one.

CLAUDE.md is a token budget, not a wiki

One more discipline, because it is the most common own-goal. CLAUDE.md is re-read at the start of every session and re-injected after every compaction. That means every token in CLAUDE.md is a token you are paying for, repeatedly, all session long — and a token you cannot spend on the actual conversation.

So CLAUDE.md should document only what the agent needs on every session: the stack, the non-negotiable conventions, where the important things live. It should not be a wiki. It should not contain the full history of every architecture decision, a paragraph on each dependency, or onboarding prose. Verbose CLAUDE.md files do not make the agent smarter — they make every session start heavier and leave less room for the work.

One real-world optimisation cut a project's initial context from 7,584 to 3,434 tokens — a 54% reduction — and improved the agent's tool discovery. The insight: the agent does not need verbose documentation upfront. It needs a tight set of triggers telling it when to go and load detail. Keep CLAUDE.md lean; push the depth into skills that load on demand.

A model-cost note for UK builders

The token maths above is a usage figure. The cost depends on the model. On API billing, Opus runs roughly 5x the per-token price of Sonnet. The discipline for a cost-conscious UK indie hacker: start every session on Sonnet, which handles the large majority of routine build work, and switch to Opus only for the genuinely hard parts — a thorny refactor, a deep architecture call. Pairing tight task-chunking with Sonnet-by-default is the difference between a Claude Code bill that reads in single-digit pounds a day and one that quietly becomes a line item you flinch at.

Want a data-backed UK business idea every week? Free reports drop every Thursday — keyword volumes, SERP analysis, builder prompts. Browse the latest free report on IdeaStack.

Frequently asked

What is the difference between CLAUDE.md and auto-memory in Claude Code?

CLAUDE.md is the instructions file you write and maintain — the stack, house style, conventions — read at the start of every session. Auto-memory is the notes Claude Code writes for itself as it works: build commands, debugging insights, architecture quirks. CLAUDE.md is what you tell the agent; auto-memory is what the agent tells itself. You maintain the first; you just benefit from (and occasionally correct) the second.

Does my CLAUDE.md survive /compact?

Yes. After compaction, Claude Code re-reads your project-root CLAUDE.md from disk and re-injects it into the fresh context. That is why anything you need the agent to know for the whole build belongs in CLAUDE.md, not a chat message — chat history can be summarised away by compaction, but CLAUDE.md is reloaded every time.

When should I run /compact?

At around 80% context fill, at a natural break point — finish the current step, then compact. Manual compaction beats automatic because you choose the moment rather than the agent compacting mid-thought. For deep multi-file work, exiting and restarting with a fresh session is often cleaner than compacting a sprawling one.

How does task-chunking reduce my Claude Code token bill?

An atomic, well-isolated task needs roughly 5,000-10,000 tokens of context. A do-everything mega-session can need 80,000 or more. Breaking a spec into atomic tasks before you start, and running each as its own mini-session, can cut token use by close to 10x on a large feature - and each task arrives with a clean context instead of inheriting the last one's confusion.

Should my CLAUDE.md be long and detailed?

No. CLAUDE.md is re-read every session and re-injected after every compaction, so every token in it is paid for repeatedly. Document only what the agent needs on every session - stack, non-negotiable conventions, where things live. Push depth into skills that load on demand. One project cut its initial context 54% by trimming CLAUDE.md and improved tool discovery in the process.

Filed under

Claude Code & AI Tools

Claude Code memory and context management for UK indie hackers: how to keep a long build cheap and on-track

Two memory systems, two jobs

What /compact does, and what survives it

The 80% rule

Audit UK sites for DMCCA fake-review and drip-pricing breaches

The task-chunking maths

CLAUDE.md is a token budget, not a wiki

A model-cost note for UK builders

Frequently asked

What is the difference between CLAUDE.md and auto-memory in Claude Code?

Does my CLAUDE.md survive /compact?

When should I run /compact?

How does task-chunking reduce my Claude Code token bill?

Should my CLAUDE.md be long and detailed?

Related reading

Claude Code in GitHub Actions for UK indie hackers: the auto-fix CI workflow that fixes failing tests while you sleep

Claude Code slash commands and skills for UK indie hackers: the six custom commands that turn the agent into your team

Claude Code plan mode for UK indie hackers: the three-screen ritual that stops the one-shot disaster

Claude Code MCP servers for UK micro-SaaS: the four wires (Stripe UK, Supabase EU, Postgres, GitHub) and one rule that stops the agent guessing

Claude Code hooks for UK indie hackers: the eight automation patterns that pay for themselves on the first ship

One UK business idea, every Thursday