claude-code·9 min read·
Claude Code observability on a budget: OpenTelemetry to self-hosted SigNoz for GBP 0 (UK 2026)
Pool B spend visibility is the whole game from 15 June 2026. Claude Code already emits OpenTelemetry metrics and traces. Wire them to a self-hosted SigNoz on a Hetzner VPS and get a single dashboard for every agent, every token, every tool call, for GBP 0 in software.

From 15 June 2026, the Anthropic credit split puts every programmatic Claude Code call into a separate Pool B credit pool, itemised and capped at your subscription price. That is genuinely good news, but it makes one specific question urgent: which of your agents is actually eating the pool? Without observability you have a single aggregate number on the Anthropic console and you guess. With observability you have a span per run, a token count per call, a cost line per agent, and the guessing stops. This post is the UK indie hacker setup for that - free, self-hosted, on the Hetzner VPS you already have.
Why observability stopped being optional in 2026
Two changes converged.
First, Claude Code in 2026 ships with native OpenTelemetry support. Set one env var and Claude Code emits structured metrics and traces conforming to OTel semantic conventions on every invocation. Token counts, tool calls, latencies, costs, cache hit rates. The data is there for the price of an env var.
Second, the credit split makes Pool B visibility the most actionable number a UK indie hacker can look at. Today, all your Claude spend hides inside a Pro subscription line. Tomorrow, you can answer "did the nightly QA agent justify its GBP 14 of Pool B last month?" with a single dashboard query. The decision to keep or kill an agent becomes data-driven.
The third reason, less urgent but still real: as your agent count grows past two or three, the cognitive overhead of "which one is failing silently this week?" starts to dominate. A dashboard is the cheapest way to keep that overhead under control.
What Claude Code emits
The full surface area, as of 2026:
- Token metrics. Input tokens, output tokens, cache read tokens, cache write tokens. Per session, per turn, per model.
- Cost metrics. Estimated cost in USD per session, broken down by model and token type.
- Tool call traces. Each tool call becomes a span with name, duration, success/failure, and any error message. The full trace shows the tree of tool calls within a session.
- Latency metrics. Time to first token, total session duration, per-tool latency.
- Session events. Session start, session end, prompt length, model version, hook invocations.
- Cache events. Cache hit/miss for prompt caching, with the size of the cached segment.
Everything tagged with the session ID, the agent name (if you set one), the model, and any custom labels you add via env vars. That last bit is the key to per-agent dashboards: tag each agent with OTEL_RESOURCE_ATTRIBUTES=agent.name=morning-brief and the dashboard naturally groups by it.
The cost decision: vendor-hosted vs self-hosted
The shape of the market in 2026, GBP per month, for a setup that handles a small UK indie hacker fleet (5-10 agents, a few thousand sessions a month).
| Option | Monthly cost (GBP) | Setup time | Pros | Cons |
|---|---|---|---|---|
| Datadog APM | ~60 (entry tier) | 30 min | Polished UI, integrations | Per-host pricing scales fast |
| Honeycomb Pro | ~50 | 20 min | Best-in-class trace tooling | Trace-heavy pricing model |
| New Relic | ~80 | 30 min | Full APM suite | Heavy for an indie setup |
| Grafana Cloud (free) | 0 (limits) | 45 min | LGTM stack, hosted | 10k series cap, 50GB logs |
| SigNoz self-hosted | 0 (+ VPS) | 30 min | Full control, single tool | You host it |
| LGTM self-hosted | 0 (+ VPS) | 90 min | Maximum flexibility | More moving parts |
Two notes.
The vendor-hosted options are all good products, and for an indie hacker who values their evening more than their VPS bill, Grafana Cloud free tier is a perfectly reasonable answer if the limits fit. The 10k metric series cap is generous for a Claude Code setup; the 50GB logs cap is the more likely pinch point.
The self-hosted path costs GBP 0 in software on top of the GBP 3.30 a month Hetzner VPS you (should) already have for the agents. SigNoz is the cleanest single-tool answer. The LGTM stack is more flexible but more moving parts to maintain.
The recommendation for a UK indie hacker: SigNoz on the same Hetzner box as your agents. One dashboard for everything, no separate infrastructure bill, full control of retention.
Setting up SigNoz on the Hetzner VPS
Thirty minutes from SSH to first dashboard.
1. Install Docker and Compose
On the VPS:
sudo apt update
sudo apt install -y docker.io docker-compose-plugin
sudo usermod -aG docker $USER
Log out and back in for the group change to take effect.
2. Clone and start SigNoz
git clone -b main https://github.com/SigNoz/signoz.git
cd signoz/deploy
docker compose -f docker/clickhouse-setup/docker-compose.yaml up -d
The compose stack pulls ClickHouse (the storage backend), the SigNoz query service, the frontend, and an OTel collector. Total RAM footprint is around 1.5GB - the CAX11 4GB box handles it comfortably alongside a handful of agents. First start takes about two minutes.
Open http://<your-vps-ip>:3301. SigNoz UI loads. Create an admin user.
3. Lock the UI down
Do not leave port 3301 open to the internet. Two options:
- Tailscale only. Close 3301 in UFW, access SigNoz over Tailscale:
http://<tailscale-hostname>:3301. Same pattern as the SSH layer in the Hetzner setup guide. - Cloudflare Tunnel + Access. Put SigNoz behind a Cloudflare Tunnel with Access policy (email allow-list). Public URL, locked down.
Either is fine. Tailscale is simpler.
4. Point Claude Code at the collector
In the agent environment files (typically /etc/claude/<agent>.env):
CLAUDE_CODE_ENABLE_TELEMETRY=1
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
OTEL_RESOURCE_ATTRIBUTES=agent.name=morning-brief,service.name=claude-code
OTEL_METRIC_EXPORT_INTERVAL=10000
Five env vars, two of which are constants. The agent.name is the thing that makes per-agent dashboards work, so set it per agent.
Re-run an agent: claude -p < prompts/brief.md. Within ten seconds you see the session, token counts, and any tool calls in the SigNoz traces UI.
5. Build the five-metric dashboard
In SigNoz, create a dashboard with five panels:
- Per-agent daily Pool B spend. Bar chart, grouped by
agent.name, summing cost over 1d windows. - Per-tool latency (p95). Line chart, grouped by tool name, p95 of tool call duration.
- Cache hit rate. Gauge, ratio of cache hits to total cache lookups across all sessions in the last 24h.
- Error rate per agent. Stacked area, grouped by
agent.name, sessions per hour with at least one error span. - Concurrent runs. Line chart, count of in-flight sessions over time.
These five tell you which agents are cheap, fast, reliable, and worth keeping. Add more later if you find a real question they cannot answer.
The configuration that actually matters
Three knobs are worth tuning for a Claude Code workload.
Sampling. By default the OTel SDK sends every span. For a low-volume indie setup that is fine and gives you the full picture. If you scale past a few hundred sessions a day, drop to tail sampling on errors and slow traces only: set OTEL_TRACES_SAMPLER=parentbased_traceidratio and OTEL_TRACES_SAMPLER_ARG=0.1 for 10% sampling, then add a tail sampler on the collector for 100% on errors. Indie hackers under 100 sessions a day can leave this alone.
Retention. SigNoz defaults to 15 days for traces and 30 days for metrics. For Pool B spend analysis you want at least 31 days of metrics to see a full billing cycle. Bump to 90 days for metrics, keep traces at 15. Configure in clickhouse-setup/clickhouse-config.xml.
Resource attributes. The OTEL_RESOURCE_ATTRIBUTES env var is how every span gets tagged. Set at minimum agent.name, service.name=claude-code, and environment=prod per agent. Use dev for local experiments so they do not pollute the prod dashboards.
The vendor-hosted fallback
If you do not want to run SigNoz, the cleanest vendor answer is Grafana Cloud free tier. The setup is identical - same env vars on Claude Code - but OTEL_EXPORTER_OTLP_ENDPOINT points at Grafana Cloud's OTLP endpoint with an API token in the headers.
CLAUDE_CODE_ENABLE_TELEMETRY=1
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-eu-west-2.grafana.net/otlp
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64-encoded-user:token>
OTEL_RESOURCE_ATTRIBUTES=agent.name=morning-brief,service.name=claude-code
The free tier handles 10k metric series and 50GB logs a month, which is generous for the indie setup. You build the same five-metric dashboard in Grafana instead of SigNoz. The trade is zero VPS overhead in exchange for a cap that you might bump into if your agent count grows.
Datadog and Honeycomb are better products with better UIs, but at GBP 50-60 a month they are hard to justify for an indie hacker stack until the agents are generating real revenue.
What to actually do with the data
The dashboard exists to drive decisions. The questions a UK indie hacker should be answering monthly using it:
- Which agent is eating the most Pool B? Sort the per-agent daily spend bar chart descending. The top one is the candidate for "do I drop this to Sonnet from Opus?" or "do I cache the system prompt?"
- Which tool call is the slowest? The p95 latency chart. If the bash tool is sitting at 30 seconds because an agent keeps shelling out to a slow remote API, that is a prompt change.
- Is prompt caching actually working? Cache hit rate should be 60%+ for any agent with a stable system prompt across many runs. If it is 0%, your env var for caching is wrong.
- Which agent is silently failing? Error rate per agent. Anything climbing without you noticing is the case for an OnFailure alert handler (covered in the Linux scheduling guide).
- Are agents overlapping unexpectedly? Concurrent runs chart. Two agents firing at the same minute and both hitting the same API rate limit is an easy fix with
RandomizedDelaySec=on the systemd timer.
Five questions, five answers, one dashboard. That is the entire ROI of the setup.
The cost summary
The all-in monthly cost for full Claude Code observability:
- SigNoz self-hosted on existing Hetzner CAX11: GBP 0 (uses ~1.5GB RAM of the 4GB box).
- OpenTelemetry collector on the same box: GBP 0 (bundled with SigNoz).
- Tailscale (personal): GBP 0.
- Storage (90 days of metrics, 15 days of traces, ClickHouse on the box's 40GB disk): GBP 0.
Total: GBP 0 on top of the GBP 23.30 indie hacker baseline (Hetzner + Pro plan).
Compare to GBP 60 a month for Datadog APM and you are saving GBP 720 a year for the cost of thirty minutes setting it up. The savings buy you a year of Anthropic Routines headroom in Pool B, with change.
The pattern that ships
The UK indie hacker observability stack in 2026:
- Already have a Hetzner CAX11 from the VPS setup.
- Install SigNoz via Docker Compose (30 min).
- Lock the UI down behind Tailscale or Cloudflare Access.
- Set
CLAUDE_CODE_ENABLE_TELEMETRY=1and the four OTLP env vars in every agent'sEnvironmentFile. - Tag each agent with
OTEL_RESOURCE_ATTRIBUTES=agent.name=<name>. - Build the five-metric dashboard: per-agent spend, per-tool latency, cache hit rate, error rate, concurrent runs.
- Review monthly, kill or tune anything not earning its Pool B share.
From 15 June onwards, every agent you ship has a cost line you can read at a glance. Every silent failure has a span. Every Pool B decision is backed by data. The whole thing for GBP 0 of software and an hour of setup, on the same cheap VPS that is already running the agents. That is the indie hacker tooling sweet spot - small, sharp, free, and yours.
New here? IdeaStack publishes one deeply researched UK business opportunity every Thursday - real keyword data, competitor analysis, builder prompts. See the latest free report.
Frequently asked
Why do I need observability for Claude Code in 2026?
Because Pool B - the new programmatic credit pool from 15 June 2026 - makes per-agent spend visible for the first time, and you want a dashboard that turns that visibility into decisions. Without observability, the Anthropic console gives you one aggregate Pool B number and you guess which agent is eating it. With observability, every agent run has a span, every token has a count, every tool call has a latency. You can answer 'is this agent worth its monthly cost?' with data instead of feel.
What does CLAUDE_CODE_ENABLE_TELEMETRY actually emit?
Per-session token counts (input, output, cache read, cache write), per-call latency, tool call traces with name and duration, model version, prompt length events, cost estimates in USD. The data conforms to OpenTelemetry semantic conventions, so it flows into any OTel-compatible backend. You set CLAUDE_CODE_ENABLE_TELEMETRY=1 plus standard OTEL_EXPORTER_OTLP_ENDPOINT env vars and Claude Code ships the data to your collector on every invocation.
SigNoz vs Grafana LGTM stack: which should I pick?
SigNoz is the easier single-tool answer - one Docker compose file gives you traces, metrics, logs, and dashboards in a unified UI. The LGTM stack (Loki for logs, Grafana for dashboards, Tempo for traces, Mimir or Prometheus for metrics) is more flexible but more pieces to wire up. For a UK indie hacker with one VPS and a handful of agents, SigNoz is the right default. If you already run Grafana for other things, lean LGTM. Both are open source and both run free on the Hetzner box.
What does the vendor-hosted path cost in 2026?
Datadog APM starts around GBP 60 a month for the entry tier (per host plus per million traces). Honeycomb Pro is around GBP 50 a month with reasonable trace volume. New Relic has a free tier with limits and paid tiers from around GBP 80. Grafana Cloud free tier handles 10k metric series and 50GB logs a month, which is enough for a small Claude Code setup if you do not want to host. The self-hosted SigNoz path is GBP 0 in software on top of the GBP 3.30 a month Hetzner VPS you already have for the agents.
What metrics should I actually look at on the dashboard?
Five: per-agent daily Pool B spend (sums to your monthly cap), per-tool latency (which tool calls are slow), cache hit rate (are you actually benefitting from prompt caching), error rate per agent (which agents are failing silently), and concurrent runs (are agents overlapping unexpectedly). The five together tell you which agents are cheap, fast, reliable, and worth keeping. Anything else on the dashboard is decoration in the early days.
Filed under




