ideastackbloguk-saassloerror-budgetsentrybetter-stackvercel

UK SaaS error budget and SLO setup 2026: Vercel + Better Stack + Sentry, the 30-min ship-it

By IdeaStack29 April 2026

Key Takeaways

UK indies should target 99.5% availability over a rolling 30-day window for the first paying customer -- 3.6 hours of allowed downtime per month. Bump to 99.9% only after 50+ paying customers and 60 days of staying inside the 99.5% budget.
Use Better Stack synthetic monitors (5 regions, 1-min interval) as the availability SLI. Use Sentry as the user-visible error rate SLI with a budget of 1 error per 10,000 requests.
Wire burn-rate alerts at 14.4x over 1 hour (page on-call), 6x over 6 hours (Slack), and 3x over 24 hours (daily review). Without these you find out you blew the budget too late to act.
The error budget policy is the contract: 0-50% consumed ship freely, 50-75% senior review for infra, 75-100% feature freeze, >100% one-week freeze + postmortem.
The 30-minute ship-it on Vercel + Better Stack + Sentry costs GBP 0 pre-revenue and GBP 19/mo at first paying customer. There is no excuse to be running blind.

UK SaaS error budget and SLO setup 2026: Vercel + Better Stack + Sentry, the 30-min ship-it

Most UK indies running SaaS in 2026 are flying blind. Sentry's installed, Better Stack pings the homepage every five minutes, and that's it. Nobody's defined an SLO. Nobody knows how much downtime is "ok". Every wobble feels like a five-alarm fire — or worse, nobody notices at all.

Here's the thing nobody tells you: at 99.0% availability you're allowed 97 hours of downtime over 12 months. At 99.9% it's 8.8 hours. At 99.99% it's 53 minutes. The difference between those targets isn't pride, it's architecture, runbook discipline, and how much sleep you get. Pick the wrong one and you'll either ship nothing because you're too scared, or burn out chasing perfection a single-region Vercel project can't deliver anyway.

This post is the 30-minute ship-it for getting an actual error budget wired up on Vercel + Better Stack + Sentry. By the end you'll have an SLI, an SLO, a budget, and burn-rate alerts that ping the right channel at the right time. Then you can get back to shipping.

Want the underlying research that powers posts like this? Get this week's free deep-dive report at ideastack.co/reports -- UK business opportunities scored, with builder prompts and revenue projections.

What an SLO actually is for a UK indie SaaS

Three letters, one habit. Here's the unbundling:

SLI (Service Level Indicator) -- the thing you measure. Availability, error rate, p99 latency, signup-flow completion. Pick one number per critical path.
SLO (Service Level Objective) -- the target you set against the SLI. "99.5% of HTTP responses succeed over a rolling 30-day window." Internal goal. Not a contract.
Error budget -- the inverse of the SLO. If you target 99.5%, you've got 0.5% to spend. Over 30 days that's 3.6 hours. Spend it on chaos engineering, risky deploys, or actual incidents -- doesn't matter, the budget exists so you can.

The reason this matters for an indie isn't theory, it's psychology. Without a budget, every minor incident feels catastrophic. With one, you have explicit permission to ship features that might break things, because you've quantified the cost of breakage. Google calls this "operational courage". You'll call it "finally being able to deploy on a Friday".

The other half: the budget tells you when to stop shipping. Burn through it in week two and you've got receipts to justify pausing feature work for reliability investment. No more "should we fix the queue or build the new dashboard?" -- the budget answers it.

The four SLIs UK indies should pick

You don't need ten. You need these four, in this order. Stop at whichever one matches your stage.

SLI	What it measures	Tool	UK price	When to add
Availability (synthetic)	Did the homepage + /api/health respond 200 in under 3s?	Better Stack synthetic monitor	Free up to 10 monitors, GBP 19/mo Pro for on-call + multi-region	Day one
User-visible error rate	What % of requests threw a 5xx or unhandled exception?	Sentry	Free up to 5k events/mo, GBP 20/mo Team	First paying customer
p99 page latency	Slowest 1% of real-user page loads	Vercel Speed Insights + Web Analytics	Free on Hobby up to 2.5k events/mo, GBP 8/mo Pro tier extras	100 paying users
Critical user journey	Did a synthetic Playwright run complete the signup or checkout?	Better Stack synthetic Playwright	Included in Pro at GBP 19/mo	First paying customer

Two non-negotiable adds on day one: availability and critical user journey. The homepage being up is meaningless if your /signup form is throwing 500s -- and that's exactly the failure mode every indie hits at least once.

The default SLO targets

This is the bit nobody writes down. Here are the targets to use, mapped by stage. Stop overthinking it -- these are good enough.

Stage	Availability SLO	Error rate SLO	p99 latency SLO	Allowed downtime / 30d
Pre-revenue	99.0%	none	none	7h 12min
First paying customer	99.5%	1 error per 10,000 requests	none	3h 36min
50 paying customers	99.9%	1 error per 100,000 requests	p99 < 2s	43 min
1,000 paying customers	99.95%	1 error per 1M requests	p99 < 1s	21 min
10,000 paying customers	99.99%	1 error per 10M requests	p99 < 500ms	4 min 19s

A few things to internalise:

99.99% on a single-region Vercel deployment is fiction. Vercel's underlying infrastructure won't hit four nines on a single function region. If you need four nines, you need multi-region failover and a database that can survive an AWS eu-west-1 wobble. That's a Phase 3 problem, not a launch problem.
Don't bump the SLO until the user count justifies it. A 99.9% SLO at five paying customers means you'll wake up at 3am for an outage that affected nobody. The target should track the cost of failure.
The error rate SLO is per user-visible error. Background job failures with retries don't count. A user clicking "save" and seeing a 500 does.

Calculating your error budget

The maths is straightforward, but the bit that catches people out is the window.

Error budget (minutes) = (1 - SLO) * 30 days * 24 * 60

99.0%   -> 432 min   (7h 12min)
99.5%   -> 216 min   (3h 36min)
99.9%   -> 43.2 min
99.95%  -> 21.6 min
99.99%  -> 4.32 min

Use a rolling 30-day window, not a fixed monthly window. Here's why:

Fixed monthly: at 11:55pm on the 31st, your budget resets. You could blow the whole thing on the 30th and feel fine on the 1st. That's not a budget, that's a casino chip.
Rolling 30-day: every minute of downtime ages out 30 days later. The budget represents your recent reliability, which is what your users actually care about.

Both Better Stack and Sentry support rolling windows natively. Pick rolling.

Better Stack as the availability SLI -- wiring

The first thing to build is a real health endpoint. Don't ping /, ping a route that actually checks your dependencies.

// app/api/health/route.ts
export async function GET() {
  const dbHealthy = await checkDb();
  const queueHealthy = await checkQueue();
  return Response.json({
    status: dbHealthy && queueHealthy ? 'healthy' : 'degraded',
    db: dbHealthy,
    queue: queueHealthy,
    region: process.env.VERCEL_REGION,
    timestamp: Date.now(),
  });
}

Two things to note:

The endpoint returns 200 even when degraded -- you want to distinguish "service unreachable" from "service running but DB is down". Better Stack will match the body to flag the difference.
Include VERCEL_REGION so when you graduate to multi-region you can see which region is degraded.

Now configure Better Stack:

Monitor type: HTTP keyword
Interval: 60 seconds (1 minute)
Regions: London, Frankfurt, Virginia, Sydney, Singapore. UK indies should at minimum pick London + Frankfurt + Virginia -- those three cover 90%+ of UK paying customers and surface AWS regional issues.
Match condition: HTTP 200 AND body contains "status":"healthy"
Timeout: 3 seconds (anything slower is effectively degraded for users)

Alerting policy:

Degraded for 5 minutes: Slack alert to #incidents
Degraded for 15 minutes: Page on-call (Better Stack's on-call rotation, GBP 19/mo Pro covers this)
2 of 5 regions degraded: Slack alert (likely a regional AWS issue, not yours)
5 of 5 regions degraded: Page on-call immediately (it's definitely yours)

The 5-minute threshold matters. Anything less and a 30-second blip during a Vercel deployment will wake you up. The first time you get paged for a deploy is also the last time -- set the floor higher than your typical deploy duration.

Sentry as the error rate SLI -- wiring

@sentry/nextjs v9 ships with first-class Next.js 16 support. The install is genuinely two commands, but the configuration is where indies get it wrong.

// sentry.client.config.ts
import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  // EU region for UK SaaS -- residency matters
  // (your DSN URL will end in @oXXXXXXX.ingest.de.sentry.io)
  tracesSampleRate: 0.1,
  replaysSessionSampleRate: 0.1,
  replaysOnErrorSampleRate: 1.0,
  environment: process.env.VERCEL_ENV,
  release: process.env.VERCEL_GIT_COMMIT_SHA,
  beforeSend(event) {
    // strip user-visible vs internal here
    if (event.tags?.internal === 'true') {
      event.tags.user_visible = 'false';
    }
    return event;
  },
});

Key points:

EU region. Sign up on the EU Sentry datacentre, not the default US one. Your UK users' replay data stays in Frankfurt, which simplifies the DUA Act + GDPR conversation when you eventually have one. See 2026-04-23-dua-act-cookie-exemption-uk-saas-2026 for why this matters for session replay residency.
tracesSampleRate: 0.1. Sample 10% of traces for performance. At 1.0 you'll burn through Sentry's free tier in a weekend at 100k requests.
replaysSessionSampleRate: 0.1, replaysOnErrorSampleRate: 1.0. Sample 10% of sessions for replay, but always replay errored sessions. This is the right ratio for UK indies under 5,000 MAU.
Tag user-visible vs internal. Background job errors should not count against the user-visible SLI. Tag them at source.

Now wire the SLI alert. In Sentry, create an Issue Alert with:

Condition: "Number of users affected" > 10 in 1 hour
Filter: user_visible:true
Action: Webhook to your #incidents Slack channel

Use Sentry Performance to track p99 latency as a second SLI -- the Performance dashboard has a built-in p99 view per transaction. Set an alert if p99 of GET / exceeds your SLO (e.g. 2,000ms) for 15 minutes.

For the underlying observability stack and why Sentry beats DIY OpenTelemetry for indie scale, see 2026-04-27-uk-saas-observability-sentry-posthog-2026.

Burn-rate alerting -- the bit that actually helps you sleep

This is the bit most "monitoring tutorials" skip and it's the most important.

Without burn-rate alerting, you find out you've blown your budget at the end of the month, by which time it's too late to do anything. With it, you find out you're burning the budget faster than expected and you can intervene early.

The standard burn rates (Google's SRE workbook, adapted for UK indie scale):

Burn rate	Window	% of monthly budget consumed	Alert level
14.4x	1 hour	2%	Page on-call -- something's badly wrong
6x	6 hours	5%	Slack alert to #incidents
3x	24 hours	10%	Daily review -- log to retro
1x	30 days	100%	Postmortem + 1-week feature freeze

How to read this: a 14.4x burn rate means you're burning the budget 14.4 times faster than allowed. If it sustained for a month, you'd burn the budget 14.4 times over. So in 1 hour, you've already burned 1/720 * 14.4 = 2% of the monthly budget. That's the trigger.

The 14.4x and 6x are paired -- they catch fast, severe outages and slower, sustained degradations respectively. You need both. A site that's down 50% for 1 hour will trip 14.4x. A site that's degraded 10% for 6 hours will trip 6x but might never trip 14.4x.

In Better Stack, wire these as multi-window alert rules:

Rule 1: Availability SLI < 70.4% over 1 hour AND < 88.0% over 5 minutes -- page on-call
Rule 2: Availability SLI < 97.0% over 6 hours AND < 95.0% over 30 minutes -- Slack alert
Rule 3: Availability SLI < 98.5% over 24 hours -- log to daily retro

The two-window check (long window for sustained, short window for "still happening right now") prevents alert fatigue from blips that already recovered.

This Week's Free Business Idea

AI Pre-Tribunal SAR + Evidence Pack Builder for UK Employees

Tribunal-Ready ET1 in Under an Hour

7.6/10Read the full breakdown →

The error budget policy -- a one-page playbook

Here's what to actually do at each consumption level. Paste this into your README. Tell your future self what to do, because in the middle of an incident you'll forget.

Budget consumed	Posture	What ships
0-25%	Ship freely	All features, all infra changes
25-50%	Normal review	Standard PR review, no extra gates
50-75%	Senior review for infra	Feature work fine, but DB migrations and infra changes need a second pair of eyes
75-100%	Reliability sprint	Only fixes ship. New feature work pauses. Use the time to harden whatever's burning the budget
>100%	Freeze + postmortem	One-week feature freeze. Public postmortem written. Reliability work prioritised

The freeze rule is the one indies skip. Don't skip it. The freeze isn't punishment -- it's the contract that lets you ship boldly when you have budget. If you don't honour the freeze, you don't have a budget, you have a chart.

Solo founder? The freeze still applies. The "second pair of eyes" in the 50-75% band can be Claude Code or Cursor in review mode -- pipe the diff through and explicitly ask for a reliability review. Better than nothing, often better than a sleep-deprived solo review.

GBP cost breakdown -- the full stack at each stage

The whole point of this stack is that it scales with you, not ahead of you.

Stage	Better Stack	Sentry	Vercel	Total / mo
Pre-revenue (free tiers)	GBP 0 (10 monitors free)	GBP 0 (5k events)	GBP 0 (Hobby)	GBP 0
100 users	GBP 19 (Pro for synthetic + on-call)	GBP 0 (5k events still fine)	GBP 0 (Hobby)	GBP 19
1,000 users	GBP 19 (Pro)	GBP 20 (Team, 50k events)	GBP 16 (Pro)	GBP 55
10,000 users	GBP 39 (Team)	GBP 20-40 (Team + extra events)	GBP 16-32 (Pro + bandwidth)	GBP 75-110

For perspective: at 10,000 users, GBP 110/mo is roughly 1.1p per user per month for full observability + uptime + error tracking. Datadog would charge you GBP 15/host/mo just for the agent. This stack is the right call for any UK indie SaaS under 50,000 users.

For the head-to-head on Better Stack vs UptimeRobot vs Pingdom and why Better Stack wins for indies, see 2026-04-28-uk-indie-hacker-uptime-monitoring-stack-2026.

Five UK indie failure modes

These are the patterns I see over and over from UK indies on the IdeaStack reader calls. Avoid them and you're already in the top 20%.

Monitoring only the homepage. Your homepage is up, your /signup page is throwing 500s, and you find out from a Twitter/X DM. Fix: synthetic Playwright that completes a full signup flow every 5 minutes. Better Stack Pro covers this.
Setting an SLO of 99.99% with no chance of hitting it. Single-region Vercel + single Supabase instance can't deliver four nines. Don't pick an SLO you can't architecturally hit -- the budget will be permanently negative and you'll learn to ignore the alerts. Pick 99.5% or 99.9% until your architecture earns the bump.
Sentry sample rate at 1.0. At 100k requests/day with 1% error rate, you'll send 1,000 events to Sentry every day. At 1.0 trace sampling you'll send 100,000. The free tier dies in a week. Set tracesSampleRate: 0.1 from day one.
Better Stack alerts to email-only. Phones don't ring for emails. Wire the alert to Slack (free) or PagerDuty integration (Better Stack Pro) and use Better Stack's on-call rotation to actually wake you up. If alerts can't reach you in under 60 seconds, they're not alerts, they're a diary.
Treating every error as production-page. Background job retries, expected 404s on favicon.ico, third-party SDK noise -- none of these should fire alerts. Tag user-visible vs internal at source. Filter user_visible:true in your alert conditions. The first month of Sentry is mostly tuning this filter -- budget the time.

30-minute ship-it (the standard UK indie path)

Here's the actual sequence. Set a timer.

0:00 -- Better Stack signup

Create account, free tier
Add a single monitor against https://yourapp.com/api/health
Tick all 5 regions (London, Frankfurt, Virginia, Sydney, Singapore)
Set 60-second interval
Wire Slack webhook for alerts
Match condition: HTTP 200 + body contains "status":"healthy"

0:10 -- Sentry signup

Sign up on EU region (sentry.io will redirect, just confirm the URL contains .de.)
Create a Next.js project
npx @sentry/wizard@latest -i nextjs -- the wizard does Next.js 16 wiring including instrumentation hooks
Set tracesSampleRate: 0.1, replaysSessionSampleRate: 0.1, replaysOnErrorSampleRate: 1.0
Deploy to Vercel, confirm a test error appears in Sentry within 30 seconds

0:20 -- Define SLO + paste policy

Pick your stage from the SLO table above

Add this to your README:

## SLO
- Availability: 99.5% over rolling 30 days (3h 36min downtime budget)
- User-visible error rate: 1 per 10,000 requests
- Window: rolling 30 days

## Error budget policy
- 0-50% consumed: ship freely
- 50-75%: senior review for infra
- 75-100%: feature freeze, fixes only
- >100%: 1-week freeze + postmortem

Commit it. Push it. The README is now the contract.

0:30 -- Burn-rate alerts wired

In Better Stack, create three multi-window alert rules (14.4x / 6x / 3x as above)
Test the 14.4x rule by manually marking the monitor as down for 5 minutes (Better Stack has a "test alert" feature)
Confirm Slack ping
Confirm on-call ping (if Pro)

Done. Ship the next feature.

FAQs

What SLO should a pre-revenue indie target?

99.0% availability, no error rate target. Focus on shipping. The point of the budget at this stage is psychological -- you're allowed to deploy on Friday because 7h 12min of downtime over 30 days is fine. Once you have a paying customer, bump to 99.5%.

How do I calculate burn rate when my traffic is bumpy?

Use time-based burn rate, not request-based. The formula downtime_minutes / budget_minutes * 30 works regardless of traffic volume. If you have 10 minutes of downtime in a 1-hour window with a 99.5% / 216-min budget, that's 10/216 * 30 = 1.39x burn -- not at the 14.4x threshold. Better Stack and Sentry both compute this for you when you configure SLOs natively.

Better Stack vs UptimeRobot for synthetic monitoring SLI -- which does this better?

Better Stack, by a margin. UptimeRobot is fine for binary up/down at GBP 5/mo, but it doesn't compute SLOs, doesn't do burn-rate alerts, and its synthetic Playwright support is limited to paid tiers that cost more than Better Stack Pro. For SLO-based monitoring, Better Stack Pro at GBP 19/mo is the right call. See the full comparison in 2026-04-28-uk-indie-hacker-uptime-monitoring-stack-2026.

Do I need a separate SLO for each microservice?

No -- and probably you don't have microservices anyway, you have a Next.js monolith on Vercel. Define SLOs per user-facing critical journey: signup, checkout, dashboard load. Three SLOs, max. If you do have separate services (e.g. a worker queue), define one availability SLO for the user-facing API and let the worker have its own internal target.

When does it make sense to graduate from a 99.5% to 99.9% SLO?

Three signals: you've got 50+ paying customers, you've gone 60 days without burning more than 50% of the 99.5% budget, and a single hour of downtime now costs you measurable revenue (refund requests, churn, support tickets). All three. Don't bump just because you can -- the architecture investment to sustain 99.9% is real (multi-region DB reads, queue with retries, deploy gating).

Key takeaways

UK indies should target 99.5% availability over a rolling 30-day window for the first paying customer -- 3.6 hours of allowed downtime per month. Bump to 99.9% only after 50+ paying customers and 60 days of staying inside the 99.5% budget.
Use Better Stack synthetic monitors (5 regions, 1-min interval) as the availability SLI. Use Sentry as the user-visible error rate SLI with a budget of 1 error per 10,000 requests.
Wire burn-rate alerts at 14.4x over 1 hour (page on-call), 6x over 6 hours (Slack), and 3x over 24 hours (daily review). Without these you find out you blew the budget too late to act.
The error budget policy is the contract: 0-50% consumed ship freely, 50-75% senior review for infra, 75-100% feature freeze, >100% one-week freeze + postmortem.
The 30-minute ship-it on Vercel + Better Stack + Sentry costs GBP 0 pre-revenue and GBP 19/mo at first paying customer. There is no excuse to be running blind.

Get this week's free deep-dive report -- UK business opportunities scored, with builder prompts and revenue projections, at ideastack.co/reports. One idea, deeply researched, every Thursday.

Frequently Asked Questions

What SLO should a pre-revenue indie target?

How do I calculate burn rate when my traffic is bumpy?

Use *time-based* burn rate, not request-based. The formula `downtime_minutes / budget_minutes * 30` works regardless of traffic volume. If you have 10 minutes of downtime in a 1-hour window with a 99.5% / 216-min budget, that's `10/216 * 30 = 1.39x burn` -- not at the 14.4x threshold. Better Stack and Sentry both compute this for you when you configure SLOs natively.

Better Stack vs UptimeRobot for synthetic monitoring SLI -- which does this better?

Better Stack, by a margin. UptimeRobot is fine for binary up/down at GBP 5/mo, but it doesn't compute SLOs, doesn't do burn-rate alerts, and its synthetic Playwright support is limited to paid tiers that cost more than Better Stack Pro. For SLO-based monitoring, Better Stack Pro at GBP 19/mo is the right call. See the full comparison in [2026-04-28-uk-indie-hacker-uptime-monitoring-stack-2026](/blog/uk-indie-hacker-uptime-monitoring-stack-2026).

Do I need a separate SLO for each microservice?

No -- and probably you don't have microservices anyway, you have a Next.js monolith on Vercel. Define SLOs per *user-facing critical journey*: signup, checkout, dashboard load. Three SLOs, max. If you do have separate services (e.g. a worker queue), define one availability SLO for the user-facing API and let the worker have its own internal target.