Probes YAML Generator

Build liveness, readiness, and startup probes with sane defaults for the workload you're shipping. No data leaves your browser.

How Kubernetes probes actually work

Probes are how the kubelet decides whether your container is alive, ready for traffic, or still booting. Three independent checks, three different consequences when they fail. Most probe bugs come from confusing the three — wiring readiness logic into liveness, then watching the pod restart-loop the moment a downstream dependency hiccups.

Key insight

Liveness restarts the pod. Readiness pulls it from the Service. Startup buys you time. Pick the wrong one and you'll either lose traffic you should have served or restart pods that were perfectly fine.

When to use each probe type

Readiness probes control traffic. When a readiness probe fails, the kubelet removes the pod from the service's endpoints — kube-proxy stops sending it traffic. Use readiness to keep users away from a pod that's up but not yet able to serve: still warming a cache, waiting for a DB connection, mid-config-reload. The pod stays alive; it just doesn't get hit until it's actually ready.

Liveness probes control restarts. When a liveness probe fails, the kubelet kills and restarts the container. Use liveness to recover from states where the process is alive but unresponsive — deadlocks, runaway memory, stuck goroutines, frozen event loops. Liveness is not a substitute for readiness: a transient external dependency hiccup should not trigger a restart, it should trigger a temporary "remove from service" via readiness.

Startup probes gate liveness during boot. Added in Kubernetes 1.16 and GA in 1.20, the startup probe disables liveness probes entirely until startup succeeds. The kubelet still kills the container if startup never passes within the configured grace period, but liveness doesn't run yet. A Java app that takes 90 seconds to be ready can have a liberal 5-minute startup grace and an aggressive 10-second liveness check, instead of having to set a 5-minute initialDelaySeconds on liveness.

Tuning the timing knobs

Every probe has the same five integer knobs. The math is more important than the names:

initialDelaySeconds — how long after container start to wait before the first check. With a startup probe handling boot grace, set this to 0 on liveness; without one, set it to longer than your worst-case boot.
periodSeconds — time between checks once they begin. Faster detection costs more CPU and exec-fork overhead; slower means the system takes longer to react.
timeoutSeconds — how long the kubelet waits for the check itself to return. Default is 1s. Bump it if your /healthz endpoint touches a slow dependency.
successThreshold — consecutive successes required to flip the probe back to passing. Kubernetes hard-codes this to 1 for liveness and startup probes — the API rejects anything else. For readiness, raise to 2–3 to tame flapping.
failureThreshold — consecutive failures required to fail the probe. Combined with periodSeconds, this is your effective grace.

livenessProbe:
  httpGet:
    path: /healthz
    port: http
  periodSeconds: 10
  failureThreshold: 3   # 30s grace before restart

The math to memorize: grace before action = initialDelaySeconds + (failureThreshold × periodSeconds). When debugging "why didn't Kubernetes do X by now?", sketch this out for the failing pod first.

Use a startupProbe for any container whose cold-start exceeds failureThreshold × periodSeconds on liveness. Once it succeeds once, it disables itself and liveness/readiness take over. This is the cleanest way to protect slow boots without weakening liveness for the steady state.

Don't share endpoints across probes. A common mistake: pointing liveness at the same /health endpoint that checks downstream dependencies. The first time the database has a hiccup, every pod gets restarted simultaneously, taking the service down harder than the original blip would have. Liveness should answer "is this process wedged?" — nothing else.

The exec probe is not free. Each invocation forks a process inside the container. On a busy node with hundreds of pods running probes every few seconds, the cumulative cost is real and shows up as kubelet PLEG warnings. Prefer httpGet or tcpSocket unless you specifically need the shell.

Choosing a handler: HTTP vs TCP vs Exec vs gRPC

httpGet — most common and usually right. The kubelet hits a path, expects 200–399. Best when the app exposes HTTP. Implement a dedicated /healthz endpoint that's cheap to call and doesn't touch the database.

tcpSocket — checks if a TCP connection succeeds. Works for non-HTTP services (Postgres, Redis, custom protocols). Be aware: TCP success ≠ application readiness. Postgres accepts connections during recovery but rejects queries; Redis accepts connections during AOF rewrite but blocks. tcpSocket is best as a startup probe and worst as a strict readiness probe.

exec — runs a command inside the container. The most flexible — you can use CLI tools the app ships with (pg_isready, mysqladmin ping, redis-cli ping) for accurate readiness. Heaviest overhead; the kubelet forks a process every check. Be careful with tight periodSeconds on busy nodes.

grpc — native gRPC health checking, GA in Kubernetes 1.24. Uses the gRPC Health Checking Protocol so the kubelet talks to your service directly. Cleaner than running grpc_health_probe as exec, which is the workaround for clusters older than 1.24.

FAQ

Why is my pod restart-looping during startup?

Almost always a liveness probe firing during legitimate boot. Add a startup probe (failureThreshold: 30, periodSeconds: 10 ≈ 5 minutes of grace) and Kubernetes will hold the liveness probe back until the app is up. The same liveness config you have today, plus a startup probe, fixes most slow-boot restart loops.

What's the difference between liveness, readiness, and startup probes?

Readiness controls traffic — failure removes the pod from service endpoints, so users don't see 502s while the pod is unready. Liveness controls restarts — failure tells the kubelet to kill and restart the container. Startup gates liveness during boot — while it's running, liveness probes are disabled.

Can I use the same endpoint for liveness and readiness?

You can, but you usually shouldn't. Liveness should be cheap and answer 'is this process alive?'. Readiness can include 'is this pod ready to serve traffic right now?' (DB connection up, cache warmed). Conflating them means a transient DB blip restarts your pod instead of just removing it from the load balancer.

Should liveness checks include database connectivity?

Almost never. If the database is down, restarting your pod doesn't help — the next pod will fail the same check and also restart, and so on. Use readiness to route traffic away from the pod so users don't hit it; let liveness keep the pod alive so it can recover when the DB comes back.

What does 'Failed: failureThreshold reached' mean in pod events?

The probe failed `failureThreshold` consecutive times. The action depends on the probe type: liveness restarts the container, readiness removes the pod from service endpoints, startup also restarts the container (and liveness still hasn't started). The grace period before action is `failureThreshold × periodSeconds`.

Why doesn't my service serve traffic right after the pod is Ready?

Endpoints controller propagation typically takes 1–5 seconds to push to kube-proxy, then kube-proxy reprograms iptables/IPVS rules, then existing client connections may still hit DNS caches. The mirror image happens on shutdown — set `terminationGracePeriodSeconds` and a `preStop` sleep to let traffic drain before SIGTERM.

Probes YAML Generator

Workload preset

Startup

Readiness

Liveness

How Kubernetes probes actually work

When to use each probe type

Tuning the timing knobs

Choosing a handler: HTTP vs TCP vs Exec vs gRPC

FAQ