Zero-Touch Kubernetes Deployments with GitLab CI and Keel

The first version of my homelab deployment process was a shell script that SSH'd into each server and ran docker pull && docker restart. When I migrated everything to K3s, I briefly used kubectl rollout restart over SSH triggered from CI. It worked, but it felt wrong — CI pushing credentials around, the cluster's state not matching any declarative source of truth, everything held together by a bash script nobody could interpret six months later.

The current setup has none of that. A push to main triggers a GitLab CI pipeline. The pipeline runs tests, builds a Docker image, and pushes it to the GitLab container registry. Keel — running inside the cluster — notices the new image and rolls it out automatically. I never run kubectl as part of a deploy.

The Pipeline

GitLab CI is configured in .gitlab-ci.yml at the repo root. Each app owns a build.yml included via local: references, so the root config stays clean and each app manages its own jobs.

The stages are test → build. A typical app:

myapp:test:
  image: node:24-alpine
  stage: test
  before_script:
    - cd apps/myapp/app/
    - npm install
  script:
    - npm test
  rules:
    - changes:
        - apps/myapp/**/*

myapp:docker-main:
  image: docker:latest
  stage: build
  services:
    - docker:dind
  before_script:
    - cd apps/myapp/app/
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
  script:
    - docker build --pull --build-arg BUILD_TIME="$CI_PIPELINE_CREATED_AT" \
        -t "${CI_REGISTRY_IMAGE}/myapp:latest" .
    - docker push "${CI_REGISTRY_IMAGE}/myapp:latest"
  rules:
    - if: $CI_COMMIT_BRANCH == 'main'
      changes:
        - apps/myapp/**/*

The rules block means a job only runs if something in apps/myapp/**/* changed — a commit that only touches documentation doesn't trigger a Docker build. $CI_REGISTRY_USER, $CI_REGISTRY_PASSWORD, and $CI_REGISTRY are built-in GitLab CI variables; no manual secret setup for the registry.

The docker-dev job is identical except for the branch condition and the image tag:

myapp:docker-dev:
  image: docker:latest
  stage: build
  services:
    - docker:dind
  before_script:
    - cd apps/myapp/app/
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
  script:
    - docker build --pull --build-arg BUILD_TIME="$CI_PIPELINE_CREATED_AT" \
        -t "${CI_REGISTRY_IMAGE}/myapp:dev-${CI_COMMIT_REF_SLUG}" .
    - docker push "${CI_REGISTRY_IMAGE}/myapp:dev-${CI_COMMIT_REF_SLUG}"
  rules:
    - if: $CI_COMMIT_BRANCH != 'main'
      changes:
        - apps/myapp/**/*

The staging branch pushes :dev-staging — $CI_COMMIT_REF_SLUG is the branch name URL-slugified by GitLab. The staging deployment's image tag is pinned to :dev-staging, so Keel picks it up automatically the same way it handles :latest in production.

Keel

Keel is a Kubernetes operator that watches image registries and automatically updates deployments when new images are pushed. It runs as a deployment inside the cluster and polls the GitLab registry every minute.

Four annotations on each deployment enable it:

metadata:
  annotations:
    keel.sh/policy: force
    keel.sh/trigger: poll
    keel.sh/match-tag: "true"
    keel.sh/poll-schedule: "@every 1m"

policy: force means Keel updates the image even if the tag hasn't changed — :latest is always overwritten on push. match-tag: "true" means it only updates when the tag matches, so the production deployment (:latest) won't accidentally pick up :dev-staging pushes. poll-schedule: "@every 1m" overrides Keel's global default of @every 1h.

Here's what the two deployments look like side by side, cut down to the relevant parts:

# production.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: myapp
  annotations:
    keel.sh/policy: force
    keel.sh/trigger: poll
    keel.sh/match-tag: "true"
    keel.sh/poll-schedule: "@every 1m"
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: myapp
          image: registry.gitlab.com/cmunroe/ops/myapp:latest

---

# staging.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-staging
  namespace: myapp
  annotations:
    keel.sh/policy: force
    keel.sh/trigger: poll
    keel.sh/match-tag: "true"
    keel.sh/poll-schedule: "@every 1m"
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: myapp
          image: registry.gitlab.com/cmunroe/ops/myapp:dev-staging

The image tags are the only meaningful difference. match-tag: "true" is what keeps them independent — when CI pushes :latest, Keel updates myapp only; when it pushes :dev-staging, it updates myapp-staging only. Without that annotation, both deployments would race to pull whatever the most recent push was.

End-to-end — push to merge, pipeline completes, Keel rolls out — takes about two minutes.

Branch → Environment Mapping

Branch	Image tag	Picked up by
`main`	`:latest`	Production deployment
`staging`	`:dev-staging`	Staging deployment
Any other branch	`:dev-<slug>`	Nothing automatic

The staging deployment is identical to production except for the image tag and replica count (1 instead of 2). Every non-trivial change goes through staging first — push to staging, verify on dev.myapp.com, merge to main.

What Flux Handles vs. What Keel Handles

Flux manages cluster state from the Git repository: it applies Kubernetes manifests, creates namespaces, ensures the right deployments and services exist. Keel handles the runtime image update loop — updating the image in an already-running deployment without requiring a manifest change.

The division is deliberate. If I change resource limits, replica count, or environment variables, that goes through a manifest change reconciled by Flux. If I push new application code, that goes through CI → registry → Keel. Neither system knows about the other.

Renovate rounds out the picture by opening merge requests when base Docker images or Helm chart versions have updates — keeping the non-application parts of the stack current without manual tracking.

What's Missing

The main gap is automatic rollback. If a new image starts crash-looping, Keel doesn't revert — I have to do that manually. For a homelab where "broken" means "my blog is down for five minutes", that's an acceptable tradeoff. For something more critical it would matter.

The other gap: liveness and readiness probes. Without them, Kubernetes rolls out to a new pod as soon as it starts — not as soon as it's serving traffic. Adding /api/health as a readiness probe to every app is on the list.