Add deploy failure diagnostics and safer backend health check.

Production deploy failed with no backend logs before rollback. Print backend and postgres logs on failure, wait longer for JVM startup, and probe /api/payment/swish-info instead of vehicle lookup (no external scrape). - Document proof-first troubleshooting in README - No volume reset workflow; fix only after reading job logs
2026-05-21 16:39:01 +02:00 · 2026-05-21 16:39:01 +02:00 · db56fc58de
commit db56fc58de
parent d652a5b862
2 changed files with 31 additions and 4 deletions
--- a/.forgejo/workflows/deploy.yml
+++ b/.forgejo/workflows/deploy.yml
@ -64,12 +64,12 @@ jobs:
      - name: Health checks with rollback
        run: |
          echo "Waiting for services to start..."
-          sleep 20
+          sleep 30

          BACKEND_OK=false
-          for i in 1 2 3 4 5; do
+          for i in 1 2 3 4 5 6 7 8 9 10; do
            if docker run --rm --network bilhej-prod_default curlimages/curl:8.5.0 \
-               -s http://bilhej-backend-prod:8080/api/vehicles/ABC123 > /dev/null; then
+               -sf http://bilhej-backend-prod:8080/api/payment/swish-info > /dev/null; then
              echo "Backend is healthy"
              BACKEND_OK=true
              break
@ -93,12 +93,25 @@ jobs:
          if [ "$BACKEND_OK" != "true" ] || [ "$FRONTEND_OK" != "true" ]; then
            echo ""
            echo "═══════════════════════════════════════════════════"
-            echo "  HEALTH CHECK FAILED — ROLLING BACK DEPLOYMENT"
+            echo "  HEALTH CHECK FAILED — DIAGNOSTICS"
+            echo "═══════════════════════════════════════════════════"
+            echo ""
+            docker compose -p bilhej-prod -f docker-compose.prod.yml ps
+            echo ""
+            echo "--- Backend logs ---"
+            docker logs bilhej-backend-prod 2>&1 | tail -80 || true
+            echo ""
+            echo "--- Postgres logs ---"
+            docker logs bilhej-postgres-prod 2>&1 | tail -30 || true
+            echo ""
+            echo "═══════════════════════════════════════════════════"
+            echo "  ROLLING BACK DEPLOYMENT"
            echo "═══════════════════════════════════════════════════"
            echo ""
            docker compose -p bilhej-prod -f docker-compose.prod.yml down
            echo ""
            echo "Rolled back. Containers stopped. DB volume preserved."
+            echo "Read Backend logs above to find the root cause before redeploying."
            exit 1
          fi

--- a/README.md
+++ b/README.md
@ -311,6 +311,20 @@ Before the first deploy, complete these steps on the production server (`srvr.nu
 3. Enter a version tag (e.g., `v0.1.0`).
 4. Click **Run workflow**.

+### Deploy failed (backend health check)
+
+If the job passes the frontend check but the backend never becomes healthy:
+
+1. Open the failed job log and read **Backend logs** (printed before rollback).
+2. Match the error to a fix — do not guess:
+   - **`password authentication failed`** — DB credentials in the running stack do not match
+     what Postgres was initialized with; fix credentials or Postgres password to match (only
+     wipe the volume if you accept losing prod data).
+   - **`Production requires ADMIN_EMAIL and ADMIN_PASSWORD`** — add those Forgejo secrets.
+   - **Flyway / migration errors** — fix schema or migration history before redeploying.
+3. **DBeaver from your laptop** — prod Postgres binds to `127.0.0.1:5433` on the server only.
+   Use an SSH tunnel, then host `localhost` port `5433` (not `192.168.0.59` directly).
+
 ### What Happens

 | Step | Action |