Add deploy failure diagnostics and safer backend health check.
All checks were successful
CI / Lint, type check, unit tests, coverage (push) Successful in 1m52s
CI / E2E browser tests (push) Successful in 46s

Production deploy failed with no backend logs before rollback. Print
backend and postgres logs on failure, wait longer for JVM startup, and
probe /api/payment/swish-info instead of vehicle lookup (no external scrape).

- Document proof-first troubleshooting in README
- No volume reset workflow; fix only after reading job logs
This commit is contained in:
Joakim Mörling 2026-05-21 16:39:01 +02:00
parent d652a5b862
commit db56fc58de
2 changed files with 31 additions and 4 deletions

View file

@ -64,12 +64,12 @@ jobs:
- name: Health checks with rollback - name: Health checks with rollback
run: | run: |
echo "Waiting for services to start..." echo "Waiting for services to start..."
sleep 20 sleep 30
BACKEND_OK=false BACKEND_OK=false
for i in 1 2 3 4 5; do for i in 1 2 3 4 5 6 7 8 9 10; do
if docker run --rm --network bilhej-prod_default curlimages/curl:8.5.0 \ if docker run --rm --network bilhej-prod_default curlimages/curl:8.5.0 \
-s http://bilhej-backend-prod:8080/api/vehicles/ABC123 > /dev/null; then -sf http://bilhej-backend-prod:8080/api/payment/swish-info > /dev/null; then
echo "Backend is healthy" echo "Backend is healthy"
BACKEND_OK=true BACKEND_OK=true
break break
@ -93,12 +93,25 @@ jobs:
if [ "$BACKEND_OK" != "true" ] || [ "$FRONTEND_OK" != "true" ]; then if [ "$BACKEND_OK" != "true" ] || [ "$FRONTEND_OK" != "true" ]; then
echo "" echo ""
echo "═══════════════════════════════════════════════════" echo "═══════════════════════════════════════════════════"
echo " HEALTH CHECK FAILED — ROLLING BACK DEPLOYMENT" echo " HEALTH CHECK FAILED — DIAGNOSTICS"
echo "═══════════════════════════════════════════════════"
echo ""
docker compose -p bilhej-prod -f docker-compose.prod.yml ps
echo ""
echo "--- Backend logs ---"
docker logs bilhej-backend-prod 2>&1 | tail -80 || true
echo ""
echo "--- Postgres logs ---"
docker logs bilhej-postgres-prod 2>&1 | tail -30 || true
echo ""
echo "═══════════════════════════════════════════════════"
echo " ROLLING BACK DEPLOYMENT"
echo "═══════════════════════════════════════════════════" echo "═══════════════════════════════════════════════════"
echo "" echo ""
docker compose -p bilhej-prod -f docker-compose.prod.yml down docker compose -p bilhej-prod -f docker-compose.prod.yml down
echo "" echo ""
echo "Rolled back. Containers stopped. DB volume preserved." echo "Rolled back. Containers stopped. DB volume preserved."
echo "Read Backend logs above to find the root cause before redeploying."
exit 1 exit 1
fi fi

View file

@ -311,6 +311,20 @@ Before the first deploy, complete these steps on the production server (`srvr.nu
3. Enter a version tag (e.g., `v0.1.0`). 3. Enter a version tag (e.g., `v0.1.0`).
4. Click **Run workflow**. 4. Click **Run workflow**.
### Deploy failed (backend health check)
If the job passes the frontend check but the backend never becomes healthy:
1. Open the failed job log and read **Backend logs** (printed before rollback).
2. Match the error to a fix — do not guess:
- **`password authentication failed`** — DB credentials in the running stack do not match
what Postgres was initialized with; fix credentials or Postgres password to match (only
wipe the volume if you accept losing prod data).
- **`Production requires ADMIN_EMAIL and ADMIN_PASSWORD`** — add those Forgejo secrets.
- **Flyway / migration errors** — fix schema or migration history before redeploying.
3. **DBeaver from your laptop** — prod Postgres binds to `127.0.0.1:5433` on the server only.
Use an SSH tunnel, then host `localhost` port `5433` (not `192.168.0.59` directly).
### What Happens ### What Happens
| Step | Action | | Step | Action |