Skip to content

Conversation

@david-baylibre
Copy link

Existing Issue

Fixes #373

Contributor Checklist

  • Variables are documented in the README.md
  • Which branch are you merging into?
    • master is for changes related to the current release of the concourse/concourse:latest image and should be good to publish immediately

Reviewer Checklist

This section is intended for the core maintainers only, to track review progress. Please do not
fill out this section.

  • Code reviewed
  • Topgun tests run
  • Back-port if needed
  • Is the correct branch targeted? (master or dev)

Signed-off-by: David Rozé <droze@baylibre.com>
Copy link
Member

@taylorsilva taylorsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not needed because each deployment starts with a fresh work dir for Concourse.

If you look at the statefulset you'll see the workdir is a volume that will persist across deployments:

- name: concourse-work-dir
mountPath: {{ .Values.concourse.worker.workDir | quote }}

and no such volume mount exists for the deployment:

volumeMounts:
- name: concourse-keys
mountPath: {{ .Values.worker.keySecretsPath | quote }}
readOnly: true
- name: pre-stop-hook
mountPath: /pre-stop-hook.sh
subPath: pre-stop-hook.sh
{{- if and (not (kindIs "invalid" .Values.secrets.workerAdditionalCerts)) (.Values.secrets.workerAdditionalCerts | toString) }}
- name: worker-additional-certs
mountPath: "{{ .Values.worker.certsPath }}/worker-additional-certs.pem"
subPath: worker-additional-certs.pem
readOnly: true
{{- end }}

Looking at your issue #373, it sounds like k8s isn't cleaning up the disk space from the crashed worker container. I don't think your PR here would fix that issue.

@david-baylibre
Copy link
Author

You're right @taylorsilva it did not fix #373 which is also happening on clean upgrades/updates/restart.

I suspect Kubernetes not cleaning loop mounts, this happened on some of my static workers running in docker on bare machines (outside Kube). I'll dig into it...

@taylorsilva
Copy link
Member

Maybe a cleanup on shutdown would help with that?

@taylorsilva
Copy link
Member

taylorsilva commented Aug 26, 2025

@david-baylibre any progress on this? Wondering if this PR should be closed or not?

@david-baylibre
Copy link
Author

@taylorsilva loop mounts are the actual problem.

From the node itself:

root@gke-ci-cluster-concourse-dev-wk-01-fc3b878d-wvtz:/# losetup -a; df -h /
/dev/loop0: [2049]:785313 (/concourse-work-dir/volumes.img)
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       745G   18G  727G   3% /

Delete the pod:

# k delete po concourse-worker-7798fb4664-4x46h
pod "concourse-worker-7798fb4664-4x46h" deleted

Check again:

root@gke-ci-cluster-concourse-dev-wk-01-fc3b878d-wvtz:/# losetup -a; df -h /
/dev/loop1: [2049]:2580553 (/concourse-work-dir/volumes.img)
/dev/loop0: [2049]:785313 (/volumes.img (deleted))
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       745G   18G  727G   3% /

Everytime I delete the pod, I get an extra loop mount and disk space isn't released.
The good part is deleted volumes show up in the new pod:

$ k exec -it concourse-worker-7798fb4664-75wx6 -- losetup -a
/dev/loop1: [2049]:2580553 (/volumes.img (deleted))
/dev/loop2: [2049]:3096649 (/concourse-work-dir/volumes.img)
/dev/loop0: [2049]:785313 (/volumes.img (deleted))

I would suggest to add:
umount ${CONCOURSE_WORK_DIR}/volumes in /pre-stop-hook.sh before the kill -s to end the container properly

and detach deleted volumes on startup as a poststart hook to reclaim space from previously crashed workers

@taylorsilva
Copy link
Member

Ah okay! Thanks for digging into this and figuring it out. I'm a bit busy with other stuff at the moment, but happy to review any PR that fixes this. Not sure if you want to dust this one off or not?

@david-baylibre
Copy link
Author

@taylorsilva I did not get to detach the loop device in the pre-stop hook because the filesystem is still in use, then the pod dies once Concourse is killed and it's too late, the pod is gone...
The only option is to create a sidecar container (which isn't very elegant) that monitors Concourse processes and detach the loop device once the main container has exited. That goes in a pre stop hook as well, a kubectl delete pod would kill the 2 containers leaving the loop device still attached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make it possible to cleanup the workdir on workers startup for deployments

2 participants