-
-
Notifications
You must be signed in to change notification settings - Fork 186
Enable cleanUpWorkDirOnStart on workers deployments
#374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Enable cleanUpWorkDirOnStart on workers deployments
#374
Conversation
Signed-off-by: David Rozé <droze@baylibre.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is not needed because each deployment starts with a fresh work dir for Concourse.
If you look at the statefulset you'll see the workdir is a volume that will persist across deployments:
concourse-chart/templates/worker-statefulset.yaml
Lines 133 to 134 in cae81f3
| - name: concourse-work-dir | |
| mountPath: {{ .Values.concourse.worker.workDir | quote }} |
and no such volume mount exists for the deployment:
concourse-chart/templates/worker-deployment.yaml
Lines 106 to 118 in cae81f3
| volumeMounts: | |
| - name: concourse-keys | |
| mountPath: {{ .Values.worker.keySecretsPath | quote }} | |
| readOnly: true | |
| - name: pre-stop-hook | |
| mountPath: /pre-stop-hook.sh | |
| subPath: pre-stop-hook.sh | |
| {{- if and (not (kindIs "invalid" .Values.secrets.workerAdditionalCerts)) (.Values.secrets.workerAdditionalCerts | toString) }} | |
| - name: worker-additional-certs | |
| mountPath: "{{ .Values.worker.certsPath }}/worker-additional-certs.pem" | |
| subPath: worker-additional-certs.pem | |
| readOnly: true | |
| {{- end }} |
Looking at your issue #373, it sounds like k8s isn't cleaning up the disk space from the crashed worker container. I don't think your PR here would fix that issue.
|
You're right @taylorsilva it did not fix #373 which is also happening on clean upgrades/updates/restart. I suspect Kubernetes not cleaning loop mounts, this happened on some of my static workers running in docker on bare machines (outside Kube). I'll dig into it... |
|
Maybe a cleanup on shutdown would help with that? |
|
@david-baylibre any progress on this? Wondering if this PR should be closed or not? |
|
@taylorsilva loop mounts are the actual problem. From the node itself: Delete the pod: Check again: Everytime I delete the pod, I get an extra loop mount and disk space isn't released. I would suggest to add: and detach deleted volumes on startup as a poststart hook to reclaim space from previously crashed workers |
|
Ah okay! Thanks for digging into this and figuring it out. I'm a bit busy with other stuff at the moment, but happy to review any PR that fixes this. Not sure if you want to dust this one off or not? |
|
@taylorsilva I did not get to detach the loop device in the pre-stop hook because the filesystem is still in use, then the pod dies once Concourse is killed and it's too late, the pod is gone... |
Existing Issue
Fixes #373
Contributor Checklist
README.mdmasteris for changes related to the current release of theconcourse/concourse:latestimage and should be good to publish immediatelyReviewer Checklist
masterordev)