Add offline backup for foremanctl#507
Conversation
15ba9dd to
c6de1eb
Compare
ianballou
left a comment
There was a problem hiding this comment.
I did an automated test run, take these with a grain of salt, but I wanted to post early in case they're real for extra time to test and fix.
Firstly, the DBs did get backed up, so awesome, but there were some hiccups that stopped the full process from running:
Bug #1: podman_network.yaml fails when no custom networks exist
File: src/playbooks/backup/tasks/podman_network.yaml
Severity: Blocker — prevents backup from completing
Description: The shell command podman network ls --format '{{.Name}}' | grep -v '^podman$' | while read net; do ... returns exit code 1 when there are no custom networks, because grep -v finds no matching lines.
Fix: Add || true after the grep, or use a different approach:
# Option A: tolerate empty result
failed_when: networks_json.rc not in [0, 1]
# Option B: check first, skip if no custom networksBug #2: Wrong Foreman tasks API endpoint
File: src/playbooks/backup/tasks/preflight.yaml
Severity: High — preflight silently skips running task detection
Description: Uses https://{{ fqdn }}/api/v2/tasks?state=running which returns 404. The correct endpoint is https://{{ fqdn }}/foreman_tasks/api/tasks?state=running&search=state%3Drunning. Because failed_when: false is set, the error is silently ignored.
Impact: Backups will proceed even with running Foreman tasks, risking data inconsistency.
Bug #3: pg_isready and pg_dump not available on host
... I cut the output here, I'm not sure why these commands weren't on my box. It's not related to this PR I don't think.
Bug #4: Hardcoded parameters.yaml path in metadata task
File: src/playbooks/backup/tasks/metadata.yaml
Severity: Low — affects metadata accuracy only
Description: ansible.builtin.slurp reads from /var/lib/foremanctl/parameters.yaml but foremanctl's state directory is configurable via OBSAH_STATE. In dev/vagrant setups, the actual path is different (e.g., /vagrant/.var/lib/foremanctl/parameters.yaml).
Impact: enabled_features: [] in metadata despite features being configured. Doesn't affect DB dump correctness.
Fix: Use the state_dir variable instead of hardcoding the path.
|
Will update the tasks endpoint and parameters.yaml path.. About the podman networks, those are created for IOP.. Like https://github.com/theforeman/foremanctl/blob/master/src/roles/iop_network/tasks/main.yaml so I'd assume we have that present in production deployments. We can add some handling for when it's not. |
|
Maybe nitpick but does it make sense to include the not yet implemented flags when doing |
I am fine either way but it's helpful guidance for future PRs and documentation to look at. |
4da7174 to
10c3e00
Compare
|
not a full review (I stopped somewhere around the secrets backup), but overall this feels a lot like "let's write a huge bash script and then wrap it in YAML" and not like Ansible :( |
ce661a7 to
2161467
Compare
| register: backup_pulp_content_backup_check | ||
| failed_when: false | ||
|
|
||
| - name: Build backup metadata |
There was a problem hiding this comment.
If these are all in service of writing the metadata file, you should be able to set them as vars on the tasks:
- name: Write metadata file
ansible.builtin.copy:
content: "{{ backup_metadata | to_nice_yaml }}"
dest: "{{ backup_dir_full }}/metadata.yml"
mode: '0644'
vars:
backup_metadata:
hostname: "{{ ansible_fqdn }}"
...
You could also consider metadata.yml being a template.
There was a problem hiding this comment.
Updated as vars for the task..
| - name: Check for running Foreman tasks | ||
| theforeman.foreman.resource_info: | ||
| server_url: "https://{{ backup_foreman_server_fqdn }}" | ||
| oauth1_consumer_key: "{{ foreman_oauth_consumer_key }}" |
There was a problem hiding this comment.
This is tricky because you are bleeding other role variables into this role which goes against our design principle.
There was a problem hiding this comment.
Moved these to backup_* specific variables in base.yml following pattern for other roles..
4d5f779 to
256d3ba
Compare
|
Should be getting close here..Addressed prior reviews..2 pending reviews around using pulp API instead of pulp db queries for running tasks and documentation for backup.. |
256d3ba to
3a3f554
Compare
|
|
||
| - name: Wait for Pulp tasks to complete (if --wait-for-tasks) | ||
| ansible.builtin.uri: | ||
| url: "https://{{ ansible_fqdn }}/pulp/api/v3/tasks/?state__in=running,waiting" |
There was a problem hiding this comment.
I don't think it's a blocker, but we could use pulp.squeezer here https://galaxy.ansible.com/ui/repo/published/pulp/squeezer/content/module/task/?keywords=task
| - name: Backup pulp content directory with encryption keys # noqa: command-instead-of-module | ||
| ansible.builtin.command: | ||
| cmd: > | ||
| tar -czf {{ backup_dir_full }}/pulp-content.tar.gz |
There was a problem hiding this comment.
I think it makes sense to always use --listed-incremental with tar here so that the incremental metadata is available from the start even before we fully support incremental backup.
Foreman maintain backup passes the incremental flag to tar regardless of if the incremental flag was passed to foreman maintain: https://github.com/theforeman/foreman_maintain/blob/344d337491072e6bbed1b7987284d8f41889cbf9/definitions/procedures/backup/pulp.rb#L36-L44
When we do add support for incremental backups, then we can copy around the .snar files and use them properly.
There was a problem hiding this comment.
It is interconnected as in you'd need a full backup as level 0 backup and the snar and then move to incremental on top of it. Implementation wise, I'd propose getting there as part of incremental and scoping this PR to full backups only?
Implements comprehensive offline backup functionality for Foreman deployments: - Backs up all databases (foreman, candlepin, pulp, 5 IOP DBs) - Backs up podman secrets, networks, volumes, quadlet files - Backs up systemd units and foremanctl state - Includes metadata with container image digests for restore compatibility - Preflight checks for running tasks and database integrity (amcheck) - Automatic service restoration on failure Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3a3f554 to
8db6812
Compare
| @@ -0,0 +1,207 @@ | |||
| # Backup | |||
There was a problem hiding this comment.
@ehelms Added user documentation for backup..We can reuse and update the file as we develop restore command and expand the backup feature.
I was not sure if we needed any developer documentation for this or backup/restore in general.
Implements comprehensive offline backup functionality for Foreman deployments:
Why are you introducing these changes? (Problem description, related links)
What are the changes introduced in this pull request?
How to test this pull request
I got a foremanctl box with normal deploy. On this box, clone foremanctl repo and checkout this branch.
cd /root/foremanctl
source .venv/bin/activate
export OBSAH_STATE=/var/lib/foremanctl
Then try ./foremanctl --help
Also,
The help section has placeholders for incremental, online and --tar-volume-size which will be implemented in follow up cards/PRs.
You can run :
Command:
I have a dummy restore script which can be used for testing and also getting steps to run manually when testing.:
Download the script
wget https://gist.githubusercontent.com/sjha4/35d98b318f15753a678a406fb0fb14ad/raw/test-restore-final.sh
Make it executable
chmod +x test-restore-final.sh
Run restore
./test-restore-final.sh /path/to/backup/foreman-backup-TIMESTAMP
Steps to reproduce:
Checklist