Skip to content

fix: fall back to lease file when DHCPLeases API is empty#190

Open
ajmeese7 wants to merge 8 commits into
fog:masterfrom
ajmeese7:fix/leasefile-fallback-for-external-dhcp
Open

fix: fall back to lease file when DHCPLeases API is empty#190
ajmeese7 wants to merge 8 commits into
fog:masterfrom
ajmeese7:fix/leasefile-fallback-for-external-dhcp

Conversation

@ajmeese7
Copy link
Copy Markdown
Contributor

@ajmeese7 ajmeese7 commented Apr 1, 2026

Summary

  • When a libvirt network uses an external dnsmasq (not managed by libvirt), the DHCPLeases API returns empty and the VM's IP address can't be discovered. This adds a fallback that reads the dnsmasq lease file directly from /var/lib/libvirt/dnsmasq/<network>.leases.
  • Common scenario: dnsmasq running inside a network namespace on WSL2 to work around port 67 conflicts with Windows ICS.
  • Includes hardening (error handling for file permission/race conditions, warning log on fallback) and a minitest suite for the new method.

Changes

  • lib/fog/libvirt/models/compute/server.rb
    • New ip_address_from_leasefile(net, mac) private method parses dnsmasq lease files, matches by MAC, returns IP with latest expiry
    • addresses() calls the fallback when dhcp_leases returns nil and a network exists
    • DNSMASQ_LEASE_DIR constant for the lease file directory
    • Fog::Logger.warning when the fallback path is entered (emitted only once per MAC)
    • Errno::EACCES/Errno::ENOENT rescue around file reads
    • Removed unused DOMAIN_CLEANUP_REGEXP constant
    • Retain Tempfile reference in generate_config_iso_in_dir to prevent GC from deleting the ISO file before File.size reads it (fixes flaky CI ENOENT failure)
  • minitests/server/leasefile_fallback_test.rb — 8 test cases covering happy path, expiry selection, case-insensitive MAC, missing file, malformed lines, no match, permission error, and nil network name

Test plan

  • bundle exec rake minitest passes
  • Manual verification on a WSL2 host with external dnsmasq in a network namespace
  • Verify existing public_ip_address behavior is unchanged when libvirt DHCP is active (bundle exec rake test)

The libvirt DHCPLeases API returns no results when the network was
defined without a <dhcp> block, even if an external dnsmasq is
serving DHCP on the bridge. This occurs on WSL2 with mirrored
networking where port 67 is held by Windows ICS, forcing dnsmasq
to run in an isolated network namespace outside libvirt's control.

Add ip_address_from_leasefile() as a fallback in addresses() that
reads the dnsmasq lease file directly, matching by MAC and picking
the lease with the latest expiry — mirroring the existing
dhcp_leases logic.
- Extract lease directory to DNSMASQ_LEASE_DIR constant
- Remove bare rescue on net.name (fog attribute won't raise)
- Handle Errno::EACCES/ENOENT during lease file reads
- Log warning when falling back to dnsmasq lease file
- Remove unused DOMAIN_CLEANUP_REGEXP constant
- Add minitest suite for ip_address_from_leasefile
The addresses() method is polled in a tight loop while waiting for
a VM to get an IP. The warning fired on every call, flooding logs
with dozens of identical lines.
The Tempfile object was immediately discarded after calling .path,
making it eligible for garbage collection. When GC finalized the
object, the underlying ISO file was deleted before File.size could
read it, causing flaky ENOENT failures in CI.
Copy link
Copy Markdown
Contributor

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a networked setup this won't work and I'm not sure this is something I'd want to support. This gem is currently struggling for maintainers and all added code needs to be maintained. I'm inclined to reject this PR.

Comment thread lib/fog/libvirt/models/compute/server.rb
Address PR fog#190 review:

- Only read the dnsmasq lease file when service.uri is local. For a remote
  URI (e.g. qemu+ssh) the lease file lives on the libvirt host, not the
  client, so the old code would read the wrong machine's filesystem. This
  was the maintainer's core objection.
- Extract warn-once logic into #warn_leasefile_fallback using an instance
  var instead of a class var (Style/ClassVars).
- Split parallel assignment in the lease parser (Style/ParallelAssignment).
- Use Process.uid.zero? in the test (Style/NumericPredicate).
- Add coverage for both the local-connection (uses leasefile) and
  remote-connection (skips leasefile) paths through #addresses.
@ajmeese7
Copy link
Copy Markdown
Contributor Author

Thanks for the review and for being so attentive, you make a good point. I just introduced a guard so the fallback only runs for a local connection (!service.uri.remote?). For any qemu+ssh:// or qemu+tcp:// URI it's skipped and the behavior remains unchanged. It also only fires when the DHCPLeases API returns nothing, so existing local setups with libvirt-managed DHCP never hit it either.

If this still isn't the direction you are looking for I understand, but I will say that this change was critical for me when I was using this library for a production use case, so I do find value in it. I'm happy to keep working with you to find a way forward if you are open to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants