Skip to content

Multi depend fix#474

Open
troglobit wants to merge 6 commits intomasterfrom
multi-depend
Open

Multi depend fix#474
troglobit wants to merge 6 commits intomasterfrom
multi-depend

Conversation

@troglobit
Copy link
Collaborator

In a setup like this, when netd is marked dirty and subsequently is reloaded using initctl reload, zebra is properly restarted, but staticd isn't:

mgmtd <!> ← netd <pid/mgmtd> ← zebra <!pid/netd> ← staticd <!pid/zebra>

Finit must invalidate the condition of zebra to trigger a restart also of staticd. This to guard against daemons like zebra that may fail to clean up their pidfiles.

@troglobit troglobit requested a review from mattiaswal February 12, 2026 09:57
@mattiaswal
Copy link
Collaborator

mattiaswal commented Feb 12, 2026

Still issue on infix:

finit[1]: cond_set_oneshot():service/netopeer/running
finit[1]: cond_set_oneshot_noupdate():service/netopeer/running => /run/finit/cond/service/netopeer/running
finit[1]: service_step():Reassert netopeer ready condition
finit[1]: cond_set_noupdate():service/netopeer/ready
finit[1]: cond_set_path():/run/finit/cond/service/netopeer/ready <= 2
finit[1]: service_step():            netopeer(4576): ->  running
finit[1]: service_step():            netopeer(4576):  running  enabled/clean   cond:on  
finit[1]: cond_set_noupdate():service/netopeer/ready
finit[1]: cond_set_path():/run/finit/cond/service/netopeer/ready <= 2
finit[1]: service_step():                netd(4673):   paused  enabled/updated cond:on  
finit[1]: cond_set_oneshot():service/netd/running
finit[1]: cond_set_oneshot_noupdate():service/netd/running => /run/finit/cond/service/netd/running
finit[1]: service_step():                netd(4673): ->  running
finit[1]: service_step():                netd(4673):  running  enabled/updated cond:on  
finit[1]: service_reload():netd[4673], sending SIGHUP
finit[1]: netd[4673], sending SIGHUP ...
finit[1]: service_step():                rauc(4577):  running  enabled/clean   cond:on  
finit[1]: service_step():          resolvconf(   0):     done  enabled/clean   cond:on  
finit[1]: service_step():               statd(4547):   paused  enabled/clean   cond:on  
finit[1]: cond_set_oneshot():service/statd/running
finit[1]: cond_set_oneshot_noupdate():service/statd/running => /run/finit/cond/service/statd/running
finit[1]: service_step():Reassert condition pid/statd
finit[1]: cond_set_path():/run/finit/cond/pid/statd <= 2
finit[1]: service_step():Reassert statd ready condition
finit[1]: cond_set_noupdate():service/statd/ready
finit[1]: cond_set_path():/run/finit/cond/service/statd/ready <= 2
finit[1]: service_step():               statd(4547): ->  running
finit[1]: service_step():               statd(4547):  running  enabled/clean   cond:on  
finit[1]: service_step():             staticd(6414):   paused  enabled/clean   cond:flux
finit[1]: service_step():             syslogd(2734):  running  enabled/clean   cond:on  
finit[1]: service_step():           watchdogd(2735):  running  enabled/clean   cond:on  
finit[1]: service_step():               zebra(6410):   paused  enabled/clean   cond:flux
finit[1]: service_step():                mdns(6398):  running  enabled/clean   cond:on  
finit[1]: service_step():               lldpd(4602):  running  enabled/clean   cond:on  
finit[1]: service_step():           netbrowse(6404):  running  enabled/clean   cond:on  
finit[1]: service_step():               nginx(4629):  running  enabled/clean   cond:on  
finit[1]: service_step():            rousette(4631):   paused  enabled/clean   cond:on  
finit[1]: cond_set_oneshot():service/rousette/running
finit[1]: cond_set_oneshot_noupdate():service/rousette/running => /run/finit/cond/service/rousette/running
finit[1]: service_step():Reassert rousette ready condition
finit[1]: cond_set_noupdate():service/rousette/ready
finit[1]: cond_set_path():/run/finit/cond/service/rousette/ready <= 2
finit[1]: service_step():            rousette(4631): ->  running
finit[1]: service_step():            rousette(4631):  running  enabled/clean   cond:on  
finit[1]: cond_set_noupdate():service/rousette/ready
finit[1]: cond_set_path():/run/finit/cond/service/rousette/ready <= 2
finit[1]: service_step():                sshd(4648):   paused  enabled/clean   cond:on  
finit[1]: cond_set_oneshot():service/sshd/running
finit[1]: cond_set_oneshot_noupdate():service/sshd/running => /run/finit/cond/service/sshd/running
finit[1]: service_step():Reassert condition pid/sshd
finit[1]: cond_set_path():/run/finit/cond/pid/sshd <= 2
finit[1]: service_step():Reassert sshd ready condition
finit[1]: cond_set_noupdate():service/sshd/ready
finit[1]: cond_set_path():/run/finit/cond/service/sshd/ready <= 2
finit[1]: service_step():                sshd(4648): ->  running
finit[1]: service_step():                sshd(4648):  running  enabled/clean   cond:on  
finit[1]: service_step():                ttyd(4649):  running  enabled/clean   cond:on  
finit[1]: cond_set_oneshot():nop
finit[1]: service_step():          resolvconf(   0):     done  enabled/clean   cond:on  
finit[1]: sm_step():Update configuration generation of device conditions ...
finit[1]: sm_step():Update configuration generation of unmodified non-native services ...
finit[1]: cond_set_noupdate():service/udevd/ready
finit[1]: cond_set_path():/run/finit/cond/service/udevd/ready <= 2
finit[1]: cond_set_noupdate():service/dbus/ready
finit[1]: cond_set_path():/run/finit/cond/service/dbus/ready <= 2
finit[1]: cond_set_noupdate():service/netopeer/ready
finit[1]: cond_set_path():/run/finit/cond/service/netopeer/ready <= 2
finit[1]: cond_set_noupdate():service/rousette/ready
finit[1]: cond_set_path():/run/finit/cond/service/rousette/ready <= 2
finit[1]: sm_step():Reconfiguration done
finit[1]: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit[1]: service_step():               udevd(1470):  running  enabled/clean   cond:on  
finit[1]: cond_set_noupdate():service/udevd/ready
finit[1]: cond_set_path():/run/finit/cond/service/udevd/ready <= 2
finit[1]: service_step():                dbus(2738):  running  enabled/clean   cond:on  
finit[1]: cond_set_noupdate():service/dbus/ready
finit[1]: cond_set_path():/run/finit/cond/service/dbus/ready <= 2
finit[1]: service_step():               confd(3728):  running  enabled/clean   cond:on  
finit[1]: service_step():             dnsmasq(2743):  running  enabled/clean   cond:on  
finit[1]: service_step():              tty:S0(4572):  running  enabled/clean   cond:on  
finit[1]: service_step():            tty:hvc0(4573):  running  enabled/clean   cond:on  
finit[1]: service_step():               iitod(2768):  running  enabled/clean   cond:on  
finit[1]: service_step():              klishd(4574):  running  enabled/clean   cond:on  
finit[1]: service_step():          mdns-alias(6400):  running  enabled/clean   cond:on  
finit[1]: service_step():               mgmtd(4575):  running  enabled/clean   cond:on  
finit[1]: service_step():               mstpd(   0):  stopped disabled/clean   cond:on  
finit[1]: service_step():            netopeer(4576):  running  enabled/clean   cond:on  
finit[1]: cond_set_noupdate():service/netopeer/ready
finit[1]: cond_set_path():/run/finit/cond/service/netopeer/ready <= 2
finit[1]: service_step():                netd(4673):  running  enabled/clean   cond:on  
finit[1]: service_step():                rauc(4577):  running  enabled/clean   cond:on  
finit[1]: service_step():          resolvconf(   0):     done  enabled/clean   cond:on  
finit[1]: service_step():               statd(4547):  running  enabled/clean   cond:on  
finit[1]: service_step():             staticd(6414):   paused  enabled/clean   cond:flux
finit[1]: service_step():             syslogd(2734):  running  enabled/clean   cond:on  
finit[1]: service_step():           watchdogd(2735):  running  enabled/clean   cond:on  
finit[1]: service_step():               zebra(6410):   paused  enabled/clean   cond:flux
finit[1]: service_step():                mdns(6398):  running  enabled/clean   cond:on  
finit[1]: service_step():               lldpd(4602):  running  enabled/clean   cond:on  
finit[1]: service_step():           netbrowse(6404):  running  enabled/clean   cond:on  
finit[1]: service_step():               nginx(4629):  running  enabled/clean   cond:on  
finit[1]: service_step():            rousette(4631):  running  enabled/clean   cond:on  
finit[1]: cond_set_noupdate():service/rousette/ready
finit[1]: cond_set_path():/run/finit/cond/service/rousette/ready <= 2
finit[1]: service_step():                sshd(4648):  running  enabled/clean   cond:on  
finit[1]: service_step():                ttyd(4649):  running  enabled/clean   cond:on  
finit[1]: pidfile_update_conds():path: /run/netd.pid, mask: 00000004
finit[1]: pidfile_update_conds():Found svc netd for /run/netd.pid with pid 4673
finit[1]: pidfile_update_conds():Setting netd PID file to /run/netd.pid
finit[1]: cond_set_noupdate():pid/netd
finit[1]: cond_set_path():/run/finit/cond/pid/netd <= 2
finit[1]: cond_update():pid/netd: match <pid/netd> Zebra routing daemon(zebra)
finit[1]: service_step():               zebra(6410):   paused  enabled/clean   cond:on  
finit[1]: cond_set_oneshot():service/zebra/running
finit[1]: cond_set_oneshot_noupdate():service/zebra/running => /run/finit/cond/service/zebra/running
finit[1]: service_step():Reassert condition pid/zebra
finit[1]: cond_set_path():/run/finit/cond/pid/zebra <= 2
finit[1]: service_step():Reassert zebra ready condition
finit[1]: cond_set_noupdate():service/zebra/ready
finit[1]: cond_set_path():/run/finit/cond/service/zebra/ready <= 2
finit[1]: service_step():               zebra(6410): ->  running
finit[1]: service_step():               zebra(6410):  running  enabled/clean   cond:on  
finit[1]: cond_set_noupdate():service/netd/ready
finit[1]: cond_set_path():/run/finit/cond/service/netd/ready <= 2

admin@R1:~$ initctl 
finit[1]: api_cb():get runlevel
PID   IDENT       STATUS   RUNLEVELS     DESCRIPTION                                                                                                         
1470  udevd       running  [S-12345-789] Device event daemon (udev)
2738  dbus        running  [S-123456789] D-Bus message bus daemon
3728  confd       running  [S-12345----] Configuration daemon
2743  dnsmasq     running  [S-12345----] DHCP/DNS proxy
4572  tty:S0      running  [--12345-789] Getty on ttyS0
4573  tty:hvc0    running  [--12345-789] Getty on hvc0
2768  iitod       running  [S0123456789] LED daemon
4574  klishd      running  [---2345----] CLI backend daemon
6400  mdns-alias  running  [---2345----] mDNS alias advertiser 
4575  mgmtd       running  [---2345----] FRR MGMT daemon
0     mstpd       stopped  [S0123456789] Spanning Tree daemon
4576  netopeer    running  [--12345----] NETCONF server
4673  netd        running  [S-12345----] Network route daemon
4577  rauc        running  [---2345----] Software update service
0     resolvconf  done     [S-12345----] Update DNS configuration
4547  statd       running  [S-12345----] Status daemon
6414  staticd     paused   [---2345----] Static routing daemon
2734  syslogd     running  [S0123456789] System log daemon
2735  watchdogd   running  [S0123456789] System watchdog daemon
6410  zebra       running  [---2345----] Zebra routing daemon
6398  mdns        running  [---2345----] Avahi mDNS-SD daemon
4602  lldpd       running  [---2345----] LLDP daemon (IEEE 802.1ab)
6404  netbrowse   running  [---2345----] Network browser
4629  nginx       running  [---2345----] Web server
4631  rousette    running  [--12345----] RESTCONF server
4648  sshd        running  [---2345----] OpenSSH daemon
4649  ttyd        running  [---2345----] Web terminal daemon (ttyd)
admin@R1:~$ 
admin@R1:~$ initctl show zebra
service <!pid/netd> pid:!/run/frr/zebra.pid env:-/etc/default/zebra  \
	[2345] zebra $ZEBRA_ARGS -- Zebra routing daemon
admin@R1:~$ 

@mattiaswal
Copy link
Collaborator

mattiaswal commented Feb 12, 2026

Got it with an earlier version of FRR with a simpler dependency chain just zebra->staticd

admin@tauri:~$ initctl --version
Finit 4.15
admin@tauri:~$ initctl 
PID   IDENT                         STATUS   RUNLEVELS     DESCRIPTION                                                                                        
1357  udevd                         running  [S-12345-789] Device event daemon (udev)
2682  dbus                          running  [S-123456789] D-Bus message bus daemon
3586  confd                         running  [S-12345----] Configuration daemon
4785  dnsmasq                       running  [S-12345----] DHCP/DNS proxy
4786  tty:0                         running  [--12345-789] Getty on tty0
4787  tty:S0                        running  [--12345-789] Getty on ttyS0
2722  iitod                         running  [S0123456789] LED daemon
4788  klishd                        running  [---2345----] CLI backend daemon
4801  mdns-alias                    running  [---2345----] mDNS alias advertiser 
0     mstpd                         stopped  [S0123456789] Spanning Tree daemon
4790  netopeer                      running  [--12345----] NETCONF server
4792  rauc                          running  [---2345----] Software update service
0     resolvconf                    done     [S-12345----] Update DNS configuration
4725  statd                         running  [S-12345----] Status daemon
0     staticd                       waiting  [---2345----] Static routing daemon
2678  syslogd                       running  [S0123456789] System log daemon
2679  watchdogd                     running  [S0123456789] System watchdog daemon
4799  zebra                         running  [---2345----] Zebra routing daemon
4800  mdns                          running  [---2345----] Avahi mDNS-SD daemon
4810  chronyd                       running  [---2345----] Chrony NTP v3/v4 daemon
0     dhcp-client:wan               waiting  [---2345----] DHCP client @wan
4813  firewalld                     running  [---2345----] Firewall daemon
4816  hostapd:radio0                running  [---2345----] Wi-Fi Access Point @radio0
4822  hostapd:radio1                running  [---2345----] Wi-Fi Access Point @radio1
4825  lldpd                         running  [---2345----] LLDP daemon (IEEE 802.1ab)
4827  netbrowse                     running  [---2345----] Network browser
4829  nginx                         running  [---2345----] Web server
4832  rousette                      running  [--12345----] RESTCONF server
4837  sshd                          running  [---2345----] OpenSSH daemon
4838  ttyd                          running  [---2345----] Web terminal daemon (ttyd)
4839  wpa_supplicant:wifi0-station  running  [---2345----] Wi-Fi Station @wifi0-station

4799  zebra                         running  [---2345----] Zebra routing daemon
initctl cond
PID   IDENT                         STATUS  CONDITION (+ ON, ~ FLUX, - OFF)                                                                                   
2682  dbus                          on      <+pid/syslogd>
3586  confd                         on      <+run/bootstrap/success>
4785  dnsmasq                       on      <+pid/syslogd>
2722  iitod                         on      <+usr/product,+usr/led>
4801  mdns-alias                    on      <+service/mdns/running>
4790  netopeer                      on      <+pid/confd>
4792  rauc                          on      <+service/dbus/running>
0     resolvconf                    on      <+usr/bootstrap>
4725  statd                         on      <+pid/confd>
0     staticd                       off     <-pid/zebra>
2678  syslogd                       on      <+run/udevadm:post/success>
0     dhcp-client:wan               off     <-net/wan/running>
4813  firewalld                     on      <+pid/syslogd>
4829  nginx                         on      <+usr/mkcert>
4832  rousette                      on      <+pid/confd>
4837  sshd                          on      <+pid/syslogd>

In a setup like this, when 'netd' is marked dirty and subsequently is
reloaded, e.g., using 'initctl reload', zebra is properly restarted,
but staticd isn't:

mgmtd <!> ← netd <pid/mgmtd> ← zebra <!pid/netd> ← staticd <!pid/zebra>

Finit must invalidate the condition of zebra to trigger a restart also
of staticd.  This to guard against daemons like zebra that may fail to
clean up their pidfiles.

Fixes #475

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
@troglobit
Copy link
Collaborator Author

@mattiaswal new fix, and updated test to better mimic your actual use-case

When 'initctl reload' is called after marking a service in a dependency
chain dirty, Finit fails to restart (unfreeze) affected services.

This patch updates the pidfile plugin to watch for IN_ATTRIB changes,
e.g. when a process uses utimensat() to update its pidfile, and adds
service_step_all() at end of reload cycle to guarantee convergence
after conditions are reasserted.

Issue #476

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Conditions in Finit are dependencies: if A is asserted, service B is
allowed to run.  When A goes through FLUX (e.g., upstream reloads),
dependents are PAUSED and then simply resumed when the condition is
reasserted -- this is the correct behavior for barrier-style deps
like <pid/syslogd>.

However, some setups have tightly coupled services where dependents
must be reloaded/restarted when an upstream service reloads, not just
resumed.  E.g., the FRR routing stack on Infix OS:

    netd <pid/mgmtd> ← zebra <!pid/netd> ← {staticd,ripd} <!pid/zebra>

When netd reloads (SIGHUP), zebra and its dependents must be restarted
to pick up the new configuration.

The new '~' condition prefix marks a dependency as flux-sensitive:

    service <!~pid/netd> name:zebra ...

When the upstream condition goes FLUX and returns to ON, the dependent
is reloaded (SIGHUP) or restarted (noreload '!') instead of merely
resumed.  Transitivity follows naturally through the condition chain.

Closes #416
Closes #476

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments