Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions doc/ChangeLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,26 @@ All relevant changes are documented in this file.
systemd `RemainAfterExit=yes`, by Aaron Andersen
- Clear service conditions on `initctl reload NAME` to ensure dependent
services are properly updated
- Add `~` condition prefix to propagate reload from a dependency to the
dependent service. E.g., `<!~pid/netd>` means not only a regular
condition, but when `netd` reloads, restart this service too. Similar
to systemd's directive `PropagatesReloadTo=`, but declared on the
consumer side. Issue #416

### Fixes
- Fix #464: invalid user:group examples in cgroups.md
- Fix #466: elogind path for Debian-based distros, by Jackie Liu
- Fix #467: TTY services stuck in restart state after non-zero exit.
Throttling logic introduced in v4.15 had duplicate checks causing
infinite timer loop, and TTYs lacked default restart timeout
- Fix #475: clear pid condition on service collection to fix stale
deps. When a service crashes (SIGKILL), the RUNNING → HALTED path
bypasses STOPPING where `cond_clear()` is normally called, leaving
dependents stuck
- Fix #476: dependents not restarted after SIGHUP reload of service in
dependency chain. Add `service_step_all()` at end of reload cycle to
guarantee convergence after conditions are reasserted. See also the
new `~` condition prefix (above) to propagate reload to dependents
- Fix handling of already-mounted cgroups in `cgroup_init()`, can occur
after switch_root or in container environments
- Improve cgroups documentation clarity, grammar, and examples
Expand Down
37 changes: 33 additions & 4 deletions doc/conditions.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ specified separated by comma. Multiple conditions are logically AND'ed
during evaluation, i.e. all conditions must be satisfied in order for a
service to run.

A special syntax, using a leading `!` in run/task/service conditions,
denote if a:
Two special prefixes can be used inside the angle brackets:

- service does not support `SIGHUP`
- run/task should not block runlevel changes (i.e., bootstrap)
- `!` -- service does not support `SIGHUP` (noreload), or run/task
should not block runlevel changes (i.e., bootstrap)
- `~` -- propagate reload from this dependency, see below

Finit guarantees by default that all run/tasks run (at least) once
per runlevel. For most tasks this is a good default, for example
Expand Down Expand Up @@ -53,6 +53,30 @@ moving to the next runlevel after bootstrap, so we set `<!>`:
task [S0123456789] <!sys/pwr/fail> name:pwrfail initctl poweroff -- Power failure, shutting down


Propagating Reload in Dependencies
-----------------------------------

By default, when a service reloads during `initctl reload`, dependent
services are paused (`SIGSTOP`) and simply resumed (`SIGCONT`) when the
condition is reasserted. This is correct for barrier-style dependencies
like `<pid/syslogd>`, where dependents just need syslogd running and do
not care if it reloads its config.

For services that need to react when their upstream reloads, the `~`
prefix propagates the reload from the dependency:

service <pid/svc_a> name:svc_b /sbin/svc_b -- Needs A (barrier)
service <!~pid/svc_b> name:svc_c /sbin/svc_c -- Propagate reload from B

Here, `<~pid/svc_b>` means: propagate a reload of `svc_b` to `svc_c`.
When `svc_b` reloads, `svc_c` will be restarted (because of `!`,
noreload) instead of merely resumed. If `svc_c` supported `SIGHUP`
(no `!` prefix), it would be sent `SIGHUP` instead.

This is similar to systemd's `PropagatesReloadTo=` directive, but
declared on the consumer side rather than the provider side.


Triggering
----------

Expand Down Expand Up @@ -281,6 +305,11 @@ This STOP/CONT handling minimizes the number of unnecessary service
restarts that would otherwise occur because a depending service was sent
`SIGHUP` for example.

Services with the `~` prefix are an exception to this rule: when their
conditions return to `on` after being in `flux`, the reload is propagated
-- the service is reloaded (SIGHUP) or restarted (noreload `!`) instead
of simply being resumed.

Therefore, any plugin that supplies Finit with conditions must ensure
that their state is updated after each reconfiguration. This can be
done by binding to the `HOOK_SVC_RECONF` hook. For an example of how
Expand Down
8 changes: 8 additions & 0 deletions doc/config/services.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,14 @@ waits for another service, `zebra`, to have created its PID file in
*all* files in `/var/run`, for each file named `*.pid`, or `*/pid`,
Finit opens it and find the matching `NAME:ID` using the PID.

The condition can be prefixed with `!` and/or `~`:

- `<!pid/zebra>` -- `ospfd` does not support `SIGHUP` (noreload)
- `<~pid/zebra>` -- propagate reload from `zebra` to `ospfd`
- `<!~pid/zebra>` -- both: noreload and propagate reload

For details, see the [Finit Conditions](../conditions.md) document.

Some services do not maintain a PID file and rather than patching each
application Finit provides a workaround. A `pid` keyword can be set
to have Finit automatically create (when starting) and later remove
Expand Down
3 changes: 2 additions & 1 deletion plugins/pidfile.c
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ static int pidfile_add_path(struct iwatch *iw, char *path)
}
}

return iwatch_add(iw, path, IN_ONLYDIR | IN_CLOSE_WRITE);
return iwatch_add(iw, path, IN_ONLYDIR | IN_CLOSE_WRITE | IN_ATTRIB);
}

static void pidfile_update_conds(char *dir, char *name, uint32_t mask)
Expand Down Expand Up @@ -276,6 +276,7 @@ static void pidfile_reconf(void *arg)
if (cond_get(cond) == COND_ON)
continue;

dbg("Reassert condition %s", cond);
cond_set_path(cond_path(cond), COND_ON);
}

Expand Down
13 changes: 13 additions & 0 deletions src/conf.c
Original file line number Diff line number Diff line change
Expand Up @@ -745,8 +745,21 @@ void conf_parse_cond(svc_t *svc, char *cond)
return;
}

/*
* The '~' prefix means a reload of the upstream service is
* propagated to this service -- it will be reloaded (SIGHUP)
* or restarted (noreload), not just paused and resumed.
* Syntax: <!~pid/foo,~pid/bar> or <~pid/foo>
*
* Strip '~' from each condition and set the service-level
* flag. This allows '~' on any condition in the list.
*/
svc->cond[0] = 0;
for (i = 0, c = strtok(ptr, ","); c; c = strtok(NULL, ","), i++) {
if (c[0] == '~') {
c++;
svc->flux_reload = 1;
}
devmon_add_cond(c);
if (i)
strlcat(svc->cond, ",", sizeof(svc->cond));
Expand Down
39 changes: 33 additions & 6 deletions src/service.c
Original file line number Diff line number Diff line change
Expand Up @@ -1093,6 +1093,7 @@ static void service_notify_stop(svc_t *svc)
*/
static void service_cleanup(svc_t *svc)
{
char cond[MAX_COND_LEN];
char *fn;

/* PID collected, cancel any pending SIGKILL */
Expand All @@ -1103,6 +1104,18 @@ static void service_cleanup(svc_t *svc)
logit(LOG_CRIT, "Failed removing service %s pidfile %s",
svc_ident(svc, NULL, 0), fn);

/*
* Invalidate the pid/ condition for this service to ensure
* dependent services are properly stopped and restarted.
* Without this, the condition is only cleared asynchronously
* via inotify on pidfile removal, which may not trigger when
* the daemon fails to clean up its own pidfile, or when the
* service dies during a reload cycle and goes directly from
* RUNNING to HALTED (skipping STOPPING where cond_clear()
* is normally called).
*/
cond_clear(mkcond(svc, cond, sizeof(cond)));

service_notify_stop(svc);

/* No longer running, update books. */
Expand Down Expand Up @@ -1140,7 +1153,7 @@ int service_stop(svc_t *svc)
service_timeout_cancel(svc);

if (svc->stop_script[0]) {
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling stop:%s ...", id, svc->pid, svc->stop_script);
logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], calling stop:%s ...", id, svc->pid, svc->stop_script);
} else if (!svc_is_sysv(svc)) {
char *nm = pid_get_name(svc->pid, NULL, 0);
const char *sig = sig_name(svc->sighalt);
Expand All @@ -1155,10 +1168,10 @@ int service_stop(svc_t *svc)
}

dbg("Sending %s to pid:%d name:%s(%s)", sig, svc->pid, id, nm);
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], stopping, sending %s ...", id, svc->pid, sig);
logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], sending %s ...", id, svc->pid, sig);
} else {
compose_cmdline(svc, cmdline, sizeof(cmdline));
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling '%s stop' ...", id, svc->pid, cmdline);
logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], calling '%s stop' ...", id, svc->pid, cmdline);
}

/*
Expand Down Expand Up @@ -1278,16 +1291,16 @@ static int service_reload(svc_t *svc)
print_desc("Restarting ", svc->desc);

if (svc->reload_script[0]) {
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling reload:%s ...", id, svc->pid, svc->reload_script);
logit(LOG_CONSOLE | LOG_NOTICE, "Reloading %s[%d], calling reload:%s ...", id, svc->pid, svc->reload_script);
rc = service_run_script(svc, svc->reload_script);
} else if (svc->sighup) {
if (svc->pid <= 1) {
dbg("%s[%d]: bad PID, cannot reload service", id, svc->pid);
svc->start_time = svc->pid = 0;
goto done;
}
dbg("%s[%d], sending SIGHUP", id, svc->pid);
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], sending SIGHUP ...", id, svc->pid);
dbg("Reloading %s[%d], sending SIGHUP", id, svc->pid);
logit(LOG_CONSOLE | LOG_NOTICE, "Reloading %s[%d], sending SIGHUP ...", id, svc->pid);
rc = kill(svc->pid, SIGHUP);
if (rc == -1 && (errno == ESRCH || errno == ENOENT)) {
/* nobody home, reset internal state machine */
Expand Down Expand Up @@ -3133,6 +3146,20 @@ int service_step(svc_t *svc)
switch (cond) {
case COND_ON:
kill(svc->pid, SIGCONT);

/*
* Propagate reload (~): upstream reloaded, so
* we must also reload/restart, not just resume.
*/
if (svc->flux_reload && !svc_is_changed(svc)) {
svc_set_state(svc, SVC_RUNNING_STATE);
if (svc_is_noreload(svc))
service_stop(svc);
else
service_reload(svc);
break;
}

svc_set_state(svc, SVC_RUNNING_STATE);
/* Reassert condition if we go from waiting and no change */
if (!svc_is_changed(svc)) {
Expand Down
12 changes: 12 additions & 0 deletions src/sm.c
Original file line number Diff line number Diff line change
Expand Up @@ -510,15 +510,27 @@ void sm_step(void)
/* Cleanup stale services */
svc_clean_dynamic(service_unregister);

dbg("Step all services waiting for reasserted conditions ...");
service_step_all(SVC_TYPE_ANY);

dbg("Calling reconf hooks ...");
plugin_run_hooks(HOOK_SVC_RECONF);

dbg("Step all services waiting for reasserted conditions ...");
service_step_all(SVC_TYPE_ANY);

dbg("Update configuration generation of device conditions ...");
devmon_reconf();

dbg("Step all services waiting for reasserted conditions ...");
service_step_all(SVC_TYPE_ANY);

dbg("Update configuration generation of unmodified non-native services ...");
service_notify_reconf();

dbg("Step all services waiting for reasserted conditions ...");
service_step_all(SVC_TYPE_ANY);

dbg("Reconfiguration done");
sm.state = SM_RUNNING_STATE;
break;
Expand Down
1 change: 1 addition & 0 deletions src/svc.h
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ typedef struct svc {
int starting; /* ... waiting for pidfile to be re-asserted */
int runlevels;
int sighup; /* This service supports SIGHUP :) */
int flux_reload; /* Propagate reload from dependency, '~' prefix */
int forking; /* This is a service/sysv daemon that forks, wait for it ... */
svc_block_t block; /* Reason that this service is currently stopped */
char cond[MAX_COND_LEN];
Expand Down
2 changes: 2 additions & 0 deletions test/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ EXTRA_DIST += add-remove-dynamic-service-sub-config.sh
EXTRA_DIST += bootstrap-crash.sh
EXTRA_DIST += cond-start-task.sh
EXTRA_DIST += crashing.sh
EXTRA_DIST += dep-chain-reload.sh
EXTRA_DIST += depserv.sh
EXTRA_DIST += devmon.sh
EXTRA_DIST += failing-sysv.sh
Expand Down Expand Up @@ -74,6 +75,7 @@ TESTS += add-remove-dynamic-service-sub-config.sh
TESTS += bootstrap-crash.sh
TESTS += cond-start-task.sh
TESTS += crashing.sh
TESTS += dep-chain-reload.sh
TESTS += depserv.sh
TESTS += devmon.sh
TESTS += failing-sysv.sh
Expand Down
Loading