Describe the bug
The log settings defined by logging_on(), and therefore by any trollflow2 process, are not inherited by tasks scheduled using dask.distributed when called inside an if __name__ == "__main__" block. This is what happens when using trollflow2 normally. As a consequence, only messages of level warning or above are displayed and without the configured log definition.
To Reproduce
import logging
import dask
import dask.array as da
import numpy as np
from dask.distributed import Client
from trollflow2.logging import logging_on
logger = logging.getLogger(__name__)
@dask.delayed
def test():
logger.warning("xilx xilx")
logger.info("qral larq")
logger.debug("coic cioc")
return np.array([0])
def main():
arr = da.from_delayed(test(), shape=(1,), dtype="uint16")
print(arr.compute())
if __name__ == "__main__":
with logging_on():
with Client():
main()
Expected behavior
I expect all log messages to be shown, as they are when using the default scheduler.
Actual results
Full output (stderr + stdout):
[DEBUG: 2024-08-01 12:50:08 : asyncio] Using selector: EpollSelector
xilx xilx
[0]
For reference, when leaving out the Client() context manager, the output is as expected:
[WARNING: 2024-08-01 12:50:38 : __main__] xilx xilx
[INFO: 2024-08-01 12:50:38 : __main__] qral larq
[DEBUG: 2024-08-01 12:50:38 : __main__] coic cioc
[0]
When I call logging_on() outside the if __name__ == "__main__", I get the log messages (and others):
[DEBUG: 2024-08-01 12:56:41 : asyncio] Using selector: EpollSelector
[DEBUG: 2024-08-01 12:56:42 : asyncio] Using selector: EpollSelector
[DEBUG: 2024-08-01 12:56:42 : asyncio] Using selector: EpollSelector
[DEBUG: 2024-08-01 12:56:42 : asyncio] Using selector: EpollSelector
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Locking '/tmp/dask-scratch-space/worker-6flgbr8t.dirlock'...
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Checking lock file '/tmp/dask-scratch-space/scheduler-75snhs1x.dirlock'...
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Locking '/tmp/dask-scratch-space/worker-avvfalg_.dirlock'...
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Checking lock file '/tmp/dask-scratch-space/worker-6flgbr8t.dirlock'...
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Checking lock file '/tmp/dask-scratch-space/scheduler-75snhs1x.dirlock'...
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Checking lock file '/tmp/dask-scratch-space/worker-avvfalg_.dirlock'...
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Locking '/tmp/dask-scratch-space/worker-xwpb7xsk.dirlock'...
[INFO: 2024-08-01 12:56:42 : distributed.worker] Start worker at: tcp://127.0.0.1:36357
[INFO: 2024-08-01 12:56:42 : distributed.worker] Listening to: tcp://127.0.0.1:36357
[INFO: 2024-08-01 12:56:42 : distributed.worker] Worker name: 0
[INFO: 2024-08-01 12:56:42 : distributed.worker] dashboard at: 127.0.0.1:43539
[INFO: 2024-08-01 12:56:42 : distributed.worker] Waiting to connect to: tcp://127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.worker] Start worker at: tcp://127.0.0.1:46133
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[INFO: 2024-08-01 12:56:42 : distributed.worker] Listening to: tcp://127.0.0.1:46133
[INFO: 2024-08-01 12:56:42 : distributed.worker] Threads: 2
[INFO: 2024-08-01 12:56:42 : distributed.worker] Worker name: 2
[INFO: 2024-08-01 12:56:42 : distributed.worker] Memory: 7.80 GiB
[INFO: 2024-08-01 12:56:42 : distributed.worker] Local Directory: /tmp/dask-scratch-space/worker-avvfalg_
[INFO: 2024-08-01 12:56:42 : distributed.worker] dashboard at: 127.0.0.1:34943
[INFO: 2024-08-01 12:56:42 : distributed.worker] Waiting to connect to: tcp://127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[INFO: 2024-08-01 12:56:42 : distributed.worker] Threads: 2
[INFO: 2024-08-01 12:56:42 : distributed.worker] Memory: 7.80 GiB
[INFO: 2024-08-01 12:56:42 : distributed.worker] Local Directory: /tmp/dask-scratch-space/worker-6flgbr8t
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.core] Establishing connection to 127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.core] Establishing connection to 127.0.0.1:34505
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.tcp] Setting TCP keepalive: nprobes=10, idle=10, interval=2
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.tcp] Setting TCP user timeout: 30000 ms
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.tcp] Setting TCP keepalive: nprobes=10, idle=10, interval=2
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.tcp] Setting TCP user timeout: 30000 ms
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.core] Connection to 127.0.0.1:34505 established
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.core] Connection to 127.0.0.1:34505 established
[INFO: 2024-08-01 12:56:42 : distributed.worker] Start worker at: tcp://127.0.0.1:35289
[INFO: 2024-08-01 12:56:42 : distributed.worker] Listening to: tcp://127.0.0.1:35289
[INFO: 2024-08-01 12:56:42 : distributed.worker] Worker name: 3
[INFO: 2024-08-01 12:56:42 : distributed.worker] dashboard at: 127.0.0.1:36137
[INFO: 2024-08-01 12:56:42 : distributed.worker] Waiting to connect to: tcp://127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[INFO: 2024-08-01 12:56:42 : distributed.worker] Threads: 2
[INFO: 2024-08-01 12:56:42 : distributed.worker] Memory: 7.80 GiB
[INFO: 2024-08-01 12:56:42 : distributed.worker] Local Directory: /tmp/dask-scratch-space/worker-xwpb7xsk
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.core] Establishing connection to 127.0.0.1:34505
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.tcp] Setting TCP keepalive: nprobes=10, idle=10, interval=2
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.tcp] Setting TCP user timeout: 30000 ms
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.core] Connection to 127.0.0.1:34505 established
[DEBUG: 2024-08-01 12:56:42 : asyncio] Using selector: EpollSelector
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Checking lock file '/tmp/dask-scratch-space/worker-6flgbr8t.dirlock'...
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Checking lock file '/tmp/dask-scratch-space/scheduler-75snhs1x.dirlock'...
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Checking lock file '/tmp/dask-scratch-space/worker-avvfalg_.dirlock'...
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Checking lock file '/tmp/dask-scratch-space/worker-xwpb7xsk.dirlock'...
[DEBUG: 2024-08-01 12:56:42 : distributed.diskutils] Locking '/tmp/dask-scratch-space/worker-j5jqbicd.dirlock'...
[INFO: 2024-08-01 12:56:42 : distributed.worker] Start worker at: tcp://127.0.0.1:43163
[INFO: 2024-08-01 12:56:42 : distributed.worker] Listening to: tcp://127.0.0.1:43163
[INFO: 2024-08-01 12:56:42 : distributed.worker] Worker name: 1
[INFO: 2024-08-01 12:56:42 : distributed.worker] dashboard at: 127.0.0.1:37003
[INFO: 2024-08-01 12:56:42 : distributed.worker] Waiting to connect to: tcp://127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[INFO: 2024-08-01 12:56:42 : distributed.worker] Threads: 2
[INFO: 2024-08-01 12:56:42 : distributed.worker] Memory: 7.80 GiB
[INFO: 2024-08-01 12:56:42 : distributed.worker] Local Directory: /tmp/dask-scratch-space/worker-j5jqbicd
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.core] Establishing connection to 127.0.0.1:34505
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.tcp] Setting TCP keepalive: nprobes=10, idle=10, interval=2
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.tcp] Setting TCP user timeout: 30000 ms
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.core] Connection to 127.0.0.1:34505 established
[INFO: 2024-08-01 12:56:42 : distributed.worker] Starting Worker plugin shuffle
[INFO: 2024-08-01 12:56:42 : distributed.worker] Registered to: tcp://127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[INFO: 2024-08-01 12:56:42 : distributed.core] Starting established connection to tcp://127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.worker] Starting Worker plugin shuffle
[INFO: 2024-08-01 12:56:42 : distributed.worker] Registered to: tcp://127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.worker] Starting Worker plugin shuffle
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[INFO: 2024-08-01 12:56:42 : distributed.core] Starting established connection to tcp://127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.worker] Registered to: tcp://127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[INFO: 2024-08-01 12:56:42 : distributed.worker] Starting Worker plugin shuffle
[INFO: 2024-08-01 12:56:42 : distributed.worker] Registered to: tcp://127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.core] Starting established connection to tcp://127.0.0.1:34505
[INFO: 2024-08-01 12:56:42 : distributed.worker] -------------------------------------------------
[INFO: 2024-08-01 12:56:42 : distributed.core] Starting established connection to tcp://127.0.0.1:34505
[WARNING: 2024-08-01 12:56:42 : __main__] xilx xilx
[INFO: 2024-08-01 12:56:42 : __main__] qral larq
[DEBUG: 2024-08-01 12:56:42 : __main__] coic cioc
[DEBUG: 2024-08-01 12:56:42 : distributed.worker.state_machine] Data task <TaskState ('test-from-value-03edcc0a2b842080ed8e1e36abea720a', 0) memory> already known (stimulus_id=compute-task-1722509802.9544415)
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.tcp] Incoming connection from 'tcp://127.0.0.1:42440' to 'tcp://127.0.0.1:35289'
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.tcp] Setting TCP keepalive: nprobes=10, idle=10, interval=2
[DEBUG: 2024-08-01 12:56:42 : distributed.comm.tcp] Setting TCP user timeout: 30000 ms
[DEBUG: 2024-08-01 12:56:42 : distributed.core] Connection from 'tcp://127.0.0.1:42440' to Worker
[DEBUG: 2024-08-01 12:56:42 : distributed.core] Message from 'tcp://127.0.0.1:42440': {'op': 'get_data', 'keys': (('from-value-03edcc0a2b842080ed8e1e36abea720a', 0),), 'who': None, 'reply': True}
[DEBUG: 2024-08-01 12:56:42 : distributed.core] Calling into handler get_data
[DEBUG: 2024-08-01 12:56:42 : distributed.worker.state_machine] Purge task: <TaskState ('test-from-value-03edcc0a2b842080ed8e1e36abea720a', 0) memory>
[DEBUG: 2024-08-01 12:56:42 : distributed.worker.state_machine] Purge task: <TaskState ('from-value-03edcc0a2b842080ed8e1e36abea720a', 0) memory>
[DEBUG: 2024-08-01 12:56:42 : distributed.worker.state_machine] Purge task: <TaskState ('from-value-03edcc0a2b842080ed8e1e36abea720a', 0) released>
[DEBUG: 2024-08-01 12:56:42 : distributed.worker.state_machine] Purge task: <TaskState ('test-from-value-03edcc0a2b842080ed8e1e36abea720a', 0) released>
[INFO: 2024-08-01 12:56:42 : distributed.worker] Stopping worker at tcp://127.0.0.1:46133. Reason: nanny-close
[INFO: 2024-08-01 12:56:42 : distributed.worker] Stopping worker at tcp://127.0.0.1:36357. Reason: nanny-close
[INFO: 2024-08-01 12:56:42 : distributed.worker] Stopping worker at tcp://127.0.0.1:43163. Reason: nanny-close
[INFO: 2024-08-01 12:56:42 : distributed.worker] Stopping worker at tcp://127.0.0.1:35289. Reason: nanny-close
[INFO: 2024-08-01 12:56:42 : distributed.core] Connection to tcp://127.0.0.1:34505 has been closed.
[DEBUG: 2024-08-01 12:56:42 : distributed.worker] Attempted to close worker that is already Status.closing. Reason: worker-handle-scheduler-connection-broken
[INFO: 2024-08-01 12:56:42 : distributed.core] Connection to tcp://127.0.0.1:34505 has been closed.
[DEBUG: 2024-08-01 12:56:42 : distributed.worker] Attempted to close worker that is already Status.closing. Reason: worker-handle-scheduler-connection-broken
[INFO: 2024-08-01 12:56:42 : distributed.core] Connection to tcp://127.0.0.1:34505 has been closed.
[DEBUG: 2024-08-01 12:56:42 : distributed.worker] Attempted to close worker that is already Status.closing. Reason: worker-handle-scheduler-connection-broken
[INFO: 2024-08-01 12:56:42 : distributed.core] Connection to tcp://127.0.0.1:34505 has been closed.
[DEBUG: 2024-08-01 12:56:42 : distributed.worker] Attempted to close worker that is already Status.closing. Reason: worker-handle-scheduler-connection-broken
[DEBUG: 2024-08-01 12:56:42 : distributed.worker] Attempted to close worker that is already Status.closed. Reason: worker-close
[DEBUG: 2024-08-01 12:56:42 : distributed.worker] Attempted to close worker that is already Status.closed. Reason: worker-close
[DEBUG: 2024-08-01 12:56:42 : distributed.worker] Attempted to close worker that is already Status.closed. Reason: worker-close
[DEBUG: 2024-08-01 12:56:42 : distributed.core] Lost connection to 'tcp://127.0.0.1:42440' while reading message: in <TCP (closed) local=tcp://127.0.0.1:35289 remote=tcp://127.0.0.1:42440>: Stream is closed. Last operation: get_data
[DEBUG: 2024-08-01 12:56:42 : distributed.worker] Attempted to close worker that is already Status.closed. Reason: worker-close
[0]
Environment Info:
- OS: [e.g. OSX, Windows, Linux]
- Trollflow2 Version: main (739b1f9)
Additional context
From this stackoverflow question:
As of version 1.15.0 we now fork workers from a clean process, so changes that you make to your process prior to calling Client() won't affect forked workers. For more information search for forkserver here: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
so to solve this, logging_on() should be called after Client().
Describe the bug
The log settings defined by
logging_on(), and therefore by any trollflow2 process, are not inherited by tasks scheduled using dask.distributed when called inside anif __name__ == "__main__"block. This is what happens when using trollflow2 normally. As a consequence, only messages of level warning or above are displayed and without the configured log definition.To Reproduce
Expected behavior
I expect all log messages to be shown, as they are when using the default scheduler.
Actual results
Full output (stderr + stdout):
For reference, when leaving out the
Client()context manager, the output is as expected:When I call
logging_on()outside theif __name__ == "__main__", I get the log messages (and others):Environment Info:
Additional context
From this stackoverflow question:
so to solve this,
logging_on()should be called afterClient().