Commit 9ab5da8
Limit ray actors (#220)
Limit ray actors to be 4, to avoid cases were app crashes due to out of
memory.
When tested OSS installation on M1 with 10 cores - we got an out of
memory error:
`raise value\nray.exceptions.OutOfMemoryError: Task was killed due to
the node running low on memory.\nMemory on the node (IP: 10.5.0.6, ID:
ccaf0ebfab0bcfe9ace62f58b7b188bd70cec9f6e62154b6ab30751a) where the task
(actor ID: 49f3ed706a5e9912f2268b5501000000,
name=CheckExecutor-3:CheckPerWindowExecutor.__init__, pid=4143, memory
used=0.36GB) was running was 7.30GB / 7.67GB (0.95197), which exceeds
the memory usage threshold of 0.95. Ray killed this worker (ID:
9eb84eac980461996027638fce5a80848572761d7b55504ca96e4568) because it was
the most recently scheduled task; to see more information about memory
usage on this node, use ray logs raylet.out -ip 10.5.0.6. To see the
logs of the worker, use ray logs
worker-9eb84eac980461996027638fce5a80848572761d7b55504ca96e4568*out -ip
10.5.0.6. Top 10 memory
users:\nPID\tMEM(GB)\tCOMMAND\n277\t0.39\t/usr/local/bin/python -c from
multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5,
pipe...\n279\t0.39\t/usr/local/bin/python -c from multiprocessing.spawn
import spawn_main; spawn_main(tracker_fd=5,
pipe...\n278\t0.39\t/usr/local/bin/python -c from multiprocessing.spawn
import spawn_main; spawn_main(tracker_fd=5,
pipe...\n276\t0.39\t/usr/local/bin/python -c from multiprocessing.spawn
import spawn_main; spawn_main(tracker_fd=5,
pipe...\n4143\t0.36\tray::CheckPerWindowExecutor\n361\t0.36\tray::CheckPerWindowExecutor\n356\t0.36\tray::CheckPerWindowExecutor\n357\t0.36\tray::CheckPerWindowExecutor\n358\t0.36\tray::CheckPerWindowExecutor\n101\t0.02\t/usr/local/bin/python
/usr/local/lib/python3.11/site-packages/ray/dashboard/dashboard.py
--host=loca...\nRefer to the documentation on how to address the out of
memory issue:
https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html.
Consider provisioning more memory on this node or reducing task
parallelism by requesting more CPUs per task. To adjust the kill
threshold, set the environment variable RAY_memory_usage_threshold when
starting Ray. To disable worker killing, set the environment variable
RAY_memory_monitor_refresh_ms to zero."}`
After consulting Yurii and Matan, Yurii suggested to lower the number of
actors - that seems to solve the problem.
Adding it to the default setting is in order to lower the cases of out
of memory errors on default installations.
---------
Co-authored-by: Matan Perlmutter <matan@deepchecks.com>1 parent bec8e82 commit 9ab5da8
1 file changed
+1
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
| |||
0 commit comments