-
Notifications
You must be signed in to change notification settings - Fork 118
Description
In Cubed we run a lot of Lithops tests on the localhost backend using GitHub Actions CI. They often hang or fail: https://github.com/cubed-dev/cubed/actions/workflows/lithops-tests.yml
Here's an example of the test hanging (from https://github.com/cubed-dev/cubed/actions/runs/16374935068/job/46272388673) and eventually being cancelled for exceeding the timeout:
2025-07-18 16:02:49,078 [INFO] invokers.py:119 -- ExecutorID d8700a-52 | JobID M002 - Selected Runtime: python
2025-07-18 16:02:49,079 [INFO] invokers.py:186 -- ExecutorID d8700a-52 | JobID M002 - Starting function invocation: <lambda>() - Total: 4 activations
2025-07-18 16:02:49,080 [INFO] invokers.py:225 -- ExecutorID d8700a-52 | JobID M002 - View execution logs at /tmp/lithops-runner/logs/d8700a-52-M002.log
2025-07-18 16:02:49,080 [INFO] wait.py:105 -- ExecutorID d8700a-52 - Waiting for any of 4 function activations to complete
Error: The operation was canceled.
Here's an example of a failure (from https://github.com/cubed-dev/cubed/actions/runs/16411249471/job/46366557026):
Details
__________________________ test_default_spec[lithops] __________________________
executor = <cubed.runtime.executors.lithops.LithopsExecutor object at 0x7f533ccac450>
def test_default_spec(executor):
# default spec works for small computations
a = xp.ones((3, 3), chunks=(2, 2))
b = xp.negative(a)
assert_array_equal(
> b.compute(executor=executor),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-np.ones((3, 3)),
)
cubed/tests/test_core.py:347:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cubed/core/array.py:151: in compute
result = compute(
cubed/core/array.py:290: in compute
plan.execute(
cubed/core/plan.py:317: in execute
executor.execute_dag(
cubed/runtime/executors/lithops.py:271: in execute_dag
execute_dag(
cubed/runtime/executors/lithops.py:180: in execute_dag
function_executor = FunctionExecutor(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/executors.py:143: in __init__
self.invoker = create_invoker(
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/invokers.py:55: in create_invoker
return BatchInvoker(
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/invokers.py:257: in __init__
self.compute_handler.init()
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/localhost/v1/localhost.py:91: in init
self.env.setup()
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/localhost/v1/localhost.py:282: in setup
self._copy_lithops_to_tmp()
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/localhost/v1/localhost.py:217: in _copy_lithops_to_tmp
shutil.copytree(LITHOPS_LOCATION, os.path.join(LITHOPS_TEMP_DIR, 'lithops'))
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/shutil.py:573: in copytree
return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
entries = [<DirEntry 'serverless'>, <DirEntry 'version.py'>, <DirEntry '__init__.py'>, <DirEntry 'monitor.py'>, <DirEntry 'invokers.py'>, <DirEntry 'config.py'>, ...]
src = '/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops'
dst = '/tmp/lithops-runner/lithops', symlinks = False, ignore = None
copy_function = <function copy2 at 0x7f534db0ff60>
ignore_dangling_symlinks = False, dirs_exist_ok = False
def _copytree(entries, src, dst, symlinks, ignore, copy_function,
ignore_dangling_symlinks, dirs_exist_ok=False):
if ignore is not None:
ignored_names = ignore(os.fspath(src), [x.name for x in entries])
else:
ignored_names = ()
os.makedirs(dst, exist_ok=dirs_exist_ok)
errors = []
use_srcentry = copy_function is copy2 or copy_function is copy
for srcentry in entries:
if srcentry.name in ignored_names:
continue
srcname = os.path.join(src, srcentry.name)
dstname = os.path.join(dst, srcentry.name)
srcobj = srcentry if use_srcentry else srcname
try:
is_symlink = srcentry.is_symlink()
if is_symlink and os.name == 'nt':
# Special check for directory junctions, which appear as
# symlinks but we want to recurse.
lstat = srcentry.stat(follow_symlinks=False)
if lstat.st_reparse_tag == stat.IO_REPARSE_TAG_MOUNT_POINT:
is_symlink = False
if is_symlink:
linkto = os.readlink(srcname)
if symlinks:
# We can't just leave it to `copy_function` because legacy
# code with a custom `copy_function` may rely on copytree
# doing the right thing.
os.symlink(linkto, dstname)
copystat(srcobj, dstname, follow_symlinks=not symlinks)
else:
# ignore dangling symlink if the flag is on
if not os.path.exists(linkto) and ignore_dangling_symlinks:
continue
# otherwise let the copy occur. copy2 will raise an error
if srcentry.is_dir():
copytree(srcobj, dstname, symlinks, ignore,
copy_function, ignore_dangling_symlinks,
dirs_exist_ok)
else:
copy_function(srcobj, dstname)
elif srcentry.is_dir():
copytree(srcobj, dstname, symlinks, ignore, copy_function,
ignore_dangling_symlinks, dirs_exist_ok)
else:
# Will raise a SpecialFileError for unsupported file types
copy_function(srcobj, dstname)
# catch the Error from the recursive copytree so that we can
# continue with other files
except Error as err:
errors.extend(err.args[0])
except OSError as why:
errors.append((srcname, dstname, str(why)))
try:
copystat(src, dst)
except OSError as why:
# Copying file access times may fail on Windows
if getattr(why, 'winerror', None) is None:
errors.append((src, dst, str(why)))
if errors:
> raise Error(errors)
E shutil.Error: [('/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/__pycache__', '/tmp/lithops-runner/lithops/__pycache__', "[Errno 17] File exists: '/tmp/lithops-runner/lithops/__pycache__'")]
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/shutil.py:527: Error
Interestingly, I noticed that the tests almost always pass on macos-latest (currently macos-14) - the problems occur on ubuntu-latest (currently ubuntu-24.04). I also noticed that ubuntu-22.04 seems to produce hanging tests, but not outright failures (cubed-dev/cubed#762).
@JosepSampe I noticed that you added a workflow for running on multiple OS versions (https://github.com/lithops-cloud/lithops/blob/master/.github/workflows/tests-all-os.yml) in #1349, but it doesn't run automatically. It might be worth enabling that to run for tests or every night, and also to add ubuntu-24.04 to see if that shows the same problem I'm hitting above.