Skip to content

Lithops localhost consistently failing or hanging in CI #1438

@tomwhite

Description

@tomwhite

In Cubed we run a lot of Lithops tests on the localhost backend using GitHub Actions CI. They often hang or fail: https://github.com/cubed-dev/cubed/actions/workflows/lithops-tests.yml

Here's an example of the test hanging (from https://github.com/cubed-dev/cubed/actions/runs/16374935068/job/46272388673) and eventually being cancelled for exceeding the timeout:

2025-07-18 16:02:49,078 [INFO] invokers.py:119 -- ExecutorID d8700a-52 | JobID M002 - Selected Runtime: python 
2025-07-18 16:02:49,079 [INFO] invokers.py:186 -- ExecutorID d8700a-52 | JobID M002 - Starting function invocation: <lambda>() - Total: 4 activations
2025-07-18 16:02:49,080 [INFO] invokers.py:225 -- ExecutorID d8700a-52 | JobID M002 - View execution logs at /tmp/lithops-runner/logs/d8700a-52-M002.log
2025-07-18 16:02:49,080 [INFO] wait.py:105 -- ExecutorID d8700a-52 - Waiting for any of 4 function activations to complete
Error: The operation was canceled.

Here's an example of a failure (from https://github.com/cubed-dev/cubed/actions/runs/16411249471/job/46366557026):

Details
__________________________ test_default_spec[lithops] __________________________

executor = <cubed.runtime.executors.lithops.LithopsExecutor object at 0x7f533ccac450>

    def test_default_spec(executor):
        # default spec works for small computations
        a = xp.ones((3, 3), chunks=(2, 2))
        b = xp.negative(a)
        assert_array_equal(
>           b.compute(executor=executor),
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            -np.ones((3, 3)),
        )

cubed/tests/test_core.py:347: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
cubed/core/array.py:151: in compute
    result = compute(
cubed/core/array.py:290: in compute
    plan.execute(
cubed/core/plan.py:317: in execute
    executor.execute_dag(
cubed/runtime/executors/lithops.py:271: in execute_dag
    execute_dag(
cubed/runtime/executors/lithops.py:180: in execute_dag
    function_executor = FunctionExecutor(**kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/executors.py:143: in __init__
    self.invoker = create_invoker(
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/invokers.py:55: in create_invoker
    return BatchInvoker(
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/invokers.py:257: in __init__
    self.compute_handler.init()
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/localhost/v1/localhost.py:91: in init
    self.env.setup()
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/localhost/v1/localhost.py:282: in setup
    self._copy_lithops_to_tmp()
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/localhost/v1/localhost.py:217: in _copy_lithops_to_tmp
    shutil.copytree(LITHOPS_LOCATION, os.path.join(LITHOPS_TEMP_DIR, 'lithops'))
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/shutil.py:573: in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

entries = [<DirEntry 'serverless'>, <DirEntry 'version.py'>, <DirEntry '__init__.py'>, <DirEntry 'monitor.py'>, <DirEntry 'invokers.py'>, <DirEntry 'config.py'>, ...]
src = '/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops'
dst = '/tmp/lithops-runner/lithops', symlinks = False, ignore = None
copy_function = <function copy2 at 0x7f534db0ff60>
ignore_dangling_symlinks = False, dirs_exist_ok = False

    def _copytree(entries, src, dst, symlinks, ignore, copy_function,
                  ignore_dangling_symlinks, dirs_exist_ok=False):
        if ignore is not None:
            ignored_names = ignore(os.fspath(src), [x.name for x in entries])
        else:
            ignored_names = ()
    
        os.makedirs(dst, exist_ok=dirs_exist_ok)
        errors = []
        use_srcentry = copy_function is copy2 or copy_function is copy
    
        for srcentry in entries:
            if srcentry.name in ignored_names:
                continue
            srcname = os.path.join(src, srcentry.name)
            dstname = os.path.join(dst, srcentry.name)
            srcobj = srcentry if use_srcentry else srcname
            try:
                is_symlink = srcentry.is_symlink()
                if is_symlink and os.name == 'nt':
                    # Special check for directory junctions, which appear as
                    # symlinks but we want to recurse.
                    lstat = srcentry.stat(follow_symlinks=False)
                    if lstat.st_reparse_tag == stat.IO_REPARSE_TAG_MOUNT_POINT:
                        is_symlink = False
                if is_symlink:
                    linkto = os.readlink(srcname)
                    if symlinks:
                        # We can't just leave it to `copy_function` because legacy
                        # code with a custom `copy_function` may rely on copytree
                        # doing the right thing.
                        os.symlink(linkto, dstname)
                        copystat(srcobj, dstname, follow_symlinks=not symlinks)
                    else:
                        # ignore dangling symlink if the flag is on
                        if not os.path.exists(linkto) and ignore_dangling_symlinks:
                            continue
                        # otherwise let the copy occur. copy2 will raise an error
                        if srcentry.is_dir():
                            copytree(srcobj, dstname, symlinks, ignore,
                                     copy_function, ignore_dangling_symlinks,
                                     dirs_exist_ok)
                        else:
                            copy_function(srcobj, dstname)
                elif srcentry.is_dir():
                    copytree(srcobj, dstname, symlinks, ignore, copy_function,
                             ignore_dangling_symlinks, dirs_exist_ok)
                else:
                    # Will raise a SpecialFileError for unsupported file types
                    copy_function(srcobj, dstname)
            # catch the Error from the recursive copytree so that we can
            # continue with other files
            except Error as err:
                errors.extend(err.args[0])
            except OSError as why:
                errors.append((srcname, dstname, str(why)))
        try:
            copystat(src, dst)
        except OSError as why:
            # Copying file access times may fail on Windows
            if getattr(why, 'winerror', None) is None:
                errors.append((src, dst, str(why)))
        if errors:
>           raise Error(errors)
E           shutil.Error: [('/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/lithops/__pycache__', '/tmp/lithops-runner/lithops/__pycache__', "[Errno 17] File exists: '/tmp/lithops-runner/lithops/__pycache__'")]

/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/shutil.py:527: Error

Interestingly, I noticed that the tests almost always pass on macos-latest (currently macos-14) - the problems occur on ubuntu-latest (currently ubuntu-24.04). I also noticed that ubuntu-22.04 seems to produce hanging tests, but not outright failures (cubed-dev/cubed#762).

@JosepSampe I noticed that you added a workflow for running on multiple OS versions (https://github.com/lithops-cloud/lithops/blob/master/.github/workflows/tests-all-os.yml) in #1349, but it doesn't run automatically. It might be worth enabling that to run for tests or every night, and also to add ubuntu-24.04 to see if that shows the same problem I'm hitting above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions