Skip to content
This repository was archived by the owner on Oct 11, 2025. It is now read-only.
This repository was archived by the owner on Oct 11, 2025. It is now read-only.

"The datastore operation timed out" with many shards #109

@Erfa

Description

@Erfa

I tried starting a mapper job with the DatastoreInputReader on an entity type that has about 30 million entities. I specified 30,000 shards, which causes the /mapreduce/kickoffjob_callback job to fail due to a Datastore timeout.

It might be ridiculous to use a shard amount that high, but I thought I'd share it anyway. Here's the full stack trace:

The datastore operation timed out, or the data was temporarily unavailable.
Traceback (most recent call last):
  File "/base/data/home/apps/e~appid/module:version/lib/mapreduce/base_handler.py", line 135, in post
    self.handle()
  File "/base/data/home/apps/e~appid/module:version/lib/mapreduce/handlers.py", line 1387, in handle
    readers, serialized_readers_entity = self._get_input_readers(state)
  File "/base/data/home/apps/e~appid/module:version/lib/mapreduce/handlers.py", line 1459, in _get_input_readers
    readers = input_reader_class.split_input(split_param)
  File "/base/data/home/apps/e~appid/module:version/lib/mapreduce/input_readers.py", line 722, in split_input
    return super(DatastoreInputReader, cls).split_input(mapper_spec)
  File "/base/data/home/apps/e~appid/module:version/lib/mapreduce/input_readers.py", line 355, in split_input
    query_spec.app, namespaces, shard_count, query_spec)
  File "/base/data/home/apps/e~appid/module:version/lib/mapreduce/input_readers.py", line 395, in _to_key_ranges_by_shard
    app)
  File "/base/data/home/apps/e~appid/module:version/lib/mapreduce/input_readers.py", line 464, in _split_ns_by_scatter
    random_keys = ds_query.Get(shard_count * oversampling_factor)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 1722, in Get
    return list(self.Run(limit=limit, offset=offset, **kwargs))
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_query.py", line 3321, in next
    next_batch = self.__batcher.next_batch(Batcher.AT_LEAST_OFFSET)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_query.py", line 3207, in next_batch
    batch = self.__next_batch.get_result()
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
    return self.__get_result_hook(self)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_query.py", line 2906, in __query_result_hook
    self._batch_shared.conn.check_rpc_success(rpc)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1371, in check_rpc_success
    raise _ToDatastoreError(err)
Timeout: The datastore operation timed out, or the data was temporarily unavailable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions