Skip to content

[Enhance]: Export DB #380

Description

@croqaz

Affected Component

Python API

Current Behavior

Hi. I don't know for sure if this is an enhancement, or feature request. I searched the documentation over and over and I still don't know how to export a whole DB into something like JSON lines.

Why? Well, I am running different processes on different machines to calculate statistics and embeddings and when they are finished, I want to join all the Zvec DBs into one. So maybe that's the main feature, joining multiple DBs?

But exporting as JSON will allow better compatibility with any other tool from any other programming language.

The error when I'm trying to export a whole DB:

  File ".venv/lib/python3.14/site-packages/zvec/executor/query_executor.py", line 236, in execute
    docs = self._do_execute(query_vectors, collection)
  File ".venv/lib/python3.14/site-packages/zvec/executor/query_executor.py", line 189, in _do_execute
    docs = collection.Query(query)
ValueError: query validate failed: topk[10000000] is too large, max is 1024

I don't know what entries I Have in there, so I can either fetch them one by one. The DBs are big, tens of millions of entries.

I know you are using RocksDB somewhere in there, so it should be easy to expose some lazy iter() function for keys + values, that should solve the problem.

Desired Improvement

Export feature/ Merge DBs/ Lazy iter

Impact

Metadata

Metadata

Labels

enhancementImprove an existing feature or componentfeatureNew feature wanted

Type

No type
No fields configured for issues without a type.

Projects

Status
Backlog

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions