feat(async search): allow to set docs limit on search start#142
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #142 +/- ##
==========================================
+ Coverage 71.59% 71.61% +0.01%
==========================================
Files 200 200
Lines 18176 18191 +15
==========================================
+ Hits 13013 13027 +14
- Misses 4446 4448 +2
+ Partials 717 716 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
12225c2 to
b09a027
Compare
📝 WalkthroughWalkthroughAdds a per-request size limit to async search across APIs and services, propagates WithDocs and Size through proxy/store/fracmanager layers, enforces a configurable max-documents-per-request in proxy API, updates config and documentation, and adjusts tests and marshaling to include the new Size field. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (4)
📒 Files selected for processing (12)
🚧 Files skipped from review as they are similar to previous changes (9)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-08-05T10:52:04.737ZApplied to files:
🧬 Code graph analysis (2)fracmanager/async_searcher.go (2)
proxyapi/grpc_async_search.go (3)
🔇 Additional comments (16)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (9)
api/seqproxyapi/v1/seq_proxy_api.proto (1)
259-261: Clarify size semantics in comments (tie to global cap, with_docs).Tiny doc tweak to avoid ambiguity with Fetch page size and to mention the global cap.
Apply this diff to improve the field docs:
- // Maximum number of documents to find. Doesn't affect aggs and hist. - // Ignored if with_docs was set to false. + // Upper bound on the number of documents (IDs) to store for this async search. + // Doesn't affect aggs or hist. Ignored if with_docs is false. + // The effective cap is min(size, async_search.max_documents_per_request) where 0 disables the global cap. int64 size = 6;config/config.go (1)
241-246: Add a brief field comment (0 disables).For consistency with other “limit” settings, document that 0 disables the cap.
- MaxDocumentsPerRequest int64 `config:"max_documents_per_request" default:"100000"` + // MaxDocumentsPerRequest caps the number of documents stored by a single async search + // when with_docs=true. Set to 0 to disable the cap. + MaxDocumentsPerRequest int64 `config:"max_documents_per_request" default:"100000"`proxyapi/ingestor_config.go (1)
20-22: Document AsyncSearchMaxDocumentsPerRequest semantics (0 disables).Minor: add a comment to make intent obvious at call sites.
type APIConfig struct { SearchTimeout time.Duration ExportTimeout time.Duration QueryRateLimit float64 EsVersion string // GatewayAddr is grpc-gateway client address. Used for debugging purposes. - GatewayAddr string - AsyncSearchMaxDocumentsPerRequest int64 + GatewayAddr string + // Global cap on documents (IDs) stored per async search (with_docs=true). 0 means unlimited. + AsyncSearchMaxDocumentsPerRequest int64 }api/storeapi/store_api.proto (1)
151-152: Mirror public API comments for new size fields.Add short comments so internal API users see the same semantics.
message StartAsyncSearchRequest { @@ - int64 size = 9; + // Upper bound on docs (IDs) to store for this async search; 0 means unlimited. + // Mirrors public API's StartAsyncSearchRequest.size. + int64 size = 9; } @@ message FetchAsyncSearchResultResponse { @@ - int64 size = 16; + // Size configured for this async search (echo of StartAsyncSearchRequest.size). + int64 size = 16; } @@ message AsyncSearchesListItem { @@ - int64 size = 16; + // Size configured for this async search (echo of StartAsyncSearchRequest.size). + int64 size = 16; }Also applies to: 188-189, 230-231
docs/en/02-configuration.md (1)
155-156: Document semantics of async_search.max_documents_per_request.Clarify applicability (with_docs), enforcement point, and how to disable.
-| `async_search.max_documents_per_request` | int | `100000` | - | +| `async_search.max_documents_per_request` | int | `100000` | Upper bound on documents (IDs) retained for a single async search when `with_docs=true`. Enforced on StartAsyncSearch; set to `0` to disable. |Optional: consider marking the type as “int64” to match code.
pkg/seqproxyapi/v1/marshaler_test.go (1)
127-127: Add a round‑trip unmarshal assertion for Size (string-encoded) to guard regressions.We already assert the marshaled JSON; also unmarshal it and compare to ensure
"size":"100"is parsed back correctly.func TestFetchAsyncSearchResultResponseMarshalJSON(t *testing.T) { r := require.New(t) test := func(resp *FetchAsyncSearchResultResponse, expected string) { t.Helper() raw, err := json.Marshal(resp) r.NoError(err) r.Equal(expected, string(raw)) + // Round-trip to ensure string-encoded numbers like "size" parse back correctly. + var um FetchAsyncSearchResultResponse + r.NoError(json.Unmarshal(raw, &um)) + r.Equal(resp, &um) }Also applies to: 145-145
storeapi/grpc_async_search.go (2)
26-29: Validate non-negative Size and fail fast.A negative r.Size would flow into int(limit) and could produce surprising behavior. Guard it explicitly.
func (g *GrpcV1) StartAsyncSearch( _ context.Context, r *storeapi.StartAsyncSearchRequest, ) (*storeapi.StartAsyncSearchResponse, error) { + if r.Size < 0 { + return nil, status.Errorf(codes.InvalidArgument, "size must be non-negative, got=%d", r.Size) + } aggs, err := aggQueriesFromProto(r.Aggs) if err != nil { return nil, err } limit := 0 if r.WithDocs { limit = int(r.Size) }
17-21: Defense-in-depth: consider enforcing a max per-request docs limit here too (if Store API can be reached directly).Proxy enforces the cap; mirroring a cap in the store layer prevents bypass when the proxy isn’t on the path.
Would you confirm whether the Store gRPC service is externally reachable (bypassing proxy)? If yes, I can propose a minimal config-driven cap here as well.
proxyapi/grpc_async_search.go (1)
24-27: Also reject negative Size values.Complements the max-cap check and avoids ambiguous semantics for Size < 0.
if g.config.AsyncSearchMaxDocumentsPerRequest > 0 && r.Size > g.config.AsyncSearchMaxDocumentsPerRequest { return nil, status.Errorf(codes.InvalidArgument, "too many documents are requested: count=%d, max=%d", r.Size, g.config.AsyncSearchMaxDocumentsPerRequest) } + if r.Size < 0 { + return nil, status.Errorf(codes.InvalidArgument, "size must be non-negative, got=%d", r.Size) + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
pkg/seqproxyapi/v1/seq_proxy_api.pb.gois excluded by!**/*.pb.gopkg/seqproxyapi/v1/seq_proxy_api_vtproto.pb.gois excluded by!**/*.pb.gopkg/storeapi/store_api.pb.gois excluded by!**/*.pb.gopkg/storeapi/store_api_vtproto.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (13)
api/seqproxyapi/v1/seq_proxy_api.proto(1 hunks)api/storeapi/store_api.proto(3 hunks)cmd/seq-db/seq-db.go(1 hunks)config/config.go(1 hunks)docs/en/02-configuration.md(1 hunks)fracmanager/async_searcher.go(6 hunks)pkg/seqproxyapi/v1/marshaler_test.go(2 hunks)proxy/search/async.go(4 hunks)proxyapi/grpc_async_search.go(6 hunks)proxyapi/ingestor_config.go(1 hunks)proxyapi/mock/grpc_v1.go(1 hunks)storeapi/grpc_async_search.go(4 hunks)tests/integration_tests/integration_test.go(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-05T10:52:04.737Z
Learnt from: forshev
PR: ozontech/seq-db#3
File: api/seqproxyapi/v1/seq_proxy_api.proto:88-90
Timestamp: 2025-08-05T10:52:04.737Z
Learning: In the seq-db project, the FetchAsyncSearchResult RPC method in api/seqproxyapi/v1/seq_proxy_api.proto was intentionally changed from HTTP GET to POST during development. This breaking change is acceptable because the async search feature is under active development and not yet in production use.
Applied to files:
api/seqproxyapi/v1/seq_proxy_api.protocmd/seq-db/seq-db.goproxyapi/grpc_async_search.go
📚 Learning: 2025-08-05T10:51:10.548Z
Learnt from: forshev
PR: ozontech/seq-db#3
File: proxy/search/async.go:155-171
Timestamp: 2025-08-05T10:51:10.548Z
Learning: In proxy/search/async.go, the FetchAsyncSearchResult method uses an `anyResponse` variable to track if any successful responses were received from any shard. If all shards return NotFound errors, the function properly returns a NotFound status error, so individual shard-level response tracking is not necessary.
Applied to files:
pkg/seqproxyapi/v1/marshaler_test.gofracmanager/async_searcher.go
🧬 Code graph analysis (3)
tests/integration_tests/integration_test.go (1)
seq/qpr.go (1)
QPR(68-74)
fracmanager/async_searcher.go (2)
frac/info.go (1)
Info(21-40)seq/qpr.go (1)
DocsOrderDesc(19-19)
proxyapi/grpc_async_search.go (3)
fracmanager/async_searcher.go (2)
AsyncSearchStatus(627-627)GetAsyncSearchesListRequest(929-932)pkg/seqproxyapi/v1/seq_proxy_api.pb.go (7)
AsyncSearchStatus(189-189)AsyncSearchStatus(230-232)AsyncSearchStatus(234-236)AsyncSearchStatus(243-245)GetAsyncSearchesListRequest(1516-1527)GetAsyncSearchesListRequest(1540-1540)GetAsyncSearchesListRequest(1555-1557)proxy/search/async.go (1)
GetAsyncSearchesListRequest(104-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (7)
proxyapi/mock/grpc_v1.go (1)
17-18: Generated file change is harmless.Import reordering only; no functional impact. Nothing to do.
tests/integration_tests/integration_test.go (2)
1970-1972: Good: exercising WithDocs + Size path.This validates per-search Size propagation and storage enablement.
2025-2026: Optional: add a cap-enforcement test.
- Add an integration test in tests/integration_tests/integration_test.go (around lines 2025–2026) that requests Size > async_search.max_documents_per_request and asserts InvalidArgument.
- Repo search returned no test helper to override async_search.max_documents_per_request — confirm how to set this limit in tests or add a test-scoped override.
cmd/seq-db/seq-db.go (1)
181-187: Config plumbed and limit enforcedStartAsyncSearch validates r.Size against g.config.AsyncSearchMaxDocumentsPerRequest and returns InvalidArgument when exceeded (proxyapi/grpc_async_search.go:24–26). The value is set from cfg.AsyncSearch.MaxDocumentsPerRequest in cmd/seq-db/seq-db.go:186.
proxy/search/async.go (1)
32-33: LGTM: Size is plumbed end-to-end (request -> store -> fetch/list).Propagation looks consistent and aligns with QPR merge behavior. No action needed.
Also applies to: 55-57, 249-250, 394-395
fracmanager/async_searcher.go (1)
136-137: LGTM: limit-aware merging and surfaced fields (WithDocs, Size) are correct.
- Using request Params.Limit during merge avoids reading more than needed.
- Exposing WithDocs and Size in fetch/list responses is coherent with the API.
- Persisting WithDocs in the on-disk info JSON is backward compatible (absent field defaults to false).
Also applies to: 707-709, 799-800, 955-955, 1010-1012
proxyapi/grpc_async_search.go (1)
81-91: LGTM: Start/Fetch/List propagate Size consistently in the public API.The request echo in FetchAsyncSearchResult and listing entries include Size as expected.
Also applies to: 185-195
b09a027 to
c9f0149
Compare
Description
Allow to set docs limit in the StartAsyncSearch request and added configurable global ids limit for single async search
Fixes #141
If you have used LLM/AI assistance please provide model name and full prompt:
Summary by CodeRabbit