Skip to content

feat(spider-grpc): Add gRPC protocol for storage; Add gRPC clients in execution manager and scheduler.#340

Open
sitaowang1998 wants to merge 12 commits into
y-scope:mainfrom
sitaowang1998:grpc-job
Open

feat(spider-grpc): Add gRPC protocol for storage; Add gRPC clients in execution manager and scheduler.#340
sitaowang1998 wants to merge 12 commits into
y-scope:mainfrom
sitaowang1998:grpc-job

Conversation

@sitaowang1998

@sitaowang1998 sitaowang1998 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR:

  • Adds remaining gRPC protocol for storage interface.
  • Adds task id and job state conversion between gRPC and spider-core.
  • Adds gRPC implementation of execution manager's liveness client.
  • Adds gRPC implementation of scheduler's storage client.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • GitHub workflows pass.

Summary by CodeRabbit

  • New Features
    • Job management API: submit, start, cancel, query state, retrieve outputs/errors, and expired-job cleanup
    • Resource group management: add, verify, delete with password/session scoping
    • Execution-manager liveness: register, heartbeat, liveliness checks, dead-manager discovery
    • Session retrieval API and scheduler polling endpoints for ready/commit/cleanup tasks, plus a gRPC-backed scheduler storage client

@sitaowang1998 sitaowang1998 requested a review from a team as a code owner June 10, 2026 17:08
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@sitaowang1998, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 51 minutes. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6a3676a7-9b1f-49c4-9a7d-134e1d8f5287

📥 Commits

Reviewing files that changed from the base of the PR and between a3a3f09 and fbc1c4c.

📒 Files selected for processing (1)
  • components/spider-scheduler/Cargo.toml

Walkthrough

This PR expands storage protobufs with job/resource-group/liveness APIs, adds protobuf↔core conversions for JobState and TaskId, implements a GrpcLivenessClient and updates execution-manager storage adapters to TaskInstance types, and introduces GrpcSchedulerStorageClient plus scheduler crate wiring and dependency updates.

Changes

Execution Manager Liveness and Job Management APIs

Layer / File(s) Summary
Proto API definitions and enums
components/spider-proto/storage/storage.proto
Adds JobManagementService, SchedulerStorageService, ResourceGroupManagementService, ExecutionManagerLivenessService, SessionManagementService, per-domain response/error messages, and a JobState enum; removes generic StorageOperationResponse/StorageError.
Protobuf ↔ core type conversions
components/spider-proto-rust/src/job.rs, components/spider-proto-rust/src/id.rs, components/spider-proto-rust/src/lib.rs
Implements From → protobuf and TryFromstorage::JobState → core (rejects Unspecified); implements TryFromstorage::TaskId → core TaskId with error on missing kind/invalid index; tests added/updated.
GrpcLivenessClient and response mapping
components/spider-execution-manager/src/client/grpc/liveness.rs
Adds GrpcLivenessClient with connect(), implements LivenessClient trait (register, heartbeat), response conversion helpers, storage-error→LivenessResponseError mapping, transport error helper, and unit tests.
Execution-manager gRPC adapter updates
components/spider-execution-manager/src/client/grpc/storage.rs, components/spider-execution-manager/src/client/grpc/mod.rs, components/spider-execution-manager/src/client.rs
Switches storage adapter to use TaskInstance* protobuf types, updates StorageResponseError::<-TaskInstanceError> mapping and operation-response handling; adds liveness submodule and re-exports GrpcLivenessClient; re-exports both GrpcLivenessClient and GrpcStorageClient from client.rs.
GrpcSchedulerStorageClient and scheduler conversions
components/spider-scheduler/src/grpc/storage_client.rs
Adds GrpcSchedulerStorageClient with connect(), session bootstrap, poll-ready methods (ready/commit/cleanup), job_state retrieval, protobuf→internal conversions for ReadyTasks/ReadyTask, error mapping helpers, transport error helper, and unit tests.
Scheduler crate wiring and deps
components/spider-scheduler/Cargo.toml, components/spider-scheduler/src/grpc/mod.rs, components/spider-scheduler/src/lib.rs, components/spider-scheduler/src/error.rs
Adds spider-proto-rust path dependency, declares grpc module and re-exports GrpcSchedulerStorageClient from crate root, and extends StorageClientError with StaleSession, InvalidInput(String), Server(String), and Transport(String) variants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • y-scope/spider#329: Uses the GrpcLivenessClient added here during runtime bootstrap and liveness refresh flows.
  • y-scope/spider#328: Introduces liveness actor logic that relies on LivenessClient::register and heartbeat; directly related to the GrpcLivenessClient implementation.
  • y-scope/spider#327: Defines the LivenessClient trait and types that the new GrpcLivenessClient implements.

Suggested reviewers

  • LinZhihao-723
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding gRPC protocol for storage and gRPC clients in execution manager and scheduler components.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sitaowang1998 sitaowang1998 changed the title feat(spider-grpc): Add gRPC protocol for storage; Add gRPC liveness client in execution manager. feat(spider-grpc): Add gRPC protocol for storage; Add gRPC clients in execution manager and scheduler. Jun 11, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/spider-proto/storage/storage.proto (1)

1-352: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Add comprehensive documentation to the proto API.

The proto file defines multiple public gRPC services and complex message structures but contains zero documentation comments. For a cross-component API contract, comprehensive documentation is essential to ensure maintainability, enable safe evolution, and support developer onboarding.

Please add proto comments documenting:

  • Each service's purpose and usage context
  • Each RPC's behaviour, preconditions, and error scenarios
  • Message field semantics, constraints, and valid ranges
  • Error code meanings and client handling expectations
  • The JobState enum transitions and their lifecycle implications
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-proto/storage/storage.proto` around lines 1 - 352, The
proto file lacks any documentation; add clear proto comments for all public
symbols: each service (JobManagementService, TaskInstanceManagementService,
SchedulerStorageService, ResourceGroupManagementService,
ExecutionManagerLivenessService, SessionManagementService) describing purpose
and usage; each RPC (e.g., SubmitJob, StartJob, PollReadyTasks,
RegisterTaskInstance, ReportTaskSuccess, AddResourceGroup,
RegisterExecutionManager, GetSession) explaining input preconditions, side
effects, success/failure semantics and typical error scenarios; each message and
important fields (SubmitJobRequest.serialized_task_graph, ReadyTask.task_id,
RegisterTaskInstanceRequest.execution_manager_id, etc.) documenting field
semantics, valid ranges and constraints; all error messages
(JobManagementError.ErrCode, TaskInstanceError.ErrCode,
SchedulerStorageError.ErrCode, ResourceGroupError.ErrCode,
ExecutionManagerLivenessError.ErrCode) describing when each code is returned and
recommended client handling; and the JobState enum documenting legal state
transitions (e.g., READY -> RUNNING -> COMMIT_READY/CLEANUP_READY ->
SUCCEEDED/FAILED/CANCELLED) and lifecycle implications. Keep comments concise,
use proto // or /** */ comments above definitions so they are included in
generated docs and ensure examples or notes for session_id and storage_session
consistency where relevant.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@components/spider-proto/storage/storage.proto`:
- Around line 1-352: The proto file lacks any documentation; add clear proto
comments for all public symbols: each service (JobManagementService,
TaskInstanceManagementService, SchedulerStorageService,
ResourceGroupManagementService, ExecutionManagerLivenessService,
SessionManagementService) describing purpose and usage; each RPC (e.g.,
SubmitJob, StartJob, PollReadyTasks, RegisterTaskInstance, ReportTaskSuccess,
AddResourceGroup, RegisterExecutionManager, GetSession) explaining input
preconditions, side effects, success/failure semantics and typical error
scenarios; each message and important fields
(SubmitJobRequest.serialized_task_graph, ReadyTask.task_id,
RegisterTaskInstanceRequest.execution_manager_id, etc.) documenting field
semantics, valid ranges and constraints; all error messages
(JobManagementError.ErrCode, TaskInstanceError.ErrCode,
SchedulerStorageError.ErrCode, ResourceGroupError.ErrCode,
ExecutionManagerLivenessError.ErrCode) describing when each code is returned and
recommended client handling; and the JobState enum documenting legal state
transitions (e.g., READY -> RUNNING -> COMMIT_READY/CLEANUP_READY ->
SUCCEEDED/FAILED/CANCELLED) and lifecycle implications. Keep comments concise,
use proto // or /** */ comments above definitions so they are included in
generated docs and ensure examples or notes for session_id and storage_session
consistency where relevant.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a9894bac-13ee-4fdd-8177-6fede9a84856

📥 Commits

Reviewing files that changed from the base of the PR and between abd22d4 and a3a3f09.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • components/spider-proto-rust/src/generated/storage.rs is excluded by !**/generated/**
📒 Files selected for processing (8)
  • components/spider-execution-manager/src/client/grpc/liveness.rs
  • components/spider-execution-manager/src/client/grpc/storage.rs
  • components/spider-proto/storage/storage.proto
  • components/spider-scheduler/Cargo.toml
  • components/spider-scheduler/src/error.rs
  • components/spider-scheduler/src/grpc/mod.rs
  • components/spider-scheduler/src/grpc/storage_client.rs
  • components/spider-scheduler/src/lib.rs
✅ Files skipped from review due to trivial changes (2)
  • components/spider-scheduler/Cargo.toml
  • components/spider-scheduler/src/grpc/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/spider-execution-manager/src/client/grpc/liveness.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant