Skip to content

Conversation

@lihao-figma
Copy link
Contributor

@lihao-figma lihao-figma commented Dec 3, 2025

Add heartbeat mechanism to prevent stealing active tests

Previously, if a test's deadline estimation was wrong, other workers
could steal the test even while the original worker was still actively
running it. This could cause duplicate test runs and wasted resources.

This change adds a heartbeat mechanism:

  1. Workers automatically send heartbeats every 10 seconds (configurable
    via heartbeat_interval) while running a test
  2. Each heartbeat extends the test's deadline and records the heartbeat
    time in Redis
  3. Before claiming a "lost" test, reserve_lost.lua checks if it was
    recently heartbeated (within heartbeat_grace_period, default 30s)
  4. Tests with recent heartbeats are skipped, allowing the active worker
    to continue

Changes:

  • Fix heartbeat.lua to properly extend deadline (was setting score to
    current_time instead of current_time + timeout)
  • Add heartbeat tracking in Redis hash for "recent activity" checks
  • Update reserve_lost.lua to respect heartbeat grace period
  • Add automatic heartbeating via background thread in Worker#poll
  • Add heartbeat_interval and heartbeat_grace_period config options
  • Add logging when heartbeat extends deadline

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@lihao-figma lihao-figma force-pushed the heart_beating_job branch 3 times, most recently from fdeb0fe to 9d0c6f8 Compare December 3, 2025 04:56
@lihao-figma lihao-figma marked this pull request as ready for review December 3, 2025 05:35
@lihao-figma lihao-figma merged commit f65f7ff into master Dec 3, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants