-
Notifications
You must be signed in to change notification settings - Fork 856
feat: triggers suspended mode #7297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Claude finished @hugocasa's task —— View job PR Review: Triggers Suspended Mode ✅I've completed a comprehensive review of this PR. This is a substantial and well-architected feature that adds a "suspended mode" to triggers, allowing jobs to be queued without automatic execution. Here's my detailed analysis: 🎯 OverviewThis PR introduces a third state for triggers beyond enabled/disabled: suspended mode. When a trigger is suspended, it continues accepting payloads and creating jobs, but those jobs are marked with special "unassigned" job kinds ( Scope: ~5,200 additions, ~2,300 deletions across 100 files ✅ Strengths1. Clean Database MigrationThe migration from
2. Good Abstraction with TriggerCrud TraitThe // backend/windmill-api/src/triggers/handler.rs:209-276
async fn set_trigger_mode(&self, ...) -> Result<bool>
3. Backward CompatibilityThe deprecated // backend/windmill-api/src/triggers/mod.rs:126-142
pub fn mode(&self) -> &TriggerMode {
self.mode.as_ref().unwrap_or(
#[allow(deprecated)]
if self.enabled.unwrap_or(true) { &TriggerMode::Enabled }
else { &TriggerMode::Disabled }
)
}4. Smart Job Reassignment LogicThe
5. Listener ArchitectureThe listener system correctly fetches both enabled AND suspended triggers: // backend/windmill-api/src/triggers/listener.rs:81
.and_where("(mode = 'enabled'::TRIGGER_MODE OR mode = 'suspended'::TRIGGER_MODE)")6. OpenAPI UpdatedGood practice - the OpenAPI spec was properly updated with new endpoints and the 🐛 Issues FoundCRITICAL: Potential Data Loss in Down MigrationLocation: Issue: The down migration converts suspended triggers to disabled, potentially losing important state. See inline comment for details and recommendations. Impact: If users roll back while triggers are in suspended mode with queued jobs:
HIGH: Missing Error Context in TransactionLocation: Issue: Multiple DELETE operations in job cleanup lack error context. See inline comment for recommendations. Impact: If a delete fails midway, debugging will be difficult without proper error messages. MEDIUM: SQL Injection DefenseLocations:
Issue: Table names constructed from enum values. While currently safe, defense-in-depth would improve security. See inline comment for explicit validation pattern. MEDIUM: Inconsistent Error ReportingLocation: Issue: The Impact: Critical errors in capture mode might not alert operators properly. Recommendation: Unify error reporting to use LOW: Potential Race Condition in Mode ToggleLocation: Issue: Impact: If a listener picks up the trigger between these two operations, it might process with inconsistent state. Recommendation:
LOW: Missing Index ConsiderationLocation: Database migration Issue: Queries frequently filter by 🎨 Code Quality ObservationsGood Practices
Minor Improvements
tokio::time::sleep(tokio::time::Duration::from_secs(5)).await;Extract to constant:
🔒 Security Considerations
🚀 Performance Considerations
📝 Documentation & TestingNeeds Documentation
Test Coverage Recommendations:
📋 Testing InstructionsTo manually test this feature through the app interface: Navigate to any trigger page (e.g., HTTP Routes under Triggers section). Create or select an existing trigger. In the trigger editor toolbar, you'll see a toggle for enable/disable along with a dropdown menu. Click the dropdown and select "Suspend job execution" to put the trigger in suspended mode. The toggle will change to show three states: Disabled, Suspended, and Enabled. Once suspended, trigger the endpoint/event that would normally execute the trigger. Go to the Runs page and add a filter for job status. You should see jobs with "Unassigned" status appearing - these are suspended jobs that won't execute automatically. Click on any suspended job to view its details. Notice the job shows as queued but won't start execution. Back on the trigger page, if there are suspended jobs, you'll see an alert/modal appear when trying to change the trigger mode. The modal allows you to review all suspended jobs and choose to either resume them (which will execute them with the current trigger configuration) or cancel them. You can select individual jobs or batch process all suspended jobs. After resuming, jobs should transition to normal execution. After canceling, jobs should be marked as canceled in the runs page. Test edge cases: edit the trigger configuration while jobs are suspended, then resume - jobs should use the NEW configuration. Delete a trigger with suspended jobs - verify appropriate cleanup. Switch between suspended and enabled/disabled modes multiple times to ensure state transitions work correctly. SummaryThis is a well-designed and thoughtfully implemented feature with good architecture and backward compatibility. The main concerns are:
The code follows Windmill's patterns well, maintains consistency across trigger types, and provides a valuable feature for users who need manual review before job execution. Recommendation: Fix the critical down migration issue and high-priority items before merge. Medium and low priority items can be addressed in follow-up PRs if needed. Inline comments added: 5 specific code locations with detailed recommendations |
| ALTER TABLE email_trigger | ||
| ADD COLUMN enabled BOOLEAN; | ||
|
|
||
| UPDATE email_trigger SET enabled = CASE WHEN mode = 'enabled'::TRIGGER_MODE THEN true ELSE false END; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical Issue: Potential Data Loss
When rolling back, triggers in suspended mode will be converted to disabled, but any unassigned jobs will remain orphaned in the database. This could lead to:
- Jobs stuck in limbo that users can't see or manage
- Loss of user intent (which triggers were suspended vs disabled)
Recommendation: Add cleanup logic before the migration:
-- Clean up unassigned jobs before reverting mode column
-- Option A: Cancel all unassigned jobs
DELETE FROM v2_job WHERE kind IN ('unassigned_script', 'unassigned_flow', 'unassigned_singlestepflow');
-- OR Option B: Auto-execute them by converting to regular jobs
UPDATE v2_job SET kind = CASE
WHEN kind = 'unassigned_script' THEN 'script'::JOB_KIND
WHEN kind = 'unassigned_flow' THEN 'flow'::JOB_KIND
WHEN kind = 'unassigned_singlestepflow' THEN 'singlescriptflow'::JOB_KIND
ELSE kind
END WHERE kind IN ('unassigned_script', 'unassigned_flow', 'unassigned_singlestepflow');Document which option is chosen and why.
| _ => {} | ||
| } | ||
|
|
||
| let table_name = format!("{}_trigger", trigger_kind.to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Medium Priority: SQL Injection Defense
While trigger_kind is an enum (which provides safety), constructing table names dynamically could become a vulnerability if the enum handling ever changes.
Recommendation: Add explicit validation:
let table_name = match trigger_kind {
JobTriggerKind::Http => "http_trigger",
JobTriggerKind::Kafka => "kafka_trigger",
JobTriggerKind::Websocket => "websocket_trigger",
JobTriggerKind::Mqtt => "mqtt_trigger",
JobTriggerKind::Nats => "nats_trigger",
JobTriggerKind::Postgres => "postgres_trigger",
JobTriggerKind::Sqs => "sqs_trigger",
JobTriggerKind::Gcp => "gcp_trigger",
JobTriggerKind::Email => "email_trigger",
JobTriggerKind::Webhook | JobTriggerKind::Schedule | _ => {
return Err(Error::BadRequest(format!(
"{} triggers do not support this operation",
trigger_kind
)));
}
};This provides defense-in-depth and makes supported trigger types explicit.
| }; | ||
|
|
||
| // Delete the unassigned job from all related tables | ||
| sqlx::query!("DELETE FROM v2_job_queue WHERE id = $1", job.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High Priority: Improve Error Context
These cascading deletes cleanup orphaned data after job reassignment. While they're in a transaction, consider adding error context for better debugging:
sqlx::query!("DELETE FROM v2_job_queue WHERE id = $1", job.id)
.execute(&mut *tx)
.await
.context("Failed to delete job from queue during reassignment")?;
sqlx::query!("DELETE FROM v2_job_runtime WHERE id = $1", job.id)
.execute(&mut *tx)
.await
.context("Failed to delete job runtime during reassignment")?;
// ... etcAlso consider if cascade deletes via foreign keys could simplify this (delete only from v2_job and let cascade handle the rest).
| } | ||
|
|
||
| impl BaseTriggerData { | ||
| pub fn mode(&self) -> &TriggerMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Practice: Backward Compatibility
Nice handling of the deprecated enabled field! This ensures older API clients continue to work during the transition period. Consider documenting when this can be removed (e.g., "Remove in v2.0" or after a certain number of releases).
| for job in jobs { | ||
| // If job was created before trigger was edited, simply update it to unsuspend | ||
| // instead of deleting and repushing | ||
| if job.created_at > trigger.edited_at { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Logic: Smart Job Reassignment
Excellent decision to handle two scenarios differently:
- Jobs created AFTER trigger edit: Get new config (delete & recreate)
- Jobs created BEFORE trigger edit: Keep original config (simple update)
This prevents the confusing scenario where a user edits a trigger and then wonders why old suspended jobs are executing with the new config. The timestamp comparison is a smart way to determine user intent.
| "(last_server_ping IS NULL OR last_server_ping < now() - interval '15 seconds')", | ||
| ); | ||
| sqlb.fields(&fields) | ||
| .and_where("(mode = 'enabled'::TRIGGER_MODE OR mode = 'suspended'::TRIGGER_MODE)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Practice: Listener Includes Suspended Triggers
Correct implementation - listeners need to pick up both enabled and suspended triggers since suspended mode still processes events (just doesn't auto-execute jobs). This ensures the trigger continues to queue jobs even in suspended state.
Minor optimization consideration: If you have many triggers, consider adding an index on mode column for better query performance:
CREATE INDEX CONCURRENTLY idx_trigger_mode ON {table_name}(workspace_id, mode)
WHERE mode IN ('enabled', 'suspended');| JobKind::AIAgent => "jobs.run.ai_agent", | ||
| JobKind::UnassignedScript => "jobs.run.unassigned_script", | ||
| JobKind::UnassignedFlow => "jobs.run.unassigned_flow", | ||
| JobKind::UnassignedSinglestepFlow => "jobs.run.unassigned_singlestepflow", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typographical note: The enum variant UnassignedSinglestepFlow appears to be using a non-standard compound word format. Consider renaming it to UnassignedSingleStepFlow (and updating the corresponding mapping to "jobs.run.unassigned_single_step_flow") for improved readability and consistency.
No description provided.