Potential Deadlock Issue with Workflow Status Field in Kubernetes Environment #246
Unanswered
travisaustin
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I encountered a deadlock situation today where multiple workflows became stuck in an infinite loop. The symptoms were:
status = "running"
even when no job was actively processingWithoutOverlappingMiddleware
was not blocking the jobsEnvironment Context
While Horizon containers are frequently replaced due to auto-scaling, they normally shut down cleanly. However, Kubernetes can occasionally evict containers, forcing unclean shutdowns that we need to account for in system design.
Root Cause Analysis (Theory)
I suspect the issue occurred when:
Questions for Discussion
1. Is the database status check redundant?
Since
WithoutOverlappingMiddleware
already prevents concurrent execution of the same workflow, is it necessary to also validate the databasestatus
field before allowing a workflow to run?2. Should we rely on database state for runtime logic?
Given that:
Would it be better to treat the
status
field as informational only (for human readability) rather than using it for execution control?3. Proposed solution
Remove the database status validation from the execution path and rely solely on
WithoutOverlappingMiddleware
for concurrency control. This would:Request for Feedback
status
field for execution control?Any insights or alternative approaches would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions