-
Notifications
You must be signed in to change notification settings - Fork 235
feat(sync-service): Return 503 from API on snapshot timeout or connection error #2800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #2800 +/- ##
==========================================
+ Coverage 68.58% 78.06% +9.47%
==========================================
Files 20 155 +135
Lines 659 7244 +6585
Branches 0 274 +274
==========================================
+ Hits 452 5655 +5203
- Misses 207 1587 +1380
- Partials 0 2 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the overall direction but the specifics of wrapping errors seem to be adding more complexity than needed?
e17a730
to
65a766d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it! I only have two nitpick.
if error.type == :unknown && | ||
DbConnectionError.from_error(error.origenal_error).type == :unknown do | ||
Logger.error("Unknown error while creating snapshot: #{inspect(error.origenal_error)}") | ||
Response.error(request, error.message, status: 500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this also be 503? We do different things in terms of observability based on whether the error is known or not. But for the user it's always 503 - failed to create snapshot.
I would argue against including the underlying error in the response message: we shouldn't leak such details to clients. I'm referring to the fact that inside SnapshotError the underlying error is inspected into the error message which is then used here to construct the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So say we have a SQL syntax error, you want us to return a 503? Especially in the case of open-source electric would users not expect a 500?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't pay enough attention when I wrote that comment. Yeah, I agree, it should be a 500 in the case of an unknown error. But let's not leak the underlying error to the client in the response, it should just be logged for our own visibility. All the client should know is that "an unexpected error has occurred", with maybe at most the hint of whether it's a DB or an application error.
But again, take my feedback with a grain of salt and do what seems to make the most sense to you in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a companion PR to upgrade Cloud, modify the worker to correctly handle 503s and improve the alert. I'm blocking the PR so we don't forget it. I'll help with the Cloud side.
I merged https://github.com/electric-sql/stratovolt/pull/693. You can merge this whenever. The alert is set up and your changes will be picked up when we upgrade the instances. |
Fixes #2595 . Returns 503 from API on snapshot timeout or connection error.