-
Notifications
You must be signed in to change notification settings - Fork 971
perf: don't call GetUserByID unnecessarily for Agents metrics loops #19395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
reduce calls to GetUserByID Signed-off-by: Callum Styan <callumstyan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing as the only referenced field of user
is Username
, why not use workspace.OwnerUsername
instead?
Good point, I'll make that change so that we're only caching the username in the map 👍 |
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
if username == "" { | ||
logger.Warn(ctx, "in prometheusmetrics.Agents the username on workspace was empty string, this should not be possible", | ||
slog.F("workspace_id", workspace.ID), | ||
slog.F("template_id", workspace.TemplateID)) | ||
// Fallback to GetUserByID if OwnerUsername is empty (e.g., in tests) | ||
user, err := db.GetUserByID(ctx, workspace.OwnerID) | ||
if err != nil { | ||
logger.Error(ctx, "can't get user from the database", slog.F("user_id", workspace.OwnerID), slog.Error(err)) | ||
agentsGauge.WithLabelValues(VectorOperationAdd, 0, user.Username, workspace.Name, templateName, templateVersionName) | ||
continue | ||
} | ||
username = user.Username |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this situation would only be possible if we completely messed up the view. In this case, many other things would be also broken. However, in this case we not only potentially do a whole bunch of user lookups, we also spam a bunch of log errors.
IMO this should just error loudly so it's very obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this situation would only be possible if we completely messed up the view.
Yes it seems that way, though I'm not sure if there's some other edge case where this could happen, in which case dropping the fallback path of querying GetUserByID
would techncially be a breaking change.
Removing the logging (warning level for the "workspace did not have a username set") feels reasonable. I would lean towards keeping the fallback DB query unless we have no requirements around breaking changes for metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would lean towards keeping the fallback DB query unless we have no requirements around breaking changes for metrics.
My point is, if we mess up the view in such a way that OwnerUsername
is empty, it's very likely that more things would be broken than metrics. Having a non-empty username is a pretty fundamental assumption baked into the codebase right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but it could be possible that OwnerUsername
is empty for only a very small subset of workspaces.
Is your suggestion that we error updating the metric as a whole (for all found workspace agents) if we see any empty username, or just for those workspaces which have an empty username (and do not call the GetUserByID query as a fallback)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two possible scenarios here:
-
Some users end up actually having empty usernames (which is theoretically possible based on the database schema). In this case, there would be no benefit from calling
GetUserByID
. -
Some error updating the view results in an empty string being returned for
owner_username
. As far as I can tell, this would break things like application routing, SSH access, and also be distinctly visible in the UI. In this case, we have three options:
a) Continue submitting metrics with an emptyusername
field,
b) Fall back to theGetUserByID
query,
c) Skip over the user/agent.
d) Refuse to generate potentially invalid metrics at all, error and exit early.
a) could actually signal the issue more quickly, but at the cost of messed up metrics.
b) would likely correct the issue, but we would suddenly be performing more database queries unexpectedly.
c) would also surface the error in a way
d) might actually be overkill now that I think about it.
Out of curiousity, I decided to see what would happen if a migration messed up the view and pushed #19426. The main takeaway I got from that is that a number of existing tests failed, but -- most notably -- coderd/prometheusmetrics
and coderd/workspacestats
didn't fail. If this is an area of concern, I'd suggest modifying the existing tests to guard for this so that we catch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiousity, I decided to see what would happen if a migration messed up the view and pushed #19426. The main takeaway I got from that is that a number of existing tests failed, but -- most notably -- coderd/prometheusmetrics and coderd/workspacestats didn't fail. If this is an area of concern, I'd suggest modifying the existing tests to guard for this so that we catch it.
IIUC that's because we would still have the GetUserByID
call, so the view doesn't matter, we're taking the workspace owner ID and looking up that user.
I'm not sure I understand your point about B would likely correct the issue, but we would suddenly be performing more database queries unexpectedly
since we're currently making a DB call for every active workspace.
Unless we want to introduce a potentially breaking change I think B is our only option. Otherwise we can remove the fallback query and go with option C, then workspaces with correct usernames set would still emit agent related metrics.
continue | ||
username := workspace.OwnerUsername | ||
// This should never be possible, but we'll guard against it just in case. | ||
// 1. The owner_id field on the workspaces table is a reference to IDs in the users table and has a `NOT NULL` constraint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not just add a foreign key constraint on this field? Seems like an odd omission.
Along with that, a constraint to enforce the minimum length of the username would add a lot more integrity.
This would likely render the rest of this discussion moot as this situation would not be possible.
I do appreciate the defensiveness of this code though, thanks @cstyan 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good observation @dannykopping - I might suggest a separate PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 👍 that PR should block this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the FK constraint suggestion, what would that look like or do for us? The query here just uses a view workspaces_expanded
, and the ownerID
on the workspaces
table is already a FK to users
. So I think it's just the constraint to enforce a minimum username length that we need to add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh whoops, I didn't see the FK at the bottom of the file: https://github.com/coder/coder/blob/main/coderd/database/dump.sql#L3325-L3326
Username length and format, via regex, are already enforced at the application layer, but we have some code paths with database queries where we could optimize away many of the DB query calls if we could be sure at the database level that the username is never an empty string. For example: #19395 --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>
constraint to users table Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
At the moment, the loop which retrieves and updates the values of the agents metrics excessively calls
GetUserByID
(a DB query). First it retrieves a list of all workspaces, filtering out inactive agents (not entirely clear to me whether this is non-running workspaces, or just dead agents), and then iterates over those workspaces to get the rest of the relevant data for the metrics. The next call isGetUserByID
forworkspace.OwnerID
This should at least partially resolve coder/internal#726
by caching seen Useruuid.UUID
in a map for each iteration of the loop.UPDATE: we now have a constraint for the
username
field in theusers
table which allows us to safely access the username field from the workspaces_expanded view. See #19453UPDATE: It looks like, in theory, the calls here forGetUserByID
should not even be necessary as we already have adatabase.Workspace
object which also already has the owner ID and Username.I left comments in both spots as to why the username should never be empty on the workspace again, but I'll reiterate here:1. Theowner_id
field on theworkspaces
table is a FK reference to IDs in theusers
table and has aNOT NULL
constraint, so the owner fields of a workspace will always be populated2. While the users table technically only has a constraint that the username has to beNOT NULL
(meaning empty string is valid), at user creation time our httpmw package enforces non-empty usernames (for example it callscodersdk.NameValid
which enforces that the name is at least 1 character and fits theUsernameValidRegex
)3. Theworkspaces_expanded
view has an inner join onworkspaces.owner_id = visible_users.id
, and if the owner is valid in theusers
table (whichvisible_users
is a view of) then they will have a username set