-
Notifications
You must be signed in to change notification settings - Fork 907
feat(agent/agentcontainers): implement sub agent injection #18245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d49f84e
to
011a8aa
Compare
91ff08e
to
3960774
Compare
011a8aa
to
63f93bc
Compare
3960774
to
1cf1905
Compare
63f93bc
to
0deaab8
Compare
1cf1905
to
f190036
Compare
0deaab8
to
8796ba3
Compare
dc146ab
to
d1447f3
Compare
8796ba3
to
adbfd45
Compare
d1447f3
to
3547372
Compare
This change adds support for sub agent creation and injection into dev containers. Closes coder/internal#621
3547372
to
7358ee0
Compare
a8e4495
to
eb29bba
Compare
I'm still working on an integration test and the existing mocks are being a PITA (think those are about sorted now though). Promoting this to "ready for review" to get some feedback on the approach @DanielleMaywood @johnstcn. (Also going to break out the "follow-up PR" tasks into new issues before merging this.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have to read some more but adding my comments so far.
agent/agentcontainers/api.go
Outdated
err := api.dccli.Exec(agentCtx, dc.WorkspaceFolder, dc.ConfigPath, agentPath, []string{"agent"}, | ||
WithContainerID(container.ID), | ||
WithRemoteEnv( | ||
"CODER_AGENT_URL="+api.subAgentURL, | ||
"CODER_AGENT_TOKEN="+agent.AuthToken.String(), | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to background this? If the parent agent ends up crashing and being restarted, we'll lose the sub-agents and have to re-inject them. We can keep track of the expected PID in e.g. /.coder-agent/pid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably background it either on the host or inside the container, but not doing so has some nice properties:
- We immediately discover if a sub agent exits/crashes and we could restart immediately (we don't currently)
- Job control is simpler (simply cancel the context vs looking up processes and verifying against pid)
- With prebuilds, we can exit all sub-agents on claim and re-inject afterwards to ensure a clean slate
For the case where the parent agent crashes, keeping those sub-agents may be a bit hit-and-miss and those dev containers could be affected anyway on agent startup. I'm not aware of agents crashing though so this might not even be a concern we need to be mindful of now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough!
if _, err := api.ccli.ExecAs(ctx, container.ID, "root", "setcap", "cap_net_admin+ep", coderPathInsideContainer); err != nil { | ||
logger.Warn(ctx, "set CAP_NET_ADMIN on agent binary failed", slog.Error(err)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably fail unless the container is running as privileged or has the specific CAP_NET_ADMIN privilege set on the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the comment, this is an optional networking boost. (See regular agent bootstrap script, I'll update the comment to reference it.) Did you have some action in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could check for both of these things before trying? Not a blocker though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I don't think it's very high priority but let's create a ticket for future enhancement. 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agent/agentcontainers/api.go
Outdated
// Make sure the agent binary is executable so we can run it. | ||
if _, err := api.ccli.ExecAs(ctx, container.ID, "root", "chmod", "+x", coderPathInsideContainer); err != nil { | ||
return xerrors.Errorf("set agent binary executable: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to chown
the binary so that it's readable by the default container user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout. I didn't consider this but docker cp
seems to follow the permissions of the file on disk. So unless we chown
it could be nonsense within the container (non-existent user, etc).
It's unlikely that the permissions will be bad for the user (typically 0755), but we could improve it for sure. It might make sense to turn this into a script rather than N amount of docker exec
s.
|
||
logger.Info(ctx, "starting subagent in dev container") | ||
|
||
err := api.dccli.Exec(agentCtx, dc.WorkspaceFolder, dc.ConfigPath, agentPath, []string{"agent"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we try to execute this as a non-root user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK this will get executed as the remote user configured by devcontainer.json
(or if unconfigured, container user), which seems like the correct behavior to me.
injected := make(map[uuid.UUID]bool, len(api.injectedSubAgentProcs)) | ||
for _, proc := range api.injectedSubAgentProcs { | ||
injected[proc.agent.ID] = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be a map[uuid.UUID]struct{}
instead, and then below on line 888 just check for _, found := injected[agent.ID]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't foresee the memory savings being necessary here (will we have 1000s of sub agents?). The current form reads better and is simpler to use IMO (I always prefer this form for readability where applicable).
for _, agent := range agents { | ||
if injected[agent.ID] { | ||
continue | ||
} | ||
err := api.subAgentClient.Delete(ctx, agent.ID) | ||
if err != nil { | ||
api.logger.Error(ctx, "failed to delete agent", | ||
slog.Error(err), | ||
slog.F("agent_id", agent.ID), | ||
slog.F("agent_name", agent.Name), | ||
) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set an upper bound on deletion attempts and raise if more than say 3 attempts fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting silently ignoring failures unless >= 3 fail? Or perhaps adding retry logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mainly worried about spamming error logs into the void.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will be part of the parent agent log 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave it as-is for now, but I think if this does start happening frequently (or all the time) it may be difficult to catch if it just goes into the parent agent log.
466bc6b
to
780483b
Compare
abe9116
to
d5eb3fc
Compare
@DanielleMaywood @johnstcn I've added I also added |
This change adds support for sub agent creation and injection into dev
containers.
TODO:
devcontainer.json
parsing, follow-up PR).customizations.coder.devcontainer.name
from docker container label (materializeddevcontainer.json
on creation, follow-up PR)Updates coder/internal#621