Skip to content

feat(agent/agentcontainers): implement sub agent injection #18245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

mafredri
Copy link
Member

@mafredri mafredri commented Jun 5, 2025

This change adds support for sub agent creation and injection into dev
containers.

TODO:

  • Pass the correct access URL to sub agent
  • Add integration test
  • Use correct directory for sub agent (requires on-disk devcontainer.json parsing, follow-up PR)
  • Parse .customizations.coder.devcontainer.name from docker container label (materialized devcontainer.json on creation, follow-up PR)
  • Add support for downloading agent binaries for different architectures (follow-up PR)
  • Make sure there are reduced capabilities for sub-agents (e.g. no containers API, follow-up PR)

Updates coder/internal#621

  1. chore(agent): update agent proto client #18242
  2. feat(agent/agentcontainers): refactor Lister to ContainerCLI and implement new methods #18243
  3. feat(agent/agentcontainers): add Exec method to devcontainers CLI #18244
  4. 👉🏻 feat(agent/agentcontainers): implement sub agent injection #18245

@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-3 branch from d49f84e to 011a8aa Compare June 5, 2025 12:51
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from 91ff08e to 3960774 Compare June 5, 2025 12:52
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-3 branch from 011a8aa to 63f93bc Compare June 5, 2025 13:59
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from 3960774 to 1cf1905 Compare June 5, 2025 13:59
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-3 branch from 63f93bc to 0deaab8 Compare June 6, 2025 08:44
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from 1cf1905 to f190036 Compare June 6, 2025 08:44
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-3 branch from 0deaab8 to 8796ba3 Compare June 6, 2025 09:30
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch 2 times, most recently from dc146ab to d1447f3 Compare June 6, 2025 09:45
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-3 branch from 8796ba3 to adbfd45 Compare June 6, 2025 11:20
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from d1447f3 to 3547372 Compare June 6, 2025 11:27
Base automatically changed from mafredri/feat-agent-devcontainer-injection-3 to main June 6, 2025 11:39
This change adds support for sub agent creation and injection into dev
containers.

Closes coder/internal#621
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from 3547372 to 7358ee0 Compare June 6, 2025 11:39
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from a8e4495 to eb29bba Compare June 6, 2025 15:59
@mafredri
Copy link
Member Author

mafredri commented Jun 6, 2025

I'm still working on an integration test and the existing mocks are being a PITA (think those are about sorted now though). Promoting this to "ready for review" to get some feedback on the approach @DanielleMaywood @johnstcn.

(Also going to break out the "follow-up PR" tasks into new issues before merging this.)

@mafredri mafredri marked this pull request as ready for review June 6, 2025 16:14
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have to read some more but adding my comments so far.

Comment on lines 1093 to 1099
err := api.dccli.Exec(agentCtx, dc.WorkspaceFolder, dc.ConfigPath, agentPath, []string{"agent"},
WithContainerID(container.ID),
WithRemoteEnv(
"CODER_AGENT_URL="+api.subAgentURL,
"CODER_AGENT_TOKEN="+agent.AuthToken.String(),
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to background this? If the parent agent ends up crashing and being restarted, we'll lose the sub-agents and have to re-inject them. We can keep track of the expected PID in e.g. /.coder-agent/pid

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably background it either on the host or inside the container, but not doing so has some nice properties:

  1. We immediately discover if a sub agent exits/crashes and we could restart immediately (we don't currently)
  2. Job control is simpler (simply cancel the context vs looking up processes and verifying against pid)
  3. With prebuilds, we can exit all sub-agents on claim and re-inject afterwards to ensure a clean slate

For the case where the parent agent crashes, keeping those sub-agents may be a bit hit-and-miss and those dev containers could be affected anyway on agent startup. I'm not aware of agents crashing though so this might not even be a concern we need to be mindful of now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough!

Comment on lines +1009 to +1011
if _, err := api.ccli.ExecAs(ctx, container.ID, "root", "setcap", "cap_net_admin+ep", coderPathInsideContainer); err != nil {
logger.Warn(ctx, "set CAP_NET_ADMIN on agent binary failed", slog.Error(err))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably fail unless the container is running as privileged or has the specific CAP_NET_ADMIN privilege set on the container?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the comment, this is an optional networking boost. (See regular agent bootstrap script, I'll update the comment to reference it.) Did you have some action in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could check for both of these things before trying? Not a blocker though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I don't think it's very high priority but let's create a ticket for future enhancement. 👍🏻

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 1002 to 1005
// Make sure the agent binary is executable so we can run it.
if _, err := api.ccli.ExecAs(ctx, container.ID, "root", "chmod", "+x", coderPathInsideContainer); err != nil {
return xerrors.Errorf("set agent binary executable: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to chown the binary so that it's readable by the default container user?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout. I didn't consider this but docker cp seems to follow the permissions of the file on disk. So unless we chown it could be nonsense within the container (non-existent user, etc).

It's unlikely that the permissions will be bad for the user (typically 0755), but we could improve it for sure. It might make sense to turn this into a script rather than N amount of docker execs.


logger.Info(ctx, "starting subagent in dev container")

err := api.dccli.Exec(agentCtx, dc.WorkspaceFolder, dc.ConfigPath, agentPath, []string{"agent"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we try to execute this as a non-root user?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this will get executed as the remote user configured by devcontainer.json (or if unconfigured, container user), which seems like the correct behavior to me.

Comment on lines +879 to +882
injected := make(map[uuid.UUID]bool, len(api.injectedSubAgentProcs))
for _, proc := range api.injectedSubAgentProcs {
injected[proc.agent.ID] = true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably be a map[uuid.UUID]struct{} instead, and then below on line 888 just check for _, found := injected[agent.ID]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't foresee the memory savings being necessary here (will we have 1000s of sub agents?). The current form reads better and is simpler to use IMO (I always prefer this form for readability where applicable).

Comment on lines +887 to +899
for _, agent := range agents {
if injected[agent.ID] {
continue
}
err := api.subAgentClient.Delete(ctx, agent.ID)
if err != nil {
api.logger.Error(ctx, "failed to delete agent",
slog.Error(err),
slog.F("agent_id", agent.ID),
slog.F("agent_name", agent.Name),
)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set an upper bound on deletion attempts and raise if more than say 3 attempts fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting silently ignoring failures unless >= 3 fail? Or perhaps adding retry logic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly worried about spamming error logs into the void.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be part of the parent agent log 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave it as-is for now, but I think if this does start happening frequently (or all the time) it may be difficult to catch if it just goes into the parent agent log.

@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from 466bc6b to 780483b Compare June 9, 2025 09:30
@mafredri mafredri force-pushed the mafredri/feat-agent-devcontainer-injection-4 branch from abe9116 to d5eb3fc Compare June 9, 2025 16:04
@mafredri
Copy link
Member Author

mafredri commented Jun 9, 2025

@DanielleMaywood @johnstcn I've added WithContainerLabelIncludeFilter to filter out injection in tests and prevent them from interfering with non-test dev containers.

I also added WithSubAgentEnv to update the autostart integration test in agent package. It now verifies that a sub agent is started as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy