Skip to content

refactor: convert workspacesdk.AgentConn to an interface #19392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 20, 2025

Conversation

DanielleMaywood
Copy link
Contributor

@DanielleMaywood DanielleMaywood commented Aug 18, 2025

Fixes coder/internal#907

We convert workspacesdk.AgentConn to an interface and generate a mock for it. This allows writing coderd tests that rely on the agent's HTTP api to not have to set up an entire tailnet networking stack.

Comment on lines +328 to 333
if options.PrometheusRegistry == nil {
options.PrometheusRegistry = prometheus.NewRegistry()
}
if options.Authorizer == nil {
options.Authorizer = rbac.NewCachingAuthorizer(options.PrometheusRegistry)
if buildinfo.IsDev() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This re-order fixes a nil pointer access when no PrometheusRegistry and Authorizer are passed through options.

Comment on lines -1578 to -1652
t.Run("PayloadTooLarge", func(t *testing.T) {
t.Parallel()

var (
ctx = testutil.Context(t, testutil.WaitSuperLong)
logger = slogtest.Make(t, &slogtest.Options{IgnoreErrors: true}).Leveled(slog.LevelDebug)
mClock = quartz.NewMock(t)
updaterTickerTrap = mClock.Trap().TickerFunc("updaterLoop")
mCtrl = gomock.NewController(t)
mCCLI = acmock.NewMockContainerCLI(mCtrl)

client, db = coderdtest.NewWithDatabase(t, &coderdtest.Options{Logger: &logger})
user = coderdtest.CreateFirstUser(t, client)
r = dbfake.WorkspaceBuild(t, db, database.WorkspaceTable{
OrganizationID: user.OrganizationID,
OwnerID: user.UserID,
}).WithAgent(func(agents []*proto.Agent) []*proto.Agent {
return agents
}).Do()
)

// WebSocket limit is 4MiB, so we want to ensure we create _more_ than 4MiB worth of payload.
// Creating 20,000 fake containers creates a payload of roughly 7MiB.
var fakeContainers []codersdk.WorkspaceAgentContainer
for range 20_000 {
fakeContainers = append(fakeContainers, codersdk.WorkspaceAgentContainer{
CreatedAt: time.Now(),
ID: uuid.NewString(),
FriendlyName: uuid.NewString(),
Image: "busybox:latest",
Labels: map[string]string{
agentcontainers.DevcontainerLocalFolderLabel: "/home/coder/project",
agentcontainers.DevcontainerConfigFileLabel: "/home/coder/project/.devcontainer/devcontainer.json",
},
Running: false,
Ports: []codersdk.WorkspaceAgentContainerPort{},
Status: string(codersdk.WorkspaceAgentDevcontainerStatusRunning),
Volumes: map[string]string{},
})
}

mCCLI.EXPECT().List(gomock.Any()).Return(codersdk.WorkspaceAgentListContainersResponse{Containers: fakeContainers}, nil)
mCCLI.EXPECT().DetectArchitecture(gomock.Any(), gomock.Any()).Return("<none>", nil).AnyTimes()

_ = agenttest.New(t, client.URL, r.AgentToken, func(o *agent.Options) {
o.Logger = logger.Named("agent")
o.Devcontainers = true
o.DevcontainerAPIOptions = []agentcontainers.Option{
agentcontainers.WithClock(mClock),
agentcontainers.WithContainerCLI(mCCLI),
agentcontainers.WithWatcher(watcher.NewNoop()),
}
})

resources := coderdtest.NewWorkspaceAgentWaiter(t, client, r.Workspace.ID).Wait()
require.Len(t, resources, 1, "expected one resource")
require.Len(t, resources[0].Agents, 1, "expected one agent")
agentID := resources[0].Agents[0].ID

updaterTickerTrap.MustWait(ctx).MustRelease(ctx)
defer updaterTickerTrap.Close()

containers, closer, err := client.WatchWorkspaceAgentContainers(ctx, agentID)
require.NoError(t, err)
defer func() {
closer.Close()
}()

select {
case <-ctx.Done():
t.Fail()
case _, ok := <-containers:
require.False(t, ok)
}
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is now covered by the test in coderd/workspaceagents_internal_test.go.

We convert `workspacesdk.AgentConn` to an interface and generate a mock
for it. This allows writing `coderd` tests that rely on the agent's HTTP
api to not have to set up an entire tailnet networking stack.
@DanielleMaywood DanielleMaywood force-pushed the danielle/flake/payload-too-large branch from d579e1d to ea4de03 Compare August 18, 2025 13:53
@DanielleMaywood DanielleMaywood marked this pull request as ready for review August 18, 2025 14:31
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor looks fine to me 👍 Thanks for de-flaking this!

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The humongous new interface does not feel ideal, but I also don't have a better suggestion at the moment. One suggestion to improve the test but otherwise LGTM.

decodeCh := decoder.Chan()

// When: We close the `containersCh`
close(containersCh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have at least one test message sent first to confirm the base case works and then the exit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

Conn: conn,
opts: opts,
}
}

// AgentConn represents a connection to a workspace agent.
// @typescript-ignore AgentConn
type AgentConn struct {
type AgentConn interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is an SDK are we really ready to commit to this just to fix a flake? It feels a little heavy handed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this or something like it it's basically impossible to test Coderd routes that need to interact with Agents without the whole baggage of a real tailnet and a real agent.

Our testing is straining under the weight of spinning up all this stuff to test individual API handlers.

SDKs that we develop need to be designed with mocking in mind, otherwise we can't effectively unit test code that depends on them.

@sreya
Copy link
Collaborator

sreya commented Aug 18, 2025

@DanielleMaywood could you elaborate on the underlying cause of the flake?

@DanielleMaywood
Copy link
Contributor Author

@DanielleMaywood could you elaborate on the underlying cause of the flake?

Sure thing.

The underlying cause of the flake is that it is just incredibly slow. On my workspace it could take up to around 10 seconds, and that is dedicated hardware without contention. Running it in CI means results in it often exceeding the testutil.WaitLong duration of 25 seconds, and relying on testutil.WaitSuperLong feels like a code smell.

You can see in this flake that it took almost 8 seconds just for the tailnet to begin accepting TCP connections (test started at 05:29:36.490 and only accepted the connection at 05:29:42.288)
https://github.com/coder/coder/actions/runs/17031823892/job/48276142706#step:15:932

As you can see in the logs, most of the time is spent setting up the agent and the networking stack. We don't need any of this running to test for a specific behavior on coderd.

The idea behind the change to make workspacesdk.AgentConn an interface, rather than a struct, is that we can entirely eliminate the agent and the agent's tailnet. This removes a very slow part of the test, and allows more control over triggering certain behavior. This would also allow future tests to be written without this heavy dependency (so not just aiming for this specific flake).

@DanielleMaywood DanielleMaywood requested a review from sreya August 20, 2025 00:12
@DanielleMaywood DanielleMaywood merged commit 5e84d25 into main Aug 20, 2025
31 checks passed
@DanielleMaywood DanielleMaywood deleted the danielle/flake/payload-too-large branch August 20, 2025 09:00
@github-actions github-actions bot locked and limited conversation to collaborators Aug 20, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flake: TestWatchWorkspaceAgentDevcontainers/PayloadTooLarge
5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy