Skip to content

Prometheus metrics aggregator pegging coderd CPU and potentially causing OOM kill #11775

@mafredri

Description

@mafredri

In the latest scale test, we ran into coderd restarts and upon inspecting logs, we saw that the OOM killer was summoned, that we saw a lot of log messages from the aggregator: update queue is full.

As can be seen from these graphs, both CPU and memory usage spikes coincided with the restart(s).

image

What can also be observed above is that one coderd instance had it's CPU pegged, upon CPU/trace inspection, the finger is once again pointed towards the aggregator:

image

The same is shown in the CPU profile:

image

One code-path that's being executed for a while (as shown in the trace above), is this loop:

for _, m := range req.metrics {

// ping @mtojek for insights since you worked on the initial feature.

Metadata

Metadata

Assignees

Labels

apiArea: HTTP APIs1Bugs that break core workflows. Only humans may set this.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy