Managing Kubernetes Performance Scale
Managing Kubernetes Performance Scale
m
pl
im
en
ts
of
Managing
Kubernetes
Performance
at Scale
Operational Best Practices
REPORT
Deliver on the promise
of Kubernetes
Learn more:
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Managing Kuber‐
netes Performance at Scale, the cover image, and related trade dress are trademarks
of O’Reilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the
publisher’s views. While the publisher and the authors have used good faith efforts
to ensure that the information and instructions contained in this work are accurate,
the publisher and the authors disclaim all responsibility for errors or omissions,
including without limitation responsibility for damages resulting from the use of or
reliance on this work. Use of the information and instructions contained in this
work is at your own risk. If any code samples or other technology this work contains
or describes is subject to open source licenses or the intellectual property rights of
others, it is your responsibility to ensure that your use thereof complies with such
licenses and/or rights.
This work is part of a collaboration between O’Reilly and Cisco. See our statement of
editorial independence.
978-1-492-07820-3
[LSI]
Table of Contents
iii
Managing Kubernetes
Performance at Scale
Introduction
Enterprises are investing in Kubernetes for the promise of rapid
time-to-market, business agility, and elasticity at multicloud scale.
Modern containerized applications of loosely coupled services are
built, deployed, and iterated upon faster than ever before. The
potential for businesses—the ability to bring ideas to market faster—
has opened the Kubernetes adoption floodgates. Nevertheless, these
modern applications introduce extraordinary complexity that chal‐
lenges the best of teams. Ensuring that you build your platforms for
growth and scale today is critical to accelerating the successful adop‐
tion of Kubernetes and the cloud-native practices that enable
innovation-first operations.
This ebook is for Kubernetes operators who have a platform-first
strategy in their sights, and need to assure that all services perform
to meet Service-Level Objectives (SLOs) set by their organization.
Kubernetes administrators and systems architects will learn about
common challenges and operational mechanisms for running pro‐
duction Kubernetes infrastructure based on proven environments
across many organizations. As you learn about the software-defined
levers that Kubernetes provides, consider what must be managed by
you versus what can and should be managed by software.
Building for scale is all about automation. From the mindset and
culture to the technologies you adopt and the architectures you
introduce, managing elasticity necessitates that IT organizations
adopt automation to assure performance without introducing labor
1
or inefficiency. But automation is not a binary state of you are either
doing it or not. Everyone is automating. The crux of automation is
the extent to which you allow software to manage the system. From
container configuration to autoscaling to full-stack management,
there are levers to control things. The question is: are you control‐
ling them (deciding what to do and when to do it) or are you letting
software do it?
Managing Multitenancy
Kubernetes allows you to orchestrate and manage the life cycle of
containerized services. As adoption grows in your environment, you
will be challenged to manage a growing set of services from different
applications, each with its own resource demands without allowing
workloads to affect one another. Let’s first review how containerized
services gain access to compute resources of memory and CPU. You
can deploy pods without any capacity defined. This allows contain‐
ers to consume as much memory and CPU that is available on the
node, competing with other containers that can grow the same way.
Although this might sound like the ultimate definition of freedom,
there is nothing inherent to the orchestration of platforms that man‐
ages the trade-offs of consumption of resources, against all the
workload in the cluster, given the available capacity. Because pods
cannot “move” to redistribute workload throughout the cluster,
allowing all your services to have untethered access to any resource
could cause node starvation, performance issues such as congestion,
and would be more complicated to plan for onboarding new
services.
Although containers are cattle not pets, the services themselves can
be mission critical. You want your cattle to have enough room to
graze but not overtake the entire field. To avoid these scenarios, con‐
tainers can have specifications that define how much compute
resources can be reserved for only that container (a request) and the
upper capacity allowed (a limit). If you specify both limits and
requests, the ratio of these values, whether 1:1 or any:any, changes
the Quality of Service (QoS) for that workload. We don’t go into
detail here about setting limits and requests, and implications such
as QoS, but we do explore in the next section the benefits of
Autoscaling
Suppose that you have followed the aforementioned patterns to
assure that workloads will not risk other services: set up namespaces
with quotas—the requirement placed on any service to specify limits
and requests—and you are testing to make sure the container speci‐
fications are not too constrained or overallocated. This will help you
manage multitenancy, but it does not guarantee service performance
when demand increases. What else can you do?
The next direction to look at is how many instances, or replicas, you
need to effectively run your service under what you define as a rea‐
sonable amount of demand. Like container sizing, gather data on
how your services are performing running with a specific number of
replicas. Are you getting the correct throughput? Response time?
Manually adjust the replica number to see whether you can sustain a
predictable and desired SLO. This might require some adjustment
You can manage node resources either on platform using the Cluster
Autoscaler (CA) project, which is also part of the Google Kubernetes
Engine (GKE Cluster Autoscaler), off platform by using scale groups
(autoscaling groups, availability sets, etc.) offered by cloud provid‐
ers, or setting thresholds tracked from the on-premises infrastruc‐
ture that needs someone to make a decision about how and where
After you have the combination that balances the outcome that you
want to achieve, you need to repeat this process for the next service,
and the next. For your first application, and for services that are very
similar, the scale of this exercise can be manageable, but as more
services want to use HPA, and as services can change in how they
behave through different releases, this is a task to which you need to
allocate time and people.
As the number of services that utilize HPA policies grow, there are a
couple more questions that you must answer: how can I assure that
Capacity management
Congratulations! You’ve rolled out your first set of services using
Kubernetes, and you even utilized some of the techniques to influ‐
ence pod placement, scaling, and stay within business compliance.
Don’t get too comfortable. The success of this Phase 1 project has
opened the floodgates, and now more services want to get onboard.
Many more. What’s the golden rule? Never keep an application wait‐
ing. Even though the pods can deploy in a minute or less, planning
for growth can take longer—much longer.
Now you are ready to plan for growth. Borrowing directly from the
concepts laid out in this three-part series, “How Full is My Cluster,”
the first step is to take inventory in how you are managing multite‐
nancy, whether you are using quotas, and then account for the
Conclusion
Kubernetes promises rapid time-to-market, business agility, and
elasticity at multicloud scale. Demand for platforms that allow your
lines of business to bring ideas to market faster will quickly grow.
What you build today and the best practices you establish will last
for years to come. How will you continuously assure performance
and maintain compliance while minimizing cost?
References
Horizontal Pod Autoscaling
Cluster API (subproject of sig-cluster-lifecycle)
Vertical Pod Autoscaling
Cluster Autoscaler
References | 19
About the Authors
Eva Tuczai has more than 15 years of experience in IT solutions,
including application performance management, virtualization opti‐
mization, and automation and cloud native platform integration. As
part of Turbonomic’s Advanced Engineering team, she is committed
to bringing a customer-centric solution approach to solve challenges
with performance and efficiency, while leveraging elasticity.
Asena Hertz brings more than a decade of experience in disruptive
technologies, spanning workload automation, energy and resource
analytics, developer tools, and more. As a Product Marketing leader
at Turbonomic, Asena is passionate about the long-term impact and
role that cloud native architectures and distributed systems will have
on the future of IT and the way businesses bring new ideas to
market.