Policies - Kubernetes
Policies - Kubernetes
Policies
Manage security and best-practices with policies.
1: Limit Ranges
2: Resource Quotas
3: Process ID Limits And Reservations
4: Node Resource Managers
NetworkPolicies can be used to restrict ingress and egress traffic for a workload.
LimitRanges manage resource allocation constraints across different object kinds.
ResourceQuotas limit resource consumption for a namespace.
Kubernetes has several built-in admission controllers that are configurable via the API server
--enable-admission-plugins flag.
Details on admission controllers, with the complete list of available admission controllers, are
documented in a dedicated section:
Admission Controllers
Dynamic admission controllers can be used to apply policies on API requests and trigger other
policy-based workflows. A dynamic admission controller can perform complex checks
including those that require retrieval of other cluster resources and external data. For
example, an image verification check can lookup data from OCI registries to validate the
container image signatures and attestations.
https://kubernetes.io/docs/concepts/policy/_print/ 1/21
6/6/23, 3:59 PM Policies | Kubernetes
Implementations
Note: This section links to third party projects that provide functionality required by
Kubernetes. The Kubernetes project authors aren't responsible for these projects, which
are listed alphabetically. To add a project to this list, read the content guide before
submitting a change. More information.
Dynamic Admission Controllers that act as flexible policy engines are being developed in the
Kubernetes ecosystem, such as:
Kubewarden
Kyverno
OPA Gatekeeper
Polaris
Process ID limts and reservations are used to limit and reserve allocatable PIDs.
Node Resource Managers can manage compute, memory, and device resources for
latency-critical and high-throughput workloads.
https://kubernetes.io/docs/concepts/policy/_print/ 2/21
6/6/23, 3:59 PM Policies | Kubernetes
1 - Limit Ranges
By default, containers run with unbounded compute resources on a Kubernetes cluster. Using
Kubernetes resource quotas, administrators (also termed cluster operators) can restrict
consumption and creation of cluster resources (such as CPU time, memory, and persistent
storage) within a specified namespace. Within a namespace, a Pod can consume as much CPU
and memory as is allowed by the ResourceQuotas that apply to that namespace. As a cluster
operator, or as a namespace-level administrator, you might also be concerned about making
sure that a single object cannot monopolize all available resources within a namespace.
A LimitRange is a policy to constrain the resource allocations (limits and requests) that you
can specify for each applicable object kind (such as Pod or PersistentVolumeClaim) in a
namespace.
Enforce minimum and maximum compute resources usage per Pod or Container in a
namespace.
Enforce minimum and maximum storage request per PersistentVolumeClaim in a
namespace.
Enforce a ratio between request and limit for a resource in a namespace.
Set default request/limit for compute resources in a namespace and automatically inject
them to Containers at runtime.
https://kubernetes.io/docs/concepts/policy/_print/ 3/21
6/6/23, 3:59 PM Policies | Kubernetes
concepts/policy/limit-range/problematic-limit-range.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: cpu-resource-constraint
spec:
limits:
- default: # this section defines default limits
cpu: 500m
defaultRequest: # this section defines default requests
cpu: 500m
max: # max and min define the limit range
cpu: "1"
min:
cpu: 100m
type: Container
along with a Pod that declares a CPU resource request of 700m , but not a limit:
concepts/policy/limit-range/example-conflict-with-limitrange-cpu.yaml
apiVersion: v1
kind: Pod
metadata:
name: example-conflict-with-limitrange-cpu
spec:
containers:
- name: demo
image: registry.k8s.io/pause:2.0
resources:
requests:
cpu: 700m
then that Pod will not be scheduled, failing with an error similar to:
If you set both requestand limit , then that new Pod will be scheduled successfully even
with the same LimitRange in place:
concepts/policy/limit-range/example-no-conflict-with-limitrange-cpu.yaml
https://kubernetes.io/docs/concepts/policy/_print/ 4/21
6/6/23, 3:59 PM Policies | Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: example-no-conflict-with-limitrange-cpu
spec:
containers:
- name: demo
image: registry.k8s.io/pause:2.0
resources:
requests:
cpu: 700m
limits:
cpu: 700m
In a 2 node cluster with a capacity of 8 GiB RAM and 16 cores, constrain Pods in a
namespace to request 100m of CPU with a max limit of 500m for CPU and request
200Mi for Memory with a max limit of 600Mi for Memory.
Define default CPU limit and request to 150m and memory default request to 300Mi for
Containers started with no cpu and memory requests in their specs.
In the case where the total limits of the namespace is less than the sum of the limits of the
Pods/Containers, there may be contention for resources. In this case, the Containers or Pods
will not be created.
Neither contention nor changes to a LimitRange will affect already created resources.
What's next
For examples on using limits, see:
Refer to the LimitRanger design document for context and historical information.
https://kubernetes.io/docs/concepts/policy/_print/ 5/21
6/6/23, 3:59 PM Policies | Kubernetes
2 - Resource Quotas
When several users or teams share a cluster with a fixed number of nodes, there is a concern
that one team could use more than its fair share of resources.
Different teams work in different namespaces. This can be enforced with RBAC.
Users create resources (pods, services, etc.) in the namespace, and the quota system
tracks usage to ensure it does not exceed hard resource limits defined in a
ResourceQuota.
If creating or updating a resource violates a quota constraint, the request will fail with
HTTP status code 403 FORBIDDEN with a message explaining the constraint that would
have been violated.
If quota is enabled in a namespace for compute resources like cpu and memory , users
must specify requests or limits for those values; otherwise, the quota system may reject
pod creation. Hint: Use the LimitRanger admission controller to force defaults for pods
that make no compute resource requirements.
Note:
For cpu and memory resources, ResourceQuotas enforce that every (new) pod in
that namespace sets a limit for that resource. If you enforce a resource quota in a
namespace for either cpu or memory , you, and other clients, must specify either
requests or limits for that resource, for every new Pod you submit. If you don't,
the control plane may reject admission for that Pod.
For other resources: ResourceQuota works and will ignore pods in the namespace
without setting a limit or request for that resource. It means that you can create a
new pod without limit/request ephemeral storage if the resource quota limits the
ephemeral storage of this namespace. You can use a LimitRange to automatically
set a default request for these resources.
Examples of policies that could be created using namespaces and quotas are:
In a cluster with a capacity of 32 GiB RAM, and 16 cores, let team A use 20 GiB and 10
cores, let B use 10GiB and 4 cores, and hold 2GiB and 2 cores in reserve for future
allocation.
Limit the "testing" namespace to using 1 core and 1GiB RAM. Let the "production"
namespace use any amount.
In the case where the total capacity of the cluster is less than the sum of the quotas of the
namespaces, there may be contention for resources. This is handled on a first-come-first-
served basis.
Neither contention nor changes to quota will affect already created resources.
https://kubernetes.io/docs/concepts/policy/_print/ 6/21
6/6/23, 3:59 PM Policies | Kubernetes
Resource
Name Description
limits.cpu Across all pods in a non-terminal state, the sum of CPU limits cannot
exceed this value.
limits.memor Across all pods in a non-terminal state, the sum of memory limits
y cannot exceed this value.
requests.cp Across all pods in a non-terminal state, the sum of CPU requests
u cannot exceed this value.
requests.mem Across all pods in a non-terminal state, the sum of memory requests
ory cannot exceed this value.
hugepages- Across all pods in a non-terminal state, the number of huge page
<size> requests of the specified size cannot exceed this value.
As overcommit is not allowed for extended resources, it makes no sense to specify both
requests and limits for the same extended resource in a quota. So for extended
resources, only quota items with prefix requests. is allowed for now.
Take the GPU resource as an example, if the resource name is nvidia.com/gpu , and you want
to limit the total number of GPUs requested in a namespace to 4, you can define a quota as
follows:
requests.nvidia.com/gpu: 4
https://kubernetes.io/docs/concepts/policy/_print/ 7/21
6/6/23, 3:59 PM Policies | Kubernetes
In addition, you can limit consumption of storage resources based on associated storage-
class.
For example, if an operator wants to quota storage with gold storage class separate from
bronze storage class, the operator can define a quota as follows:
gold.storageclass.storage.k8s.io/requests.storage: 500Gi
bronze.storageclass.storage.k8s.io/requests.storage: 100Gi
In release 1.8, quota support for local ephemeral storage is added as an alpha feature:
requests.ephemera Across all pods in the namespace, the sum of local ephemeral
l-storage storage requests cannot exceed this value.
limits.ephemeral- Across all pods in the namespace, the sum of local ephemeral
storage storage limits cannot exceed this value.
Note: When using a CRI container runtime, container logs will count against the
ephemeral storage quota. This can result in the unexpected eviction of pods that have
exhausted their storage quotas. Refer to Logging Architecture for details.
Here is an example set of resources users may want to put under object count quota:
count/persistentvolumeclaims
count/services
count/secrets
count/configmaps
count/replicationcontrollers
count/deployments.apps
count/replicasets.apps
https://kubernetes.io/docs/concepts/policy/_print/ 8/21
6/6/23, 3:59 PM Policies | Kubernetes
count/statefulsets.apps
count/jobs.batch
count/cronjobs.batch
The same syntax can be used for custom resources. For example, to create a quota on a
widgets custom resource in the example.com API group, use count/widgets.example.com .
When using count/* resource quota, an object is charged against the quota if it exists in
server storage. These types of quotas are useful to protect against exhaustion of storage
resources. For example, you may want to limit the number of Secrets in a server given their
large size. Too many Secrets in a cluster can actually prevent servers and controllers from
starting. You can set a quota for Jobs to protect against a poorly configured CronJob. CronJobs
that create too many Jobs in a namespace can lead to a denial of service.
It is also possible to do generic object count quota on a limited set of resources. The following
types are supported:
configmaps The total number of ConfigMaps that can exist in the namespace.
pods The total number of Pods in a non-terminal state that can exist in the
namespace. A pod is in a terminal state if .status.phase in
(Failed, Succeeded) is true.
resourcequot The total number of ResourceQuotas that can exist in the namespace.
as
services The total number of Services that can exist in the namespace.
services.loa The total number of Services of type LoadBalancer that can exist
dbalancers in the namespace.
services.nod The total number of Services of type NodePort that can exist in the
eports namespace.
secrets The total number of Secrets that can exist in the namespace.
For example, pods quota counts and enforces a maximum on the number of pods created
in a single namespace that are not terminal. You might want to set a pods quota on a
namespace to avoid the case where a user creates many small pods and exhausts the
cluster's supply of Pod IPs.
Quota Scopes
Each quota can have an associated set of scopes . A quota will only measure usage for a
resource if it matches the intersection of enumerated scopes.
When a scope is added to the quota, it limits the number of resources it supports to those
that pertain to the scope. Resources specified on the quota outside of the allowed set results
in a validation error.
https://kubernetes.io/docs/concepts/policy/_print/ 9/21
6/6/23, 3:59 PM Policies | Kubernetes
Scope Description
pods
pods
cpu
memory
requests.cpu
requests.memory
limits.cpu
limits.memory
Note that you cannot specify both the Terminating and the NotTerminating scopes in the
same quota, and you cannot specify both the BestEffort and NotBestEffort scopes in the
same quota either.
In
NotIn
Exists
DoesNotExist
When using one of the following values as the scopeName when defining the scopeSelector ,
the operator must be Exists .
Terminating
NotTerminating
BestEffort
NotBestEffort
If the operator is In or NotIn , the values field must have at least one value. For example:
scopeSelector:
matchExpressions:
- scopeName: PriorityClass
operator: In
values:
- middle
https://kubernetes.io/docs/concepts/policy/_print/ 10/21
6/6/23, 3:59 PM Policies | Kubernetes
If the operator is Exists or DoesNotExist , the values field must NOT be specified.
Pods can be created at a specific priority. You can control a pod's consumption of system
resources based on a pod's priority, by using the scopeSelector field in the quota spec.
A quota is matched and consumed only if scopeSelector in the quota spec selects the pod.
When quota is scoped for priority class using scopeSelector field, quota object is restricted
to track only following resources:
pods
cpu
memory
ephemeral-storage
limits.cpu
limits.memory
limits.ephemeral-storage
requests.cpu
requests.memory
requests.ephemeral-storage
This example creates a quota object and matches it with pods at specific priorities. The
example works as follows:
Pods in the cluster have one of the three priority classes, "low", "medium", "high".
One quota object is created for each priority.
https://kubernetes.io/docs/concepts/policy/_print/ 11/21
6/6/23, 3:59 PM Policies | Kubernetes
apiVersion: v1
kind: List
items:
- apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-high
spec:
hard:
cpu: "1000"
memory: 200Gi
pods: "10"
scopeSelector:
matchExpressions:
- operator : In
scopeName: PriorityClass
values: ["high"]
- apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-medium
spec:
hard:
cpu: "10"
memory: 20Gi
pods: "10"
scopeSelector:
matchExpressions:
- operator : In
scopeName: PriorityClass
values: ["medium"]
- apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-low
spec:
hard:
cpu: "5"
memory: 10Gi
pods: "10"
scopeSelector:
matchExpressions:
- operator : In
scopeName: PriorityClass
values: ["low"]
resourcequota/pods-high created
resourcequota/pods-medium created
resourcequota/pods-low created
https://kubernetes.io/docs/concepts/policy/_print/ 12/21
6/6/23, 3:59 PM Policies | Kubernetes
Name: pods-high
Namespace: default
Resource Used Hard
-------- ---- ----
cpu 0 1k
memory 0 200Gi
pods 0 10
Name: pods-low
Namespace: default
Resource Used Hard
-------- ---- ----
cpu 0 5
memory 0 10Gi
pods 0 10
Name: pods-medium
Namespace: default
Resource Used Hard
-------- ---- ----
cpu 0 10
memory 0 20Gi
pods 0 10
Create a pod with priority "high". Save the following YAML to a file high-priority-pod.yml .
apiVersion: v1
kind: Pod
metadata:
name: high-priority
spec:
containers:
- name: high-priority
image: ubuntu
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
resources:
requests:
memory: "10Gi"
cpu: "500m"
limits:
memory: "10Gi"
cpu: "500m"
priorityClassName: high
Verify that "Used" stats for "high" priority quota, pods-high , has changed and that the other
two quotas are unchanged.
https://kubernetes.io/docs/concepts/policy/_print/ 13/21
6/6/23, 3:59 PM Policies | Kubernetes
Name: pods-high
Namespace: default
Resource Used Hard
-------- ---- ----
cpu 500m 1k
memory 10Gi 200Gi
pods 1 10
Name: pods-low
Namespace: default
Resource Used Hard
-------- ---- ----
cpu 0 5
memory 0 10Gi
pods 0 10
Name: pods-medium
Namespace: default
Resource Used Hard
-------- ---- ----
cpu 0 10
memory 0 20Gi
pods 0 10
Operators can use CrossNamespacePodAffinity quota scope to limit which namespaces are
allowed to have pods with affinity terms that cross namespaces. Specifically, it controls which
pods are allowed to set namespaces or namespaceSelector fields in pod affinity terms.
Preventing users from using cross-namespace affinity terms might be desired since a pod
with anti-affinity constraints can block pods from all other namespaces from getting
scheduled in a failure domain.
Using this scope operators can prevent certain namespaces ( foo-ns in the example below)
from having pods that use cross-namespace pod affinity by creating a resource quota object
in that namespace with CrossNamespaceAffinity scope and hard limit of 0:
apiVersion: v1
kind: ResourceQuota
metadata:
name: disable-cross-namespace-affinity
namespace: foo-ns
spec:
hard:
pods: "0"
scopeSelector:
matchExpressions:
- scopeName: CrossNamespaceAffinity
If operators want to disallow using namespaces and namespaceSelector by default, and only
allow it for specific namespaces, they could configure CrossNamespaceAffinity as a limited
resource by setting the kube-apiserver flag --admission-control-config-file to the path of the
following configuration file:
https://kubernetes.io/docs/concepts/policy/_print/ 14/21
6/6/23, 3:59 PM Policies | Kubernetes
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: "ResourceQuota"
configuration:
apiVersion: apiserver.config.k8s.io/v1
kind: ResourceQuotaConfiguration
limitedResources:
- resource: pods
matchScopes:
- scopeName: CrossNamespaceAffinity
With the above configuration, pods can use namespaces and namespaceSelector in pod
affinity only if the namespace where they are created have a resource quota object with
CrossNamespaceAffinity scope and a hard limit greater than or equal to the number of pods
using those fields.
If the quota has a value specified for requests.cpu or requests.memory , then it requires that
every incoming container makes an explicit request for those resources. If the quota has a
value specified for limits.cpu or limits.memory , then it requires that every incoming
container specifies an explicit limit for those resources.
https://kubernetes.io/docs/concepts/policy/_print/ 15/21
6/6/23, 3:59 PM Policies | Kubernetes
NAME AGE
compute-resources 30s
object-counts 32s
Name: compute-resources
Namespace: myspace
Resource Used Hard
-------- ---- ----
limits.cpu 0 2
limits.memory 0 2Gi
requests.cpu 0 1
requests.memory 0 1Gi
requests.nvidia.com/gpu 0 4
Name: object-counts
Namespace: myspace
Resource Used Hard
-------- ---- ----
configmaps 0 10
persistentvolumeclaims 0 4
pods 0 4
replicationcontrollers 0 20
secrets 1 10
services 0 10
services.loadbalancers 0 2
Kubectl also supports object count quota for all standard namespaced resources using the
syntax count/<resource>.<group> :
https://kubernetes.io/docs/concepts/policy/_print/ 16/21
6/6/23, 3:59 PM Policies | Kubernetes
Name: test
Namespace: myspace
Resource Used Hard
-------- ---- ----
count/deployments.apps 1 2
count/pods 2 3
count/replicasets.apps 1 4
count/secrets 1 4
Note that resource quota divides up aggregate cluster resources, but it creates no restrictions
around nodes: pods from several namespaces may run on the same node.
With this mechanism, operators are able to restrict usage of certain high priority classes to a
limited number of namespaces and not every namespace will be able to consume these
priority classes by default.
https://kubernetes.io/docs/concepts/policy/_print/ 17/21
6/6/23, 3:59 PM Policies | Kubernetes
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: "ResourceQuota"
configuration:
apiVersion: apiserver.config.k8s.io/v1
kind: ResourceQuotaConfiguration
limitedResources:
- resource: pods
matchScopes:
- scopeName: PriorityClass
operator: In
values: ["cluster-services"]
policy/priority-class-resourcequota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-cluster-services
spec:
scopeSelector:
matchExpressions:
- operator : In
scopeName: PriorityClass
values: ["cluster-services"]
resourcequota/pods-cluster-services created
What's next
See ResourceQuota design doc for more information.
See a detailed example for how to use resource quota.
Read Quota support for priority class design doc.
See LimitedResources
https://kubernetes.io/docs/concepts/policy/_print/ 18/21
6/6/23, 3:59 PM Policies | Kubernetes
Kubernetes allow you to limit the number of process IDs (PIDs) that a Pod can use. You can
also reserve a number of allocatable PIDs for each node for use by the operating system and
daemons (rather than by Pods).
Process IDs (PIDs) are a fundamental resource on nodes. It is trivial to hit the task limit
without hitting any other resource limits, which can then cause instability to a host machine.
Cluster administrators require mechanisms to ensure that Pods running in the cluster cannot
induce PID exhaustion that prevents host daemons (such as the kubelet or kube-proxy, and
potentially also the container runtime) from running. In addition, it is important to ensure that
PIDs are limited among Pods in order to ensure they have limited impact on other workloads
on the same node.
Note: On certain Linux installations, the operating system sets the PIDs limit to a low
default, such as 32768. Consider raising the value of /proc/sys/kernel/pid_max.
You can configure a kubelet to limit the number of PIDs a given Pod can consume. For
example, if your node's host OS is set to use a maximum of 262144 PIDs and expect to host
less than 250 Pods, one can give each Pod a budget of 1000 PIDs to prevent using up that
node's overall number of available PIDs. If the admin wants to overcommit PIDs similar to
CPU or memory, they may do so as well with some additional risks. Either way, a single Pod
will not be able to bring the whole machine down. This kind of resource limiting helps to
prevent simple fork bombs from affecting operation of an entire cluster.
Per-Pod PID limiting allows administrators to protect one Pod from another, but does not
ensure that all Pods scheduled onto that host are unable to impact the node overall. Per-Pod
limiting also does not protect the node agents themselves from PID exhaustion.
You can also reserve an amount of PIDs for node overhead, separate from the allocation to
Pods. This is similar to how you can reserve CPU, memory, or other resources for use by the
operating system and other facilities outside of Pods and their containers.
PID limiting is a an important sibling to compute resource requests and limits. However, you
specify it in a different way: rather than defining a Pod's resource limit in the .spec for a Pod,
you configure the limit as a setting on the kubelet. Pod-defined PID limits are not currently
supported.
Caution: This means that the limit that applies to a Pod may be different depending on
where the Pod is scheduled. To make things simple, it's easiest if all Nodes use the same
PID resource limits and reservations.
To configure the limit, you can specify the command line parameter --pod-max-pids to the
kubelet, or set PodPidsLimit in the kubelet configuration file.
PID limiting - per Pod and per Node sets the hard limit. Once the limit is hit, workload will start
experiencing failures when trying to get a new PID. It may or may not lead to rescheduling of a
Pod, depending on how workload reacts on these failures and how liveleness and readiness
probes are configured for the Pod. However, if limits were set correctly, you can guarantee
that other Pods workload and system processes will not run out of PIDs when one Pod is
misbehaving.
What's next
Refer to the PID Limiting enhancement document for more information.
For historical context, read Process ID Limiting for Stability Improvements in Kubernetes
1.14.
Read Managing Resources for Containers.
Learn how to Configure Out of Resource Handling.
https://kubernetes.io/docs/concepts/policy/_print/ 20/21
6/6/23, 3:59 PM Policies | Kubernetes
The main manager, the Topology Manager, is a Kubelet component that co-ordinates the
overall resource management process through its policy.
https://kubernetes.io/docs/concepts/policy/_print/ 21/21