Kasten EGI Lab-Insight
Kasten EGI Lab-Insight
May 2023
Overview
Developers are continuing to embrace Kubernetes-based container environments in order to streamline
and simplify application development and operation. For the agility that Kubernetes (K8s) brings to the
application development lifecycle, K8s can add complexity for IT Operations as applications move into
production. Most container applications require persistent data and protecting these enterprise critical
applications requires enterprise-grade data protection services. In addition to protecting production
applications, creating consistent copies of application data can also be an important capability for
developers to rollback, migrate or recover from cyber-attacks.
The developer-driven nature of K8s has led to an inclination towards open-source tools as a starting point
for functionality. The industry as a whole is evolving to using non-DIY approaches, though, to favor
simplicity and scale – for example, starting with a managed K8s distro such as EKS. Evaluator Group
anticipates that a similar trend will occur on the data protection side as the market matures. Regardless
of the source of data protection tools, it is critical to have the features and capabilities needed for creating
and managing K8s application data protection services at scale.
While the “do it yourself’ approach is appealing to developers and Dev/Ops teams, there are several
considerations, including the risk of human error and the extensive time and staff resources required to
develop and maintain customized data protection tools.
Typically, few companies can sustain the extensive and long-haul investment required to operate custom
tools at enterprise scale. According to Evaluator Group's research, limited IT staff is the leading challenge
customers are facing with their data protection implementations. Companies of all sizes require tools that
enable IT staff to manage scale efficiently, without trade-offs to meet their data protection needs cost
effectively.
With this context, Evaluator Group was commissioned to compare critical protection capabilities,
performance metrics, and features relating to usability of Kasten K10 in comparison to a commonly used
open-source data protection toolset for protecting Kubernetes applications.
Testing was accomplished by creating a multi-cluster OpenShift / Kubernetes environment as the platform
for comparing data protection in a realistic environment. Kasten K10 and Velero (with Restic) were
installed on both OpenShift clusters.
Scality ARTESCA was used as a repository for external data copies. ARTESCA is a lightweight, cloud-native
S3 object storage solution suited for containerized applications, with a highly integrated user interface for
management and performance monitoring. It supports S3 object locking for immutability, and variable
data protection policies as the system scales from one to six nodes.
With bad actors increasingly targeting the backup environment, backup copies and data recoverability has
never been more at risk. Commercial data protection software is designed to provide enhanced features
for creating and managing data copies and providing mechanisms for rapid recoverability. Examples
include creating and storing multiple data protection points for ransomware resiliency, ensuring the ability
to meet backup windows, while also providing multiple migration and recovery options to enable a
consistent application recovery to minimize downtime and data loss.
Kasten K10
Kasten K10 by Veeam is a data protection solution purpose-built for Kubernetes environments. It provides
backup and operational recovery, disaster recovery (DR), and application mobility. This includes an
architected framework to streamline the path to data consistency during backups. It also encompasses
built-in flexibility to adjust to heterogeneous target environments, including various Kubernetes
distributions, hybrid multi-cloud and on-and-off premises resources, during data recovery and migration
operations.
As tested by Evaluator Group, Kasten K10 includes documented Kubernetes-native APIs and policy-based
operations with automated discovery. The K10 solution protects entire application namespaces including
all microservices and other K8s artifacts required for recovery. This understanding of complex
interdependencies within applications allows Kasten to create consistent protection points ensuring
recoverability. Kasten K10’s biggest differentiator when operating at enterprise scale, compared to open-
source tools, is its ability to execute multi-cluster management via a web-based UI. Additionally, the
inclusion of cyber-resiliency capabilities including RBAC, encryption with KMS, and data immutability are
important for enterprise environments as well.
Evaluator Group tested protecting, migrating and recovering two container applications:
• A two-pod application, in the form of WordPress with stateful database
• A larger, three-pod app, each with three persistent volume claims (PVCs)
o The custom VDB application had 3 pods * 3 PVCs at 50 GB for a total of 450GB of data
An overview of the test environment is shown in Figure 2, with additional details in the appendix.
Infra
Velero Velero
Containers
Server
Hardware
vSAN
Storage
Figure 2: Test Environment – Kasten K10 vs. Open-source Tools (Source: Evaluator Group)
Testing primarily focused on multiple functional comparisons, including the ability to:
• Create multiple protection points, using snapshots and external copies for recovery
• Manage expiration and retention of data protection points
• Recover from a choice of retention copies
• Migrate / move the applications to another cluster
Additionally, relative performance of full and incremental backup as well as restore and migration for a
three-pod, nine-PVC application was tested with Kasten and the Velero / Restic open-source solution.
Performance
Backup and recovery performance has two key implications for IT Operations. The first, is the ability to
meet backup windows, and as a result ensure that required recovery points are available. The second is
the ability to minimize downtime for the business. In testing, Evaluator Group measured the time required
to collect and restore 450 GB of data (3 pods, 3 PVC @ 50 GB). Results are summarized in table 2.
Initial Backup:
23m 30s 46m 55s 2X faster
(3 pods, w/ 3 PVCs ea.)
Incremental Backup:
6m 19s 36m 16s 6X faster
(10% data change)
Table 2: Backup and Recovery Performance Comparison of Kasten and Velero (Source: Evaluator Group)
Evaluator Group Comments: Relative performance is the important metric given the
shared hardware infrastructure. Kasten’s 6X advantage in incremental backup
performance would become further magnified at enterprise scale. Additionally,
backup performance correlates directly to the ability to meet application availability
requirements. Not meeting backup window times can result in missing application
SLA’s or other requirements.
Deployment
The initial setup of a data protection solution carries implications for its overall ease of use, and it can help
to remove barriers to adoption for IT Operations teams that are facing the pinch of limited staffing
resources. Kasten has multiple documented options for deployment, helping Evaluator Group to perform
deployment in five minutes. Velero deployment took several days and multiple document sources.
Scality ARTESCA was used for the backup target. ARTESCA can scale from a single physical server or VM
to multiple petabytes of capacity, providing choice between redundancy, cost, and performance. It
supports multiple object protection methods, and it can be deployed in both private and public cloud
environments due to its lightweight, cloud-native architecture. In this instance, it served as a private cloud
S3 object storage solution with comprehensive API support, and other important capabilities including
secure multi-tenancy through an AWS-compatible IAM, and S3 object locking for immutability.
Protection Policies
Flexibility in applying protection policies is important when it comes to being able to meet a range of
required recovery points (that is, the amount of data loss that can be tolerated) and retention periods.
Furthermore, the ability to control when backups occur is useful for minimizing potential disruption to
business operations.
Velero allows for backup jobs and other operations to be created, managed, and tracked, but it lacks a
graphical UI (GUI). Administrators must work with the Kubernetes API using command line, “crontab”-
style entries, which increases the difficulty of setting and customizing protection policies for IT Operations.
Specifically, Evaluator Group encountered issues creating label or tag-based backups with Velero.
In contrast, the Kasten K10 UI enables highly customizable backup windows, retention definitions, and
more. Applications are automatically discovered, and policies can be applied to new namespaces based
on labels. If desired, administrators also have the flexibility to manually run or pause protection jobs via
API with YAML definitions.
Velero management is per cluster, which leads to low levels of efficiency. All tasks, including creating user
roles and protection policies, must be done per cluster causing management time to increase linearly in
tandem with the number of clusters.
Kasten K10 includes a Multi-Cluster Dashboard which allows for centralized protection policy creation,
backup target management, user creation and distribution via RBAC, across a network of distributed
clusters. This limits the added management burden for additional clusters.
Evaluator Group Comments: The Kasten Multi-Cluster Dashboard is one of the most
useful features and areas of competitive differentiation for Kasten K10, particularly
compared to open-source tools. Providing centralized policy-based management for
features including data protection, backup targets, RBAC roles along with alerting
and reporting are all critical to improving IT efficiency.
Velero only creates either snapshots or externally protected data copies. Kasten, on the other hand,
allows for the creation of snapshots, and optional external copies.
With K10, the ability to create and export a point in time snapshot is the basis for application consistency,
further expanded by Kasten's framework for interacting directly with applications to quiesce ancillary data
services and/or perform logical backups. This provides flexibility to users by not restricting the kinds of
data services they can adopt on Kubernetes based on whether or not they can be properly protected.
Additionally, Kasten includes application artifacts and persistent storage (PVCs) by default. In contrast,
Velero with Restic does not protect persistent storage volumes (PVCs) by default, thereby leading to the
possibility of not being able to recover application data when attempting to restore the application.
Kasten K10, on the other hand, allows multiple granular recovery options. Recoveries can occur from
either a local snapshot or an external copy, and specific objects (single artifacts, storage PVCs, etc.) can
be recovered. Additionally, K10 offers administrators policies for application migrations, called
“Transformation Sets,” which allow users to identify, add, remove, and modify the application’s
Kubernetes manifest data during the restore process (e.g., modifying the storage class of a Persistent
Volume Claim wen moving between clusters with different storage infrastructure, or updating network
settings to ensure immediate availability). This helps when exporting an application from one cluster to
another, or when cloning an application into a different namespace to refresh their dev/test environment.
Evaluator Group Comments: Kasten ability to monitor, alert and report on backup
jobs enables administrators to manage larger environments, by focusing attention
on potential issues or exceptions. Additionally, K8s data collection tool Prometheus
and customizable Grafana dashboards for visualizing can further enhance the
automation of data protection in enterprise operations at scale.
Final Thoughts
Kasten K10 has significant differentiation and meaningful advantages compared to open-source tools such
as the Velero / Restic solution tested. One of the most important for efficiently scaling operations is K10’s
Multi-Cluster UI for management, monitoring, and alerting, which can significantly lower OPEX compared
to open-source options.
Streamlining policy-based protection at enterprise scale is critical because IT Operations teams are
struggling with limited staff resources, and because Kubernetes environments are dynamic and
continuously changing with multiple complex components and interdependencies. This is complemented
by 24/7 support capabilities, which can greatly assist IT Operations teams in the event troubleshooting is
required.
From the standpoint of enterprise-grade data protection capabilities, K10’s use of creating snapshots for
consistency and then creating backup copies stored in an external repository is a critical best practice.
Additionally, Kasten K10 helps enable the use of multiple repositories, in comparison to the extreme
difficulties encountered when attempting to utilize additional repository locations.
The pre-configured Kanister Blueprints framework for data protection can simplify application-consistent
backups for IT Operations. A wide range of applications are supported, and the Blueprints are
customizable to cater to complex applications. This is important because backups are only as good as the
ability to recover them, and crash consistent, or live filesystem backups can cause certain databases to be
unrecoverable.
Kasten K10’s faster performance for full and incremental backups as well as restore operations increases
the ability to meet application availability SLAs, as well as RPOs and RTOs required by the business. In
other words, it has implications for cost, backup windows, and recovery times that can materially impact
business continuity. Also of note, the ability to granularly restore specific items can speed time-to-
recovery and help to reduce overall downtime, and moving across various clouds, distributions and
underlying infrastructure is enabled through Kasten's ability to modify application specifications and
components during restore operations.
Appendix
Test Environment Details
Testing was performed by Evaluator Group personnel in the Evaluator Group lab. The test environment
utilized the following hardware, software and other elements for the test environment.
Hardware Infrastructure
Figure 2 shown previously on page 5 depicts the test environment, consisting of:
● 3 Node Cluster - Intel Xeon Scalable systems, consisting of
o Each with 2x 6154 CPUs and 384 GiB DRAM
o 1 x 100 Gb Ethernet connectivity to each system
● Local NVMe drives for vSAN with 2 disk groups per node
o Each node used 2x Optane 400 GB, plus 6x Intel P4510 4 TB drives
● 100 Gb Ethernet network
o Mellanox SN2100 – 16 x 100 GbE
Software Infrastructure
● VMware vSphere (vCenter 7.0U3 with ESXi 7.0U3 on hosts) with vSAN storage
● 2 K8s clusters, each: Red Hat OpenShift 4.9 Kubernetes
o Each cluster had 3 master nodes and 3 worker nodes running as VMs
▪ Master nodes: Each w/ 4 vCPU, 16 GB RAM, 200 GB storage
▪ Worker nodes: Each w/ 22 vCPU, 58 GB RAM, 200 GB storage
Test Environment
OpenShift Container Platform
● The following container native applications were installed
o Wordpress w/ MariaDB, with 1 GB of persistent storage
o Custom App, w/ 9 PVCs and 450 GB of persistent storage
● Additionally, Kasten and Velero components as required
This document was developed with funding from Kasten by Veeam. Although the document may utilize publicly
available material from various vendors, including Kasten, Veeam, and others, it does not necessarily reflect such
vendors' positions on the issues addressed in this document.