0% found this document useful (0 votes)
3 views15 pages

STWP Presentation

This research explores optimizing AI workload scaling in cloud environments using a managed Kubernetes approach, specifically Amazon EKS, to address challenges like resource inefficiency and scaling latency. It proposes a framework for dynamically orchestrating AI workloads, aiming to improve resource efficiency and cost while maintaining performance. The study provides practical guidelines for cloud architects and AI engineers to implement resilient infrastructure for AI applications.

Uploaded by

netkerishi2764
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views15 pages

STWP Presentation

This research explores optimizing AI workload scaling in cloud environments using a managed Kubernetes approach, specifically Amazon EKS, to address challenges like resource inefficiency and scaling latency. It proposes a framework for dynamically orchestrating AI workloads, aiming to improve resource efficiency and cost while maintaining performance. The study provides practical guidelines for cloud architects and AI engineers to implement resilient infrastructure for AI applications.

Uploaded by

netkerishi2764
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Scaling AI Workloads in Cloud

Environments: A Managed Kubernetes


Approach
Authors: Vaishnavi Pangare , Akanksha Singh, Hritika Pawar
Abstract:
This research investigates optimizing AI workload scaling in cloud environments through
a virtual Kubernetes approach, specifically utilizing Amazon Elastic Kubernetes Service
(EKS). As AI applications become increasingly resource-intensive, traditional scaling
methods fail to efficiently balance performance, cost, and availability. This study proposes
a novel framework for dynamically orchestrating AI workloads across virtual Kubernetes
clusters, addressing challenges related to resource utilization, latency, and fault tolerance.
Using a mixed-methods approach combining experimental benchmarking and case studies
across multiple AI workload types, the research aims to demonstrate significant
improvements in resource efficiency and cost reduction while maintaining or enhancing AI
model performance. The findings will provide cloud architects and AI engineers with
practical guidelines for implementing resilient, cost-effective infrastructure for modern AI
applications
Problem Statement :-
• Organizations deploying AI workloads in cloud environments face several critical
challenges:
• Resource Inefficiency: Standard Kubernetes deployments often lead to underutilized
compute resources, especially GPUs and specialized AI accelerators.
• Scaling Latency: Traditional auto-scaling mechanisms are not responsive enough for the
bursty nature of AI workload demands.
• Cost Management: Organizations struggle to balance performance requirements with
budget constraints.
• Workload Heterogeneity: Different AI applications have varying resource profiles and
scaling requirements.
• Multi-tenant Optimization: Efficiently sharing infrastructure across multiple AI
applications or teams.
Research Objectives:
• Design a reference architecture for managed Kubernetes clusters optimized for AI
workloads using Amazon EKS.
• Develop and implement predictive scaling algorithms that minimize latency for
changing AI workload demands.
• Create a classification framework for AI workloads that guides optimal infrastructure
provisioning.
• Establish benchmarking methodologies to evaluate the performance and cost-
efficiency of the proposed solution.
• Formulate best practices and implementation guidelines for organizations deploying
AI workloads on EKS
Significance of the Study :

This research addresses a critical gap in cloud computing and AI infrastructure, with
significant benefits for:
• Academic Community
• Industry Practitioners
• Technology Ecosystem
Gap Analysis :
Despite these advancements, several gaps remain in the literature:
1. Limited integration between workload characterization and automated scaling mech
anisms
2. Insufficient attention to heterogeneous AI workloads sharing infrastructure
3. Lack of comprehensive frameworks that balance performance, cost, and resource
utilization
4. Minimal research on the application of virtual Kubernetes clusters for AI workloads
5. Few empirical studies comparing different configuration approaches for EKS in AI
contexts
Methodology:
What Is Local Kubernetes?
Definition: A self-contained Kubernetes deployment running on local infrastructure
Key Characteristics:
- Runs on physical servers or VMs within organizational boundaries
- Complete control over hardware, networking, and storage resources
- Full Kubernetes functionality in an on-premises environment
- Direct management of the entire Kubernetes stack

Components:-
- Control plane (API server, scheduler, controller manager, etcd)
- Worker nodes running containerized applications
- Local persistent storage solutions
- Physical networking infrastructure
- Local load balancers and ingress controllers

• We can locally use Kubernetes via Minikube, Kind, k3s ,Kubeadm, Openshift
Fig. Local Kubernetes For AI Workload Deployment
What Is Virtual Kubernetes?
Definition: A Kubernetes service operated and maintained by a third-party cloud provider

Key Characteristics:
•Kubernetes control plane managed by the provider
•Automated deployment, scaling, and updates
•Built-in monitoring and security features
•Pay-as-you-go pricing model
•Reduced operational overhead

Leading Providers:
•Amazon EKS (Elastic Kubernetes Service)
•Google GKE (Google Kubernetes Engine)
•Microsoft AKS (Azure Kubernetes Service)
•IBM Cloud Kubernetes Service
•Digital Ocean Kubernetes
Fig . Architecture Diagram of Deploying AI workloads on Managed Kubernetes
Proposed Framework :
1. Workload Classification System
2. Managed Cluster Architecture
3. Predictive Scaling Engine
4. Resource Optimization Strategies
5. Implementation on Amazon EKS
Expected Outcomes & Impact
Technical Outcomes:
- Optimized architecture framework for AI workloads on managed K8s
- Performance benchmarks comparing managed vs. local deployments
- Provider comparison (AWS EKS, GCP GKE, Azure AKS)
Practical Applications:
- Best practices for AI workload deployment and scaling
- Cost-optimization strategies for GPU/TPU resources
- MLOps workflow integration patterns
Measurable Impacts:
- Training time reduction metrics for distributed workloads
- Inference latency improvements at scale
- Operational cost analysis and ROI calculations
Delivered Artifacts:
- Reference implementation templates
- Deployment automation scripts
- AI-specific monitoring configurations
REFERENCES:
[1] Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2019). Borg, Omega, and Kubernetes:
Lessons learned from three container-management systems over a decade. Queue, 14(1), 70-93.
[2] Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., ... & Gonzalez, J. (2020). TVM: An
automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX
Symposium on Operating Systems Design and Implementation.
[3] Gupta, A., & Singh, R. (2023). Inference workload patterns in production AI sys tems: A study of resource
utilization and latency requirements. Journal of Machine Learning Operations, 5(2), 128-145
[4] Li, K., Zhou, M., Wu, X., & Yu, H. (2022). Performance analysis of managed Ku bernetes services for AI
workloads: A comparative study. In Proceedings of the In ternational Conference on Cloud Computing.
[5] Martinez, J., Patel, K., & Rodriguez, L. (2024). Deployment strategies for large language models on
Kubernetes: Challenges and solutions. arXiv preprint arXiv:2401.12345.
Thank You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy