f5 Ai Reference Architecture
f5 Ai Reference Architecture
Questions? businessdevelopment@f5.com
AI / ML Reference
Architecture Overview
Mike Rau Alysia Groves
SVP, Enterprise Technical Strategy Sr. Business Manager, Business Development
Ian Lauth
Senior Manager, Product Marketing for AI
Generative AI
threatens to make New GPU
centric clouds
this scary complexity New foundational
model providers
even more acute
AWS SaaS
Edge
AI Apps
2 © 2024 F5
Are you building an AI Product or
delivering Operational Efficiency?
3 © 2024 F5
SaaS AI
The AI solution is provided as a fully managed service by a third-party provider. Customers
can access and use the AI capabilities over the internet without worrying about the underlying
infrastructure, maintenance, or updates, making it a convenient and scalable option.
Cloud-Hosted AI
The AI solution runs on cloud infrastructure provided by cloud service providers such as AWS,
Google Cloud, or Azure. It offers flexibility, scalability, and ease of integration with other
Four cloud services, while the customer maintains control over the configuration and
management of their AI systems.
Deployment Self-Hosted AI
Models The AI solution is deployed on the customer's own infrastructure, such as on-premises
servers or private data centers. This provides maximum control and customization options
but requires significant resources for setup, maintenance, and management of the
hardware and software components.
Edge-Hosted AI
The AI solution in an edge environment, outside traditional cloud or data center
infrastructure. An example is a machine learning solution operating on a device like a kiosk in
a retail storefront. This model reduces latency, enhances privacy, and ensures real-time
processing by bringing the computation closer to the data source or end-user.
4 © 2024 F5
OWASP LLM Top Ten
Educate developers, designers, architects, managers, and organizations about the potential
security risks when deploying and managing LLM and Generative AI applications.
Design Requirements
Define the essential capabilities, technologies, and principles needed to address technical
challenges and ensure effective solution implementation.
5 © 2024 F5
Web Apps & APIs
Seven AI
Building Blocks
RAG Corpus Fine-Tuning Training
Management
In this deck we will be showing two of the
Focus Area
seven building blocks.
For access to the full deck, please reach
out to your F5 account team or email
App Development
businessdevelopment@f5.com
Development
6 © 2024 F5
AI Component Architecture
DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES
Fine-Tuning Training
Data Data
Source/
Config Control
End Users
CI/CD
7 © 2024 F5
Seven AI Building Blocks
8 © 2024 F5
Inference
This building block involves the process of making predictions or generating outputs
based on input data using pre-trained AI models. It's the core function where the AI
system applies its learned knowledge to new, unseen data.
DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES
Fine-Tuning Training
Data Data
Source/
Config Control
End Users
CI/CD
Knowledge
Corpus Data
Databases Websites Queues Developer
9 © 2024 F5
Inference with Retrieval Augmented Generation (RAG)
RAG combines the capabilities of retrieval and generation models to produce more informed and
accurate responses. It retrieves relevant information from a predefined corpus and uses it to enhance
the generation process, resulting in more contextually appropriate outputs.
DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES
Fine-Tuning Training
Data Data
Source/
Config Control
End Users
CI/CD
Knowledge
Corpus Data
Databases Websites Queues Developer
10 © 2024 F5
RAG Corpus Management
This focuses on maintaining and curating the database or corpus of information that the AI system
uses for Retrieval-Augmented Generation. It includes updating, organizing, and ensuring the quality
of the data to support accurate and relevant retrieval.
DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES
Fine-Tuning Training
Data Data
Source/
Config Control
End Users
CI/CD
Knowledge
Corpus Data
Databases Websites Queues Developer
11 © 2024 F5
External Services Integration
This involves connecting the AI system with external services and APIs, enabling it to interact, retrieve data, or
perform actions based on user requests or model inference. It allows the AI to leverage external tools and
databases to extend its functionality and autonomously make decisions or take actions as necessary.
DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES
Fine-Tuning Training
Data Data
Source/
Config Control
End Users
CI/CD
Knowledge
Corpus Data
Databases Websites Queues Developer
12 © 2024 F5
Fine-Tuning
This process involves adjusting a pre-trained AI model on specific datasets to improve its
performance for a particular task or domain. Fine-tuning helps tailor the model's capabilities to
better meet the unique needs of specific applications or industries.
DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES
Fine-Tuning Training
Data Data
Source/
Config Control
End Users
CI/CD
Knowledge
Corpus Data
Databases Websites Queues Developer
13 © 2024 F5
Training
This is the process of teaching an AI model by exposing it to large amounts of data and allowing it
to learn patterns and features. Training involves multiple iterations and optimizations to develop
a model that can generalize well to new, unseen data.
DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES
Fine-Tuning Training
Data Data
Source/
Config Control
End Users
CI/CD
Knowledge
Corpus Data
Databases Websites Queues Developer
14 © 2024 F5
Development
This encompasses the overall creation, testing, and deployment of AI solutions.
It involves coding, integrating various AI components, and ensuring that the
system is robust, scalable, and ready for production use.
DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES
Fine-Tuning Training
Data Data
Source/
Config Control
End Users
CI/CD
Knowledge
Corpus Data
Databases Websites Queues Developer
15 © 2024 F5
Inference with Retrieval
Augmented-Generation (RAG)
16 © 2024 F5
IN FER ENCE W ITH RA G
Fine-Tuning Training
Data Data
Source/
Config Control
End Users
CI/CD
Knowledge
Corpus Data
Databases Websites Queues Developer
17 © 2024 F5
IN FER ENCE W ITH RA G
INFERENCE SERVICES
FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY
End Users
18 © 2024 F5
IN FER ENCE W ITH RA G
OWASP LLM Top Ten
LLM02
Prompt Injection
LLM09 Misinformation
INFERENCE SERVICES
LLM10 Unbounded consumption
FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY
End Users
LLM05
LLM02
LLM08
19 © 2024 F5
IN FER ENCE W ITH RA G
OWASP LLM Top Ten
LLM02
Prompt Injection
LLM09 Misinformation
INFERENCE SERVICES
LLM10 Unbounded consumption
FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY
4 5 4 5 4 5
7 6 7 F5 Application Delivery Top Ten
7 9 8 9
End Users ADC01 Weak DNS Practices
1
ADC02 Lack of Fault Tolerance & Resilience
LLM05 4 5
1 2
ADC03 Incomplete Observability
ADC02 LLM02 8 9
ADC04 Insufficient Traffic Controls
RETRIEVAL AUGMENTATION SERVICES
ADC05 Unoptimized Traffic Steering
RETRIEVAL ENGINE EMBEDDING LLM ADC06 Inability to Handle Latency
20 © 2024 F5
IN FER ENCE W ITH RA G
Design Requirements
1 Distributed Compute Services
2 AI Compute Resources
4 5 4 5 4 5
CLUSTER REPOSITORY 5 Centralized Security Policy Management
7 6 7
7 9 8 9
End Users 6 AI/ML Data Loss Prevention
1
4 5
1 2
7 AI/ML Security
8 9
21 © 2024 F5
IN FER ENCE W ITH RA G
SaaS Deployment
Site Mgmt
Global
AI Gateway
AI Gateway
INFERENCE SERVICES
FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY
4 5 4 5 4 5
7 6 7
7 9 8 9
End Users
1
4 5
1 2
App 8 9 Site
22 © 2024 F5
IN FER ENCE W ITH RA G
Cloud-Hosted Deployment
Site Mgmt
Global
AI Gateway
AI Gateway
INFERENCE SERVICES
FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY
4 5 4 5 4 5
7 6 7
7 9 8 9
End Users
1
4 5
1 2
App 8 9 Site
23 © 2024 F5
IN FER ENCE W ITH RA G
Self-hosted Deployment
Site
Global
INFERENCE SERVICES
FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY
4 5 4 5 4 5
7 6 7
7 9 8 9
End Users
1
4 5
1 2
8 9 Site
24 © 2024 F5
RAG Corpus
Management
25 © 2024 F5
RAG CO RP US MANAG EMENT
Fine-Tuning Training
Data Data
Source/
Config Control
End Users
CI/CD
Knowledge
Corpus Data
Databases Websites Queues Developer
26 © 2024 F5
RAG CO RP US MANAG EMENT
DOCUMENT
PRE-PROCESSING & EMBEDDING
RETRIEVAL ENGINE
EMBEDDING LLM
Object Vector
Storage DB
LLM02
Prompt Injection
LLM09 Misinformation
DOCUMENT
PRE-PROCESSING & EMBEDDING LLM10 Unbounded consumption
RETRIEVAL ENGINE
EMBEDDING LLM
Object Vector
Storage DB
LLM10
LLM02
Prompt Injection
LLM09 Misinformation
DOCUMENT
PRE-PROCESSING & EMBEDDING LLM10 Unbounded consumption
1
3 5 6
2
F5 Application Delivery Top Ten
3 RETRIEVAL ENGINE
4 ADC01 Weak DNS Practices
5 ADC02 Lack of Fault Tolerance & Resilience
EMBEDDING LLM
9 ADC03 Incomplete Observability
Design Requirements
1 Distributed Compute Services
2 AI Compute Resources
1
3 5 6 5 Centralized Security Policy Management
2
3 RETRIEVAL ENGINE
4 6 AI/ML Data Loss Prevention
5
EMBEDDING LLM
9 7 AI/ML Security
Object Vector
Storage DB
8 AI/ML Observability
9
Cloud Deployment
Site Site Mgmt
DOCUMENT
PRE-PROCESSING & EMBEDDING
1
3 5 6
2
3 RETRIEVAL ENGINE
4
5
EMBEDDING LLM
9
Object Vector
Storage DB
9
Self-Hosted Deployment
Site
DOCUMENT
PRE-PROCESSING & EMBEDDING
1
3 5 6
2
3 RETRIEVAL ENGINE
4
5
EMBEDDING LLM
9
Object Vector
Storage DB
9
Site
Site