0% found this document useful (0 votes)
113 views33 pages

f5 Ai Reference Architecture

The document outlines the architecture and deployment models for AI and ML applications, emphasizing the importance of generative AI and its multi-modal, decomposed nature. It discusses various deployment options including SaaS, cloud-hosted, self-hosted, and edge-hosted solutions, along with key considerations for building AI products. Additionally, it highlights security risks associated with LLM and generative AI applications and provides insights into design requirements and AI building blocks.

Uploaded by

bullmohitbull
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views33 pages

f5 Ai Reference Architecture

The document outlines the architecture and deployment models for AI and ML applications, emphasizing the importance of generative AI and its multi-modal, decomposed nature. It discusses various deployment options including SaaS, cloud-hosted, self-hosted, and edge-hosted solutions, along with key considerations for building AI products. Additionally, it highlights security risks associated with LLM and generative AI applications and provides insights into design requirements and AI building blocks.

Uploaded by

bullmohitbull
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

PREVIEW DECK – More to come AppWorld 2025

Questions? businessdevelopment@f5.com

AI / ML Reference
Architecture Overview
Mike Rau Alysia Groves
SVP, Enterprise Technical Strategy Sr. Business Manager, Business Development

Mark J Menger Eric Ji


Solution Architect, Business Development Senior Solution Architect, Business Development

Paul Pindell Gregory Coward


Principal Solution Architect, Business Development Senior Solution Architect, Business Development

Ian Lauth
Senior Manager, Product Marketing for AI
Generative AI
threatens to make New GPU
centric clouds
this scary complexity New foundational
model providers
even more acute
AWS SaaS

Generative AI app experiences


1 will be multi-modal

Generative AI apps will be Azure Colocation


2 highly decomposed

“Data gravity” will significantly influence


3 placement of apps and models
Google Data Centers
Cloud Traditional &
Generative AI apps will be
4 especially dependent on APIs
Private Cloud

Edge

AI Apps
2 © 2024 F5
Are you building an AI Product or
delivering Operational Efficiency?

What are your


objectives? Do you want to build, buy, or
out-source the solution?

How mature is your AI practice?


Are you exploring, integrating,
or transforming?

3 © 2024 F5
SaaS AI
The AI solution is provided as a fully managed service by a third-party provider. Customers
can access and use the AI capabilities over the internet without worrying about the underlying
infrastructure, maintenance, or updates, making it a convenient and scalable option.

Cloud-Hosted AI
The AI solution runs on cloud infrastructure provided by cloud service providers such as AWS,
Google Cloud, or Azure. It offers flexibility, scalability, and ease of integration with other

Four cloud services, while the customer maintains control over the configuration and
management of their AI systems.

Deployment Self-Hosted AI
Models The AI solution is deployed on the customer's own infrastructure, such as on-premises
servers or private data centers. This provides maximum control and customization options
but requires significant resources for setup, maintenance, and management of the
hardware and software components.

Edge-Hosted AI
The AI solution in an edge environment, outside traditional cloud or data center
infrastructure. An example is a machine learning solution operating on a device like a kiosk in
a retail storefront. This model reduces latency, enhances privacy, and ensures real-time
processing by bringing the computation closer to the data source or end-user.

4 © 2024 F5
OWASP LLM Top Ten
Educate developers, designers, architects, managers, and organizations about the potential
security risks when deploying and managing LLM and Generative AI applications.

AI Ecosystem F5 Application Delivery Top Ten


Considerations The top unforeseen challenges that arise in today’s hybrid multicloud application delivery
model cause by too many point solutions, a lack of interoperability, multiple management
consoles and manual complexity.

Design Requirements
Define the essential capabilities, technologies, and principles needed to address technical
challenges and ensure effective solution implementation.

5 © 2024 F5
Web Apps & APIs

Inference Retrieval-Augmented Agentic External


Generation Services Integration
Focus Area

Hybrid Multicloud & Data Ingest

Seven AI
Building Blocks
RAG Corpus Fine-Tuning Training
Management
In this deck we will be showing two of the
Focus Area
seven building blocks.
For access to the full deck, please reach
out to your F5 account team or email
App Development
businessdevelopment@f5.com

Development

6 © 2024 F5
AI Component Architecture

DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES

Fine-Tuning Training
Data Data

Source/
Config Control

FRONT-END LLM INFERENCE PLUGINS,


APPLICATIONS ORCHESTRATION SERVICES DATA CONNECTORS
IDE

End Users
CI/CD

RETRIEVAL AUGMENTATION SERVICES DOWNSTREAM SERVICES

Primary Data Path


Knowledge
Secondary Data Path Corpus Data
Development Path Databases Websites Queues Developer

7 © 2024 F5
Seven AI Building Blocks

8 © 2024 F5
Inference
This building block involves the process of making predictions or generating outputs
based on input data using pre-trained AI models. It's the core function where the AI
system applies its learned knowledge to new, unseen data.

DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES

Fine-Tuning Training
Data Data

Source/
Config Control

FRONT-END LLM INFERENCE PLUGINS,


APPLICATIONS ORCHESTRATION SERVICES DATA CONNECTORS
IDE

End Users
CI/CD

RETRIEVAL AUGMENTATION SERVICES DOWNSTREAM SERVICES

Knowledge
Corpus Data
Databases Websites Queues Developer

9 © 2024 F5
Inference with Retrieval Augmented Generation (RAG)
RAG combines the capabilities of retrieval and generation models to produce more informed and
accurate responses. It retrieves relevant information from a predefined corpus and uses it to enhance
the generation process, resulting in more contextually appropriate outputs.

DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES

Fine-Tuning Training
Data Data

Source/
Config Control

FRONT-END LLM INFERENCE PLUGINS,


APPLICATIONS ORCHESTRATION SERVICES DATA CONNECTORS
IDE

End Users
CI/CD

RETRIEVAL AUGMENTATION SERVICES DOWNSTREAM SERVICES

Knowledge
Corpus Data
Databases Websites Queues Developer

10 © 2024 F5
RAG Corpus Management
This focuses on maintaining and curating the database or corpus of information that the AI system
uses for Retrieval-Augmented Generation. It includes updating, organizing, and ensuring the quality
of the data to support accurate and relevant retrieval.

DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES

Fine-Tuning Training
Data Data

Source/
Config Control

FRONT-END LLM INFERENCE PLUGINS,


APPLICATIONS ORCHESTRATION SERVICES DATA CONNECTORS
IDE

End Users
CI/CD

RETRIEVAL AUGMENTATION SERVICES DOWNSTREAM SERVICES

Knowledge
Corpus Data
Databases Websites Queues Developer

11 © 2024 F5
External Services Integration
This involves connecting the AI system with external services and APIs, enabling it to interact, retrieve data, or
perform actions based on user requests or model inference. It allows the AI to leverage external tools and
databases to extend its functionality and autonomously make decisions or take actions as necessary.

DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES

Fine-Tuning Training
Data Data

Source/
Config Control

FRONT-END LLM INFERENCE PLUGINS,


APPLICATIONS ORCHESTRATION SERVICES DATA CONNECTORS
IDE

End Users
CI/CD

RETRIEVAL AUGMENTATION SERVICES DOWNSTREAM SERVICES

Knowledge
Corpus Data
Databases Websites Queues Developer

12 © 2024 F5
Fine-Tuning
This process involves adjusting a pre-trained AI model on specific datasets to improve its
performance for a particular task or domain. Fine-tuning helps tailor the model's capabilities to
better meet the unique needs of specific applications or industries.

DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES

Fine-Tuning Training
Data Data

Source/
Config Control

FRONT-END LLM INFERENCE PLUGINS,


APPLICATIONS ORCHESTRATION SERVICES DATA CONNECTORS
IDE

End Users
CI/CD

RETRIEVAL AUGMENTATION SERVICES DOWNSTREAM SERVICES

Knowledge
Corpus Data
Databases Websites Queues Developer

13 © 2024 F5
Training
This is the process of teaching an AI model by exposing it to large amounts of data and allowing it
to learn patterns and features. Training involves multiple iterations and optimizations to develop
a model that can generalize well to new, unseen data.

DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES

Fine-Tuning Training
Data Data

Source/
Config Control

FRONT-END LLM INFERENCE PLUGINS,


APPLICATIONS ORCHESTRATION SERVICES DATA CONNECTORS
IDE

End Users
CI/CD

RETRIEVAL AUGMENTATION SERVICES DOWNSTREAM SERVICES

Knowledge
Corpus Data
Databases Websites Queues Developer

14 © 2024 F5
Development
This encompasses the overall creation, testing, and deployment of AI solutions.
It involves coding, integrating various AI components, and ensuring that the
system is robust, scalable, and ready for production use.

DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES

Fine-Tuning Training
Data Data

Source/
Config Control

FRONT-END LLM INFERENCE PLUGINS,


APPLICATIONS ORCHESTRATION SERVICES DATA CONNECTORS
IDE

End Users
CI/CD

RETRIEVAL AUGMENTATION SERVICES DOWNSTREAM SERVICES

Knowledge
Corpus Data
Databases Websites Queues Developer

15 © 2024 F5
Inference with Retrieval
Augmented-Generation (RAG)

16 © 2024 F5
IN FER ENCE W ITH RA G

Featured AI Building Block


DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES

Fine-Tuning Training
Data Data

Source/
Config Control

FRONT-END LLM INFERENCE PLUGINS,


APPLICATIONS ORCHESTRATION SERVICES DATA CONNECTORS
IDE

End Users
CI/CD

RETRIEVAL AUGMENTATION SERVICES DOWNSTREAM SERVICES

Knowledge
Corpus Data
Databases Websites Queues Developer

17 © 2024 F5
IN FER ENCE W ITH RA G

Detailed Component Architecture

INFERENCE SERVICES

FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY

End Users

RETRIEVAL AUGMENTATION SERVICES

RETRIEVAL ENGINE EMBEDDING LLM

Vector DB Object Storage

18 © 2024 F5
IN FER ENCE W ITH RA G
OWASP LLM Top Ten

OWASP LLM Top Ten Insights LLM01

LLM02
Prompt Injection

Sensitive Information Disclosure

LLM03 Supply Chain

LLM04 Data and Model Poisoning


LLM09 LLM10
LLM05 Improper Output Handling
LLM05 LLM07
LLM06 Excessive Agency
LLM02 LLM05
LLM07 System Prompt Leakage
LLM02 LLM01 LLM02
LLM08 Vector and Embedding Weakness

LLM09 Misinformation
INFERENCE SERVICES
LLM10 Unbounded consumption
FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY

End Users

LLM05

LLM02

RETRIEVAL AUGMENTATION SERVICES

RETRIEVAL ENGINE EMBEDDING LLM

LLM08

Vector DB Object Storage

19 © 2024 F5
IN FER ENCE W ITH RA G
OWASP LLM Top Ten

F5 ADC Top Ten Insights LLM01

LLM02
Prompt Injection

Sensitive Information Disclosure

LLM03 Supply Chain


ADC10
LLM04 Data and Model Poisoning
ADC07 ADC05 LLM09 ADC10 LLM10
LLM05 Improper Output Handling
ADC05 LLM05 ADC04 LLM07 ADC05
LLM06 Excessive Agency
ADC02 LLM02 ADC03 LLM05 ADC03
LLM07 System Prompt Leakage
LLM02 LLM01 ADC02 LLM02 ADC02
LLM08 Vector and Embedding Weakness

LLM09 Misinformation
INFERENCE SERVICES
LLM10 Unbounded consumption
FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY
4 5 4 5 4 5
7 6 7 F5 Application Delivery Top Ten
7 9 8 9
End Users ADC01 Weak DNS Practices
1
ADC02 Lack of Fault Tolerance & Resilience
LLM05 4 5
1 2
ADC03 Incomplete Observability
ADC02 LLM02 8 9
ADC04 Insufficient Traffic Controls
RETRIEVAL AUGMENTATION SERVICES
ADC05 Unoptimized Traffic Steering
RETRIEVAL ENGINE EMBEDDING LLM ADC06 Inability to Handle Latency

ADC07 Incompatible Delivery Policies


LLM08 ADC08 Lack of Security & Regulatory Compliance

Vector DB Object Storage ADC09 Bespoke Application Requirements

ADC10 Poor Resource Utilization


1

20 © 2024 F5
IN FER ENCE W ITH RA G

Design Requirements
1 Distributed Compute Services

2 AI Compute Resources

3 Centralized Networking Management

INFERENCE SERVICES 4 Distributed App & API Security Services


FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL

4 5 4 5 4 5
CLUSTER REPOSITORY 5 Centralized Security Policy Management
7 6 7
7 9 8 9
End Users 6 AI/ML Data Loss Prevention
1
4 5
1 2
7 AI/ML Security
8 9

RETRIEVAL AUGMENTATION SERVICES


8 AI/ML Observability
RETRIEVAL ENGINE EMBEDDING LLM

9 Inter-Cluster Traffic Management

Vector DB Object Storage

21 © 2024 F5
IN FER ENCE W ITH RA G

SaaS Deployment
Site Mgmt

Global

AI Gateway

AI Gateway

INFERENCE SERVICES

FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY
4 5 4 5 4 5
7 6 7
7 9 8 9
End Users
1
4 5
1 2
App 8 9 Site

RETRIEVAL AUGMENTATION SERVICES

RETRIEVAL ENGINE EMBEDDING LLM


AI Gateway

Vector DB Object Storage

22 © 2024 F5
IN FER ENCE W ITH RA G

Cloud-Hosted Deployment
Site Mgmt

Global

AI Gateway

AI Gateway

INFERENCE SERVICES

FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY
4 5 4 5 4 5
7 6 7
7 9 8 9
End Users
1
4 5
1 2
App 8 9 Site

RETRIEVAL AUGMENTATION SERVICES

RETRIEVAL ENGINE EMBEDDING LLM


AI Gateway

Vector DB Object Storage

23 © 2024 F5
IN FER ENCE W ITH RA G

Self-hosted Deployment
Site

Global

INFERENCE SERVICES

FRONT-END LLM
APPLICATIONS ORCHESTRATION INFERENCE MODEL
CLUSTER REPOSITORY
4 5 4 5 4 5
7 6 7
7 9 8 9
End Users
1
4 5
1 2
8 9 Site

RETRIEVAL AUGMENTATION SERVICES

RETRIEVAL ENGINE EMBEDDING LLM

Vector DB Object Storage

24 © 2024 F5
RAG Corpus
Management

25 © 2024 F5
RAG CO RP US MANAG EMENT

Featured AI Building Block


DEVELOPMENT
FINE-TUNING SERVICES TRAINING SERVICES
SERVICES

Fine-Tuning Training
Data Data

Source/
Config Control

FRONT-END LLM INFERENCE PLUGINS,


APPLICATIONS ORCHESTRATION SERVICES DATA CONNECTORS
IDE

End Users
CI/CD

RETRIEVAL AUGMENTATION SERVICES DOWNSTREAM SERVICES

Knowledge
Corpus Data
Databases Websites Queues Developer

26 © 2024 F5
RAG CO RP US MANAG EMENT

Detailed Component Architecture

RETRIEVAL AUGMENTATION SERVICES ENTERPRISE DATA STORES

DOCUMENT
PRE-PROCESSING & EMBEDDING

RETRIEVAL ENGINE

EMBEDDING LLM

Object Vector
Storage DB

27 © 2024 F5 External Data


RAG CO RP US MANAG EMENT
OWASP LLM Top Ten

OWASP LLM Top Ten Insights LLM01

LLM02
Prompt Injection

Sensitive Information Disclosure

LLM03 Supply Chain


LLM06
LLM04 Data and Model Poisoning
LLM05
LLM05 Improper Output Handling
LLM02 LLM03
LLM06 Excessive Agency

LLM07 System Prompt Leakage


RETRIEVAL AUGMENTATION SERVICES ENTERPRISE DATA STORES LLM08 Vector and Embedding Weakness

LLM09 Misinformation
DOCUMENT
PRE-PROCESSING & EMBEDDING LLM10 Unbounded consumption

RETRIEVAL ENGINE

EMBEDDING LLM

Object Vector
Storage DB

LLM04 LLM06 LLM03

LLM10

28 © 2024 F5 External Data


RAG CO RP US MANAG EMENT
OWASP LLM Top Ten

F5 ADC Top Ten Insights LLM01

LLM02
Prompt Injection

Sensitive Information Disclosure

LLM03 Supply Chain


ADN07 LLM06
LLM04 Data and Model Poisoning
ADN01 ADN06 LLM05
LLM05 Improper Output Handling
LLM02 ADN02 LLM03
LLM06 Excessive Agency

LLM07 System Prompt Leakage


RETRIEVAL AUGMENTATION SERVICES ENTERPRISE DATA STORES LLM08 Vector and Embedding Weakness

LLM09 Misinformation
DOCUMENT
PRE-PROCESSING & EMBEDDING LLM10 Unbounded consumption
1
3 5 6
2
F5 Application Delivery Top Ten
3 RETRIEVAL ENGINE
4 ADC01 Weak DNS Practices
5 ADC02 Lack of Fault Tolerance & Resilience
EMBEDDING LLM
9 ADC03 Incomplete Observability

Object Vector ADC04 Insufficient Traffic Controls


Storage DB
ADC05 Unoptimized Traffic Steering
9
ADC06 Inability to Handle Latency

ADC07 Incompatible Delivery Policies


LLM04 LLM06 LLM03
ADC08 Lack of Security & Regulatory Compliance
ADN04 LLM10 ADN03 ADC09 Bespoke Application Requirements

ADN09 ADN02 ADN08 ADC10 Poor Resource Utilization

29 © 2024 F5 External Data


RAG CO RP US MANAG EMENT

Design Requirements
1 Distributed Compute Services

2 AI Compute Resources

3 Centralized Networking Management


RETRIEVAL AUGMENTATION SERVICES ENTERPRISE DATA STORES

DOCUMENT 4 Distributed App & API Security Services


PRE-PROCESSING & EMBEDDING

1
3 5 6 5 Centralized Security Policy Management
2

3 RETRIEVAL ENGINE
4 6 AI/ML Data Loss Prevention
5
EMBEDDING LLM
9 7 AI/ML Security

Object Vector
Storage DB
8 AI/ML Observability
9

9 Inter-Cluster Traffic Management

30 © 2024 F5 External Data


RAG CO RP US MANAG EMENT

Cloud Deployment
Site Site Mgmt

RETRIEVAL AUGMENTATION SERVICES ENTERPRISE DATA STORES Site

DOCUMENT
PRE-PROCESSING & EMBEDDING

1
3 5 6
2

3 RETRIEVAL ENGINE
4
5
EMBEDDING LLM
9

Object Vector
Storage DB
9

Global Mgmt Global


Site

31 © 2024 F5 External Data


RAG CO RP US MANAG EMENT

Self-Hosted Deployment

Site

RETRIEVAL AUGMENTATION SERVICES ENTERPRISE DATA STORES Site

DOCUMENT
PRE-PROCESSING & EMBEDDING

1
3 5 6
2

3 RETRIEVAL ENGINE
4
5
EMBEDDING LLM
9

Object Vector
Storage DB
9

Site
Site

32 © 2024 F5 External Data

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy