Own Your AI - Tech Deck
Own Your AI - Tech Deck
Friendly reminder:
We have one evening and will start with only the ultra hot open-source topics.
Why all the hype?
Why all the hype?
Agenda
2017 – Present
“Attention is All You Need paper” in 2017
❑Type of Deep Neural Network
❑Leverages Attention/Self-Attention, including multi-head attention
❑Expressive: Feed-forward;
❑Optimizable: Backpropagation, Gradient Descent;
❑Efficient: High Parallelism compute graph
❑Examples: LLaMA-3, phi-3, GPT-4, Claude-3
❑Learn all from Karpathy https://www.youtube.com/watch?v=zjkBMFhNj_g
Brief New Age Tech Glossary
❑ Transformers: A type of general-purpose neural network architecture that facilitates the modeling of sequences without the
need for recurrent connections, prominently used in language processing tasks
❑ Foundational Model: A large-scale model that is trained on vast amounts of data and can be fine-tuned for a variety of
downstream tasks, serving as a base for further specialized models
❑ Large Language Model: A substantial neural network model trained on extensive textual data to understand and generate
human-like text across many languages and contexts. Small Language Model: A more compact version of a language model
designed for efficiency and lower resource consumption while performing natural language processing tasks
❑ Visual Language Model: A model that combines language and vision processing to understand and generate content related to
both text and images
❑ Multimodal Models: AI models that can process and understand information from different types of data, such as text, images,
and audio, simultaneously
❑ RWKV (RwaKuv): A variant of a recurrent neural network, which stands for "Reduced Weight KneeV", designed for efficiency and
performance in sequence modeling tasks
❑ Mamba/Jamba, Hawk/Griffin, DPO, DORPO, Flash Attention…
Where does open-source AI live
130K+
public data sets
1M+
daily downloads
Founded In
2016 700K+
daily visitors
170
Employees 30+
300K+ 600K+ Libraries
130K+
public data sets
1M+
daily downloads
Founded In
2016 700K+
daily visitors
170
Employees 30+
300K+ 600K+ Libraries
Model in
production NVIDIA Hugging Face Amazon Google
DGX Cloud on Azure SageMaker Cloud
Managed Open Datasets
Inference on
AWS, Azure and 130,000+ datasets
GCP on the hub
Cloud Platforms
Deploy
anywhere
No-code AutoML
600,000+ models
Transformers on the hub
Accelerate
Open Models
Diffusers
Open-Source Closed/Proprietary
Models cannot be self-hosted. Data is sent outside your
Models can be self-hosted, data stays in your environment
Security environment to vendor(s)
Updates and changes to performance are happening
The lifecycle is controlled by you
Control without notice
Lower latency due to on premise and smaller model sizes Often greater latency due to larger model sizes + API latency
Latency
No single approach is best. Each use case will vary. Proprietary is typically closer to the frontier of performance.
Quality
Examples
Energy/carbon footprint and LLMs
Start by test existing models on your domain and task(s) of interest
pretrain an LLM?
Energy budget will likely be dominated by Train-compute-optimal models Train smaller models for
inference costs. (Chinchilla law) are not longer if you plan to
Select a compute efficient model: efficient for inference deploy it at large scale
- smallest size
- quantized
- classification models > generative Train in a local cluster/provider with a good energy mix
50+
Other project Integrations
40+ Examples
Started In
2023 7.8K+ Forks
664
Contributors
55.1K+ 180+
stars on Github Active PRs
MLX (Apple owned Open Source)
20+
Other project Integrations
15+ Examples
Started In
2023 770+ Forks
95
Contributors
13.8K+ 10+
stars on Github Active PRs
One Click Tools
CoreNet: Train SLM on your Mac
3
Other project Integrations
3
Examples
84
Started In Forks
2024
2K+
Stars on Github
6+
Active PRs
Mergekit Evolve
10+
Other project Integrations
5
Examples
Started In
2023 272 Forks
16
Contributors
3.4K+
Stars on Github
10+
Active PRs
Open AI Platform
Architecture
Open AI Platform
Data Lakehouse
Delta Lake, Spark, Embeddings Model Embeddings DB IaC
Your Own Data Trino E5 Mistral Quant OpenTofu
DevSecOps
Model
Orchestration APIs/Plugins
Factory Playground Guardrails
Routing Open
Vault
Synthetic DSPy, distilabel DSPy OpenBao
DSPy CodeInterpreter
Data Pipeline
Query/ Local
In-mem, Databases Guardrails Service
Orchestration/Router
API Call DSPy, AICI
Memmap, pgvector DSPy
Base Inferencing Backend Function Calling Backends
Output/ Llama-3 70B Phi-3
Base Model/Function Lifecycle /Control Plane
API Call calling Agent LLMOps
Llama-3 8B/Phi-3 Rust Phoenix
Backend API Services CodeGen Backeng
App/API/Inference on CPU and GPU FastAPI Wavecoder Ultra
Local Platform on Computer/Edge GW/Phone Cache
Valkey Backend Inferencing and APIs on GPU, CPU, IPU, NPU
Open AI Project Team
Data Lakehouse
Delta Lake, Spark, Embeddings Model Embeddings DB IaC
Your Own Data Trino E5 Mistral Quant OpenTofu
DevSecOps
Model
Orchestration APIs/Plugins
Factory Playground Guardrails
Routing Open
Vault
Synthetic DSPy, distilabel DSPy OpenBao
DSPy CodeInterpreter
Data Pipeline
Query/ Local
In-mem, Databases Guardrails Service
Orchestration/Router
API Call DSPy, AICI
Memmap, pgvector DSPy
Base Inferencing Backend Function Calling Backends
Output/ Llama-3 70B Phi-3
Base Model/Function Lifecycle /Control Plane
API Call calling Agent LLMOps
Llama-3 8B/Phi-3 Rust Phoenix
Backend API Services ConfigGen Backend
App/API/Inference on CPU and GPU FastAPI llama3-70b-orpo-industrial
Local Platform on Computer/Edge GW/Phone Cache
Valkey Backend Inferencing and APIs on GPU, CPU, IPU, NPU
Production System with NVIDIA/Microsoft
Data Lakehouse Embeddings Model Embeddings DB
Your Own Data Microsoft Fabric NIM:embed-qa-4 Quant IaC
DevSecOps
Model
Factory Orchestration
Playground Guardrails APIs/Plugins Vault
Prompt DSPy NeMo Guardrails
Routing
Zapier Azure Key Vault
DSPy
Programming
Query/
API Call Hybrid Identity Service
Entra ID
Local
In-mem, Databases Guardrails Service
Orchestration/Router
Quant client NeMo Guardrails
AICI/DSPy
Base Inferencing Backend Documentation Backend
Output/ NIM: Llama-3 70B NIM: Llama-3 8B
Base Model/Function Lifecycle /Control Plane
API Call calling Agent LLMOps
NIM: Llama-3 8B Azure Arc W&B
Backend API Services Coding Backend
FastAPI NIM: Llama-3 70B
NIM Edge
Cache
Redis NIM Server Backend and APIs on CUDA GPU
From Proof of Concept
to Pilot
to Production
Lessons Learned
❑ Set expectations
❑ Minimize risks
❑ Always experiment and build with the North Star to take it to production
❑ Work 3x faster from product start to launch to happen in 6 months
Set Expectations
❑ If you want cool demos to show everyone externally that you’re ahead of the curve, just do it!
❑ If you want your team to experiment and build out AI muscles for production, just do it!
❑ If you want a product, build data, get compute and train talents to build it, and just do it!
There are a lot of things GenAI can do
DevSecOps
Model
Orchestration
Factory Playground Guardrails
Routing APIs/Plugins
Vault
Synthetic
Data Pipeline
Query/ Local
In-mem, Databases Guardrails Service
API Call Orchestration/Router
Base Inferencing Backend Function Calling Backend
Output/
Lifecycle /Control Plane
API Call Base Model/Function
Agent LLMOps
calling
Backend API Services
Coding Backend
App/API/Inference on CPU and GPU
Local Platform on Computer/Edge GW/Phone Cache
Backend Inferencing and APIs on GPU, CPU, IPU, NPU
Lessons Learned
Qwen-1.5-MoE-A2.7B
OLMo1.7-7B
Cloud Inferencing Race to the Bottom
❑ Training:
Data preparation
Efficient training techniques
Evaluation
❑ Fine-tuning:
RLHF, RLAIF
❑ Inference:
Quantization
Deployment
Training Your Own Model
https://arxiv.org/abs/2404.14619
Training Your Own Model
https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dataset/
Select Your Data
https://arxiv.org/abs/2402.16827
Data Preparation
Safety Filtering
Topic Filtering
Language
Filtering
Semantic Filtering
15 T
Tokens- Clean and Deduplicated
45 TB Dataset Size
Created In ODC
2024 Open Data Commons License
https://huggingface.co/datasets/HuggingFaceFW/fineweb
Generate Synthetic Datasets Locally
2022
16
Contributors
693
stars on Github
8
Active PRs
Create a synthetic dataset seed locally on your own AI
platform for aligning models to a specific domain example
Synthetic Dataset Example: Cosmopedia
30M
Synthetic Samples
8
Domain Splits
Generated In
2024
https://huggingface.co/datasets/HuggingFaceTB/cosmopedia
Coding Dataset Example: The Stack v2
67.5 TB
Full Dataset
32.1 TB
Deduplicated Dataset
Created In 658
2024 Programming Languages
https://huggingface.co/datasets/bigcode/the-stack-v2
Data Filtering
❑ Fuzzy
BLOOM Filters for hashing and fixed-size vector
MinHash for hashing and sorting
❑ Exact
Exact substrings with a suffix array
Sentence deduplication
❑ Over-deduplication may keep only the bad data
Prepare the Data for Pre-Training
❑Shuffle
❑Tokenizers
❑Tokenization Scaling
Data Quality Evaluation
❑ Data
Compute efficiency for gradient all-reduce, training efficiency of batch-size
❑ Tensor
Rewrite model code
Reduce sync points with combined column/row slicing
❑ Pipeline
Group sub-parts of the networks
Optimize GPUs utilization
❑ Sequence
❑Initialization
❑Stabilization
❑Learning Rate
❑Scaling hyper-parameters results
MiniCPM V2.0 https://huggingface.co/openbmb/MiniCPM-V-2
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Transfer https://arxiv.org/abs/2203.03466
Synthetic AI Recipe as an Emerging Trend
The Mystic WizardLM-2
❑ Data Pre-Processing
Weighted Sampling
Progressive Learning
❑ Evol-Instruct
❑ AI Aligns AI (AAA)
Co-Teaching
Self-Teaching
❑ Supervised Learning
Staged-DPO
RLEIF with IRM and PRM
DevSecOps
Model
Orchestration
Factory Playground Guardrails
Routing APIs/Plugins
Vault
Synthetic
Data Pipeline
Query/ Local
In-mem, Databases Guardrails Service
API Call Orchestration/Router
Base Inferencing Backend Function Calling Backend
Output/
Lifecycle /Control Plane
API Call Base Model/Function
Agent LLMOps
calling
Backend API Services
Coding Backend
App/API/Inference on CPU and GPU
Local Platform on Computer/Edge GW/Phone Cache
Backend Inferencing and APIs on GPU, CPU, IPU, NPU
Practical Use Cases
Practical Use Cases
1.Content Creation
2.Automation of Routine Tasks
3.Human-Computer Interface Personalization
4.Assisted Software Development
5.Design and Prototyping
6.Synthetic Data Generation
Thank You!
We Will Meet Again!
Backup slides