Documentation
Documentation
Réalisé par :
El Akramine Yassir Salim
April 2025
Contents
1 Introduction 2
1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 PC Requirements for Running the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 GPU Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Alternatives for Limited Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Technology Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 Mistral-7B-Instruct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.2 PyTorch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.3 Hugging Face Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.4 Gradio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.5 CUDA and GPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Environment Setup 4
2.1 Prepare WSL2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Create Python Virtual Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Fix Missing pip inside the Virtual Environment . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Install PyTorch with CUDA Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Install Required Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Model Setup 6
3.1 Option 1: Automatic Download from Hugging Face . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Option 2: Manual Download . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Troubleshooting 9
5.1 Out of Memory Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Slow Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.3 CUDA Not Available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6 Model Information 9
7 Results 10
1
1 Introduction
1.1 Summary
The Mistral-7B-Instruct model runs using PyTorch, a popular machine learning framework. PyTorch
handles the computations, especially for deep learning, by utilizing CUDA to accelerate processing on
the GPU. The model and tokenizer come from Hugging Face, which provides pre-trained models like
Mistral, making it easier to load and run without needing to train it from scratch. When you send input
to the chatbot, the tokenizer breaks it into tokens, the model processes it, and PyTorch computes the
response using your GPU, which stores and handles the model’s data in VRAM for faster performance.
Note: Although **8GB VRAM** is sufficient for running the model with **8-bit quantization**,
**16GB VRAM** is recommended for optimal performance, especially when dealing with larger batch
sizes or extended sequence lengths. For smaller GPUs, such as those with **4GB VRAM**, the model
may struggle to run without significantly reducing performance or precision.
• Quantization: Using **4-bit or 8-bit quantization** significantly reduces the memory footprint
of the model, enabling it to run on GPUs with lower VRAM (e.g., 4GB or 6GB).
• Cloud Deployment: For users with limited local resources, **cloud platforms** such as **Google
Colab**, **AWS EC2**, or **Azure** provide access to GPUs with higher VRAM and computa-
tional power. You can deploy the model in the cloud and access it remotely for inference.
• Model Optimization: Techniques such as **model parallelism** or **offloading** parts of the
model to the CPU can help distribute the computation load, though they may slow down processing.
With these alternatives, you can still run and experiment with the Mistral-7B model even on hardware
with more limited resources.
2
1.3.1 Mistral-7B-Instruct
Mistral-7B-Instruct was chosen because it is a fine-tuned version of the Mistral-7B base model, optimized
for instruction-based tasks. It is specifically designed to understand and generate responses in natural
language, making it ideal for building a conversational AI. The model offers:
• High Performance: Mistral-7B-Instruct is a large-scale model that can process complex inputs
with high accuracy.
• Instruct Tuning: This version of Mistral has been fine-tuned for better understanding of instruc-
tions, making it more effective in handling a wide range of user queries.
• State-of-the-Art Tokenizer: It uses the latest tokenizer, ensuring efficient text processing and
understanding.
1.3.2 PyTorch
PyTorch was selected as the deep learning framework because of its flexibility, efficiency, and ease of use.
Key reasons for its selection include:
• Dynamic Computation Graphs: PyTorch allows for dynamic computation graphs, which makes
it easier to modify and debug models during training and inference.
• CUDA Support: PyTorch provides seamless integration with CUDA for GPU acceleration, mak-
ing it ideal for running large models like Mistral-7B on GPUs.
• Community and Support: PyTorch has an active community and extensive documentation,
making it easier to find resources and solutions to problems.
1.3.4 Gradio
Gradio was used to create the user interface for the chatbot. The reasons for choosing Gradio include:
• Ease of Use: Gradio makes it simple to create interactive web applications with minimal code,
allowing for rapid prototyping and deployment.
• Integration with ML Models: Gradio integrates seamlessly with models built using PyTorch,
making it ideal for displaying the results of the Mistral-7B-Instruct model.
• Real-time Interaction: It enables real-time interactions with the model, offering users an intu-
itive chat interface.
3
2 Environment Setup
2.1 Prepare WSL2 Environment
# In Windows PowerShell ( as Administrator )
wsl -- install # If you haven ’ t already installed WSL
# Upgrade pip
python -m pip install -- upgrade pip
Problem encountered: When trying to create the virtual environment using Python 3.12, the following
error appeared:
salim@MSI :~ $ python3 -m venv ~/ mistral - tutor - env
The virtual environment was not created successfully because ensurepip is not
available . On Debian / Ubuntu systems , you need to install the python3 - venv
package using the following command .
You may need to use sudo with that command . After installing the python3 - venv
package , recreate your virtual environment .
Explanation: This error happens because Python 3.12 is installed by default on Ubuntu 24.04, but it
is not fully compatible yet with many AI libraries like transformers, bitsandbytes, etc.
Correction: To avoid future compatibility problems, we decided to install Python 3.11 manually.
# Install dependencies
sudo apt update
sudo apt install software - properties - common
After installing Python 3.11, we create and activate the virtual environment again (notice that the
terminal prompt changes, confirming activation):
# Create and activate the virtual environment
python3 .11 -m venv ~/ mistral - tutor - env
source ~/ mistral - tutor - env / bin / activate
4
Explanation: This happens because in Ubuntu 24.04 with Python 3.11, the virtual environment does
not automatically include pip. We need to manually install pip inside the environment using the official
get-pip.py script.
Correction: First, ensure that python3.11-distutils is installed, then manually install pip:
# Ensure distutils is installed
sudo apt install python3 .11 - distutils
This confirms that pip is now correctly associated with Python 3.11 inside the virtual environment.
This command automatically installs PyTorch 2.7.0 built for CUDA 12.6, which is fully compatible with
our RTX GPU and driver version 566.14.
Verification: After installation, we verify that PyTorch correctly detects the GPU by running the
following Python code:
import torch
Expected Output:
CUDA available : True
PyTorch version : 2.7.0+ cu126
GPU : Your GPU
This confirms that PyTorch is correctly installed and the GPU is available for model training and
inference.
# Install protobuf and sentencepiece library , which is required for the tokenizer
pip install protobuf sentencepiece
Explanation:
5
• transformers: for loading large language models (LLMs).
• accelerate: for optimized device placement and model execution.
• huggingface_hub: to connect and download models.
• einops, safetensors: helper libraries for tensor operations and secure weight loading.
• bitsandbytes: to enable 8-bit or 4-bit model quantization, making it feasible to run large models
on limited VRAM.
• gradio: to build an easy-to-use web interface for the chatbot.
All packages were successfully installed without errors inside the virtual environment (mistral-tutor-env).
3 Model Setup
3.1 Option 1: Automatic Download from Hugging Face
The code will automatically download the model on first run using the command python chatbot_app.py:
import gradio as gr
import torch
from transformers import AutoTokenizer , AutoModelForCausalLM , B i t s A n d B y t e s C o n f i g
# Configuration
MODEL_NAME = " mistralai / Mistral -7 B - Instruct - v0 .3 " # Latest Mistral 7 B Instruct model
MAX_ NEW_TOKE NS = 512 # Maximum number of tokens to generate
M A X _ H I S T O R Y _ L E N G T H = 5 # Maximum number of conversation turns to keep
return prompt
def g e n e r a te _ r e s p o n s e ( prompt ) :
6
""" Generate a response from the model given a prompt . """
try :
inputs = tokenizer ( prompt , return_t ensors = " pt " ) . to ( model . device )
return response
except Exception as e :
return f " An error occurred : { str ( e ) } "
return response
gr . Markdown ( """
## About this Tutor
7
Note : While the AI tries to provide accurate information , always verify important
facts .
""" )
8
5 Troubleshooting
5.1 Out of Memory Errors
• Reduce MAX_NEW_TOKENS in the code
• Reduce MAX_NEW_TOKENS
• Use do_sample=False
6 Model Information
Model: Mistral 7B Instruct v0.3
Size: 7 billion parameters (quantized to 4GB (VRAM) in 4-bit mode)
9
7 Results
10
8 Conclusion and Future Applications
We have successfully run the Mistral-7B-Instruct model locally, which is a significant achievement and
an important step towards building more sophisticated AI applications in the future. This success-
ful implementation opens up many opportunities for further development and integration into various
domains.
Some potential future applications include:
• API Integration: We can link the model to an API, turning it into a service that can be used by
various applications. This would allow other systems or businesses to easily integrate the model
for natural language processing tasks, such as customer support, content generation, and more.
• Cloud Deployment: The model can be deployed on the cloud, making it accessible to users
through a web interface. This would allow businesses to offer scalable AI-powered services, acces-
sible from anywhere, at any time.
• Subject-Specific Fine-Tuning: The model can be fine-tuned to become hyper-specific to partic-
ular domains or subjects, such as medical, legal, or technical fields. This would allow the chatbot
to provide more specialized and accurate responses tailored to specific industries.
• Chatbot for Businesses: The model can be integrated into customer service solutions, providing
businesses with automated, real-time support for their customers. It could assist with answering
frequently asked questions, troubleshooting common problems, and more.
• Educational Applications: The model can be used as an educational assistant, helping students
learn new concepts, offering explanations, and providing personalized tutoring in various subjects.
11