0% found this document useful (0 votes)
5 views75 pages

Pa Unit-Iii

The document covers various machine learning techniques for predictive modeling, including decision trees, logistic regression, neural networks, and k-nearest neighbors (kNN). It explains the structure and functioning of decision trees, their advantages, and the algorithms used for building them, along with an overview of logistic regression and neural networks. Additionally, it discusses model assessment techniques like Cp, AIC, and BIC for evaluating regression models.

Uploaded by

kayatarun2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views75 pages

Pa Unit-Iii

The document covers various machine learning techniques for predictive modeling, including decision trees, logistic regression, neural networks, and k-nearest neighbors (kNN). It explains the structure and functioning of decision trees, their advantages, and the algorithms used for building them, along with an overview of logistic regression and neural networks. Additionally, it discusses model assessment techniques like Cp, AIC, and BIC for evaluating regression models.

Uploaded by

kayatarun2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

UNIT-III

Machine Learning for prediction


Predictive modeling – decision trees, logistic regression, neural network,
kNN, Bayesian method
Regression model :
Assessing Predictive models - Batch Approach to Model Assessment,
Percent Correct Classification, Rank-Ordered Approach to Model
Assessment, Assessing Regression Models
Decision Tree
• Decision Tree is a Supervised learning
technique that can be used for both classification
and Regression problems, but mostly it is
preferred for solving Classification problems. It
is a tree-structured classifier, where internal
nodes represent the features of a dataset,
branches represent the decision rules and each
leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are
the Decision Node and Leaf Node. Decision nodes
are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of
those decisions and do not contain any further
branches.
• The decisions or the test are performed on the
basis of features of the given dataset.
• It is a graphical representation for getting all
the possible solutions to a problem/decision based
on given conditions.
.
• It is called a decision tree because, similar
to a tree, it starts with the root node, which
expands on further branches and constructs a
tree-like structure.
• In order to build a tree, we use the CART
algorithm, which stands for Classification and
Regression Tree algorithm.
• A decision tree simply asks a question, and
based on the answer (Yes/No), it further split
the tree into subtrees.
• Below diagram explains the general structure of
a decision tree:
• Note: A decision tree can contain categorical data (YES/NO) as
well as numeric data.
.
.• Why use Decision Trees?
• There are various algorithms in Machine learning,
so choosing the best algorithm for the given
dataset and problem is the main point to remember
while creating a machine learning model. Below are
the two reasons for using the Decision tree:
• Decision Trees usually mimic human thinking
ability while making a decision, so it is easy to
understand.
• The logic behind the decision tree can be easily
understood because it shows a tree-like structure.
• Decision Tree Terminologies:

Root Node: Root node is from where the decision
tree starts. It represents the entire dataset,
which further gets divided into two or more
homogeneous sets.
.
• Leaf Node: Leaf nodes are the final output
node, and the tree cannot be segregated further
after getting a leaf node.
• Splitting: Splitting is the process of dividing
the decision node/root node into sub-nodes
according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the
tree.
• Pruning: Pruning is the process of removing the
unwanted branches from the tree.
• Parent/Child node: The root node of the tree is
called the parent node, and other nodes are
called the child nodes.
.• How does the Decision Tree algorithm Work?
Step-1: Begin the tree with the root node, says
S, which contains the complete dataset.
• Step-2: Find the best attribute in the dataset
using Attribute Selection Measure (ASM).
• Step-3: Divide the S into subsets that contains
possible values for the best attributes.
• Step-4: Generate the decision tree node, which
contains the best attribute.
• Step-5: Recursively make new decision trees
using the subsets of the dataset created in
step -3. Continue this process until a stage is
reached where you cannot further classify the
nodes and called the final node as a leaf node
.
• Example: Suppose there is a candidate who has a job offer and wants to decide whether he
should accept the offer or Not. So, to solve this problem, the decision tree starts with the
root node (Salary attribute by ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the corresponding labels. The next
decision node further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:
.• Attribute Selection Measures
• While implementing a Decision tree, the main issue arises
that how to select the best attribute for the root node and
for sub-nodes. So, to solve such problems there is a
technique which is called as Attribute selection measure or
ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree. There are two popular
techniques for ASM, which are:
• Information Gain, gini index
• 1. Information Gain:
• Information gain is the measurement of changes in entropy
after the segmentation of a dataset based on an attribute.
• It calculates how much information a feature provides us
about a class.
• According to the value of information gain, we split the
node and build the decision tree.
• A decision tree algorithm always tries to maximize the value
of information gain, and a node/attribute having the highest
information gain is split first. It can be calculated using
the below formula:
• Information Gain= Entropy(S)-
[(Weighted Avg) *Entropy(each feature)
.• Entropy:
Entropy is a metric to measure the
impurity in a given attribute. It specifies
randomness in data. Entropy can be calculated
as:
.
LOGISTIC REGRESSION
• Logistic regression is a statistical method used for binary
classification, which means it is used to predict the probability that an
input belongs to one of two possible classes (usually denoted as 0
and 1).
• Despite its name, logistic regression is a classification algorithm, not
a regression algorithm. It is widely used in various fields, including
machine learning, medical research, economics, and social sciences.
• Logistic regression is a simple yet powerful algorithm for binary
classification tasks, and its interpretability and efficiency make it a
popular choice in various applications. It's important to note that
logistic regression can be extended to handle multiclass classification
problems using techniques like one-vs-all (OvA) or softmax
regression.
.
.
--
•.
.
------
.
.
Neural networks
• ➢ Neural networks, inspired by the structure and functioning of the
human brain, are a class of machine learning models that excel at
learning complex patterns and representations from data.
• ➢ They consist of interconnected nodes, known as neurons,
organized into layers.
• ➢ Neural networks are a fundamental component of deep learning,
a subfield of machine learning characterized by the use of deep
architectures with multiple layers.
.• Key Concepts:
• Neurons: Neurons are the basic building blocks of a neural network.
Each neuron processes input data and produces an output.
• Layers: Neural networks are organized into layers, including an input
layer, one or more hidden layers, and an output layer. The input layer
receives data, hidden layers process it, and the output layer produces
the final result.
• Weights and Biases: Weights and biases are parameters that the
neural network learns during training. They determine the strength of
connections between neurons and affect the output.
• Activation Functions: Activation functions introduce non-linearities to
the model, allowing it to learn complex relationships in the data.
Common activation functions include sigmoid, tanh, and rectified
linear unit (ReLU).
.
• Feedforward and Backpropagation: In the training phase, data is
passed through the network in a feedforward manner to make
predictions. The backpropagation algorithm is then used to adjust
weights and biases based on the error, optimizing the model.
• Loss Function: The loss function measures the difference between
the predicted output and the actual target. During training, the goal is
to minimize this loss, guiding the network to make more accurate
predictions.
• Deep Learning: Deep neural networks have multiple hidden layers,
enabling them to learn hierarchical representations of data. This
depth allows them to handle intricate features and patterns.
• Types of Neural Networks: Different architectures cater to various
tasks, including: Convolutional Neural Networks (CNNs) for image-
related tasks. Recurrent Neural Networks (RNNs) for sequential data.
Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU)
networks for improved handling of long range dependencies.
. pplications:
A
•Image and Speech Recognition: Neural networks have achieved
significant success in tasks such as image classification, object
detection, and speech recognition.
•Natural Language Processing (NLP): They are widely used in language-
related tasks, including sentiment analysis, machine translation, and
text generation.
•Medical Diagnostics: Neural networks contribute to medical image
analysis, disease diagnosis, and predicting patient outcomes.
•Autonomous Vehicles: In the field of autonomous driving, neural
networks play a crucial role in tasks like object detection, path
planning, and decision-making.
•Financial Forecasting: Neural networks are applied to predict stock
prices, analyze market trends, and model financial data.
.• Example: Virtual Personal Assistant's Speech Recognition
• Imagine using a virtual personal assistant, such as Apple's Siri, Amazon's Alexa, or
Google Assistant. These virtual assistants employ neural networks, particularly
recurrent neural networks (RNNs) and convolutional neural networks (CNNs), for
speech recognition.
• 1.Data Input: You activate the virtual assistant by saying a command, such as
"Hey Siri" or "Alexa."
• 2.Neural Network Processing: The neural network within the virtual assistant's
system processes the incoming audio data in real-time. The network has been
trained on vast datasets containing various spoken phrases and words.
• 3.Feature Extraction: The neural network extracts relevant features from the
audio data, capturing the nuances of your voice, accent, and speech patterns.
Recurrent neural networks are particularly effective in handling sequential data
like spoken language.
• 4.Pattern Recognition: The neural network recognizes patterns and converts the
audio input into a sequence of words or commands. This involves complex
computations to understand context, syntax, and semantics
.• 5.Command Execution: Based on the recognized speech, the virtual
assistant executes the corresponding command. For example, if you
say, "What's the weather today?" the neural network interprets the
query and triggers the appropriate response by fetching real-time
weather information.
• 6.Continuous Learning: Neural networks in virtual assistants are
often designed for continuous learning. As users interact more, the
neural network adapts to individual speech patterns, accents, and
preferences, enhancing its performance over time.
• This example illustrates how neural networks in speech recognition
applications have become integral to our daily lives. The technology
enables natural and seamless interactions with devices, showcasing
the power of neural networks in understanding and processing
complex patterns in real-world scenarios
What is Artificial Neural Network?
.
.• Relationship between Biological neural network and artificial neural
network:
• Biological Neural Network
• Dendrites
• Cell nucleus
• Synapse
• Axon
• Artificial Neural Network
• Inputs
• Nodes
• Weights
• Output
.
•. Input Layer: As the name suggests, it accepts inputs in several
different formats provided by the programmer.
• Hidden Layer: The hidden layer presents in-between input and
output layers. It performs all the calculations to find hidden features
and patterns.
• Output Layer: The input goes through a series of transformations
using the hidden layer, which finally results in output that is conveyed
using this layer. The artificial neural network takes input and
computes the weighted sum of the inputs and includes a bias. This
computation is represented in the form of a transfer function.
K-nearest Neighbour
•K-Nearest Neighbor is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
•K-NN algorithm assumes the similarity between the new case/data
and available cases and put the new case into the category that is most
similar to the available categories.
•K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then
it can be easily classified into a well suite category by using K- NN
algorithm.
•K-NN algorithm can be used for Regression as well as for Classification
but mostly it is used for the Classification problems.
.• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
▪ It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
▪ KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much similar
to the new data.
Example: ▪ Suppose, we have an image of a creature that looks similar to cat
and dog, but we want to know either it is a cat or dog.
▪ So for this identification, we can use the KNN algorithm, as it works on a
similarity measure.
▪ Our KNN model will find the similar features of the new data set to the cats
and dogs images and based on the most similar features it will put it in either
cat or dog category.
.
.• Why do we need a K-NN Algorithm?
• Suppose there are two categories, i.e., Category A and Category B,
and we have a new data point x1, so this data point will lie in which of
these categories
• . To solve this type of problem, we need a K-NN algorithm. With the
help of K-NN, we can easily identify the category or class of a
particular dataset.
• Consider the below diagram:
.
Cp (Mallow's Cp):
• ● Purpose: Cp was developed by Colin Mallows to assess the quality of
linear regression models. It is primarily used in the context of regression
analysis.
• ● Calculation: Cp measures the trade-off between model fit and model
complexity. It is calculated as follows: Cp = (SSE_p / MSE) - (n - 2p) Where:
● SSE_p: The sum of squared errors for the model with p predictor
variables.
• ● MSE: The mean squared error for the full model with all predictor
variables.
• ● n: The number of data points.
• ● p: The number of predictor variables in the model.
• ● Interpretation: A smaller Cp value indicates a better trade-off between
model fit and complexity. Cp is used to assess whether a model with a
subset of predictor variables is competitive with the full model while
penalizing for model complex
AIC (Akaike Information Criterion):
• ● Purpose: AIC is a general-purpose model selection criterion used in
various statistical models, including linear regression, time series
analysis, and more.
• ● Calculation: AIC is based on the likelihood function of the model
and is calculated as follows: AIC = -2 * log(Likelihood) + 2 * k Where:
• ● Likelihood: The likelihood of the model given the data.
• ● k: The number of estimated parameters in the model.
• ● Interpretation: AIC balances the fit of the model and its complexity.
A lower AIC value indicates a better model. It effectively penalizes
models with more parameters, encouraging simplicity.
BIC (Bayesian Information Criterion):
• ● Purpose: BIC is similar to AIC but tends to penalize model complexity more
heavily. It is also used for model selection in various statistical contexts.
• ● Calculation: BIC is calculated as follows: BIC = -2 * log(Likelihood) + k * log(n)
Where:
• Likelihood: The likelihood of the model given the data.
• ● k: The number of estimated parameters in the model.
• ● n: The number of data points.
• ● Interpretation: BIC favors simpler models more strongly than AIC. A lower BIC
value indicates a better model. Compared to AIC, BIC is more conservative when
it comes to model selection, often resulting in a more parsimonious choice.
• The choice between Cp, AIC, and BIC depends on the specific context and goals of
your analysis. Cp is suitable for linear regression, while AIC and BIC are versatile
and widely used in various statistical modeling scenarios. BIC is the most
conservative in terms of model selection and favors simpler models the most,
while AIC strikes a balance between model fit and complexity.
.
• Bayesian Estimation of the Parameters of a Function We now discuss
the case where we estimate the parameters, not of a distribution, but
some function of the input, for regression or classification. Again, our
approach is to consider these parameters as random variables with a
prior distribution and use Bayes’ rule to calculate a posterior
distribution.
• We can then either evaluate the full integral, approximate it, or use
the MAP estimate
.
Assessing Predictive Models
• Assessing predictive models is a critical step in the data science and machine
learning workflow. The goal is to understand how well a model performs and
whether it can generalize to unseen data. Here’s an overview of the process and
key metrics to assess predictive models:
1. Split the Data
• Training Set: Used to train the model.
• Validation Set: Used to fine-tune hyperparameters and avoid overfitting.
• Test Set: Used to evaluate the final performance of the model.
A common practice is to split the data into 70-80% for training and the remaining
20-30% for testing. In more advanced setups, cross-validation (e.g., k-fold cross-
validation) is used for more robust evaluation.
• 2. Key Performance Metrics
• The choice of evaluation metric depends on the type of model (classification,
regression, etc.) and the specific problem at hand.
• Assessing predictive models involves evaluating how well a model performs on
both seen (training) data and unseen (testing or validation) data. This process
helps in identifying the model’s generalizability, predictive power, and any
potential areas of improvement. Here’s a structured approach to assessing
predictive models:
.
Batch Approach to Model Assessment
• The first approach to assessing model accuracy is a batch approach,
which means that all the records in the test or validation data are
used to compute the accuracy without regard to the order of
predictions in the data.
• A second approach based on a rank-ordered sorting of the predictions
will be considered next.
• Throughout this chapter, the target variable for binary classification
results will be shown as having 1s and 0s, although any two values
can be used without loss of generality, such as “Y” and “N,” or “good”
and “bad.”
Percent Correct Classification
• Percent correct classification is a basic metric used to evaluate the
performance of a classification model.
• It measures the proportion of correct predictions made by the model
out of the total predictions.
• However, while it provides a quick idea of a model's accuracy, it
might not always be sufficient on its own, especially in certain
scenarios like imbalanced datasets.
• Percent correct classification refers to the percentage of instances
(data points) that the model correctly classifies. It is often used to
measure the effectiveness of classification algorithms. The higher the
percentage, the better the model is at making correct predictions.
.
.. Example Calculation
Suppose you have a model that classifies 100 data points into two classes (Class A and
Class B). Out of these 100 data points, the model correctly classifies 85 of them.
To calculate percent correct:
• Percent Correct=(85/100)×100=85%
So, the model has 85% accuracy in its predictions.
Confusion Matrix and Percent Correct
To better understand percent correct, it's helpful to look at the confusion matrix. The
confusion matrix summarizes the performance of a classification algorithm and includes
the following terms:
• True Positive (TP): The number of instances correctly predicted as the positive class.
• True Negative (TN): The number of instances correctly predicted as the negative class.
• False Positive (FP): The number of instances incorrectly predicted as the positive class.
• False Negative (FN): The number of instances incorrectly predicted as the negative class.
.
Limitations of Percent Correct
While percent correct is easy to understand, it has several limitations:
• Imbalanced Datasets: In cases where there is a significant imbalance
between the number of instances in different classes, percent correct
may give a false impression of model performance. For example, in a
dataset with 95% instances of Class A and 5% instances of Class B, a
model that always predicts Class A could achieve 95% accuracy but
would perform poorly on Class B.
• For example, consider:
• Class A: 95 samples
• Class B: 5 samples
• If the model predicts every sample as Class A, it will be correct 95% of
the time, but it will have missed every instance of Class B, leading to
poor performance on the minority class.
.When to Use Percent Correct
Percent correct can be useful when:
• The dataset is balanced (i.e., each class has a similar number of instances).
• A quick, overall evaluation of model performance is needed.
• The costs of misclassification are relatively uniform across classes.
• However, if the data is imbalanced or if specific class performance is
important, other metrics like precision, recall, or the F1-score should be
considered in addition to percent correct.
Summary:
• Percent correct classification is a simple and easy-to-understand metric
that gives the percentage of correct predictions made by a model. It’s
calculated as the ratio of correct predictions to the total number of
predictions. While useful for quick assessments, it has limitations in cases
of imbalanced datasets or when different types of misclassification carry
different costs. More comprehensive metrics like precision, recall, and F1-
score are often used alongside percent correct for a deeper evaluation of a
model's performance
.
.
.
.
•. ROC Curve (Receiver Operating Characteristic Curve) and AUC (Area
Under the Curve) are used to evaluate the performance of binary
classification models. These metrics are particularly useful when
dealing with imbalanced datasets or when you want to assess the
model across different thresholds.
. ROC Curve (Receiver Operating Characteristic Curve)
The ROC Curve is a graphical representation that shows the
performance of a classification model at different classification
thresholds. The curve plots two metrics:
• True Positive Rate (TPR) or Sensitivity (y-axis)
• False Positive Rate (FPR) (x-axis)
.
.AUC (Area Under the Curve)
The AUC (Area Under the Curve) is a scalar value that summarizes the
overall performance of the model, based on the ROC curve. It
represents the area under the ROC curve.
What does AUC represent?
• AUC = 0.5: The model performs no better than random guessing. The
ROC curve would be a diagonal line from (0,0) to (1,1).
• AUC = 1: The model perfectly classifies all the positive and negative
instances, with no errors (perfect separation).
• 0.5 < AUC < 1: The model performs better than random guessing,
with higher values indicating better performance.
• AUC < 0.5: The model is performing worse than random guessing,
which may indicate that it has learned the wrong decision boundary
(or it's a bad model
Rank-Ordered Approach to Model assessment
• In contrast to batch approaches to computing model accuracy, rank-
ordered metrics begin by sorting the numeric output of the predictive
model, either the probability or confidence of a classification model
or the actual predicted output of a regression model.
• The rank-ordered predictions are binned into segments and summary
statistics related to the accuracy of the model are computed either
individually for each segment, or cumulatively as you traverse the
sorted file list.
• The three most common rank-ordered error metrics are gains charts,
lift charts, and ROI charts. In each of these charts, the x-axis is the
percent depth of the rank-ordered list of probabilities, and the y-axis
is the gain, lift, or ROI produced by the model at that depth.
.• In the context of rank-ordered error metrics for evaluating machine
learning models, particularly in classification tasks, Gains Charts, Lift
Charts, and Return on Investment (ROI) Charts are often used to
assess model performance, especially when dealing with imbalanced
datasets. These charts are used primarily in the fields of marketing,
finance, and customer analytics to evaluate how well a model is
identifying the most relevant or profitable segments of data.
1. Gains Chart
• A Gains Chart is used to measure the effectiveness of a classification
model in identifying the target class (e.g., "churned customers",
"purchased product"). It compares the cumulative percentage of true
positive cases (or correct predictions) identified by the model at
various thresholds to a random classifier (which would select cases
randomly).
. ow a Gains Chart Works:
H
• X-Axis: Represents the percentage of the total data, ranked from the most
likely to the least likely to belong to the target class. This is typically
referred to as the decile or percentile.
• Y-Axis: Represents the cumulative percentage of actual positives or true
positive cases (e.g., customers who actually churned
• The chart plots two curves:
• Model curve: Cumulative number of true positives identified by the model at
various cutoffs (percentiles).
• Random curve: This represents a random classifier's performance, where the
percentage of true positives found is directly proportional to the percentage of data
sampled.
• Interpretation:
• A perfect model will have a steep initial slope, meaning it quickly identifies
the majority of the positive class with the smallest percentage of samples.
It should outperform the random classifier.
• The larger the area between the model curve and the random curve, the
better the model is at identifying the positive class efficiently.
.
.
.
Lift Chart
A Lift Chart is similar to the Gains Chart but focuses on how much
better the model is at identifying the target class over random
guessing, in a more quantifiable way.
How a Lift Chart Works:
• X-Axis: Again, represents the percentage of the population (samples)
ranked by their likelihood of being part of the target class (e.g.,
likelihood of churn).
• Y-Axis: Represents the lift—which is the ratio of the model’s
performance (true positive rate) to that of random guessing.
.
• Lift > 1 indicates that the model is effectively identifying more
positive cases than random guessing.
• Lift = 1 indicates that the model's performance is similar to random
guessing
• Lift charts can provide an easily interpretable view of model
performance, particularly in marketing or customer analytics, where
identifying the right segment of customers (e.g., the top 10% who are
most likely to churn) is crucial.
Example:
• In a direct mail campaign, a Lift Chart would show how much more
effective a model is at identifying customers who will respond
positively to an offer compared to a random selection.
.
.
.
.
Return on Investment (ROI) Chart
• A Return on Investment (ROI) Chart measures the financial return or
benefit from a model’s predictions. The ROI typically focuses on how
much profit or value the model generates relative to the costs of
applying the model, especially in business settings like marketing,
finance, or sales.
How a ROI Chart Works:
• X-Axis: Represents the percentage of the population (samples) ranked
by their likelihood of being in the target class.
• Y-Axis: Represents the ROI—the cumulative financial gain or profit
achieved by applying the model’s predictions.
• The ROI is calculated by comparing the cost of targeting certain
customers or segments with the financial return from successfully
identifying those customers.
True Positives (TP): Customers who were predicted correctly to take an action (e.g.,
purchase, churn).
Cost per Action (C): The cost associated with targeting or acting on that prediction (e.g.,
cost of sending a promotional email).
.
Profit per Action (P): The profit made from a correct prediction (e.g., profit from
customers who purchase after receiving a promotion).
The ROI chart shows how much profit or benefit you can expect at each decile of the
population based on model predictions.
Example:
For a marketing campaign aimed at promoting a new product, the ROI chart could show
how much more profit could be made by focusing on the top decile (top 10% of
customers most likely to buy) compared to random targeting.
.
Assessing Regression Models
.
.• regression models are batch methods computed for the data partition
you are using to assess the models (training, testing, validation, and
so on). The most commonly used metric is the coefficient of
determination known as R2, pronounced “r squared.” R2 measures
the percentage of the variance of the target variable that is explained
by the models.
.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy