KNN Regression
KNN Regression
KNN regression is a non-parametric method used for estimating the continuous value of a
data point based on the values of its k-nearest neighbors in the feature space. Unlike
parametric methods that assume a specific functional form for the relationship between
features and the target variable, KNN regression makes no such assumptions. It relies directly
on the proximity of data points.
• The first crucial step is to select the value of 'k', which represents the number of
nearest neighbors to consider for making a prediction.
• The choice of 'k' significantly impacts the model's performance.
o Small k (e.g., k=1): The prediction is highly influenced by the single closest
neighbor. This can lead to noisy predictions that are sensitive to local data
variations and outliers (high variance, low bias).
o Large k (e.g., k close to the number of data points): The prediction tends to be
smoother as it averages over a larger neighborhood. This can smooth out noise but
might also average out important local patterns, potentially leading to underfitting
(low variance, high bias).
• The optimal 'k' is usually found through experimentation and techniques like cross-
validation.
2. Calculate Distances:
• When you have a new data point for which you want to make a prediction, the KNN
algorithm calculates the distance between this new point and all the data points in
your training set.
• Common distance metrics used include:
• After calculating the distances to all training data points, the algorithm identifies the
'k' data points in the training set that have the smallest distances to the new data point.
These are the k-nearest neighbors.
• Once the k-nearest neighbors are identified, their corresponding target (dependent
variable) values are used to make a prediction for the new data point.
• The simple aggregation method for regression is:
o Simple Averaging: The predicted value is the average of the target values of
the k-nearest neighbors.
Problem: Predict the efficiency at 135°C using the K-Nearest Neighbors (KNN)
algorithm.
Data:
50 0.4
100 0.6
150 0.55
200 0.7
Steps:
1. Choose a value for K:
o K represents the number of nearest neighbors we'll consider. Let's start with K
= 2 for this example.
2. Calculate the distance between the query point (135°C) and each data
point:
o We'll use the Euclidean distance formula for this one-dimensional data:
▪ Distance = |x₁ - x₂|
Result:
• Using KNN with K = 2, the predicted efficiency at 135°C is 0.575.
• If we had chosen K = 1, the nearest neighbor would be 150°C, and the predicted
efficiency would be 0.55.
• If we had chosen K = 3, the three nearest neighbors would be 100°C, 150°C,
and 200°C. The predicted efficiency would be (0.6 + 0.55 + 0.7) / 3 = 0.6167.
References
• "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" by Aurélien
Géron.
• Gemini AI. (2025, April 10).