AI LOGISTIC REGRESSION
AI LOGISTIC REGRESSION
SHAP
Additivity: The sum of all SHAP values (feature contributions) equals the model’s prediction.
SHAP systematically evaluates the impact of a feature by observing how the prediction changes
when that feature is included or excluded across all possible combinations of features.
SHAP helps explain black-box models like Random Forest, Gradient Boosting, or Neural
Networks,
Logistic Regression is a linear model often used for binary or multi-class classification tasks. It
works by estimating the probability of a sample belonging to a class using a logistic function.
Accuracy: The overall accuracy is 55.69%, indicating that the model correctly predicts the wine
quality in about 56% of cases.
Precision and Recall: With a precision of 52.67% and recall of 55.69%, the model performs
moderately well in identifying true positives for certain wine quality classes, specifically in the
mid-range quality categories (5 and 6).
F1-Score: At 52.34%, the F1-score highlights a balance between precision and recall, but the
moderate value reflects challenges in handling the complexity of the dataset.
AUC (Area Under the Curve): The AUC is 0.9135, which is surprisingly high. This suggests the
model can distinguish between quality classes well across different thresholds, despite its lower
accuracy.
● X-axis (Features or Predictions): The graph likely represents the model’s predictions
or the relationship between specific features (like alcohol or density) and their
contribution to classifying wine quality.
● Y-axis (Probabilities or Coefficients): This axis represents either the probability
distribution for predicted classes or the impact of feature coefficients on predictions.
● Feature Distribution: You can observe that the Logistic Regression model primarily
focuses on mid-range quality classes, such as 5 and 6. These are the most prevalent in
the dataset, which explains the model’s tendency to perform better for these categories.
Logistic Regression’s linear decision boundaries limit its ability to capture complex, non-linear
relationships in the data. This is evident in its struggles to predict outlier classes, such as very
low or very high wine quality.
Alcohol: Likely has a strong positive correlation with higher wine quality, as supported by SHAP
values on Slide 17.
Density and Volatile Acidity: Show negative contributions, meaning higher values for these
features might predict lower wine quality.
Strengths: Logistic Regression offers clear insights into the most critical factors affecting wine
quality, which is valuable for winemakers or researchers aiming to understand these
relationships.
Limitations: The linear approach limits its predictive power for complex datasets with intricate
feature interactions, as seen in its lower accuracy compared to models like Random Forest.
Conclusion
“In summary, the Logistic Regression model provides a straightforward and interpretable
approach to predicting wine quality. While it lacks the complexity needed to achieve high
accuracy, its clarity makes it an excellent baseline model. By analyzing its coefficients and
graphs, we gain valuable insights into how features like alcohol and acidity influence
predictions, guiding both winemaking and further model development.”
The graph illustrates the feature importance in the Logistic Regression model, highlighting how
each physicochemical property influences wine quality predictions. Key features like alcohol,
density, and volatile acidity dominate, with alcohol having the most positive impact on
predicting higher quality wines. Conversely, density and volatile acidity contribute negatively,
suggesting higher values of these features correlate with lower wine quality.
Definition: AUC measures the ability of a model to distinguish between classes across all
possible thresholds. It is derived from the ROC (Receiver Operating Characteristic) Curve,
which plots the True Positive Rate (Recall) against the False Positive Rate (1 - Specificity).
A linear model refers to a type of model in which the relationship between the input features
(independent variables) and the output (dependent variable) is expressed as a linear
combination of the input features. In simple terms, it assumes that the target value can be
predicted as a weighted sum of the input features.
it assumes that the target value can be predicted as a weighted sum of the input features.
threshold is a value used to decide the classification or output of a model, particularly in binary
or multi-class classification tasks. It acts as a cutoff point to determine how a prediction is
interpreted.
The X-axis represents the coefficients of the logistic regression model. These coefficients
measure the impact of each feature on the model’s predictions:
● Positive values: Features with positive coefficients increase the predicted wine quality
when their values increase.
● Negative values: Features with negative coefficients decrease the predicted wine
quality when their values increase.
● Magnitude: The larger the absolute value of a coefficient (regardless of sign), the more
influential the feature is.