Wine
Wine
March 9, 2025
1
1 3.20 0.68 9.8 Italy 5
2 3.26 0.65 9.8 Italy 5
3 3.16 0.58 9.8 Spain 6
4 3.51 0.56 9.4 UK 5
(1506, 15)
2
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1506 entries, 0 to 1505
Data columns (total 15 columns):
id 1506 non-null int64
fixed acidity 1504 non-null float64
volatile acidity 1504 non-null float64
citric acid 1504 non-null float64
residual sugar 1504 non-null float64
flavonoids 1504 non-null float64
chlorides 1504 non-null float64
free sulfur dioxide 1504 non-null float64
total sulfur dioxide 1504 non-null float64
density 1504 non-null float64
pH 1504 non-null float64
sulphates 1504 non-null float64
alcohol 1504 non-null float64
country 1504 non-null object
quality 1506 non-null int64
dtypes: float64(12), int64(2), object(1)
memory usage: 170.6+ KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1504 entries, 0 to 1505
Data columns (total 15 columns):
id 1504 non-null int64
fixed acidity 1504 non-null float64
volatile acidity 1504 non-null float64
2
citric acid 1504 non-null float64
residual sugar 1504 non-null float64
flavonoids 1504 non-null float64
chlorides 1504 non-null float64
free sulfur dioxide 1504 non-null float64
total sulfur dioxide 1504 non-null float64
density 1504 non-null float64
pH 1504 non-null float64
sulphates 1504 non-null float64
alcohol 1504 non-null float64
country 1504 non-null object
quality 1504 non-null int64
dtypes: float64(12), int64(2), object(1)
memory usage: 182.1+ KB
[459]: wine.describe()
3
alcohol quality
count 1504.000000 1504.000000
mean 10.427238 5.635638
std 1.434245 0.815816
min -1.000000 3.000000
25% 9.500000 5.000000
50% 10.100000 6.000000
75% 11.100000 6.000000
max 45.300000 8.000000
4
[461]: # plotting the quality of wine using a countplot
fig = plt.figure(figsize = (8,4))
sns.set_style("whitegrid")
sns.countplot(wine['quality'],palette='plasma')
plt.title("QUALITY OF WINE", size=18)
5
[463]: # plotting the quality of wine against citric acid using a barplot
sns.set_style("whitegrid")
sns.barplot('quality', 'citric acid', data=wine, palette="Spectral")
plt.title('Quality vs Citric Acid', size=20)
6
[464]: # plotting the quality of wine against residual sugar using a barplot
sns.set_style("whitegrid")
sns.barplot('quality', 'residual sugar', data=wine, palette="Spectral")
plt.title('Quality vs Residual Sugar', size=20)
7
[465]: # plotting the quality of wine against flavonoids using a barplot
sns.set_style("whitegrid")
sns.barplot('quality', 'flavonoids', data=wine, palette="Spectral")
plt.title('Quality vs Flavonoids', size=20)
8
[466]: # plotting the quality of wine against chlorides using a barplot
sns.set_style("whitegrid")
sns.barplot('quality', 'chlorides', data=wine, palette='Spectral')
plt.title('Quality vs Chlorides', size=20)
9
[467]: # plotting the quality of wine against free sulphur dioxide using a barplot
sns.set_style("whitegrid")
sns.barplot('quality', 'free sulfur dioxide', data=wine, palette='Spectral')
plt.title('Quality vs Free Sulphur Dioxide', size=20)
10
[468]: # plotting the quality of wine against total sulphur dioxide using a barplot
sns.set_style("whitegrid")
sns.barplot('quality', 'total sulfur dioxide', data=wine, palette='Spectral')
plt.title('Quality vs Total Sulphur Dioxide', size=20)
11
[469]: # plotting the quality of wine against the density using a barplot
sns.set_style("whitegrid")
sns.barplot('quality', 'density', data=wine, palette='Spectral')
plt.title('Quality vs Density', size=20)
12
[470]: # plotting the quality of wine against PH level using a barplot
sns.set_style("whitegrid")
sns.barplot('quality', 'pH', data=wine, palette='Spectral')
plt.title('Quality vs PH Level', size=20)
13
[471]: # plotting the quality of wine against Sulphates using a barplot
sns.set_style("whitegrid")
sns.barplot('quality', 'sulphates', data=wine, palette='Spectral')
plt.title('Quality vs Sulphates', size=20)
14
[472]: # plotting the quality of wine against the alcohol content using a barplot
sns.set_style("whitegrid")
sns.barplot('quality', 'alcohol', data=wine, palette='Spectral')
plt.title('Quality vs Alcohol', size=20)
15
0.5 5. PreProcessing Data for Building Machine Learning Algorithm
[473]: #Creating binary classificaion for prediction variable.
#classifyin wine as good and bad by giving a limit for the quality
bins = (2, 6.5, 8)
review = ['bad', 'good']
wine['quality'] = pd.cut(wine['quality'], bins = bins, labels = review)
16
[475]: ## replacing the categorical variable to an integer
country_to_nums = {'country': {'UK': 1, 'Italy': 2, 'Spain':3}}
wine.replace(country_to_nums, inplace=True)
wine.head()
17
4 3.51 0.56 9.4 1 0
18
[551]: #from the plot above, we can see that 9 principal components attribute for 90%␣
↪of variation in the data.
[553]: print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)
19
(1203, 9)
(301, 9)
(1203,)
(301,)
c:\users\gautham\python\python36-32\lib\site-
packages\sklearn\linear_model\logistic.py:433: FutureWarning: Default solver
will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
[[261 10]
[ 18 12]]
90.69767441860465
[[245 26]
[ 14 16]]
86.71096345514951
c:\users\gautham\python\python36-32\lib\site-
packages\sklearn\ensemble\forest.py:246: FutureWarning: The default value of
20
n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
[[264 7]
[ 17 13]]
92.02657807308971
acScore = pd.DataFrame()
acScore['Model'] = ['Linear Regression', 'Decision Tree', 'Random Forest␣
↪Classifier']
ac1 = lr_acc_score*100
ac2 = dt_acc_score*100
ac3 = rf_acc_score*100
acScore['Score'] = [ac1,ac2,ac3]
acScore
21
fontweight='bold',
size=20)
22