se python_merged (1) (1) (1)
se python_merged (1) (1) (1)
ANOVA-ANALYSIS OF
VARIANCE
19/06/2024, 23:30 Untitled2.ipynb - Colab
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import skew, kurtosis
import statsmodels.api as sm
%matplotlib inline
highway-mpg price
0 27 13495
1 27 16500
2 26 16500
3 30 13950
4 22 17450
[5 rows x 26 columns]
https://colab.research.google.com/drive/18pq4PjPIeWmnKS0QDggj9qxZf2rTNa8v#scrollTo=sNHmyAXoc1Hv&printMode=true 1/6
19/06/2024, 23:30 Untitled2.ipynb - Colab
price
count 205.000000
mean 13207.129353
std 7868.768212
min 5118.000000
25% 7788.000000
50% 10595.000000
75% 16500.000000
max 45400.000000
Skewness:
symboling 0.211072
https://colab.research.google.com/drive/18pq4PjPIeWmnKS0QDggj9qxZf2rTNa8v#scrollTo=sNHmyAXoc1Hv&printMode=true 2/6
19/06/2024, 23:30 Untitled2.ipynb - Colab
normalized-losses 0.854802
wheel-base 1.050214
length 0.155954
width 0.904003
height 0.063123
curb-weight 0.681398
engine-size 1.947655
bore 0.020211
stroke -0.689784
compression-ratio 2.610862
horsepower 1.397763
peak-rpm 0.073591
city-mpg 0.663704
highway-mpg 0.539997
price 1.827324
dtype: float64
Kurtosis:
symboling -0.676271
normalized-losses 1.404644
wheel-base 1.017039
length -0.082895
width 0.702764
height -0.443812
curb-weight -0.042854
engine-size 5.305682
bore -0.785040
stroke 2.174471
compression-ratio 5.233054
horsepower 2.678182
peak-rpm 0.086770
city-mpg 0.578648
highway-mpg 0.440070
price 3.354216
dtype: float64
numeric_df = df[numeric_columns]
correlation_matrix = numeric_df.corr()
print(correlation_matrix)
https://colab.research.google.com/drive/18pq4PjPIeWmnKS0QDggj9qxZf2rTNa8v#scrollTo=sNHmyAXoc1Hv&printMode=true 3/6
19/06/2024, 23:30 Untitled2.ipynb - Colab
peak-rpm -0.066844 -0.435936 0.130971 1.000000
city-mpg -0.042179 0.324701 -0.803162 -0.113723
highway-mpg -0.043961 0.265201 -0.770903 -0.054257
price 0.082095 0.070990 0.757917 -0.100854
# 7. ANOVA Analysis
# Group by 'make' and perform ANOVA
grouped_test2 = df[['make', 'price']].groupby(['make'])
print(grouped_test2.head(2))
print(grouped_test2.get_group('honda')['price'])
# Perform ANOVA
anova_results = sm.stats.anova_lm(model, typ=2)
print(anova_results)
make price
0 alfa-romero 13495.000000
1 alfa-romero 16500.000000
3 audi 13950.000000
4 audi 17450.000000
10 bmw 16430.000000
11 bmw 16925.000000
18 chevrolet 5151.000000
19 chevrolet 6295.000000
21 dodge 5572.000000
22 dodge 6377.000000
30 honda 6479.000000
31 honda 6855.000000
43 isuzu 6785.000000
44 isuzu 13207.129353
47 jaguar 32250.000000
48 jaguar 35550.000000
50 mazda 5195.000000
51 mazda 6095.000000
67 mercedes-benz 25552.000000
68 mercedes-benz 28248.000000
75 mercury 16503.000000
76 mitsubishi 5389.000000
77 mitsubishi 6189.000000
89 nissan 5499.000000
90 nissan 7099.000000
107 peugot 11900.000000
108 peugot 13200.000000
118 plymouth 5572.000000
119 plymouth 7957.000000
125 porsche 22018.000000
126 porsche 32528.000000
130 renault 9295.000000
131 renault 9895.000000
132 saab 11850.000000
133 saab 12170.000000
138 subaru 5118.000000
139 subaru 7053.000000
150 toyota 5348.000000
151 toyota 6338.000000
182 volkswagen 7775.000000
183 volkswagen 7975.000000
194 volvo 12940.000000
195 volvo 13415.000000
30 6479.0
31 6855.0
32 5399.0
33 6529.0
34 7129.0
35 7295.0
36 7295.0
37 7895.0
38 9095.0
39 8845.0
40 10295.0
41 12945.0
42 10345.0
Name: price, dtype: float64
df ( )
https://colab.research.google.com/drive/18pq4PjPIeWmnKS0QDggj9qxZf2rTNa8v#scrollTo=sNHmyAXoc1Hv&printMode=true 4/6
19/06/2024, 23:30 Untitled2.ipynb - Colab
# 8. Regression Plots
sns.regplot(x='engine-size', y='price', data=df)
plt.title('Engine Size vs Price')
plt.show()
https://colab.research.google.com/drive/18pq4PjPIeWmnKS0QDggj9qxZf2rTNa8v#scrollTo=sNHmyAXoc1Hv&printMode=true 5/6
19/06/2024, 23:30 Untitled2.ipynb - Colab
https://colab.research.google.com/drive/18pq4PjPIeWmnKS0QDggj9qxZf2rTNa8v#scrollTo=sNHmyAXoc1Hv&printMode=true 6/6
WEEK 8
CALCULATING THE
SKEWNESS OF A DATA SET
WEEK 9
5-POINT SUMMARY
6/22/24, 2:46 PM Untitled19
# Interpolation if necessary
if position.is_integer():
value = data[int(position) - 1]
else :
lower_index = int(position)
upper_index = lower_index + 1
lower_value = data[lower_index - 1]
upper_value = data[upper_index - 1]
value = lower_value + (position - lower_index) * (upper_value - lower
Minimum: 10
First Quartile (Q1): 32.5
Median (Q2): 55.0
Third Quartile (Q3): 77.5
Maximum: 100
localhost:8888/notebooks/Untitled19.ipynb 1/3
6/22/24, 2:46 PM Untitled19
IQR: 45.0
localhost:8888/notebooks/Untitled19.ipynb 2/3
6/22/24, 2:46 PM Untitled19
localhost:8888/notebooks/Untitled19.ipynb 3/3
WEEK 10
UNIVARIATE, BIVARIATE,
MULTIVARIATE DESCRIPTIVE
STATISTIC MEASURES
WEEK 11
NORMAL DISTRIBUTION
24/06/2024, 18:36 Untitled13.ipynb - Colab
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
n = np.arange(0,30)
print(n)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29]
rate = 12
poisson = stats.poisson.pmf(n,rate)
print(poisson)
print(poisson[5])
0.007600390681067
plt.plot(n,poisson, 'o-')
plt.show()
# Parameters
n = 10 # Number of trials
p = 0.5 # Probability of success
https://colab.research.google.com/drive/1pgh75Hk4LTUAeQ-9VA_re6McbwW7Xi9s#scrollTo=2DElP7pcWZzL&printMode=true 2/2
WEEK 12
LINEAR REGRESSION
WEEK 13
T-TEST