We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 27
WORKING WITH PANDAS DATA FRAMES.
Al
=
To write a Python program to work with pandas data frames
ALGORITHM cy
Step 1: Start
Step 2: Import PANDAS and NUMPY libraries
Step 3: Load the Dataset
Step 4: Perform Pandas basic operation
Step 5: Create a Series using numpy array and perform aggregate functiongon a Sages
Step 6: Display the results of step 4 & step 5
Step 7: Stop
PROGRAM c
#Importing Libraries Q
import pandas as pa
df= pd.read_csv(‘nettlix price in different ev)
import numpy as np oO
#Loading Dataset
unt
Displaying Type
print('.’) Q
print(type(df),sep=")
#Getting information os
print(\n2.’)
print(df.infoQ)s
Finding To@! Mi8sjnQValues in Columns
print("\n
print(a¢Shygit0.stm0)
# Series with dataframe columns as numpy arrays
serl,seg2~ pd.Series(np.array(dfT'No. of TV Shows'))),pd.Series(np.array(df{'No. of Movies)
print(\n4.’)
int(’Series 1:,serl,sep='n)
print(Series 2!,ser2,sep~\n’)
#Concatenating two series
print(\n5.")
print(pd.concat((serl,ser2)))#Mean, Median, Variance, Standard Deviation, Mi
Series
print('1n6../Series 1',sep="n)
print(Mean:ser2.mean())
printMedian:'ser!.median())
print(’Variance:'ser|.var())
print('Standard Des .serl.std()
print(’Minimum Element:',ser1.min())
print(’Maximum Element:'ser!.max())
print(‘Sum of Elements:'ser1.sum())
print(Product of Elements’ser1 prod)
SAMPLE OUTPUT
z..
2
RangeIndex: 65 entries, 0 to 64
Data columns (total 7 columns) :
# = Column
Country
Total Library Size
No. of TV Shows
No. of Movies
Cost Per Month - Basic ($)
aneenHo
memory usage: 3.7+ KB
None
3.
country
Total Library size
No. of TV Shows
No. of Movies
Cost Per Month - Basic (S$)
Cost Per Month - Standard ($)
Cost Per Month - Premium ($)
dtype: int64
eries
4
Se
° 3154
1 3779
2 3155
3 4819
4 3156
Cost Per Month - Standard ($)
Cost Per Month - Premium ($)
dtypes: floated (3), int64(3), object (1)
ecoccce
Non-Null Count
65
65
65
65
65
65
65
non-null
non-null
non-null
non-null
non-null
non-null.
non-null
jum and Maximum element, Sum, Product of a
Os)
Dtype
object
inte
inte4d
inte4d
floaté4
floated
floated60 4515
61-3654
62 4050
63-2978
64 3826
Length: 65, dtype: int64
Series 2:
° 1606
1 1861
2 1836
3 1978
4 1838
60 1971
61 1852
62 2064
63-1580
64 1992
Length: 65, dtype: int64
5.
° 3154
1 3779
2 3155
2 4819
4 3156
60 1971
61 1852
62 2064
63-1580
64 1992
Length: 130, dtype: inté4
6.
Series 1
Mean: 1795.4615384615386
Median: 3512.0
variance: 522744 .2634615385
Standard Deviation: 723.0105555671636
1675,
5234
Sum of Elements: 228732
Product of Elements: 4323455642275676160
RESULT
Tus the program to work with pandas data frames in Python is Executed and Verified Successfully.
3BASIC PLOTS USING MATPLOTLIB
To write a Python program to perform basi plots using matplotlib
Step 3: Create numpy arrays
Step 4: Get an integer choice
Step 5: Based on choice, plot a basic plot in matplotlib
Step 6: Display the plot using show function
Step 7: Stop
¢
ALGORITHM cs )
Step 1: Start, 4)
Step 2: Import MATPLOTLIB and NUMPY libraries Y
PROGRAI
import matplotlib.pyplot as plt
import numpy as np
armum,arrsquare,arreube =
np.array([1,2,3,4,5]),np.array((2,4,8,1 ys ({3,9,27,81,243])
ch = int(input(\nt Line Plotin2.Scattef Plot\n3.Bar Plot\nEnter a choice: ))
Qy,
ifeh =I:
pt plot(armum,arsquare,arpaQhi
plt.scatter(arrnum,arrsquat
lifch
plt.bar(armum,agcut
print('Wroy o
plt.show(
A] PUT 1
1.Line Plot
2.Scatter Plot
3.Bar Plot
Enter a choice: 1SAMPLE OUTPUT 2
1.Line Plot
2.Scatter Plot
3.Bar Plot
Enter a choice: 2
SAMPLE our xy
1.Line Plot
2.scatter Plot
3.Bar Plot
Enter a choice: 3SAMPLE OUTPUT 4
1.Line Plot
2.scatter Plot
3.Bar Plot
Enter a choice: 4
Wrong Choice
SYS
&
RESULT
‘Thus the program to perform basic plots using matplotlib in Python is Executed and Verified
Successfully.FREQUENCY DISTRIBUTIONS, AVERAGES, VARIABILITY.
Date:
AIM
To write a Python program to perform Frequency distributions, Averages, Variability
ALGORITHM
Step 1: Start 4)
Step 2: Import PANDAS and NUMPY libraries
Step 3: Load datasets as pandas dataframe
Step 4: Perform Frequency Distribution Tables, Numpy Average, Vi Standard
deviation and error
Step 5: Display the results
Step 6: Stop
¢
PROGRAM
import numpy as np
import pandas as pd
#FREQUENCY DISTRIBUTION ‘®)
print(ONE WAY FREQUENCY TAMLE\n’)
(_csw(train.csv')
["Survived" ] columns-titanie_train{"Pclass"],margins—True)
"lass1","class2","class3","rowtotal"]
‘survived","coltotal”)
print("n\nHIGHER WAY FREQUENCY TABLE\n’)
titanic_train = pd.read_csv(train.csv')
surv_sex_clas
pd.crosstab(index=titanie_train{"Survived"),columns=[titanic_train{"Pelass"},titanie_train["S
ex"]},margins=True)
print(surv_sex_class)Cre’
titani
array
n of NUMPY Array by dropping NaN values
train['Age'] = titanie_train{'Age'}.replace(np.nan, 0)
parray(titanic_trainl'Age’))
#AVERAGE
print(‘n\nA VERAGE\n')
print(‘Average Age of Passengers: ,format(np.average(array),'2f))
VARIABILITY
print(’Mean:,np.mean(array))
print(’Median:'.np. median(array))
print('Varianee:'np.var(array))
print('Standard Deviation:',np.std(array)) sy
print(’Standard Error:'np.std(array) / np.sqrt(np.size(array)))
SAMPLE OUTPUT
ONE WAY FREQUENCY TABLE
ol_0 count
cabin
All
Als.
A21
A29
A34
F G63
F2
F33
Fa
66
PERE Dee
[76 rows x 1 columns]
Type:
ee
Coots
ieee
dtype: intéd
(76, 1)
TWO WAY FREQUENCY TABLE
class1 olass2 class3 rowtotal
died 80 97 372 549
8survived 136 87 119
coltotal 216 184 491
HIGHER WAY FREQUENCY TABLE
Pelass 1 2 *
sex female male female male female
Survived
0 30°77 6 91 12
1 91 45 70 17 72
ALL 94 122 76 108) «(144
AVERAGE,
Average Age of Passengers: 23.80
Mean: 23.79929292929293
Median: 24.0
Variance: 309.2743232935415
Standard Deviatio
17. 586196953677664
Standard Error: 0.5891597655014137
RESULT
‘Thus the program to perform Frequency distributions, Averages, Varial
Executed and Verified Successfully.
342
891
male
300
47
347
549
342
891
ity in Python isNORMAL CURVES, CORRELATION AND SCATTER PLOTS,
Date: CORRELATION COEFFICIENT
AIM
Correlation coefficient
ALGORITHM 4)
Step 1: Start
Step 2: Import PANDAS, MATPLOTLIB, SCIPY, STATISTICS and “ ies
aay
To write a Python program to perform Normal curves, Correlation and scatter plots,
Step 3: Create numpy arrays
Step 4: Plot a Line Plot using standard deviation and Mean from the’ *
Step 5: Display the plot using show function
Step 6: Create Pandas Series and plot the correlation in a scatter plot aryl plot a fit line
Step 7: Display the plot using show function
Step 8: Create numpy arrays and find the correlation coefficient and display it
Step 9: Stop
PROGRAM
import matplotlib.pyplot as pit
import pandas as pd
from scipy.stats import norm
import statisties
#NORMAL CURVE,
pltplot
plt.shoy()
#QORREDAPION SCATTER PLOTS
ries({1,2, 3, 4, 3,5, 4)
Stries((1, 2, 3,4, 5,6, 7)
‘corrglation = y.corr(x)
piscatter(x, y)
plt.plot(np.unique(x), np.polyldinp.polyfit(x, y, 1))(np.unique()), color='red’)
plt.title(Correlation’)
plt.scatter(x, y)
pit plot(np.unique(x), np.polyld(ap.polyfit(x, y, 1)\(np.inique(x)), color="red’)
plt-xlabel(x axis)
pltylabel('y axis’)
plt.show()#CORRELATION COEFFICIENT
import numpy as np
parray({I1, 2, 7,4, 15, 6, 10, 8, 9, 1, 11, 5, 13, 6, 15)
y= npaarray({2, 5, 17, 6, 10, 8, 13, 4, 6,9, 11, 2, 5,4, 7)
pearsons_coefficient ~ np.correoefi(x, y)
print("The pearson's coeffient of the x and y inputs are: \n" .pearsons_coefficient)
S
Y
s
¢
SAMPLE OUTPUT
8
amr~ VY
The pearson's coeffient of the x and y inputs are:
ta. 0.11521488)
[0.11521488 1. 1]
oy
RESULT
‘Thus the program to perform Normal curves, Correlation and scatter plots, Correlation
coefficient in Python is Executed and Verified Successfully.REGRESSION
INTRODUCTION
Linear regression and logistic regression are two types of regression analysis techniques that
are used to solve the regression problem using machine learning. They are the most
prominent techniques of regression. But, there are many types of regression analysis
techniques in machine learning, and their usage varies according to the nature of the data
involved.
REGRESSION ANALYSI
Regression analysis is a predictive modelling technique that analyzes the relation between the
target or dependent variable and independent variable in a dataset. The different types of
regression analysis techniques get used when the target and independent variables show a
linear or non-linear relationship between each other, and the target variable contains
continuous values. The regression technique gets used mainly to determine the predictor
strength, forecast trend, time series, and in case of cause & effect relation.
Regression analysis is the primary technique to solve the regression problems in machine
earning using data modelling. It involves determining the best fit line, which is a line that
passes through all the data points in such a way that distance of the line from each data point
is minimized,
(~
‘TYPES OF REGRESSION ANALYSIS TECHNIQUES
‘There are many types of regression analysis techniques, and the use of each method depends
upon the number of factors. These factors include the type of target variable, shape of the
regression line, and the number of independent variables.
Below are the different regression techniques:
Linear Regres
Logistic Regression
Ridge Regression
Lasso Regression
Polynomial Regression
Bayesian Linear Regression
‘The different types of regression in machine learning techniques are explained below in
detail
LINEAR REGRESSION
Linear regression is one of the most basic types of regression in machine learning. The linear
regression model consists of a predictor variable and a dependent variable related linearly to
cach other. In case the data involves more than one independent variable, then linear
regression is called multiple linear regression models.
The below-given equation is used fo denote the linear regression model:
yemxicte‘where m is the slope of the line, ¢ is an intercept, and e represents the error in the model.
LOG SSION
Logistic regression is one of the types of regression analysis technique, which gets used when
the dependent variable is discrete. Example: 0 or 1, true or false, etc. This means the target
variable can have only two values, and a sigmoid curve denotes the relation between the
target variable and the independent variable.
Logit function is used in Logistic Regression to measure the relationship between the target,
variable and independent variables. Below is the equation that denotes the logistic regression.
logit(p) = In(p/(1-p)) = b0+bIX1+b2X2+b3X3...°DKXK
where p is the probability of occurrence of the feature.
1
05.
RIDGE REGRESSION
‘This is another one of the types of regression in machine learning which is usually used when
there is a high correlation between the independent variables. This is because, in the case of
‘multi collinear data, the least square estimates give unbiased values. But, in case the
collinearity is very high, there can be some bias value. Therefore, a bias matrix is introduced
in the equation of Ridge Regression. This is a powerful regression method where the model is
less susceptible to overfittingLASSO REGRESSION
Lasso Regression is one of the types of regression in machine learning that performs
regularization along with feature selection. It prohibits the absolute size of the regression
coefficient. As a result, the coefficient value gets nearer to zero, which does not happen in the
case of Ridge Regression.
Due to this, feature selection gets used in Lasso Regression, which allows selecting a set of
features from the dataset to build the model. In the ease of Lasso Regression, only the
required features are used, and the other ones are made zero. This helps in avoiding the
overfitting in the model. In case the independent variables are highly collinear, then Lasso
regression picks only one variable and makes other variables to shrink to zero.
Regression Coefcients Progression for Lasso Paths
je ‘
Fe ‘
B =
1 Zz :
‘est
POLYNOMIAL REGRESSI
Polynomial Regression is anather one of the types of regression analysis techniques in
‘machine learning, which is the same as Multiple Linear Regression with a little modification,
In Polynomial Regression, the relationship between independent and dependent variables,
that is X and Y, is denoted by the n-th degree.
It is a linear model as an estimator. Least Mean Squared Method is used in Polynomial
Regression also. The best fit line in Polynomial Regression that passes through all the data
points is not a straight line, but a curved line, which depends upon the power of X or value ofBAYESIAN LINEAR REGRESSION
Bayesian Regression is one of the types of regression in machine learning that uses the Bayes
theorem to find out the value of regression coefficients. In this method of regression, the
posterior distribution of the features is determined instead of finding the least-squares,
Bayesian Linear Regression is like both Linear Regression and Ridge Regression but
stable than the simple Linear Regression.
more
LT
‘Thus the study of regression was completed and studied Successfully.
IsZ-TEST
To write a Python program to implement Z-Test
ALGORITHM cs )
Step 1: Start, 4)
Step 2: Import STATSMODEL and NUMPY libraries Y
Step 3: Determine the level of significance
Step 4: Find the critical value of z in the z-test
Step 5: Calculate the z-test statistics
Step 6: Display the result
Step 7: Stop
¢
PROGRAM
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats. weightstats iny
mean_iq= 110
Smath.sqrt(50)
alpha =0.05
nnull_mean =100
data = sd_iq*randn(50)-meahsiq
print(‘mean="%.2f std p-mean(data), np,std{data)))
atest Score, p_val a,value = null_mean, alternative—'larger’)
if(p_value < alp!
print("Rej thesis")
else:
printg’ Fail NUII Hypothesis")
PL
-74 stdv=2.03
Reject Null Hypothesis
RESULT
‘Thus the program to implement Z-Test in Python is Executed and Verified Successfully.
16T-TEST
AIM
To write a Python program to implement T-Test
ALGORITHM Cy
Step 1: Start 4)
Step 2: Import SCIPY and NUMPY libraries
Step 3: Get data group I and 2 of equal variance
Step 4: Using function ttest, get the results
Step 5: Display the results
‘Strp 6; Stop.
PROGRAM ¢
import seipy.stats as stats
import numpy as np
data_group] = np.array(list(map(int, input
data_group2 = np array(list(map(int,input(
res = stats.ttest_ind{a=data_groupl. b
print(test statistic: ‘res[0])
print(‘p-value: ‘tes[!])
SAMPLE OUTPUT x&
Enter data group 1: ‘Te 15 15 16 13 8 14 17 16 14 19 20 21 15 15 16 16 13 14 12,
Enter data group 2: 15 17 14 17 14 8 12 19 19 14.17 22 24 16 13 16 13 18 15 13,
=0.6337397070250238,
soup 1: )split))
Foup 2: split)
-qual_var=True)
RESULT
‘Thus the program to implement T-Test in Python is Executed and Verified Successfully.
"ANOVA
To write a Python program to implement ANOVA
ALGORITHM . )
Step 1: Start QD
Step 2: Import SCIPY, PANDAS, MATPLOTLIB, STATSMODEL and Sy ies
Step 3: Load Dataset
Step 4: Get ANOVA table by fitting the dataset values to the funetions ols is
Step 5: Display the ANOVA table
Step 6: Plot it into line plots to get a visual interaction
Step 7: Display the plot using show function
Step 8: Stop ?
PROGRAM
import pandas as pd
from scipy import fats &
from matplotlib import pyplot as plt
import numpy as np
from statsmodels.formula.api import o
import statsmodels stats api
from statsmodels. graphics.
df_ameshousing=pd.read_s
df ameshousing['Seasoit
[Mo Sold’].map({12: Winter’, 1:"Winter’,
’ Spring’,6:'Summer,,7:'Summer’,8:'Summer,9:
sumsq dt F PROF)
¢{Seasonofyear) 1.520669ev11 3.0 10.158870 _1.451294e-03,
‘c(Hleating_oc) 51699845012 40 285.584708 5.5926860-118
c(SeasonofYear) :¢(Heating QC) 8.580682e+10 12.0 1.433087 1.5908776-01,
Residual, 114529790013 | 2912.0 NaN anRESULT
Thus the program to implement ANOVA in Python is Executed and Verified Successfully
9BUILDING AND VALIDATING LINEAR MODELS,
Date:
AIM
To write a Python program to build and validate linear models
ALGORITHM )
Step I: Start
Step 2: Import SKLEARN, MATPLOTLIB, MPL TOOLKITS and NUMPYQibidie
Step 3: Create numpy arrays
Step 4: Estimate Coefficients and plot regression as scatter and line plot
Step 5: Display the plot using show function
Step 6: Pre-Process the data, Fit Multiple linear regression to the trainig@
Step 7: Predict the test result and fit the results into a 3D plot
Step 8: Display the plot using show function ¢
Step 9: Stop
PROGRAM
import numpy as np
import matplotlib.pyplot as pit
from skleam import datasets, linea
import numpy as np
import matplotlib as mpl
from mpl_toolkits.mplot3d,
#SIMPLE LINEAR
9) -ntm_y4m_x
np Sum(xx) - n*m_x%m_ x
jurr®(b_0, b_1)
lef flot_regression_line(x, y, b)
I.scatter(x, y, color = "m" marker ="o", s = 30)
y_pred = b{0] + b[1]*x
pit.plot(x, y_pred, color = "g")
plt.xlabel(x’)
pltylabel(y’)
plt.show(),
20estimate_coeftx, y)
print("Estimated coeiicients:\nb_0
plot_regression_line(x, y, 6)
} \nb_1 = {}".format(b[0], b[1))
#¥MULTIPLE LINEAR REGRESSION
def generate_dataset(n):
x=)
y=0)
random_x1
ip-random.rand()
random_x2 = np.random.rand\()
fori in range(n): >
x2 = /2-+ np.randomrand()*n
xappend({I, x1, x2]) SY
y.append(random_x1 * x1 + random_x2* x2 + 1)
return np.array(x), np.array(y)
x, y= generate_dataset(200)
‘mpl.reParamsf'Tegend. fontsize'] = 12
fig = pltfigure()
ax =fig.add_subplot(projecti ¢
ax.scatter(x[:, 1], x[:, 2} ¥, label
axlegend)
ax.view_init(45, 0)
plt.show()
SAMPLE OUTPUT
Estimated coefficients:
b_O = 1.2363636363636363
b_l = 1.1696969696969697
21BUILDING AND VALIDATING LOGISTIC MODELS,
Date:
AIM
To write a Python program to build and validate logistic models
ALGORITHM )
Step I: Start 4)
Step 2: Import SKLEARN, MATPLOTLIB, PANDAS and NUMPY “Qs
Step 3: Load dataset
Step 4: normalize and split the train test
Step 5: Perform logistic expression
Step 6: Get the cost afier every iteration and display it
Step 7: Visualize the cost and iteration as.a line plot
Step 8: Display the plot using show function ¢
Step 9: Display the accuracy of train and test
Step 10: Stop
PROGRAM &
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selectior < rain test_split,
data = pd.read_esv("data.csv
inplace=True)
0 for each in data.diagnosis]
‘dF initialize_weights_and_bias(dimension):
‘w= np.full((dimension,1),0.01)
b=00
return w, b
def sigmoid):
y_head = 1/(14np.exp(-2))
return y_head
def forward_backward_propagation(w,b,x_train,y train):
232= np.dot(w.T.x_train) +b
y_head = sigmoid(2)
loss=-y_train*np.log(y_head)-(I-y_train)*np.log(1-y_head)
cost = (np.sum(loss))/x_train.shape{]
derivative_weight = (np.dot(x_train,((y_head-y_train).T)))/x_train.shape{ 1]
derivative_bias = np.sum(y_head-y_train)’x_train.shape{1]
gradients = {"derivative_weight": derivative_weight,"derivative_bias": derivative bias}
return cost,gradients
def update(w, b, x_train, y_train, leaming_rate,number_of_iterarion):
cost_list = []
cost_list2 = [] >
index = []
for iin range(number_of iterarion): ey
cost,gradients = forward backward_propagation(w,b,x_trainy_train
cost_list.append(cost)
w=w- leaming_rate * gradients{"derivat
»_weight"]
b=D-Ieaming rate * gradients{"derivative_bias"]
ifi% 100:
cost_list2.append(cost)
index.append(i) ¢
print ("Cost after iteration %i: %t" %6(i, cost)),
parameters = {"weight”: w,"bias": b}
plt.plot(index,cost_list2)
plt.xticks(index,rotation=vertical’)
plt.xlabel(”’Number of Iterarion")
pltylabel("Cost")
plt.show()
return parameters, gradients, cgst Ii
def predict(w,b,x_test)
= sigmoid(np.dot(w.T,x¢&s)
test8hapel |)
def Ip gic n(x_train, y_train, x fest, y_test, leaming_rate , num_iterations):
train shape[0]
initialize_weights_and_t
ters, gradients, cost_|
fearging_rate,num_iterations)
- prediction_test = predict(parameters{"weight"],parameters{"bias"}.x_test)
y_prediction_train = predict(parameters["weight" ],parameters["bias" ] x_train)
print("train accuracy: {} %" format(100 - np.mean(np.abs(y_prediction train - y_train)) *
100))
print("test accuracy: {} %".format(100 - np.mean(np.abs(y_prediction_test -
100)
logistic _regression(x_train, y train, x test, y_test,leamning_rate = 1, num_iterations = 100)
ias(dimension)
ipdatetw, b, x_train, y_train,
)_test)) *
24SAMPLE OUTPUT
Cost after iteration 0: 0.692836
Cost after iteration 0.498576
Cost after iteration 0.404996
Cost after iteration 30: 0.350059
Cost after iteration 0.313747
Cost after iteration 0.287767
Cost after iteration 0.268114
Cost after iteration 0.252627
Cost after iteration 0.240036
Cost after iteration 90: 0.229543
train accuracy: 94.40993788819875 %
test accuracy: 94.18604651162791 &
RESULT
‘Thus the program to build and validate logistic models in Python is Executed and Verified
Successfully
25TIME SERIES ANALYSIS
To write a Python program to implement Time series analysis
ALGORITHM C )
Step 1: Start 4)
Step 2: Import PANDAS, MATPLOTLIB, SEABORN and NUMPY Piece
Step 3: Load dataset and plot the time based column to plot
Step 4: Display the plot using show function
Step 5: Stop Nag
PROGRAM
¢
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seabom a ne )
df = pd.read_csv('AirPassengers.csv’)
df.columns = ['Date'Number of Passdhgers’
def plot_df(df, x,y, title=" s. yJdbel="Number of Passengers’, dpi=100):
plt.sfigure(figsize=(15,4), dj
pltplot(x, y, colortab:
pltsitle TIME SEI YSIS’)
pltshow0,
plot dftdf, x=aiy
"Number of Passengers’), title='Number of US Airline
160")
Nt
E r\ M =
r awe”RESULT
‘Thus the program to implement Time series analysis in Python is Executed and Verified
Successfully.
27