0% found this document useful (0 votes)
392 views27 pages

Ad3411 - Student

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
392 views27 pages

Ad3411 - Student

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 27
WORKING WITH PANDAS DATA FRAMES. Al = To write a Python program to work with pandas data frames ALGORITHM cy Step 1: Start Step 2: Import PANDAS and NUMPY libraries Step 3: Load the Dataset Step 4: Perform Pandas basic operation Step 5: Create a Series using numpy array and perform aggregate functiongon a Sages Step 6: Display the results of step 4 & step 5 Step 7: Stop PROGRAM c #Importing Libraries Q import pandas as pa df= pd.read_csv(‘nettlix price in different ev) import numpy as np oO #Loading Dataset unt Displaying Type print('.’) Q print(type(df),sep=") #Getting information os print(\n2.’) print(df.infoQ)s Finding To@! Mi8sjnQValues in Columns print("\n print(a¢Shygit0.stm0) # Series with dataframe columns as numpy arrays serl,seg2~ pd.Series(np.array(dfT'No. of TV Shows'))),pd.Series(np.array(df{'No. of Movies) print(\n4.’) int(’Series 1:,serl,sep='n) print(Series 2!,ser2,sep~\n’) #Concatenating two series print(\n5.") print(pd.concat((serl,ser2))) #Mean, Median, Variance, Standard Deviation, Mi Series print('1n6../Series 1',sep="n) print(Mean:ser2.mean()) printMedian:'ser!.median()) print(’Variance:'ser|.var()) print('Standard Des .serl.std() print(’Minimum Element:',ser1.min()) print(’Maximum Element:'ser!.max()) print(‘Sum of Elements:'ser1.sum()) print(Product of Elements’ser1 prod) SAMPLE OUTPUT z.. 2 RangeIndex: 65 entries, 0 to 64 Data columns (total 7 columns) : # = Column Country Total Library Size No. of TV Shows No. of Movies Cost Per Month - Basic ($) aneenHo memory usage: 3.7+ KB None 3. country Total Library size No. of TV Shows No. of Movies Cost Per Month - Basic (S$) Cost Per Month - Standard ($) Cost Per Month - Premium ($) dtype: int64 eries 4 Se ° 3154 1 3779 2 3155 3 4819 4 3156 Cost Per Month - Standard ($) Cost Per Month - Premium ($) dtypes: floated (3), int64(3), object (1) ecoccce Non-Null Count 65 65 65 65 65 65 65 non-null non-null non-null non-null non-null non-null. non-null jum and Maximum element, Sum, Product of a Os) Dtype object inte inte4d inte4d floaté4 floated floated 60 4515 61-3654 62 4050 63-2978 64 3826 Length: 65, dtype: int64 Series 2: ° 1606 1 1861 2 1836 3 1978 4 1838 60 1971 61 1852 62 2064 63-1580 64 1992 Length: 65, dtype: int64 5. ° 3154 1 3779 2 3155 2 4819 4 3156 60 1971 61 1852 62 2064 63-1580 64 1992 Length: 130, dtype: inté4 6. Series 1 Mean: 1795.4615384615386 Median: 3512.0 variance: 522744 .2634615385 Standard Deviation: 723.0105555671636 1675, 5234 Sum of Elements: 228732 Product of Elements: 4323455642275676160 RESULT Tus the program to work with pandas data frames in Python is Executed and Verified Successfully. 3 BASIC PLOTS USING MATPLOTLIB To write a Python program to perform basi plots using matplotlib Step 3: Create numpy arrays Step 4: Get an integer choice Step 5: Based on choice, plot a basic plot in matplotlib Step 6: Display the plot using show function Step 7: Stop ¢ ALGORITHM cs ) Step 1: Start, 4) Step 2: Import MATPLOTLIB and NUMPY libraries Y PROGRAI import matplotlib.pyplot as plt import numpy as np armum,arrsquare,arreube = np.array([1,2,3,4,5]),np.array((2,4,8,1 ys ({3,9,27,81,243]) ch = int(input(\nt Line Plotin2.Scattef Plot\n3.Bar Plot\nEnter a choice: )) Qy, ifeh =I: pt plot(armum,arsquare,arpaQhi plt.scatter(arrnum,arrsquat lifch plt.bar(armum,agcut print('Wroy o plt.show( A] PUT 1 1.Line Plot 2.Scatter Plot 3.Bar Plot Enter a choice: 1 SAMPLE OUTPUT 2 1.Line Plot 2.Scatter Plot 3.Bar Plot Enter a choice: 2 SAMPLE our xy 1.Line Plot 2.scatter Plot 3.Bar Plot Enter a choice: 3 SAMPLE OUTPUT 4 1.Line Plot 2.scatter Plot 3.Bar Plot Enter a choice: 4 Wrong Choice SYS & RESULT ‘Thus the program to perform basic plots using matplotlib in Python is Executed and Verified Successfully. FREQUENCY DISTRIBUTIONS, AVERAGES, VARIABILITY. Date: AIM To write a Python program to perform Frequency distributions, Averages, Variability ALGORITHM Step 1: Start 4) Step 2: Import PANDAS and NUMPY libraries Step 3: Load datasets as pandas dataframe Step 4: Perform Frequency Distribution Tables, Numpy Average, Vi Standard deviation and error Step 5: Display the results Step 6: Stop ¢ PROGRAM import numpy as np import pandas as pd #FREQUENCY DISTRIBUTION ‘®) print(ONE WAY FREQUENCY TAMLE\n’) (_csw(train.csv') ["Survived" ] columns-titanie_train{"Pclass"],margins—True) "lass1","class2","class3","rowtotal"] ‘survived","coltotal”) print("n\nHIGHER WAY FREQUENCY TABLE\n’) titanic_train = pd.read_csv(train.csv') surv_sex_clas pd.crosstab(index=titanie_train{"Survived"),columns=[titanic_train{"Pelass"},titanie_train["S ex"]},margins=True) print(surv_sex_class) Cre’ titani array n of NUMPY Array by dropping NaN values train['Age'] = titanie_train{'Age'}.replace(np.nan, 0) parray(titanic_trainl'Age’)) #AVERAGE print(‘n\nA VERAGE\n') print(‘Average Age of Passengers: ,format(np.average(array),'2f)) VARIABILITY print(’Mean:,np.mean(array)) print(’Median:'.np. median(array)) print('Varianee:'np.var(array)) print('Standard Deviation:',np.std(array)) sy print(’Standard Error:'np.std(array) / np.sqrt(np.size(array))) SAMPLE OUTPUT ONE WAY FREQUENCY TABLE ol_0 count cabin All Als. A21 A29 A34 F G63 F2 F33 Fa 66 PERE Dee [76 rows x 1 columns] Type: ee Coots ieee dtype: intéd (76, 1) TWO WAY FREQUENCY TABLE class1 olass2 class3 rowtotal died 80 97 372 549 8 survived 136 87 119 coltotal 216 184 491 HIGHER WAY FREQUENCY TABLE Pelass 1 2 * sex female male female male female Survived 0 30°77 6 91 12 1 91 45 70 17 72 ALL 94 122 76 108) «(144 AVERAGE, Average Age of Passengers: 23.80 Mean: 23.79929292929293 Median: 24.0 Variance: 309.2743232935415 Standard Deviatio 17. 586196953677664 Standard Error: 0.5891597655014137 RESULT ‘Thus the program to perform Frequency distributions, Averages, Varial Executed and Verified Successfully. 342 891 male 300 47 347 549 342 891 ity in Python is NORMAL CURVES, CORRELATION AND SCATTER PLOTS, Date: CORRELATION COEFFICIENT AIM Correlation coefficient ALGORITHM 4) Step 1: Start Step 2: Import PANDAS, MATPLOTLIB, SCIPY, STATISTICS and “ ies aay To write a Python program to perform Normal curves, Correlation and scatter plots, Step 3: Create numpy arrays Step 4: Plot a Line Plot using standard deviation and Mean from the’ * Step 5: Display the plot using show function Step 6: Create Pandas Series and plot the correlation in a scatter plot aryl plot a fit line Step 7: Display the plot using show function Step 8: Create numpy arrays and find the correlation coefficient and display it Step 9: Stop PROGRAM import matplotlib.pyplot as pit import pandas as pd from scipy.stats import norm import statisties #NORMAL CURVE, pltplot plt.shoy() #QORREDAPION SCATTER PLOTS ries({1,2, 3, 4, 3,5, 4) Stries((1, 2, 3,4, 5,6, 7) ‘corrglation = y.corr(x) piscatter(x, y) plt.plot(np.unique(x), np.polyldinp.polyfit(x, y, 1))(np.unique()), color='red’) plt.title(Correlation’) plt.scatter(x, y) pit plot(np.unique(x), np.polyld(ap.polyfit(x, y, 1)\(np.inique(x)), color="red’) plt-xlabel(x axis) pltylabel('y axis’) plt.show() #CORRELATION COEFFICIENT import numpy as np parray({I1, 2, 7,4, 15, 6, 10, 8, 9, 1, 11, 5, 13, 6, 15) y= npaarray({2, 5, 17, 6, 10, 8, 13, 4, 6,9, 11, 2, 5,4, 7) pearsons_coefficient ~ np.correoefi(x, y) print("The pearson's coeffient of the x and y inputs are: \n" .pearsons_coefficient) S Y s ¢ SAMPLE OUTPUT 8 amr~ VY The pearson's coeffient of the x and y inputs are: ta. 0.11521488) [0.11521488 1. 1] oy RESULT ‘Thus the program to perform Normal curves, Correlation and scatter plots, Correlation coefficient in Python is Executed and Verified Successfully. REGRESSION INTRODUCTION Linear regression and logistic regression are two types of regression analysis techniques that are used to solve the regression problem using machine learning. They are the most prominent techniques of regression. But, there are many types of regression analysis techniques in machine learning, and their usage varies according to the nature of the data involved. REGRESSION ANALYSI Regression analysis is a predictive modelling technique that analyzes the relation between the target or dependent variable and independent variable in a dataset. The different types of regression analysis techniques get used when the target and independent variables show a linear or non-linear relationship between each other, and the target variable contains continuous values. The regression technique gets used mainly to determine the predictor strength, forecast trend, time series, and in case of cause & effect relation. Regression analysis is the primary technique to solve the regression problems in machine earning using data modelling. It involves determining the best fit line, which is a line that passes through all the data points in such a way that distance of the line from each data point is minimized, (~ ‘TYPES OF REGRESSION ANALYSIS TECHNIQUES ‘There are many types of regression analysis techniques, and the use of each method depends upon the number of factors. These factors include the type of target variable, shape of the regression line, and the number of independent variables. Below are the different regression techniques: Linear Regres Logistic Regression Ridge Regression Lasso Regression Polynomial Regression Bayesian Linear Regression ‘The different types of regression in machine learning techniques are explained below in detail LINEAR REGRESSION Linear regression is one of the most basic types of regression in machine learning. The linear regression model consists of a predictor variable and a dependent variable related linearly to cach other. In case the data involves more than one independent variable, then linear regression is called multiple linear regression models. The below-given equation is used fo denote the linear regression model: yemxicte ‘where m is the slope of the line, ¢ is an intercept, and e represents the error in the model. LOG SSION Logistic regression is one of the types of regression analysis technique, which gets used when the dependent variable is discrete. Example: 0 or 1, true or false, etc. This means the target variable can have only two values, and a sigmoid curve denotes the relation between the target variable and the independent variable. Logit function is used in Logistic Regression to measure the relationship between the target, variable and independent variables. Below is the equation that denotes the logistic regression. logit(p) = In(p/(1-p)) = b0+bIX1+b2X2+b3X3...°DKXK where p is the probability of occurrence of the feature. 1 05. RIDGE REGRESSION ‘This is another one of the types of regression in machine learning which is usually used when there is a high correlation between the independent variables. This is because, in the case of ‘multi collinear data, the least square estimates give unbiased values. But, in case the collinearity is very high, there can be some bias value. Therefore, a bias matrix is introduced in the equation of Ridge Regression. This is a powerful regression method where the model is less susceptible to overfitting LASSO REGRESSION Lasso Regression is one of the types of regression in machine learning that performs regularization along with feature selection. It prohibits the absolute size of the regression coefficient. As a result, the coefficient value gets nearer to zero, which does not happen in the case of Ridge Regression. Due to this, feature selection gets used in Lasso Regression, which allows selecting a set of features from the dataset to build the model. In the ease of Lasso Regression, only the required features are used, and the other ones are made zero. This helps in avoiding the overfitting in the model. In case the independent variables are highly collinear, then Lasso regression picks only one variable and makes other variables to shrink to zero. Regression Coefcients Progression for Lasso Paths je ‘ Fe ‘ B = 1 Zz : ‘est POLYNOMIAL REGRESSI Polynomial Regression is anather one of the types of regression analysis techniques in ‘machine learning, which is the same as Multiple Linear Regression with a little modification, In Polynomial Regression, the relationship between independent and dependent variables, that is X and Y, is denoted by the n-th degree. It is a linear model as an estimator. Least Mean Squared Method is used in Polynomial Regression also. The best fit line in Polynomial Regression that passes through all the data points is not a straight line, but a curved line, which depends upon the power of X or value of BAYESIAN LINEAR REGRESSION Bayesian Regression is one of the types of regression in machine learning that uses the Bayes theorem to find out the value of regression coefficients. In this method of regression, the posterior distribution of the features is determined instead of finding the least-squares, Bayesian Linear Regression is like both Linear Regression and Ridge Regression but stable than the simple Linear Regression. more LT ‘Thus the study of regression was completed and studied Successfully. Is Z-TEST To write a Python program to implement Z-Test ALGORITHM cs ) Step 1: Start, 4) Step 2: Import STATSMODEL and NUMPY libraries Y Step 3: Determine the level of significance Step 4: Find the critical value of z in the z-test Step 5: Calculate the z-test statistics Step 6: Display the result Step 7: Stop ¢ PROGRAM import math import numpy as np from numpy.random import randn from statsmodels.stats. weightstats iny mean_iq= 110 Smath.sqrt(50) alpha =0.05 nnull_mean =100 data = sd_iq*randn(50)-meahsiq print(‘mean="%.2f std p-mean(data), np,std{data))) atest Score, p_val a,value = null_mean, alternative—'larger’) if(p_value < alp! print("Rej thesis") else: printg’ Fail NUII Hypothesis") PL -74 stdv=2.03 Reject Null Hypothesis RESULT ‘Thus the program to implement Z-Test in Python is Executed and Verified Successfully. 16 T-TEST AIM To write a Python program to implement T-Test ALGORITHM Cy Step 1: Start 4) Step 2: Import SCIPY and NUMPY libraries Step 3: Get data group I and 2 of equal variance Step 4: Using function ttest, get the results Step 5: Display the results ‘Strp 6; Stop. PROGRAM ¢ import seipy.stats as stats import numpy as np data_group] = np.array(list(map(int, input data_group2 = np array(list(map(int,input( res = stats.ttest_ind{a=data_groupl. b print(test statistic: ‘res[0]) print(‘p-value: ‘tes[!]) SAMPLE OUTPUT x& Enter data group 1: ‘Te 15 15 16 13 8 14 17 16 14 19 20 21 15 15 16 16 13 14 12, Enter data group 2: 15 17 14 17 14 8 12 19 19 14.17 22 24 16 13 16 13 18 15 13, =0.6337397070250238, soup 1: )split)) Foup 2: split) -qual_var=True) RESULT ‘Thus the program to implement T-Test in Python is Executed and Verified Successfully. " ANOVA To write a Python program to implement ANOVA ALGORITHM . ) Step 1: Start QD Step 2: Import SCIPY, PANDAS, MATPLOTLIB, STATSMODEL and Sy ies Step 3: Load Dataset Step 4: Get ANOVA table by fitting the dataset values to the funetions ols is Step 5: Display the ANOVA table Step 6: Plot it into line plots to get a visual interaction Step 7: Display the plot using show function Step 8: Stop ? PROGRAM import pandas as pd from scipy import fats & from matplotlib import pyplot as plt import numpy as np from statsmodels.formula.api import o import statsmodels stats api from statsmodels. graphics. df_ameshousing=pd.read_s df ameshousing['Seasoit [Mo Sold’].map({12: Winter’, 1:"Winter’, ’ Spring’,6:'Summer,,7:'Summer’,8:'Summer,9: sumsq dt F PROF) ¢{Seasonofyear) 1.520669ev11 3.0 10.158870 _1.451294e-03, ‘c(Hleating_oc) 51699845012 40 285.584708 5.5926860-118 c(SeasonofYear) :¢(Heating QC) 8.580682e+10 12.0 1.433087 1.5908776-01, Residual, 114529790013 | 2912.0 NaN an RESULT Thus the program to implement ANOVA in Python is Executed and Verified Successfully 9 BUILDING AND VALIDATING LINEAR MODELS, Date: AIM To write a Python program to build and validate linear models ALGORITHM ) Step I: Start Step 2: Import SKLEARN, MATPLOTLIB, MPL TOOLKITS and NUMPYQibidie Step 3: Create numpy arrays Step 4: Estimate Coefficients and plot regression as scatter and line plot Step 5: Display the plot using show function Step 6: Pre-Process the data, Fit Multiple linear regression to the trainig@ Step 7: Predict the test result and fit the results into a 3D plot Step 8: Display the plot using show function ¢ Step 9: Stop PROGRAM import numpy as np import matplotlib.pyplot as pit from skleam import datasets, linea import numpy as np import matplotlib as mpl from mpl_toolkits.mplot3d, #SIMPLE LINEAR 9) -ntm_y4m_x np Sum(xx) - n*m_x%m_ x jurr®(b_0, b_1) lef flot_regression_line(x, y, b) I.scatter(x, y, color = "m" marker ="o", s = 30) y_pred = b{0] + b[1]*x pit.plot(x, y_pred, color = "g") plt.xlabel(x’) pltylabel(y’) plt.show(), 20 estimate_coeftx, y) print("Estimated coeiicients:\nb_0 plot_regression_line(x, y, 6) } \nb_1 = {}".format(b[0], b[1)) #¥MULTIPLE LINEAR REGRESSION def generate_dataset(n): x=) y=0) random_x1 ip-random.rand() random_x2 = np.random.rand\() fori in range(n): > x2 = /2-+ np.randomrand()*n xappend({I, x1, x2]) SY y.append(random_x1 * x1 + random_x2* x2 + 1) return np.array(x), np.array(y) x, y= generate_dataset(200) ‘mpl.reParamsf'Tegend. fontsize'] = 12 fig = pltfigure() ax =fig.add_subplot(projecti ¢ ax.scatter(x[:, 1], x[:, 2} ¥, label axlegend) ax.view_init(45, 0) plt.show() SAMPLE OUTPUT Estimated coefficients: b_O = 1.2363636363636363 b_l = 1.1696969696969697 21 BUILDING AND VALIDATING LOGISTIC MODELS, Date: AIM To write a Python program to build and validate logistic models ALGORITHM ) Step I: Start 4) Step 2: Import SKLEARN, MATPLOTLIB, PANDAS and NUMPY “Qs Step 3: Load dataset Step 4: normalize and split the train test Step 5: Perform logistic expression Step 6: Get the cost afier every iteration and display it Step 7: Visualize the cost and iteration as.a line plot Step 8: Display the plot using show function ¢ Step 9: Display the accuracy of train and test Step 10: Stop PROGRAM & import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selectior < rain test_split, data = pd.read_esv("data.csv inplace=True) 0 for each in data.diagnosis] ‘dF initialize_weights_and_bias(dimension): ‘w= np.full((dimension,1),0.01) b=00 return w, b def sigmoid): y_head = 1/(14np.exp(-2)) return y_head def forward_backward_propagation(w,b,x_train,y train): 23 2= np.dot(w.T.x_train) +b y_head = sigmoid(2) loss=-y_train*np.log(y_head)-(I-y_train)*np.log(1-y_head) cost = (np.sum(loss))/x_train.shape{] derivative_weight = (np.dot(x_train,((y_head-y_train).T)))/x_train.shape{ 1] derivative_bias = np.sum(y_head-y_train)’x_train.shape{1] gradients = {"derivative_weight": derivative_weight,"derivative_bias": derivative bias} return cost,gradients def update(w, b, x_train, y_train, leaming_rate,number_of_iterarion): cost_list = [] cost_list2 = [] > index = [] for iin range(number_of iterarion): ey cost,gradients = forward backward_propagation(w,b,x_trainy_train cost_list.append(cost) w=w- leaming_rate * gradients{"derivat »_weight"] b=D-Ieaming rate * gradients{"derivative_bias"] ifi% 100: cost_list2.append(cost) index.append(i) ¢ print ("Cost after iteration %i: %t" %6(i, cost)), parameters = {"weight”: w,"bias": b} plt.plot(index,cost_list2) plt.xticks(index,rotation=vertical’) plt.xlabel(”’Number of Iterarion") pltylabel("Cost") plt.show() return parameters, gradients, cgst Ii def predict(w,b,x_test) = sigmoid(np.dot(w.T,x¢&s) test8hapel |) def Ip gic n(x_train, y_train, x fest, y_test, leaming_rate , num_iterations): train shape[0] initialize_weights_and_t ters, gradients, cost_| fearging_rate,num_iterations) - prediction_test = predict(parameters{"weight"],parameters{"bias"}.x_test) y_prediction_train = predict(parameters["weight" ],parameters["bias" ] x_train) print("train accuracy: {} %" format(100 - np.mean(np.abs(y_prediction train - y_train)) * 100)) print("test accuracy: {} %".format(100 - np.mean(np.abs(y_prediction_test - 100) logistic _regression(x_train, y train, x test, y_test,leamning_rate = 1, num_iterations = 100) ias(dimension) ipdatetw, b, x_train, y_train, )_test)) * 24 SAMPLE OUTPUT Cost after iteration 0: 0.692836 Cost after iteration 0.498576 Cost after iteration 0.404996 Cost after iteration 30: 0.350059 Cost after iteration 0.313747 Cost after iteration 0.287767 Cost after iteration 0.268114 Cost after iteration 0.252627 Cost after iteration 0.240036 Cost after iteration 90: 0.229543 train accuracy: 94.40993788819875 % test accuracy: 94.18604651162791 & RESULT ‘Thus the program to build and validate logistic models in Python is Executed and Verified Successfully 25 TIME SERIES ANALYSIS To write a Python program to implement Time series analysis ALGORITHM C ) Step 1: Start 4) Step 2: Import PANDAS, MATPLOTLIB, SEABORN and NUMPY Piece Step 3: Load dataset and plot the time based column to plot Step 4: Display the plot using show function Step 5: Stop Nag PROGRAM ¢ import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt import seabom a ne ) df = pd.read_csv('AirPassengers.csv’) df.columns = ['Date'Number of Passdhgers’ def plot_df(df, x,y, title=" s. yJdbel="Number of Passengers’, dpi=100): plt.sfigure(figsize=(15,4), dj pltplot(x, y, colortab: pltsitle TIME SEI YSIS’) pltshow0, plot dftdf, x=aiy "Number of Passengers’), title='Number of US Airline 160") Nt E r\ M = r awe” RESULT ‘Thus the program to implement Time series analysis in Python is Executed and Verified Successfully. 27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy