0% found this document useful (0 votes)
46 views4 pages

Correlation and Regression (TP)

This document analyzes the correlation between radio advertising (the independent variable) and sales (the dependent variable) using a dataset containing 199 observations. It calculates the Pearson correlation coefficient, splits the data into training and test sets, builds a linear regression model on the training set to predict sales based on radio spending, makes predictions on the test set, and plots the actual and predicted values to visualize the regression line. It also defines a function to manually calculate the Pearson correlation coefficient.

Uploaded by

Hiimay Channel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views4 pages

Correlation and Regression (TP)

This document analyzes the correlation between radio advertising (the independent variable) and sales (the dependent variable) using a dataset containing 199 observations. It calculates the Pearson correlation coefficient, splits the data into training and test sets, builds a linear regression model on the training set to predict sales based on radio spending, makes predictions on the test set, and plots the actual and predicted values to visualize the regression line. It also defines a function to manually calculate the Pearson correlation coefficient.

Uploaded by

Hiimay Channel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

7/2/2021 Correlation and Regression(TP).

ipynb - Colaboratory

import pandas as pd

from google.colab import files
uploaded = files.upload()

Choose Files Reg_Advertising.csv


Reg_Advertising.csv(application/vnd.ms-excel) - 4732 bytes, last modified: 7/2/2021 - 100%
done
Saving Reg Advertising csv to Reg Advertising csv

filename = next(iter(uploaded))
dataset=pd.read_csv(filename,encoding='latin1')
dataset.head()

Unnamed: 0 TV Radio Newspaper Sales

0 1 230.1 37.8 69.2 22.1

1 2 44.5 39.3 45.1 10.4

2 3 17.2 45.9 69.3 9.3

3 4 151.5 41.3 58.5 18.5

4 5 180.8 10.8 58.4 12.9

dataset.plot(kind='scatter', x='Radio', y=['Sales'], style='.=')

<matplotlib.axes._subplots.AxesSubplot at 0x7f49e3c71410>

variabel = dataset[['Radio', 'Sales']]
korelasi = variabel.corr(method='pearson')
# korelasi = variabel.corr(method='spearman)
korelasi

https://colab.research.google.com/drive/1Tv7K1YA8fw8RJ5RPFZwnc-p03gOcatBe?authuser=1#scrollTo=gvXjr6_eHHO1&uniqifier=4&printMode=true 1/4
7/2/2021 Correlation and Regression(TP).ipynb - Colaboratory

Radio Sales

Radio 1.000000 0.577071

Sales 0.577071 1.000000


n=len(dataset)
      
# splitting dataframe by row index
dt_train = dataset.iloc[:150,:]
dt_test = dataset.iloc[151:,:]
print("Shape of new dataframes - {}, {}".format(dt_train.shape, dt_test.shape))
 
# # splitting dataframe in a particular size
# df_split = dataset.sample(frac=0.6,random_state=200)
# df_split.reset_index()

Shape of new dataframes - (150, 5), (48, 5)

dt_test.tail()

Unnamed: 0 TV Radio Newspaper Sales

194 195 149.7 35.6 6.0 17.3

195 196 38.2 3.7 13.8 7.6

196 197 94.2 4.9 8.1 9.7

197 198 177.0 9.3 6.4 12.8

198 199 283.6 42.0 66.2 25.5

from sklearn.linear_model import LinearRegression
 
x=dt_train[['Radio']]
y=dt_train[['Sales']]
 
# Linear Regression Model
linreg=LinearRegression()
# #Membuat model dengan latih
linreg.fit(x,y)
 
print('Coefisien regressi: ', linreg.coef_) #coefisien

Coefisien regressi: [[0.19756537]]

#membuat prediksi pada data uji
x_test=dt_test[['Sales']]
y_pred=linreg.predict(x_test)
 
# #membandingkan hasil prediksi dengan data test
https://colab.research.google.com/drive/1Tv7K1YA8fw8RJ5RPFZwnc-p03gOcatBe?authuser=1#scrollTo=gvXjr6_eHHO1&uniqifier=4&printMode=true 2/4
7/2/2021 Correlation and Regression(TP).ipynb - Colaboratory
# #membandingkan hasil prediksi dengan data test
result=x_test
result["y_pred"]=y_pred
result.head()

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:7: SettingWithCopyWar
A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable


import sys

Sales y_pred

151 11.6 11.586927

152 16.6 12.574754

153 19.0 13.048910

154 15.6 12.377188

155 3.2 9.927378

#prepare plot
import matplotlib.pyplot as plt
plt.scatter(x,y,color='red')#data training
plt.scatter(x_test["Sales"], y_pred, color='blue')#data predict/data test
plt.plot(x,linreg.predict(x),color='green')#line
plt.xlabel('Radio')
plt.ylabel('Sales')
plt.show()

import math
 
def average(x):
  assert len(x) > 0
  return float(sum(x)) / len(x)
https://colab.research.google.com/drive/1Tv7K1YA8fw8RJ5RPFZwnc-p03gOcatBe?authuser=1#scrollTo=gvXjr6_eHHO1&uniqifier=4&printMode=true 3/4
7/2/2021 Correlation and Regression(TP).ipynb - Colaboratory

 
def pearson_def(x, y):
  assert len(x) == len(y)
  n = len(x)
  assert n > 0
  avg_x = average(x)
  avg_y = average(y)
  diffprod = 0
  xdiff2 = 0
  ydiff2 = 0
  for idx in range(n):
      xdiff = x[idx] - avg_x
      ydiff = y[idx] - avg_y
      diffprod += xdiff * ydiff
      xdiff2 += xdiff * xdiff
      ydiff2 += ydiff * ydiff
      print(x+y)
  return diffprod / math.sqrt(xdiff2 * ydiff2)

hasil_pearson= pearson_def([1,2,3], [1,5,7])

[1, 2, 3, 1, 5, 7]

[1, 2, 3, 1, 5, 7]

[1, 2, 3, 1, 5, 7]

check 0s completed at 4:54 PM

https://colab.research.google.com/drive/1Tv7K1YA8fw8RJ5RPFZwnc-p03gOcatBe?authuser=1#scrollTo=gvXjr6_eHHO1&uniqifier=4&printMode=true 4/4

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy