0% found this document useful (0 votes)

112 views9 pages

Assignment 6

This document discusses analyzing avocado pricing data from different regions in the United States. Various data wrangling and visualization exercises are presented to explore trends in average avocado prices over time for different regions. Key findings include that average avocado prices have increased overall from 2015 to 2018, but pricing trends differ between individual regions. Faceted graphs are useful for comparing trends between regions. A linear regression model finds a moderate positive relationship between date and average price for the total US.

Uploaded by

Ray Guo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views9 pages

Assignment 6

Uploaded by

Ray Guo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Assignment 6: Avocado outrage

Raymond Guo
2020-02-26

Exercise 1
The range span time of each observation is by 7 days or 168 hours

Exercise 2

avocado %>%
select(region) %>%
unique()

region
Albany
Atlanta
BaltimoreWashington
Boise
Boston
BuffaloRochester
California
Charlotte
Chicago
CincinnatiDayton
Columbus
DallasFtWorth
Denver
Detroit
GrandRapids
GreatLakes
HarrisburgScranton
HartfordSpringfield
Houston
Indianapolis
Jacksonville
LasVegas
LosAngeles
Louisville
MiamiFtLauderdale
Midsouth
Nashville
NewOrleansMobile
NewYork
Northeast

1
region
NorthernNewEngland
Orlando
Philadelphia
PhoenixTucson
Pittsburgh
Plains
Portland
RaleighGreensboro
RichmondNorfolk
Roanoke
Sacramento
SanDiego
SanFrancisco
Seattle
SouthCarolina
SouthCentral
Southeast
Spokane
StLouis
Syracuse
Tampa
TotalUS
West
WestTexNewMexico

i. All of the regions in the dataset are above this.

ii. BaltimoreWashington
iii. TotalUS ## Exercise 3
avocado %>%
filter(region == "TotalUS") %>%
ggplot() +
geom_line(mapping = aes(x = Date, y = AveragePrice)) +
labs(title = "Time series of average avocado price", x = "Date", y = "Average Price/$")

2
Time series of average avocado price

1.50

Average Price/$
1.25

1.00

0.75
2015 2016 2017 2018
Date

Exercise 4

avocado %>%
filter(region == "TotalUS") %>%
ggplot() +
geom_line(mapping = aes(x = Date, y = AveragePrice)) +
labs(title = "Time series of average avocado price", x = "Date", y = "Average Price/$") +
geom_smooth(mapping = aes(x = Date, y = AveragePrice), span = 0.1, se = FALSE, color = "blu

Time series of average avocado price

1.50
Average Price/$

1.25

1.00

0.75
2015 2016 2017 2018
Date

Decreasing the span will make the new line look more aligned to the original graph.

Exercise 5

avocado_regional <- avocado %>%

filter(region == "TotalUS" | region == "BaltimoreWashington" | region == "Albany")

3
Exercise 6

avocado_regional %>%
ggplot() +
geom_smooth(mapping = aes(x = Date, y = AveragePrice, color = region), span = 0.1, method =
labs(title = "Time series of average avocado price", x = "Date", y = "Average Price/$")

Time series of average avocado price

1.8

1.6
Average Price/$

region
1.4 Albany
BaltimoreWashington
1.2
TotalUS

1.0

0.8
2015 2016 2017 2018
Date
ggplot(data = avocado_regional) +
geom_smooth(mapping = aes(x = Date, y = AveragePrice, color = region), span = .3, method = "l
facet_grid(. ~ region)

Albany BaltimoreWashington TotalUS

1.50
AveragePrice

region
Albany
1.25
BaltimoreWashington
TotalUS

1.00

2015201620172018
2015201620172018
2015201620172018
Date

Facet makes it eaiser to see each of the specific regions and analyzing its graph compared to the
other graph where it looks quite difficult to see what each region is doing.
Facet also makes it harder to do comparative analysis like which region is performing better than
the other based off a given timeframe. This can be easily determined when you overlap all 3 graphs
together to arrive with a precise answer.

4
Exercise 7

avocado_usa <- avocado %>%

filter(region == "TotalUS")

avocado_usa %>%
ggplot() +
geom_smooth(mapping = aes(x = Date, y = AveragePrice, color = region), span = 0.1, method =
labs(title = "Time series of average avocado price", x = "Date", y = "Average Price/$")

Time series of average avocado price

1.50
Average Price/$

region
1.25
TotalUS

1.00

2015 2016 2017 2018

Date
avocado_usa_model <- lm(AveragePrice ~ Date, data = avocado_usa)

avocado_usa_model %>%
tidy()

term estimate std.error statistic p.value

(Intercept) -3.1886561 0.5822833 -5.476125 2e-07
Date 0.0002514 0.0000342 7.352999 0e+00

avocado_usa_model %>%
glance() %>%
select(r.squared)

r.squared
0.2445715

avocado_usa_model %>%
ggplot() +
geom_point(mapping = aes(x = Date, y = AveragePrice)) +
geom_abline(slope = avocado_usa_model$coefficients[2], intercept = avocado_usa_model$coeffici

5
1.50

AveragePrice 1.25

1.00

0.75
2015 2016 2017 2018
Date

Exercise 8
According to information found using Google, “avocado trees is best harvested when immature, green
and hard and ripened off the tree”. To maximize those attributes is by harvesting in September(Fall).
That is when the avocado supply goes up and the quality goes up making the pricing goes down.
All other seasons like Spring and Summer sees no play as in lack of quality and quantity resulting
in an exponential price value. ## Exercise 9
avocado %>%
ggplot() +
geom_point(mapping = aes(x = TotalVolume, y = AveragePrice))

2.0
AveragePrice

1.5

1.0

0.5

0e+00 2e+07 4e+07 6e+07

TotalVolume

It seems the data points holding the least amount of volume does not really show a correlation with
the price tag. There is a minor negative condition where the lower price tag yields more volume.
The other data points that are pretty much outliers of the others show that people will purchase
avocados during the harvest season. ## Exercise 10

6
avocado_usa_model1 <- lm(AveragePrice ~ TotalVolume, data = avocado_usa)

avocado_usa_model1 %>%
tidy()

term estimate std.error statistic p.value

(Intercept) 1.581618 0.0649437 24.353671 0
TotalVolume 0.000000 0.0000000 -7.661189 0

avocado_usa_model1 %>%
glance() %>%
select(r.squared)

r.squared
0.2600595

avocado_usa_df <- avocado_usa %>%

add_predictions(avocado_usa_model1) %>%
add_residuals(avocado_usa_model1)

ggplot(avocado_usa_df)+
geom_point(mapping =aes(pred, AveragePrice)) +
geom_abline(slope = 1, intercept = 0, color = "red", size = 1)

1.50
AveragePrice

1.25

1.00

0.75
0.8 1.0 1.2
pred

ggplot(avocado_usa_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)

7
0.4

0.2
resid

0.0

−0.2

0.8 1.0 1.2

pred

ggplot(data = avocado_usa_df) +
geom_qq(mapping = aes(sample = resid)) +
geom_qq_line(mapping = aes(sample = resid))

0.25
sample

0.00

−0.25

−0.50
−3 −2 −1 0 1 2 3
theoretical

There is a linear relationship from the observed vs predicted graph, the residual dispersion represents
an equilibrium from the predicted vs residuals graph, and the variability of the points around the
line is consistent from the Q-Q plot graph. With all 3 conditions being satisfied this model is reliable.
## Exercise 11
new <- avocado %>%
filter(region == "Albany" | region == "BaltimoreWashington") %>%
group_by(region) %>%
summarize(mean_avocado_price = mean(AveragePrice, na.rm = TRUE))

0.004556 is the difference between the regions

obs_stat <- avocado %>%
filter(region == "Albany" | region == "BaltimoreWashington") %>%

8
specify(AveragePrice ~ region) %>%
calculate(stat = "diff in means", order = c("Albany", "BaltimoreWashington"))

Exercise 12
i. Null hypothesis states there is no significant difference between the average price of the 2
regions and the alternative hypothesis states there is a significant difference between the
average price of the 2 regions
ii.
null <- avocado %>%
filter(region == "Albany" | region == "BaltimoreWashington") %>%
specify(AveragePrice ~ region) %>%
hypothesize(null = "independence") %>%
generate(reps = 10000, type = "permute") %>%
calculate(stat = "diff in means", order = c("Albany", "BaltimoreWashington"))

iii.
null %>%
get_p_value(obs_stat = obs_stat, direction = "right")

p_value
0.4221

iv.
null %>%
visualize() +
shade_p_value(obs_stat = obs_stat, direction = "right")

Simulation−Based Null Distribution

2000

1500
count

1000

500

0
−0.05 0.00 0.05
stat

v. The p value is less 0.05 so we reject the null hypothesis

Chapter 3 DESCRIPTIVE STATISTICS FOR EDA
No ratings yet
Chapter 3 DESCRIPTIVE STATISTICS FOR EDA
51 pages
Statistics Module 6 WW
No ratings yet
Statistics Module 6 WW
1 page
Flavia Khasoa BSCN/34705/2015 Corrections Page Number Previous Corrected
No ratings yet
Flavia Khasoa BSCN/34705/2015 Corrections Page Number Previous Corrected
2 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
15 pages
Stat Proof Book
No ratings yet
Stat Proof Book
381 pages
Chapter 4 Demand Estimation
50% (2)
Chapter 4 Demand Estimation
8 pages
Unit Vi: TO Artificial Neural Network
No ratings yet
Unit Vi: TO Artificial Neural Network
71 pages
An Introduction To Data Analysis Visualization Using R
No ratings yet
An Introduction To Data Analysis Visualization Using R
30 pages
Unit 2c Forecasting - Tools
No ratings yet
Unit 2c Forecasting - Tools
65 pages
Week12 Slides
No ratings yet
Week12 Slides
46 pages
Module 8-3 Inference About Two Populations
No ratings yet
Module 8-3 Inference About Two Populations
64 pages
Single Variable Visualization 2: Keunwoo Kim
No ratings yet
Single Variable Visualization 2: Keunwoo Kim
20 pages
Practice-Training BTTC
No ratings yet
Practice-Training BTTC
25 pages
On Eda
No ratings yet
On Eda
60 pages
DEN 1015H Lecture Notes Session 11 Examiner Variability and Error
No ratings yet
DEN 1015H Lecture Notes Session 11 Examiner Variability and Error
19 pages
Non Parametric Tests
No ratings yet
Non Parametric Tests
26 pages
KrutikaKolhe 862467252 HW5
No ratings yet
KrutikaKolhe 862467252 HW5
18 pages
Practical Assignment #2 Tests Your Ability
No ratings yet
Practical Assignment #2 Tests Your Ability
31 pages
Seminar - 1 2
No ratings yet
Seminar - 1 2
14 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
Tutorial Pres 1
No ratings yet
Tutorial Pres 1
28 pages
Sampling
No ratings yet
Sampling
22 pages
DVT (Lab) - R Language Manual
No ratings yet
DVT (Lab) - R Language Manual
20 pages
Avocado File
No ratings yet
Avocado File
7 pages
Moving Averages Forecasting - Instructions
No ratings yet
Moving Averages Forecasting - Instructions
5 pages
ASSIGNMENT-10-M24MSA068.R: # Load Required Libraries
No ratings yet
ASSIGNMENT-10-M24MSA068.R: # Load Required Libraries
7 pages
Python For Finance: Risk Measurement
No ratings yet
Python For Finance: Risk Measurement
36 pages
Week 10
No ratings yet
Week 10
15 pages
Improved
No ratings yet
Improved
18 pages
Note 2
No ratings yet
Note 2
27 pages
DR - Arunachalam Rajagopal - Time Series Forecasting With R A Beginner's Guide (2020)
No ratings yet
DR - Arunachalam Rajagopal - Time Series Forecasting With R A Beginner's Guide (2020)
93 pages
2 Lecture2 Codenotes
No ratings yet
2 Lecture2 Codenotes
11 pages
Verzani Answers
100% (8)
Verzani Answers
94 pages
Simplified ML Algorithms
No ratings yet
Simplified ML Algorithms
3 pages
Business Statistics,: 9e, GE (Groebner/Shannon/Fry) Chapter 3 Describing Data Using Numerical Measures
No ratings yet
Business Statistics,: 9e, GE (Groebner/Shannon/Fry) Chapter 3 Describing Data Using Numerical Measures
43 pages
Code File
No ratings yet
Code File
4 pages
Fundamentals of Data Science NEP
No ratings yet
Fundamentals of Data Science NEP
2 pages
R For Economic Research - 13 Time-Varying Regression Coefficient
No ratings yet
R For Economic Research - 13 Time-Varying Regression Coefficient
7 pages
Modelo Arima
No ratings yet
Modelo Arima
11 pages
Sta404 Chapter 08
No ratings yet
Sta404 Chapter 08
120 pages
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
No ratings yet
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
24 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
WS3 Geographic
100% (1)
WS3 Geographic
18 pages
07exercise Solution
No ratings yet
07exercise Solution
9 pages
Muhammad Muneeb Arshad (359126) Im DVM
No ratings yet
Muhammad Muneeb Arshad (359126) Im DVM
5 pages
Problem Set 3: General Guideline
No ratings yet
Problem Set 3: General Guideline
12 pages
Important R Codes and Notes
No ratings yet
Important R Codes and Notes
13 pages
Tsoutliers R Package For Detection of Outliers in Time Series
No ratings yet
Tsoutliers R Package For Detection of Outliers in Time Series
32 pages
R Tutorial
No ratings yet
R Tutorial
15 pages
Problem Set 1 Solution Numerical Methods
No ratings yet
Problem Set 1 Solution Numerical Methods
32 pages
EXAM1 - Muhibbul Arman Mannan: List Ls
No ratings yet
EXAM1 - Muhibbul Arman Mannan: List Ls
13 pages
Assignment1 1
No ratings yet
Assignment1 1
9 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
Stochastics Processes R-Studio
No ratings yet
Stochastics Processes R-Studio
23 pages
Chapter 03 Visualization (R)
No ratings yet
Chapter 03 Visualization (R)
30 pages
R Lab 3
No ratings yet
R Lab 3
7 pages
Data Analytics Assignment 1
No ratings yet
Data Analytics Assignment 1
11 pages
COMP2501 - Assignment - 1 - Questions - RMD 2
No ratings yet
COMP2501 - Assignment - 1 - Questions - RMD 2
7 pages
Practical 4
No ratings yet
Practical 4
2 pages
Exercise 2
No ratings yet
Exercise 2
3 pages
PABF-Forecasting Model For USA Gas Retail Prices: by Abhishek Upadhyay 1914071, EPGP, IIM Bangalore Co2020
No ratings yet
PABF-Forecasting Model For USA Gas Retail Prices: by Abhishek Upadhyay 1914071, EPGP, IIM Bangalore Co2020
13 pages
Digital Assignment-6: Read The Data
No ratings yet
Digital Assignment-6: Read The Data
30 pages
CS6011: Kernel Methods For Pattern Analysis: Submitted by
No ratings yet
CS6011: Kernel Methods For Pattern Analysis: Submitted by
26 pages
METHODS OF DETE-WPS Office
No ratings yet
METHODS OF DETE-WPS Office
8 pages
Parametric and Non Parametric Tests
No ratings yet
Parametric and Non Parametric Tests
4 pages
COVID 19 Some Challenges Some Data 1
No ratings yet
COVID 19 Some Challenges Some Data 1
26 pages
Time Series Prediction - California Dairy Data 1995-2013
No ratings yet
Time Series Prediction - California Dairy Data 1995-2013
30 pages
Codes
No ratings yet
Codes
8 pages
R Fourier
No ratings yet
R Fourier
18 pages
Applied Econometrics With R: Package Vignette and Errata: Christian Kleiber Achim Zeileis
No ratings yet
Applied Econometrics With R: Package Vignette and Errata: Christian Kleiber Achim Zeileis
6 pages
Step-By-Step Guide To Execute Linear Regression in R
No ratings yet
Step-By-Step Guide To Execute Linear Regression in R
12 pages
Lesllie Salt Company
No ratings yet
Lesllie Salt Company
15 pages
R Regression Exercise 2019
No ratings yet
R Regression Exercise 2019
9 pages
Intro To Analyzing Cross-Sectional Time-Series Data in R (For Students of IR & Comparative Politics)
No ratings yet
Intro To Analyzing Cross-Sectional Time-Series Data in R (For Students of IR & Comparative Politics)
18 pages
Assignment 9
No ratings yet
Assignment 9
8 pages
APOLUS User Guide Nov26th 2015
No ratings yet
APOLUS User Guide Nov26th 2015
8 pages
Assignment 5
No ratings yet
Assignment 5
8 pages
Aps 6 3 Notes
No ratings yet
Aps 6 3 Notes
6 pages
Allison
No ratings yet
Allison
6 pages
Assignment 8
No ratings yet
Assignment 8
6 pages
R Cheat Sheet (Updated)
No ratings yet
R Cheat Sheet (Updated)
13 pages
EDA and Regression: Introduction To Assignment
No ratings yet
EDA and Regression: Introduction To Assignment
2 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Assignment 4
No ratings yet
Assignment 4
4 pages
MCQ 6 Probability & Index Numbers
No ratings yet
MCQ 6 Probability & Index Numbers
5 pages
Forecasting of Motorcycle Demand Using Calender Variations, Hybrid Calender Variations-ANN and Disagregation (Case Study in Jombang)
No ratings yet
Forecasting of Motorcycle Demand Using Calender Variations, Hybrid Calender Variations-ANN and Disagregation (Case Study in Jombang)
8 pages
SMK BATU LINTANG Term 3 Trial Exam 2019
No ratings yet
SMK BATU LINTANG Term 3 Trial Exam 2019
3 pages
Research Article
No ratings yet
Research Article
6 pages
Mostly Harmless Econometrics BOOK REVIEW
No ratings yet
Mostly Harmless Econometrics BOOK REVIEW
2 pages
Assignment 1
No ratings yet
Assignment 1
1 page
Problem (Objective 17-5) in Auditing The Valuation of Inventory, The Auditor, Claire Butler, Decided To Use
No ratings yet
Problem (Objective 17-5) in Auditing The Valuation of Inventory, The Auditor, Claire Butler, Decided To Use
3 pages
Homework 2
100% (1)
Homework 2
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Assignment 6

Uploaded by

Assignment 6

Uploaded by

Assignment 6: Avocado outrage

i. All of the regions in the dataset are above this.

Time series of average avocado price

avocado_regional <- avocado %>%

Time series of average avocado price

Albany BaltimoreWashington TotalUS

avocado_usa <- avocado %>%

Time series of average avocado price

2015 2016 2017 2018

term estimate std.error statistic p.value

0e+00 2e+07 4e+07 6e+07

term estimate std.error statistic p.value

avocado_usa_df <- avocado_usa %>%

0.8 1.0 1.2

0.004556 is the difference between the regions

Simulation−Based Null Distribution

v. The p value is less 0.05 so we reject the null hypothesis

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.