0% found this document useful (0 votes)

681 views12 pages

Milestone 6 Solution Sheet

This document discusses analyzing traffic collision data from California to identify trends related to alcohol use and inattention. The data comes from the California Highway Patrol and is stored across two tables - collisions and parties. Several SQL queries are performed to analyze the data: 1) Over 3% of collisions involved parties found at fault while under the influence of alcohol. Around 18,000 collisions involved inattention as a factor for parties at fault. 2) Fridays and Thursdays had the highest number of overall collisions, while Sundays had the fewest. 3) Sundays saw the most collisions involving parties at fault and under the influence of alcohol. Tuesdays saw the fewest of these types of collisions.

Uploaded by

api-708555321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

681 views12 pages

Milestone 6 Solution Sheet

Uploaded by

api-708555321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Milestone 6 | Traffic Collisions in California

INTRODUCTION: Data is often stored across multiple tables to keep the storage
requirements compact, and to organize different types of data. Knowing how to
use a join is a vital skill when working with data, since bringing tables together can
open the door to additional insights that are cumbersome or impossible looking at
just one table at a time.

In this Milestone, you’ll use your proficiency with joins to help a reporter in California
use data to support an article they’re writing on the causes of motor vehicle
accidents. In particular, they want some information about how many accidents are
caused by the influence of alcohol, or due to inattention (such as using a cell phone
to text or talk to others), and when these types of accidents tend to occur.

HOW IT WORKS: Follow the prompts in the questions below to investigate your
data. Post your answers in the provided boxes: the yellow boxes for the queries you
write, purple boxes for visualizations and blue boxes for text-based answers. When
you're done, export your document as a pdf file and submit it on the Milestone page
– see instructions for creating a PDF at the end of the Milestone.

RESOURCES: If you need hints on the Milestone or are feeling stuck, there are
multiple ways of getting help. Attend Drop-In Hours to work on these problems with
your peers, or reach out to the HelpHub if you have questions. Good luck!

PROMPT: To help the reporters out, you will be making use of data regarding traffic
accidents in the state of California released by the California Highway Patrol.
Certain insights can be found by looking at data on the incident level, while other
insights are possible by looking deeper at the parties involved in an incident. But to
make insights across those two levels, we need a join to be able to relate the unique
information contained in each table.
SQL App: Here’s that link to our specialized SQL app, where you’ll write your SQL
queries and interact with the data.

— Data Set Description

Data for this Milestone comes from the California Highway Patrol’s Statewide
Integrated Traffic Records System (SWITRS). The SWITRS data we’ve provided
(switrs.*) consists of two tables from the 2019 data collection: collisions and
parties. The tables are related hierarchically. At the top level, there is a unique row
and identifier for each incident in the collisions table. Then, in the lower level, each
collision is between one or more parties, which include vehicles, pedestrians, etc.

The original collisions table has 469 664 rows and 76 columns, but we’ll be focusing
on only the following four columns in this Milestone:

● case_id - unique identifier for each collision

● collision_time - time of day when collision occurred, in 24 hour format
● day_of_week - day of week when collision occurred. Note that numbering
starts at 1 = Monday and ends at 7 = Sunday (instead of 0 = Sunday)
● party_count - number of parties involved in the collision

The original parties table has 940 216 rows and 33 columns, with the following five
columns of interest:

● case_id - associated with a collision with matching case_id, may not be

unique
● party_number - numbering of parties involved, always starts from 1 for each
collision
● at_fault - Y/N indicating whether party was at fault for collision
● party_sobriety - encodings for whether or not the party had been drinking
● oaf_1, oaf_2 - encodings for other associated factors

Most of the features in the dataset are coded in some way for efficient data
storage, which can make working with highly detailed data like this tricky. This
includes the party_sobriety, oaf_1, and oaf_2 columns you’ll be investigating in
the Milestone. Don’t sweat that point, though: the instructions will explain the
encoding values relevant to the tasks.
If you’re curious to explore the data further on your own, or want to see what other
parts of the dataset that aren’t available are like, you can find a comprehensive
description of the data in full here, on the SWITRS information page.

— Task 1: How frequently does alcohol use or lack of

attention feature in accidents?
To start, we should run some queries on the parties table to understand how fault,
alcohol use, and inattention are attributed to accidents.

A. Write a query and answer the following question: How many parties are cited
as being at fault for a collision?

SELECT
COUNT(case_id),
at_fault
FROM switrs.parties
GROUP BY at_fault

There are 438,491 parties that are cited as being at fault for the
collision. Out of 940,216 total collisions, this is roughly 47%.

B. The party_sobriety field takes on a value of 'B' when the party is known to
have been drinking, and under the influence of alcohol. Modify your query
from part A to answer the following question: How many parties were found
at fault while under the influence of alcohol?

SELECT
COUNT(case_id),
party_sobriety
FROM switrs.parties
WHERE at_fault = 'Y'
GROUP BY party_sobriety
HAVING party_sobriety = 'B'

33,512 parties were found at fault while under the influence of

alcohol. That is roughly 3% of all accidents.

C. The oaf_1 or oaf_2 feature takes on a value of 'F' if inattention was a factor in
the collision. Modify your query to answer the following question: How many
parties were found at fault while lack of attention was a factor in the collision?

SELECT
COUNT(case_id),
at_fault
FROM switrs.parties
WHERE oaf_1 = 'F'
OR oaf_2 = 'F'
GROUP BY at_fault

There were 18,311 collisions where inattention was a factor and the
party was at fault.

— Task 2: When do accidents occur by day of the week?

Now that we have a way to identify whether or not a collision can be attributed to
alcohol or inattention, let’s add in the collisions table to answer the journalist’s
question of whether or not there are differences between the two accident
sources.

A. Let’s start with the collisions table on its own. Write a query that returns the
number of collisions, grouped by day of the week. Which days have the
highest number of collisions, and which days have the least number? Note:
Day of week is encoded slightly differently than what comes out of the
date_part function: Sunday is indicated by a 7 instead of a 0.

SELECT
COUNT(case_id),
day_of_week
FROM switrs.collisions
GROUP BY day_of_week
ORDER BY COUNT(case_id)

Friday(5) have the highest number of reported collisions at

55,159, whereas Sunday(7) have the least amount of collisions at
75,654. Not far behind Friday, however is Thursday being the
runner up for most collisions.

B. The collisions table and parties tables share values in the case_id column.
Write a new query that inner joins the two tables on that column, returning the
number of rows. How many rows are in the combined output table, and why?

SELECT
COUNT(parties.case_id)
FROM switrs.collisions AS collisions
INNER JOIN switrs.parties AS parties
ON collisions.case_id = parties.case_id
There are 940,216 rows in the combined table. In the collisions
table alone there are only 469,664 rows, however because the
parties table has 940,216 the combined table takes the greater
amount of rows.

C. Combine the queries from parts A and B to return the number of collisions
grouped by the day of the week. Add a condition for the involved parties so
that we only count accidents where the party was found to be at fault AND
under the influence of alcohol. Which days have the highest number of
collisions, and which days have the smallest number?

SELECT
COUNT(collisions.case_id),
collisions.day_of_week
FROM switrs.collisions AS collisions
INNER JOIN switrs.parties AS parties
ON collisions.case_id = parties.case_id
WHERE parties.party_sobriety = 'B'
AND parties.at_fault = 'Y'
GROUP BY collisions.day_of_week
ORDER BY COUNT(collisions.case_id) DESC

In this query, we are looking at the number of collisions reported

by the day of week, where the party was deemed at fault and the
party was under the influence of alcohol. Sunday(7) returned as
the day where the most accidents occured under these
conditions, where Tuesday(2) returned as the day with the least
number of accidents. This is strange because, before, Sunday
was the day with the least number of collisions, but now with
adding the parties table, we can see that Sunday had the most
drunk driving incidents.

D. Modify your query to look at the number of accidents by the day of the week
where the party was found to be at fault AND inattention was a factor. Which
days have the highest number of collisions, and which days have the smallest
number?

SELECT
COUNT(collisions.case_id),
collisions.day_of_week,
parties.at_fault
FROM switrs.collisions AS collisions
INNER JOIN switrs.parties AS parties
ON collisions.case_id = parties.case_id
WHERE parties.oaf_1 = 'F'
OR parties.oaf_2 = 'F'
GROUP BY collisions.day_of_week, parties.at_fault
ORDER BY COUNT(collisions.case_id) DESC

Friday is seen to have the most collisions in which the party was at
fault and inattention was a factor, whereas Sunday has the least
amount of collisions reported. Looking at the data, you can see
that driving inattentively slightly steadily increases throughout
the week, where it then peaks at Friday and drops off again at
Sunday.

— Task 3: When do accidents occur by the time of day?

A data analyst colleague of yours has taken interest in your project with the
journalist and has pitched in their own contribution by providing you a summary of
the dataset with five features:
● alcohol_involved - TRUE/FALSE whether or not the party at fault was under
the influence of alcohol
● inattention_involved - TRUE/FALSE whether or not inattention was a factor
for the party at fault
● day_of_week - day of week when collision occurred. Note that numbering
starts at 1 = Monday and ends at 7 = Sunday (instead of 0 = Sunday)
● hour_of_day -hour of day when collision occurred, in 24 hour format
(0-2300). Values of 2500 indicate an unknown time of day.
● n_collisions - number of collisions matching the conditions of the first four
columns

Let’s use this new data summary to look at how accident patterns change based on
the time of day. Since the data has already been queried, we’ll do this visually within
Tableau! Click this link to navigate to the workbook you’ll use to complete the
remainder of this Milestone. Once you’ve published your Tableau Workbook in the
folder named Upload Workbooks Here, paste the Share Link in the box below.

https://prod-useast-b.online.tableau.com/#/site/globaltech/w
orkbooks/733296?:origin=card_share_link

Continue to post your answers in the provided boxes: purple boxes for your
visualizations, and blue boxes for text-based answers.

A. On Sheet 1, create a bar chart of the number of collisions by the hour of day.
Describe the pattern in the data. Are there times of day where more
accidents occur? Does this fit in with your expectations?
From the visualization, it looks like collisions are most likely to
occur at around 1700 (5pm). This is most likely due to “rush hour”
and people commuting from work back home. There is also a
slight peak at 7-8 am, most likely also caused by commuting to
work.

B. Copy the chart into a new sheet and add a filter so that the bar chart only
shows accidents where the party at fault was found to be under the influence
of alcohol. How does this distribution of accidents by time of day compare
to the overall distribution?
Compared to the first bar chart, this visualization shows a
completely different story. This visualization is basically inverted
from the last, and shows that most collisions that involve alcohol
take place in the night and very early morning hours, where it
peaks at 2 am.A quick Google search told me that a majority of
bars in California close at either 1 or 2 am most nights.

C. Copy the chart into one more sheet, but now change the filter to only look at
accidents where inattention was a factor from the party-at-fault. How does
this distribution compare to the overall distribution?
This visualization closely resembles the first visualization, telling
the story that most collisions due to inattentive driving occur
when the roads are the busiest, again peaking at 5 pm.

— Level Up
Simply because an accident was such that inattention was a factor does not
necessarily mean that a cell phone was the source of the driver’s distraction. In the
parties table, there is a column called sp_info_2. This feature takes on a value of B,
1, or 2 if a cell phone was known to be in use at the time of the accident. If you’re
interested in digging deeper, you might want to try seeing what proportion of
accidents were caused by cell phone distraction, and if they differ from other
‘inattention’ accidents. Keep in mind that the sp_info_2 column is a string data type,
so you’ll need to treat the '1', and '2' codes appropriately!

SELECT
sp_info_2,
COUNT(case_id)
FROM switrs.parties
WHERE sp_info_2 = '1'
OR sp_info_2 = '2'
OR sp_info_2 = 'B'
GROUP BY sp_info_2;

SELECT
COUNT(case_id),
at_fault
FROM switrs.parties
WHERE oaf_1 = 'F'
OR oaf_2 = 'F'
GROUP BY at_fault;

For this, I did two separate queries so I could compare the total
number of collisions caused by inattentive driving to the number
of collisions caused by phone usage. In this, I found that there is a
total of 12,010 collisions reported that were directly linked to
phone usage and 18,311 collisions that were due to inattention
while driving. Therefore, around 66% of all accidents due to
inattention while driving were linked to phone usage.

— Submission
Great work completing this Milestone! To submit your completed Milestone, you will
need to download / export this document as a PDF and then upload it to the
Milestone submission page. You can find the option to download as a PDF from the
File menu in the upper-left corner of the Google Doc interface.

Testbank & Ebook Basic Marketing Research Customer Insights and Managerial Action 10th Edition Brown Instant
No ratings yet
Testbank & Ebook Basic Marketing Research Customer Insights and Managerial Action 10th Edition Brown Instant
17 pages
The Male Borderline
100% (2)
The Male Borderline
34 pages
Nike Problem
0% (1)
Nike Problem
1 page
Project Report Data Visualization
75% (4)
Project Report Data Visualization
23 pages
Adehyeman Gardens LTD and Another v. Assibey
100% (1)
Adehyeman Gardens LTD and Another v. Assibey
12 pages
Assessment Form 12: A. Multiple Choice
No ratings yet
Assessment Form 12: A. Multiple Choice
8 pages
Questionnaire On Different Scales: Submitted By: Harsimran Singh Mba Gen B
No ratings yet
Questionnaire On Different Scales: Submitted By: Harsimran Singh Mba Gen B
3 pages
Assessment Form 9 (Mixed-Methods Approach) Draft
No ratings yet
Assessment Form 9 (Mixed-Methods Approach) Draft
18 pages
New Zealand
No ratings yet
New Zealand
56 pages
Assignment Sheet - Analytical Report
100% (1)
Assignment Sheet - Analytical Report
2 pages
Malhotra 06
100% (1)
Malhotra 06
54 pages
SPSS Statistics 20 Modules Specifications
No ratings yet
SPSS Statistics 20 Modules Specifications
5 pages
Participant Information Sheet
No ratings yet
Participant Information Sheet
1 page
Apa Tables
100% (2)
Apa Tables
2 pages
Lecture On Data Collection Method
No ratings yet
Lecture On Data Collection Method
31 pages
APA Tables
100% (1)
APA Tables
7 pages
Correlation Study
No ratings yet
Correlation Study
23 pages
Recap: Step 1: Identify and Define The Problem or Opportunity
No ratings yet
Recap: Step 1: Identify and Define The Problem or Opportunity
38 pages
Theresa Hughes Data Analysis and Surveying 101
No ratings yet
Theresa Hughes Data Analysis and Surveying 101
37 pages
Quantitative Content Analysis
No ratings yet
Quantitative Content Analysis
9 pages
Questionnaire Design: What Can Questionnaires Measure?
No ratings yet
Questionnaire Design: What Can Questionnaires Measure?
9 pages
Global Strategy Development and Implementation
No ratings yet
Global Strategy Development and Implementation
22 pages
Allen Manalo Casestudy1
No ratings yet
Allen Manalo Casestudy1
4 pages
Factor Analysis
50% (2)
Factor Analysis
18 pages
Chapter 12
50% (2)
Chapter 12
23 pages
Research Designs Used in Quantitative Research: Descriptive Research Design Experimental Research Design
No ratings yet
Research Designs Used in Quantitative Research: Descriptive Research Design Experimental Research Design
55 pages
Workplace Expectations of Genz Towards Factors of Motivation
No ratings yet
Workplace Expectations of Genz Towards Factors of Motivation
13 pages
Survey Research Errors
No ratings yet
Survey Research Errors
47 pages
Total Mid Paper Sta630
100% (2)
Total Mid Paper Sta630
50 pages
Secondary Data Research in A Digital Age
No ratings yet
Secondary Data Research in A Digital Age
30 pages
Business and Communication Essay
No ratings yet
Business and Communication Essay
9 pages
Group 2 - Personality, Attitudes, and Work Behaviors
100% (1)
Group 2 - Personality, Attitudes, and Work Behaviors
23 pages
7 Steps in Quantitative Data Analysis
100% (1)
7 Steps in Quantitative Data Analysis
13 pages
Data Collection
No ratings yet
Data Collection
65 pages
Tabu Ran Normal
100% (1)
Tabu Ran Normal
14 pages
Data Analysis and Interpretation
100% (1)
Data Analysis and Interpretation
50 pages
Malhotra10 Tif
No ratings yet
Malhotra10 Tif
15 pages
Final Exam Fall 2019
No ratings yet
Final Exam Fall 2019
12 pages
Marketing Research Report
No ratings yet
Marketing Research Report
40 pages
Thesis
No ratings yet
Thesis
174 pages
Research Methods For Leisure and Tourism 5th Edition Edition Anthony James Veal - Ebook PDFinstant Download
100% (2)
Research Methods For Leisure and Tourism 5th Edition Edition Anthony James Veal - Ebook PDFinstant Download
52 pages
Study Skills of Normal-Achieving and Academically-Struggling College Students
No ratings yet
Study Skills of Normal-Achieving and Academically-Struggling College Students
16 pages
Validation of Instrument
No ratings yet
Validation of Instrument
28 pages
Assignment 5: Qualitative Research
No ratings yet
Assignment 5: Qualitative Research
17 pages
White Paper: Comparing The Social Media Strategies Taken by Coca-Cola Company and Pepsico
No ratings yet
White Paper: Comparing The Social Media Strategies Taken by Coca-Cola Company and Pepsico
11 pages
Pushing Paper
No ratings yet
Pushing Paper
2 pages
Time Management Self-Assessment - EN
No ratings yet
Time Management Self-Assessment - EN
2 pages
Q and Ans Research Methodology
No ratings yet
Q and Ans Research Methodology
7 pages
Bsu 305 Business Research Methods Notes
No ratings yet
Bsu 305 Business Research Methods Notes
116 pages
CHAPTER 16: Displaying Data
No ratings yet
CHAPTER 16: Displaying Data
1 page
Research Method Process
100% (6)
Research Method Process
5 pages
Reflection Paper On The Personal Finance Module
No ratings yet
Reflection Paper On The Personal Finance Module
1 page
Public Relations Research
No ratings yet
Public Relations Research
95 pages
Data Analysis and Interpretation
No ratings yet
Data Analysis and Interpretation
24 pages
BFA TCP Exam Brief 21-22 Final 080322
No ratings yet
BFA TCP Exam Brief 21-22 Final 080322
5 pages
Attitude of Young Students Towards Sports and Physical Activities PDF
No ratings yet
Attitude of Young Students Towards Sports and Physical Activities PDF
11 pages
Accidents Report Western Region, 2020
No ratings yet
Accidents Report Western Region, 2020
13 pages
Road Accident Analysis
No ratings yet
Road Accident Analysis
12 pages
Final Report - Predicting Traffic Accident Severity
100% (1)
Final Report - Predicting Traffic Accident Severity
11 pages
SQL Examples
No ratings yet
SQL Examples
11 pages
Ip
No ratings yet
Ip
2 pages
4 Doc
No ratings yet
4 Doc
23 pages
NW Physical Science Grade 12 SEPT 2022 P1 and Memo
No ratings yet
NW Physical Science Grade 12 SEPT 2022 P1 and Memo
39 pages
Mac Arthur Story Stem Battery
100% (1)
Mac Arthur Story Stem Battery
364 pages
Bahasa Inggr4is
No ratings yet
Bahasa Inggr4is
6 pages
Bonding and Structure
100% (1)
Bonding and Structure
9 pages
Nursing Research Chapter 1
No ratings yet
Nursing Research Chapter 1
17 pages
Independence Sunday Liturgy PDF
No ratings yet
Independence Sunday Liturgy PDF
2 pages
Jonathan Livingston Seagull Book Review
100% (2)
Jonathan Livingston Seagull Book Review
4 pages
Reading Comprehension and Skills: Fifth Grade Basic Skills
No ratings yet
Reading Comprehension and Skills: Fifth Grade Basic Skills
49 pages
BRILLIANT Portraiture 1991 Introduction
100% (1)
BRILLIANT Portraiture 1991 Introduction
19 pages
Cambridge O Level: Second Language Urdu For Examination From 2024
No ratings yet
Cambridge O Level: Second Language Urdu For Examination From 2024
10 pages
Natural Convection Concentric Cylinders
No ratings yet
Natural Convection Concentric Cylinders
17 pages
Anesthetic Management For Woman With Single Ventricle Heart After BCPS Who Undergoes Curretage Procedure
No ratings yet
Anesthetic Management For Woman With Single Ventricle Heart After BCPS Who Undergoes Curretage Procedure
3 pages
To Autumn Song Setting
No ratings yet
To Autumn Song Setting
3 pages
Tle CSS9 Q4 M12
No ratings yet
Tle CSS9 Q4 M12
11 pages
Polyhedral Mesh Generation PDF
No ratings yet
Polyhedral Mesh Generation PDF
12 pages
The Wisdom of Your Face Change Your Life With Chinese Face Reading! Secure Download
100% (19)
The Wisdom of Your Face Change Your Life With Chinese Face Reading! Secure Download
17 pages
Full Download Cultural Identity in British Musical Theatre, 1890-1939 Ben Macpherson PDF
No ratings yet
Full Download Cultural Identity in British Musical Theatre, 1890-1939 Ben Macpherson PDF
65 pages
Mobile Database Management System 3
No ratings yet
Mobile Database Management System 3
17 pages
BOSR Trainer Guide 07 01 2016
No ratings yet
BOSR Trainer Guide 07 01 2016
203 pages
EPE462 - Introduction
No ratings yet
EPE462 - Introduction
4 pages
Inter-House Social Science Quiz
No ratings yet
Inter-House Social Science Quiz
39 pages
Identifying and Measuring Urban Design Qualities Related To Walkability
No ratings yet
Identifying and Measuring Urban Design Qualities Related To Walkability
35 pages
Honors English I: Analysis Essay Peer Edit Checklist
No ratings yet
Honors English I: Analysis Essay Peer Edit Checklist
2 pages
Wide Sargasso Sea vs. Jane Eyre
No ratings yet
Wide Sargasso Sea vs. Jane Eyre
4 pages
Billie Eilish
No ratings yet
Billie Eilish
21 pages
Research 1 2
No ratings yet
Research 1 2
27 pages
Roleplay Cards - Advanced Intermediate
No ratings yet
Roleplay Cards - Advanced Intermediate
2 pages
Fine Jewelry Auction - Skinner
100% (4)
Fine Jewelry Auction - Skinner
124 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Milestone 6 Solution Sheet

Uploaded by

Milestone 6 Solution Sheet

Uploaded by

Milestone 6 | Traffic Collisions in California

— Data Set Description

● case_id - unique identifier for each collision

● case_id - associated with a collision with matching case_id, may not be

— Task 1: How frequently does alcohol use or lack of

33,512 parties were found at fault while under the influence of

— Task 2: When do accidents occur by day of the week?

Friday(5) have the highest number of reported collisions at

In this query, we are looking at the number of collisions reported

— Task 3: When do accidents occur by the time of day?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.