0% found this document useful (0 votes)
654 views10 pages

IBM Data Science Capstone Report

This document summarizes an IBM Data Science capstone project aimed at preventing avoidable car accidents. The project uses data on past accidents collected by Seattle police to build machine learning models that can predict accident severity based on factors like weather, road, and light conditions. Three models were tested - KNN, decision tree, and linear regression. KNN performed best with an accuracy of 84%. The results will advise local governments and organizations on reducing accidents and injuries.

Uploaded by

Barakha Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
654 views10 pages

IBM Data Science Capstone Report

This document summarizes an IBM Data Science capstone project aimed at preventing avoidable car accidents. The project uses data on past accidents collected by Seattle police to build machine learning models that can predict accident severity based on factors like weather, road, and light conditions. Three models were tested - KNN, decision tree, and linear regression. KNN performed best with an accuracy of 84%. The results will advise local governments and organizations on reducing accidents and injuries.

Uploaded by

Barakha Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

IBM Data Science Capstone Report

Business Understanding
The government is going to prevent avoidable car accidents by
employing methods that alert drivers, health system, and police to
remind them to be more careful in critical situations.

In most cases, not paying enough attention during driving,


abusing drugs and alcohol or driving at very high speed are the
main causes of occurring accidents that can be prevented by
enacting harsher regulations. Besides the aforementioned reasons,
weather, visibility, or road conditions are the major uncontrollable
factors that can be prevented by revealing hidden patterns in the
data and announcing warning to the local government, police and
drivers on the targeted roads.

The target audience of the project is local Seattle government,


police, rescue groups, and last but not least, car insurance
institutes. The model and its results are going to provide some
advice for the target audience to make insightful decisions for
reducing the number of accidents and injuries for the city.

Data
The data was collected by the Seattle Police Department and
Accident Traffic Records Department from 2004 to present.

The data consists of 37 independent variables and 194,673 rows.


The dependent variable, “SEVERITYCODE”, contains numbers
that correspond to different levels of severity caused by an
accident from 1 to 2

Severity codes are as follows:

1: Property Damage Only Collision

2: Injury Collision

Furthermore, because of the existence of null values in some


records, the data needs to be preprocessed before any further
processing.

Data Preprocessing
The dataset in the original form is not ready for data analysis. In
order to prepare the data, first, we need to drop the non-relevant
columns. In addition, most of the features are of object data types
that need to be converted into numerical data types.

After analyzing the data set, I have decided to focus on only four
features, severity, weather conditions, road conditions, and light
conditions, among others.
To get a good understanding of the dataset, I have checked
different values in the features. The results show, the target
feature is imbalance, so we use a simple statistical technique to
balance it.

As you can see, the number of rows in class 1 is almost three times
bigger than the number of rows in class 2. It is possible to solve the
issue by downsampling the class 1.

Methodology
For implementing the solution, I have used Github as a repository
and running Jupyter Notebook to preprocess data and build
Machine Learning models. Regarding coding, I have used Python
and its popular packages such as Pandas, NumPy and Sklearn.
Once I have load data into Pandas Dataframe, used
‘dtypes’ attribute to check the feature names and their data types.
Then I have selected the most important features to predict the
severity of accidents in Seattle. Among all the features, the
following features have the most influence in the accuracy of the
predictions:

 “WEATHER”,

 “ROADCOND”,

 “LIGHTCOND”

Also, as I mentioned earlier, “SEVERITYCODE” is the target


variable.

I have run a value count on road (‘ROADCOND’) and weather


condition (‘WEATHER’) to get ideas of the different road and
weather conditions. I also have run a value count on light
condition (’LIGHTCOND’), to see the breakdowns of accidents
occurring during the different light conditions. The results can be
seen below:
After balancing SEVERITYCODE feature, and standardizing the
input feature, the data has been ready for building machine
learning models.

I have employed three machine learning models:

 K Nearest Neighbour (KNN)

 Decision Tree

 Linear Regression

After importing necessary packages and splitting preprocessed


data into test and train sets, for each machine learning model, I
have built and evaluated the model and shown the results as
follow:

KNN
Decision Tree
Linear Regression
Results and Evaluations
The final results of the model evaluations are summarized in the
following table:

Based on the above table, KNN is the best model to predict car
accident severity.

Conclusion
Based on the dataset provided for this capstone from weather,
road, and light conditions pointing to certain classes, we can
conclude that particular conditions have a somewhat impact on
whether or not travel could result in property damage (class 1) or
injury (class 2).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy