0% found this document useful (0 votes)

654 views10 pages

IBM Data Science Capstone Report

This document summarizes an IBM Data Science capstone project aimed at preventing avoidable car accidents. The project uses data on past accidents collected by Seattle police to build machine learning models that can predict accident severity based on factors like weather, road, and light conditions. Three models were tested - KNN, decision tree, and linear regression. KNN performed best with an accuracy of 84%. The results will advise local governments and organizations on reducing accidents and injuries.

Uploaded by

Barakha Agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

654 views10 pages

IBM Data Science Capstone Report

Uploaded by

Barakha Agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

IBM Data Science Capstone Report

Business Understanding
The government is going to prevent avoidable car accidents by
employing methods that alert drivers, health system, and police to
remind them to be more careful in critical situations.

In most cases, not paying enough attention during driving,

abusing drugs and alcohol or driving at very high speed are the
main causes of occurring accidents that can be prevented by
enacting harsher regulations. Besides the aforementioned reasons,
weather, visibility, or road conditions are the major uncontrollable
factors that can be prevented by revealing hidden patterns in the
data and announcing warning to the local government, police and
drivers on the targeted roads.

The target audience of the project is local Seattle government,

police, rescue groups, and last but not least, car insurance
institutes. The model and its results are going to provide some
advice for the target audience to make insightful decisions for
reducing the number of accidents and injuries for the city.

Data
The data was collected by the Seattle Police Department and
Accident Traffic Records Department from 2004 to present.

The data consists of 37 independent variables and 194,673 rows.

The dependent variable, “SEVERITYCODE”, contains numbers
that correspond to different levels of severity caused by an
accident from 1 to 2

Severity codes are as follows:

1: Property Damage Only Collision

2: Injury Collision

Furthermore, because of the existence of null values in some

records, the data needs to be preprocessed before any further
processing.

Data Preprocessing
The dataset in the original form is not ready for data analysis. In
order to prepare the data, first, we need to drop the non-relevant
columns. In addition, most of the features are of object data types
that need to be converted into numerical data types.

After analyzing the data set, I have decided to focus on only four
features, severity, weather conditions, road conditions, and light
conditions, among others.
To get a good understanding of the dataset, I have checked
different values in the features. The results show, the target
feature is imbalance, so we use a simple statistical technique to
balance it.

As you can see, the number of rows in class 1 is almost three times
bigger than the number of rows in class 2. It is possible to solve the
issue by downsampling the class 1.

Methodology
For implementing the solution, I have used Github as a repository
and running Jupyter Notebook to preprocess data and build
Machine Learning models. Regarding coding, I have used Python
and its popular packages such as Pandas, NumPy and Sklearn.
Once I have load data into Pandas Dataframe, used
‘dtypes’ attribute to check the feature names and their data types.
Then I have selected the most important features to predict the
severity of accidents in Seattle. Among all the features, the
following features have the most influence in the accuracy of the
predictions:

 “WEATHER”,

 “ROADCOND”,

 “LIGHTCOND”

Also, as I mentioned earlier, “SEVERITYCODE” is the target

variable.

I have run a value count on road (‘ROADCOND’) and weather

condition (‘WEATHER’) to get ideas of the different road and
weather conditions. I also have run a value count on light
condition (’LIGHTCOND’), to see the breakdowns of accidents
occurring during the different light conditions. The results can be
seen below:
After balancing SEVERITYCODE feature, and standardizing the
input feature, the data has been ready for building machine
learning models.

I have employed three machine learning models:

 K Nearest Neighbour (KNN)

 Decision Tree

 Linear Regression

After importing necessary packages and splitting preprocessed

data into test and train sets, for each machine learning model, I
have built and evaluated the model and shown the results as
follow:

KNN
Decision Tree
Linear Regression
Results and Evaluations
The final results of the model evaluations are summarized in the
following table:

Based on the above table, KNN is the best model to predict car
accident severity.

Conclusion
Based on the dataset provided for this capstone from weather,
road, and light conditions pointing to certain classes, we can
conclude that particular conditions have a somewhat impact on
whether or not travel could result in property damage (class 1) or
injury (class 2).

Advances in Computer Science and Ubiquitous Computing CSA CUTE 2018 James J. Park instant download
100% (2)
Advances in Computer Science and Ubiquitous Computing CSA CUTE 2018 James J. Park instant download
62 pages
LTE Power On Procedure
100% (1)
LTE Power On Procedure
4 pages
mems and nems -part 2
No ratings yet
mems and nems -part 2
2 pages
Lab_03-OOP
No ratings yet
Lab_03-OOP
9 pages
J PR Micro Project
No ratings yet
J PR Micro Project
11 pages
React Inv
No ratings yet
React Inv
45 pages
AI Future
No ratings yet
AI Future
4 pages
Destructor and Finalize
No ratings yet
Destructor and Finalize
11 pages
My Aws Exam Notes
No ratings yet
My Aws Exam Notes
12 pages
An Introduction To Android Development: CS231M - Alejandro Troccoli
No ratings yet
An Introduction To Android Development: CS231M - Alejandro Troccoli
22 pages
Recommendation Test For Media Convertor
No ratings yet
Recommendation Test For Media Convertor
5 pages
Karthik.G: Education Certification
No ratings yet
Karthik.G: Education Certification
2 pages
Week 10 - INTRODUCTION TO MULTIMEDIA AND MULTI MEDIA APPLICATION-1
No ratings yet
Week 10 - INTRODUCTION TO MULTIMEDIA AND MULTI MEDIA APPLICATION-1
36 pages
Exception Thrown When Trying To Communicate With Job Server On A Different Environment
No ratings yet
Exception Thrown When Trying To Communicate With Job Server On A Different Environment
2 pages
Multicasting Over Manets
No ratings yet
Multicasting Over Manets
12 pages
C Tutorial
No ratings yet
C Tutorial
4 pages
Static Code Analysis
No ratings yet
Static Code Analysis
10 pages
Ankit's Resume
No ratings yet
Ankit's Resume
1 page
Introduction To Data Science Methodology
No ratings yet
Introduction To Data Science Methodology
4 pages
Assessment Task 1 - Software
No ratings yet
Assessment Task 1 - Software
1 page
Introduction To Open Data Certificates
100% (3)
Introduction To Open Data Certificates
26 pages
Local Area Networking
No ratings yet
Local Area Networking
34 pages
Data Science - Full-Time PDF
No ratings yet
Data Science - Full-Time PDF
34 pages
Vl390 For Ubtx: Clock Gen SLG8LP625
No ratings yet
Vl390 For Ubtx: Clock Gen SLG8LP625
37 pages
Introduction To Digital Forensics
No ratings yet
Introduction To Digital Forensics
3 pages
Data Science Textbook PDF
100% (2)
Data Science Textbook PDF
646 pages
Data Science Interview Prep For SQL, Panda, Python, R Langu
No ratings yet
Data Science Interview Prep For SQL, Panda, Python, R Langu
136 pages
File Operations in SAP Application Server (AL11) Using UNIX Command - SAP Blogs
No ratings yet
File Operations in SAP Application Server (AL11) Using UNIX Command - SAP Blogs
11 pages
Arabic Language & Islaamic Studies Syllabus, Islaamic University Madinah
No ratings yet
Arabic Language & Islaamic Studies Syllabus, Islaamic University Madinah
3 pages
ML Use Cases Ebook
100% (2)
ML Use Cases Ebook
53 pages
2020-09-17 - Lak - GDG - Machine Learning Design Patterns For MLOps PDF
No ratings yet
2020-09-17 - Lak - GDG - Machine Learning Design Patterns For MLOps PDF
43 pages
How Can I Become A Data Scientist - Quora
100% (1)
How Can I Become A Data Scientist - Quora
16 pages
IBM Data Science Capstone
89% (9)
IBM Data Science Capstone
51 pages
Chidiebere Nkwazema Data Scientist With Python Track
No ratings yet
Chidiebere Nkwazema Data Scientist With Python Track
16 pages
Capstone Final
100% (1)
Capstone Final
40 pages
Thera Bank - Project
100% (4)
Thera Bank - Project
34 pages
Kaggle's State of Machine Learning and Data Science 2021
No ratings yet
Kaggle's State of Machine Learning and Data Science 2021
45 pages
Cracking Core Java Interviews - v3.5
100% (1)
Cracking Core Java Interviews - v3.5
266 pages
Experiment 3: Decision Making and Looping Operation Using 8086
No ratings yet
Experiment 3: Decision Making and Looping Operation Using 8086
5 pages
Data Science Methodolgy
No ratings yet
Data Science Methodolgy
12 pages
Ds Capstone Presentation
No ratings yet
Ds Capstone Presentation
47 pages
Data Engineer-Resume
No ratings yet
Data Engineer-Resume
1 page
Data Science Fusion: Integrating Maths, Python, and Machine Learning
From Everand
Data Science Fusion: Integrating Maths, Python, and Machine Learning
NIBEDITA Sahu
No ratings yet
Commsgenius Multi-Function Data Communications Processor
No ratings yet
Commsgenius Multi-Function Data Communications Processor
2 pages
Machine Learning Project Report - Customer Segmentation
No ratings yet
Machine Learning Project Report - Customer Segmentation
2 pages
2 Place Solution: Instacart Market Basket Analysis
No ratings yet
2 Place Solution: Instacart Market Basket Analysis
36 pages
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
Data Science For Agriculture
No ratings yet
Data Science For Agriculture
5 pages
India-GST-Documentation 1709 Config
No ratings yet
India-GST-Documentation 1709 Config
9 pages
CV Data Analyst
No ratings yet
CV Data Analyst
3 pages
Big Data Technology
100% (1)
Big Data Technology
10 pages
Inmon Vs Kimball
No ratings yet
Inmon Vs Kimball
32 pages
Post Graduate Diploma in Data Science and Machine Learning
No ratings yet
Post Graduate Diploma in Data Science and Machine Learning
8 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
CSC8001-Data Science Project Report
No ratings yet
CSC8001-Data Science Project Report
5 pages
Up and Running with ClickHouse: Learn and Explore ClickHouse, It's Robust Table Engines for Analytical Tasks, ClickHouse SQL, Integration with External Applications, and Managing the ClickHouse Server
From Everand
Up and Running with ClickHouse: Learn and Explore ClickHouse, It's Robust Table Engines for Analytical Tasks, ClickHouse SQL, Integration with External Applications, and Managing the ClickHouse Server
Vijay Anand R
No ratings yet
Parallel Python with Dask
From Everand
Parallel Python with Dask
Tim Peters
No ratings yet
Dayananda Sagar University: A Mini Project Report ON
No ratings yet
Dayananda Sagar University: A Mini Project Report ON
17 pages
PSD02 - Data Science Overview
No ratings yet
PSD02 - Data Science Overview
64 pages
Implementing Data Science Projects PDF
No ratings yet
Implementing Data Science Projects PDF
2 pages
Unit 1 Full Notes
No ratings yet
Unit 1 Full Notes
52 pages
Data Visualization Nanodegree Program Syllabus PDF
No ratings yet
Data Visualization Nanodegree Program Syllabus PDF
4 pages
Data Science Resource Package!
No ratings yet
Data Science Resource Package!
14 pages
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
No ratings yet
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
22 pages
Lifecycle of A Data Science Project
No ratings yet
Lifecycle of A Data Science Project
1 page
Kimball Vs Inmon
No ratings yet
Kimball Vs Inmon
28 pages
Data Engineering Quick Reference
No ratings yet
Data Engineering Quick Reference
9 pages
Data Science Learning Path For 50 Days
No ratings yet
Data Science Learning Path For 50 Days
15 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
Education Loan Prediction Analysis
No ratings yet
Education Loan Prediction Analysis
5 pages
C2 Databricks - Sparks - EE
No ratings yet
C2 Databricks - Sparks - EE
9 pages
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Apache Spark Graph Processing
From Everand
Apache Spark Graph Processing
Ramamonjison Rindra
No ratings yet
Data Science - A Kaggle Walkthrough - Introduction - 1 PDF
No ratings yet
Data Science - A Kaggle Walkthrough - Introduction - 1 PDF
5 pages
Data Science Use Cases
100% (1)
Data Science Use Cases
10 pages
Da Notes (Big Data) PDF
No ratings yet
Da Notes (Big Data) PDF
32 pages
Computer Security Questions and Answers:: 1:: What Is A Firewall?
No ratings yet
Computer Security Questions and Answers:: 1:: What Is A Firewall?
9 pages
Loan Risk Analysis With Databricks and XGBoost - A Databricks Guide, Including Code Samples and Notebooks (2019)
No ratings yet
Loan Risk Analysis With Databricks and XGBoost - A Databricks Guide, Including Code Samples and Notebooks (2019)
11 pages
Data Science Course Content
No ratings yet
Data Science Course Content
4 pages
Monash Data Science
No ratings yet
Monash Data Science
4 pages
PG Program Dsba Classroom
No ratings yet
PG Program Dsba Classroom
16 pages
Data Science With Python
No ratings yet
Data Science With Python
4 pages
Career Plans For Next 2 Years
No ratings yet
Career Plans For Next 2 Years
11 pages
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Andrew Treadway - Software Engineering For Data Scientists (MEAP V03) - Manning Publications (2023)
100% (1)
Andrew Treadway - Software Engineering For Data Scientists (MEAP V03) - Manning Publications (2023)
319 pages
Top SIEM Use Cases Derbycon 09232016
No ratings yet
Top SIEM Use Cases Derbycon 09232016
29 pages
Migrating Big Data Analytics
No ratings yet
Migrating Big Data Analytics
16 pages
Data Career Skills Checklist
No ratings yet
Data Career Skills Checklist
19 pages
Anaconda's Guide To Open-Source: Tools and Libraries For Enterprise Data Science and Machine Learning
No ratings yet
Anaconda's Guide To Open-Source: Tools and Libraries For Enterprise Data Science and Machine Learning
29 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

IBM Data Science Capstone Report

Uploaded by

IBM Data Science Capstone Report

Uploaded by

IBM Data Science Capstone Report

In most cases, not paying enough attention during driving,

The target audience of the project is local Seattle government,

The data consists of 37 independent variables and 194,673 rows.

Severity codes are as follows:

1: Property Damage Only Collision

Furthermore, because of the existence of null values in some

Also, as I mentioned earlier, “SEVERITYCODE” is the target

I have run a value count on road (‘ROADCOND’) and weather

I have employed three machine learning models:

 K Nearest Neighbour (KNN)

After importing necessary packages and splitting preprocessed

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.