IBM Data Science Capstone Report
IBM Data Science Capstone Report
Business Understanding
The government is going to prevent avoidable car accidents by
employing methods that alert drivers, health system, and police to
remind them to be more careful in critical situations.
Data
The data was collected by the Seattle Police Department and
Accident Traffic Records Department from 2004 to present.
2: Injury Collision
Data Preprocessing
The dataset in the original form is not ready for data analysis. In
order to prepare the data, first, we need to drop the non-relevant
columns. In addition, most of the features are of object data types
that need to be converted into numerical data types.
After analyzing the data set, I have decided to focus on only four
features, severity, weather conditions, road conditions, and light
conditions, among others.
To get a good understanding of the dataset, I have checked
different values in the features. The results show, the target
feature is imbalance, so we use a simple statistical technique to
balance it.
As you can see, the number of rows in class 1 is almost three times
bigger than the number of rows in class 2. It is possible to solve the
issue by downsampling the class 1.
Methodology
For implementing the solution, I have used Github as a repository
and running Jupyter Notebook to preprocess data and build
Machine Learning models. Regarding coding, I have used Python
and its popular packages such as Pandas, NumPy and Sklearn.
Once I have load data into Pandas Dataframe, used
‘dtypes’ attribute to check the feature names and their data types.
Then I have selected the most important features to predict the
severity of accidents in Seattle. Among all the features, the
following features have the most influence in the accuracy of the
predictions:
“WEATHER”,
“ROADCOND”,
“LIGHTCOND”
Decision Tree
Linear Regression
KNN
Decision Tree
Linear Regression
Results and Evaluations
The final results of the model evaluations are summarized in the
following table:
Based on the above table, KNN is the best model to predict car
accident severity.
Conclusion
Based on the dataset provided for this capstone from weather,
road, and light conditions pointing to certain classes, we can
conclude that particular conditions have a somewhat impact on
whether or not travel could result in property damage (class 1) or
injury (class 2).