IBM DS Certificate CapstoneProject SamiAlaruri
IBM DS Certificate CapstoneProject SamiAlaruri
net/publication/381490795
CITATIONS READS
0 2,552
1 author:
SEE PROFILE
All content following this page was uploaded by Sami D. Alaruri ﺳﺎﻣﻲ اﻟﻌﺎروريon 18 October 2024.
• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix
2
Executive Summary
In this project, a rival company to SpaceX (i.e., SpaceY) uses SpaceX Falcon 9
rocket data to determine the rocket first stage landing successes and the cost
of a launch. A summary for the methodologies and results described in this
report is outlined below.
Summary of methodologies
• Data Collection
• Data wrangling
5
Methodology
Data used in this project were collected from SpaceX Rest API and from
Wikipedia launch table. The wrangling of the collected data included cleaning,
preparation for visualization and information extraction for usage in ML
predictive models such as logistic regression, support vector machine (SVM),
decision tree, and K-nearest neighbors (KNN).
In addition, exploratory data analysis (EDA) was performed using visualization
and SQL. Lastly, Folium and Plotly Dash Python libraries were use in data
representation and in the interactive visual analytics of the data.
Finally, predictive analysis was performed using classification models
for predicting if the first stage of Falcon 9 rocket will land successfully using
Skikit-learn and also the accuracy of the model was determined. 6
Data Collection: Overview
Data collection and visualization major steps:
Step # 1: Collect Data Step# 2: Scrap and filter data to
from SpaceX API and include Falcon 9 data, assign data to Step 3: Plot and visualize
Convert data to dataf rame and dictionary, and export the data
.jason file data to a csv file
7
GitHub URL: https://github.com/ocean2024/Ocean2024/blob/main/dataset_part_1.csv
Data Collection: SpaceX API
Request response from SpaceX
API using get request and covert
data to .Jason file
GitHub URL: 8
https://github.com/ocean2024/Ocean2024/blob/main/IBM%20DS_Capstone%20Project_Data%20Collection.ipynb
Data Collection: Scraping
Step 1: Perform HTTP get to
request Falcon 9 HTML page and
create Beautiful Soup object from
HTML
GitHub URL: 9
https://github.com/ocean2024/Ocean2024/blob/main/IBM_DS%20Capstone_Project_Web_Data_Scraping_.ipynb
Data Collection: Data Wrangling
Step 1: Load data from dataset_part1.csv file
and calculate the number of launches on each
site
GitHub URL: 11
https://github.com/ocean2024/Ocean2024/blob/main/IBM_DS_Capstone_Project_Data%20Visualization_with_EDA.ipynb
EDA with SQL
GitHub URL: 12
https://github.com/ocean2024/Ocean2024/blob/main/IBM_DAS_Capstone_Project_SQL.ipynb
Build an Interactive Map with Folium
Step 1: Mark all launch sites
on a map created using Step 2: Mark the
Folium by adding markers*
Step 3: Calculate
with circle , popup label and
success/failed the distance
text label to each site using launches for each
its longitude and latitude between a launch
site on the map
coordinates to show the
using colored site to its
geographical location
approximately to the markers proximities
equator
* Explanation:
From the visual analysis of the launch site KSC LC-39A we can clearly see that it is:
• relative close to railway (15.23 km)
• relative close to highway (20.28 km)
• relative close to coastline (14.99 km)
• Also the launch site KSC LC-39A is relative close to its closest city Titusville (16.32 km).
• Failed rocket with its high speed can cover distances like 15-20 km in few seconds. It could be potentially dangerous to populated areas.
GitHubURL:
https://github.com/ocean2024/Ocean2024/blob/main/IBM_DA_Capstone_Project_Visual_Analytics_Folium.ipynb 13
Build a Dashboard with Plotly Dash
GitHub URL:
https://github.com/ocean2024/Ocean2024/blob/main/IBM%20DS%20Capstone%20Project_ML%20Prediction.ipynb
Predictive Analysis (Classification) Steps
Step 2: Create Numpy Step 4: Use
Step 1: Load the Step 3: Standardize train_test_split to split
array from the
dataframe the data in X and the X & Y data into test
column class in the
assign it variable Y and training data
data
17
Section 2
Flight Number vs. Launch Site
19
Payload Mass vs. Launch Site
20
Success Rate vs. Orbit Type
21
Flight Number vs. Orbit Type
22
Payload Mass vs. Orbit Type
23
Launch Success Yearly Trend
The names of the unique launch sites and the query structure
for obtaining these sites is shown below.
25
Launch Site Names Begin with 'CCA'
5 records for launch sites begin with the string 'CCA’ and the query
used for obtaining the information is shown below.
26
Total Payload Mass
27
Average Payload Mass by F9 v1.1
• The average payload mass carried by booster version F9 v1.1=2534.7 Kg.
• Furthermore, the query used to calculate the average payload mass carried
by booster F9 v1.1 is shown below.
28
First Successful Ground Landing Date
29
Successful Drone Ship Landing with Payload between 4000 and 6000
• List of boosters which have successfully landed on drone ship and had
payload mass greater than 4000 but less than 6000 is shown below.
• The query used in obtaining this information is shown below.
30
Total Number of Successful and Failure Mission Outcomes
31
Boosters Carried Maximum Payload
• List of the boosters which have carried the maximum payload mass are
shown below.
• The query used in obtaining the booster names is shown below.
32
2015 Launch Records
• List of the failed ”landing_outcomes” in drone ship, their booster version,
and the launch site name during year 2015 is shown below.
• The query used in obtaining the information is shown below.
33
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20
34
Section 3
USA Launch Sites in California and Florida
• Most of Launch sites considered in this project
are in proximity to the Equator line. Launch sites
are made at the closest point possible to Equator
line, because anything on the surface of the Earth
at the equator is already moving at the maximum
speed (1670 kilometers per hour). For example
launching from the equator makes the spacecraft
move almost 500 km/hour faster once it is
launched compared half way to north pole.
36
Color Labels Showing the Launch Sites on a Map
38
Section 4
Total Launch Success for All Sites
42
Section 5
Classification Accuracy
• Using the test set the same accuracy results were obtained from the four models.
• The Tree Model provided the best accuracy results for the entire data set.
44
Confusion Matrix
• The confusion matrix analysis suggests that the best performing model is the
Logistic Regression model.
• The confusion matrix predicts 13 true
positives, 3 false positives, 3 true positive,
and 0 false negative.
45
Conclusions
• The success rate for the rocket launches increased
after 2013.
• Orbits GEO, HEO, ES-L1 and SSO have 100%
launch success rate.
• Launch site KSC LC-39A has the highest success rate.
• The Decision Tree model is the best ML algorithm for
analyzing the SpaceX data set and provided the best
accuracy results.
46
Appendix
https://github.com/ocean2024/Ocean2024/tree/main
https://www.coursera.org/professional-certificates/ibm-data-
science?campaignid=1876641588&adgroupid=70740725700&d
evice=c&keyword=ibm%20data%20science%20professional%2
0certificate&matchtype=b&network=g&devicemodel=&adposit
ion=&creativeid=347445112274&hide_mobile_promo=&gad_s
ource=1
47
View publication stats