0% found this document useful (0 votes)
9 views49 pages

IBM DS Certificate CapstoneProject SamiAlaruri

The IBM Data Science Professional Certificate capstone project analyzes SpaceX Falcon 9 rocket data to assess landing success rates and launch costs for a rival company, SpaceY. The project employs various methodologies including data collection, exploratory data analysis, and predictive modeling using machine learning techniques. Key findings indicate that newer flights and certain launch sites have higher success rates, and the project culminates in an interactive dashboard and predictive analytics results.

Uploaded by

forgamingjeden
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views49 pages

IBM DS Certificate CapstoneProject SamiAlaruri

The IBM Data Science Professional Certificate capstone project analyzes SpaceX Falcon 9 rocket data to assess landing success rates and launch costs for a rival company, SpaceY. The project employs various methodologies including data collection, exploratory data analysis, and predictive modeling using machine learning techniques. Key findings indicate that newer flights and certain launch sites have higher success rates, and the project culminates in an interactive dashboard and predictive analytics results.

Uploaded by

forgamingjeden
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/381490795

IBM-data science professional certificate capstone project

Presentation · June 2024


DOI: 10.13140/RG.2.2.17037.35046/1

CITATIONS READS

0 2,552

1 author:

Sami D. Alaruri ‫ﺳﺎﻣﻲ اﻟﻌﺎروري‬


Independnet Researcher
79 PUBLICATIONS 530 CITATIONS

SEE PROFILE

All content following this page was uploaded by Sami D. Alaruri ‫ ﺳﺎﻣﻲ اﻟﻌﺎروري‬on 18 October 2024.

The user has requested enhancement of the downloaded file.


Sami D. Alaruri
June, 2024
Outline

• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix
2
Executive Summary
In this project, a rival company to SpaceX (i.e., SpaceY) uses SpaceX Falcon 9
rocket data to determine the rocket first stage landing successes and the cost
of a launch. A summary for the methodologies and results described in this
report is outlined below.

Summary of methodologies

• Data Collection

• Data wrangling

• Exploratory data analysis with data visualization and SQL

• Building an interactive map with Folium

• Building Dashboard with Plotly Dash

• Predictive analysis (classification)

Summary of all results

• Exploratory data analysis results

• Interactive analytics demo in screenshots


3
• Predictive analysis results
GitHub URL: https://github.com/ocean2024/Ocean2024/tree/main
Introduction
This capstone project is part of the IBM Data Science Professional Certificate. The
goal of the project is to demonstrate proficiency in data science and machine
learning techniques using a real-world data and to summarize the results in a
report.
In this project, a rival company to SpaceX (i.e., SpaceY) uses SpaceX Falcon 9 rocket
data to determine the rocket first stage landing successes and also uses the rocket
data to determine the cost of a launch. Space Y uses the data to bid against SpaceX
for a rocket launch. SpaceX advertises Falcon 9 rocket launch cost to be 62 million
dollars. Whereas for other companies the cost of a rocket launch is more than 165
million dollars.
Throughout the project Python Jupyter note books are used to perform the data
collection and analysis. These Jupyter note books and the final *.pdf report are
saved in my GitHub repository webpage.
The major parts of this report include data collection methodology, data wrangling,
exploratory data analysis (EDA), interactive data visualization, machine learning (ML)
classification model development, and model evaluation. Finally, the accuracy of
different ML algorithms are compared in predicting the future landing of the Falcon 4
9 first stage rocket.
Section 1

5
Methodology
Data used in this project were collected from SpaceX Rest API and from
Wikipedia launch table. The wrangling of the collected data included cleaning,
preparation for visualization and information extraction for usage in ML
predictive models such as logistic regression, support vector machine (SVM),
decision tree, and K-nearest neighbors (KNN).
In addition, exploratory data analysis (EDA) was performed using visualization
and SQL. Lastly, Folium and Plotly Dash Python libraries were use in data
representation and in the interactive visual analytics of the data.
Finally, predictive analysis was performed using classification models
for predicting if the first stage of Falcon 9 rocket will land successfully using
Skikit-learn and also the accuracy of the model was determined. 6
Data Collection: Overview
Data collection and visualization major steps:
Step # 1: Collect Data Step# 2: Scrap and filter data to
from SpaceX API and include Falcon 9 data, assign data to Step 3: Plot and visualize
Convert data to dataf rame and dictionary, and export the data
.jason file data to a csv file

Portion of generated output data file: dataset_part1.csv

7
GitHub URL: https://github.com/ocean2024/Ocean2024/blob/main/dataset_part_1.csv
Data Collection: SpaceX API
Request response from SpaceX
API using get request and covert
data to .Jason file

Use custom functions to clean


data

Clean data and assign data to


dictionary and data frame
Portion of output

Filter data to include only Falcon


9 launches and export data to a
csv file: dataset_part1

GitHub URL: 8
https://github.com/ocean2024/Ocean2024/blob/main/IBM%20DS_Capstone%20Project_Data%20Collection.ipynb
Data Collection: Scraping
Step 1: Perform HTTP get to
request Falcon 9 HTML page and
create Beautiful Soup object from
HTML

Step 2: Extract all column/variable


names from the HTML table
header

Step3: Create a data frame by Portion


parsing the launch HTML tables of
Output
Step 4: export data into CSV file
(spacex_web_scraped.csv)

GitHub URL: 9
https://github.com/ocean2024/Ocean2024/blob/main/IBM_DS%20Capstone_Project_Web_Data_Scraping_.ipynb
Data Collection: Data Wrangling
Step 1: Load data from dataset_part1.csv file
and calculate the number of launches on each
site

Step 2: Calculate the number and the


occurrence of each orbit

Step3: Calculate the number and occurrence


of mission outcome of the orbits

Step 4: Create a landing outcome label from


outcome column and export data into
dataset_part2.csv file
GitHub URL: 10
https://github.com/ocean2024/Ocean2024/blob/main/IBM_DS_Capstone_Project_Data_Wrangling.ipynb
EDA and Data Visualization
Use Matplotlib and Seaborn for data visualization
Step 1: Visualize the relationship Step 2: Visualize the Step3: Visualize the relationship
between flight number and relationship between payload between success rate of each
launch site and launch site orbit type

Step 5: Visualize the Step 4: Visualize the relationship


Step 6: Visualize the launch
relationship between payload between flight number and orbit
success yearly trend
and orbit type type

Step 7: Create dummy variable to Step 8: Cast all numeric


categorical columns columns to float64

GitHub URL: 11
https://github.com/ocean2024/Ocean2024/blob/main/IBM_DS_Capstone_Project_Data%20Visualization_with_EDA.ipynb
EDA with SQL

Step #1: Display the Step 2: Display 5 Step 3: Display average


Step 4: Display average
names of the unique records where launch payload mass carried
payload mass carried by
launch sites in the space sites begin with the by booster launched by
booster version F9 v1.1
mission string ‘CCA’ NASA (CRS)

Step 6: List the names Step 5: List the date


Step 8: List the names of Step 7: List the total when the first successful
of the boosters which
the booster_ve which number of successful landing outcome in
have success in drone
have carried the max and failure mission ground pad was
ship and mass > 4000 &
payload mass outcomes achieved
<6000

Step 9: List the records


Step 10: Rank the count
which display the
of landing outcomes or
month, failure landing,
success
booster version ..etc.

GitHub URL: 12
https://github.com/ocean2024/Ocean2024/blob/main/IBM_DAS_Capstone_Project_SQL.ipynb
Build an Interactive Map with Folium
Step 1: Mark all launch sites
on a map created using Step 2: Mark the
Folium by adding markers*
Step 3: Calculate
with circle , popup label and
success/failed the distance
text label to each site using launches for each
its longitude and latitude between a launch
site on the map
coordinates to show the
using colored site to its
geographical location
approximately to the markers proximities
equator
* Explanation:
From the visual analysis of the launch site KSC LC-39A we can clearly see that it is:
• relative close to railway (15.23 km)
• relative close to highway (20.28 km)
• relative close to coastline (14.99 km)
• Also the launch site KSC LC-39A is relative close to its closest city Titusville (16.32 km).
• Failed rocket with its high speed can cover distances like 15-20 km in few seconds. It could be potentially dangerous to populated areas.

GitHubURL:
https://github.com/ocean2024/Ocean2024/blob/main/IBM_DA_Capstone_Project_Visual_Analytics_Folium.ipynb 13
Build a Dashboard with Plotly Dash

Step 2: Add pie chart to show


Step 1: Add dropdown list to the total successful launches
enable launch site selection count for all sites and the
success vs. failed counts

Step 4: Add a scatter chart


of payload mass vs. success Step 3: Add a range slider
rate of different booster to select payload
versions

The dashboard is built using Dash web


14
Predictive Analysis (Classification): Overview

Build Model Evaluate Model

Find the best performing


Improve Model
model

GitHub URL:
https://github.com/ocean2024/Ocean2024/blob/main/IBM%20DS%20Capstone%20Project_ML%20Prediction.ipynb
Predictive Analysis (Classification) Steps
Step 2: Create Numpy Step 4: Use
Step 1: Load the Step 3: Standardize train_test_split to split
array from the
dataframe the data in X and the X & Y data into test
column class in the
assign it variable Y and training data
data

Step 8: Calculate the Step 7: Create a Step 6: Calculate the


Step 5: Create a logistic
accuracy on the test support vector accuracy on the test
object the create a
data using the machine object then data using score
GridSearchCV object
method score create a GridSearchCV method

Step 9: Create a Step 10: Calculate Step 12: Calculate the


Step 11: Create a K
decision tree the accuracy of tree- accuracy of Knn-cv &
nearest neighbors
classifier object then cv on the test data find the method
object then
create GridSearchCV using the method performs best
GridSearchCV
Object score
GitHub URL:
https://github.com/ocean2024/Ocean2024/blob/main/IBM%20DS%20Capstone%20Project_ML%20Prediction.ipynb16
Results

• Exploratory data analysis results.


• Interactive analytics demo in
screenshots.
• Predictive analysis results.

17
Section 2
Flight Number vs. Launch Site

• The majority of the flights


were launched from the CCAFS
SLC 40 sites.
• The VAFB SLC 4E and KSC LC
39A sites have higher success
rates than other sites.
• Newer flights have higher
success rates than older flights.

19
Payload Mass vs. Launch Site

• The majority of the flights with


payload mass above 7000 Kg were
successful.
• KSC LC 39A success rate for
payload mass under 5500 kg is
100%.
• For all launch sites the success rate
is proportional to the payload
mass.

20
Success Rate vs. Orbit Type

• The OS orbit has 0% success rate.


• The ELS-1, GEO, HEO and SSO
orbits have 100% success rate.
• Orbits GTO, ISS, LEO, MEO and
PO success rate is higher than
50% and less than 75%.

21
Flight Number vs. Orbit Type

• The majority of the flights were


launches to the ISS and GTO
orbits.
• The data suggests that there is
no relationship between the
flight number and the orbit
type.

22
Payload Mass vs. Orbit Type

• Payload masses above 10000


Kg were placed in PO, ISS and
LEO orbits.
• Payload masses above 4000
and less than 8000 Kg were
placed in the GTO orbit.

23
Launch Success Yearly Trend

• The launches success rate


increased steadily since
2013.
• The increase in the success
rate between 2013 and
2017 was linear.
• During 2018 there was a
drop in the launches
success rate.
24
All Launch Site Names

The names of the unique launch sites and the query structure
for obtaining these sites is shown below.

25
Launch Site Names Begin with 'CCA'
5 records for launch sites begin with the string 'CCA’ and the query
used for obtaining the information is shown below.

26
Total Payload Mass

• The calculated total payload mass carried by boosters from


NASA site =45596 Kg.
• The query for obtaining the total payload mass is shown
below.

27
Average Payload Mass by F9 v1.1
• The average payload mass carried by booster version F9 v1.1=2534.7 Kg.
• Furthermore, the query used to calculate the average payload mass carried
by booster F9 v1.1 is shown below.

28
First Successful Ground Landing Date

• The first successful landing outcome on a ground pad was in


2015-12-22.
• The query for obtaining this result is shown below.

29
Successful Drone Ship Landing with Payload between 4000 and 6000

• List of boosters which have successfully landed on drone ship and had
payload mass greater than 4000 but less than 6000 is shown below.
• The query used in obtaining this information is shown below.

30
Total Number of Successful and Failure Mission Outcomes

• The total number of successful and failed missions is as follows:


• Failure (in flight)= 1
• Successful number of flights= 98
• The query result is shown below.

31
Boosters Carried Maximum Payload
• List of the boosters which have carried the maximum payload mass are
shown below.
• The query used in obtaining the booster names is shown below.

32
2015 Launch Records
• List of the failed ”landing_outcomes” in drone ship, their booster version,
and the launch site name during year 2015 is shown below.
• The query used in obtaining the information is shown below.

33
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20

• A rank of the count of landing outcomes (such as Failure (drone ship) or


success (ground pad)) between the dates 2010-06-04 and 2017-03-20, in
descending order is shown below.
• The query used to obtain the results is shown below.

34
Section 3
USA Launch Sites in California and Florida
• Most of Launch sites considered in this project
are in proximity to the Equator line. Launch sites
are made at the closest point possible to Equator
line, because anything on the surface of the Earth
at the equator is already moving at the maximum
speed (1670 kilometers per hour). For example
launching from the equator makes the spacecraft
move almost 500 km/hour faster once it is
launched compared half way to north pole.

• All launch sites considered in this project are in


very close proximity to the coast While starting
rockets towards the ocean we minimize the risk
of having any debris dropping or exploding near
people.

36
Color Labels Showing the Launch Sites on a Map

Green= Successful Launch


Red= Failed Launch
37
Safe Distance to Launch Site

The obtained results


indicate that all launch
sites are at safe
distance from railway
lines and cities.

38
Section 4
Total Launch Success for All Sites

The highest success launch rates were recorded at these sites :


1. KSC LC-39A (41.7%)
2. CCAFS LC-40 (29.2%)
40
KSC LC-39 Launch Site Success Rate

Site KSC LC-39 success rate is 76.9% 41


Payload vs. Launch Outcome for All Sites

Highest success rate for


payloads is between 2000
and 5500 Kgs

42
Section 5
Classification Accuracy

• Using the test set the same accuracy results were obtained from the four models.
• The Tree Model provided the best accuracy results for the entire data set.

44
Confusion Matrix
• The confusion matrix analysis suggests that the best performing model is the
Logistic Regression model.
• The confusion matrix predicts 13 true
positives, 3 false positives, 3 true positive,
and 0 false negative.

45
Conclusions
• The success rate for the rocket launches increased
after 2013.
• Orbits GEO, HEO, ES-L1 and SSO have 100%
launch success rate.
• Launch site KSC LC-39A has the highest success rate.
• The Decision Tree model is the best ML algorithm for
analyzing the SpaceX data set and provided the best
accuracy results.

46
Appendix
https://github.com/ocean2024/Ocean2024/tree/main

https://www.coursera.org/professional-certificates/ibm-data-
science?campaignid=1876641588&adgroupid=70740725700&d
evice=c&keyword=ibm%20data%20science%20professional%2
0certificate&matchtype=b&network=g&devicemodel=&adposit
ion=&creativeid=347445112274&hide_mobile_promo=&gad_s
ource=1

47
View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy