Ds Capstone Presentation
Ds Capstone Presentation
Evgeny Zorin
29.08.2021
Outline
• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix
Executive Summary
Summary of methodologies
- Data collection
- Data wrangling
- Exploratory Data Analysis with Data Visualization
- Exploratory Data Analysis with SQL
- Building an interactive map with Folium
- Building a Dashboard with Plotly Dash
- Predictive analysis (Classi ication)
Introduction
Project background and context
SpaceX is the most successful company of the commercial space
age, making space travel affordable. The company advertises Falcon
9 rocket launches on its website, with a cost of 62 million dollars;
other providers cost upward of 165 million dollars each, much of the
savings is because SpaceX can reuse the irst stage. Therefore, if we
can determine if the irst stage will land, we can determine the cost
of a launch. Based on public information and machine learning
models, we are going to predict if SpaceX will reuse the irst stage.
Questions to be answered
- How do variables such as payload mass, launch site, number of
lights, and orbits affect the success of the irst stage landing?
- Does the rate of successful landings increase over the years?
- What is the best algorithm that can be used for binary classi ication
in this case?
f
f
f
f
f
f
Methodology
Data collection methodology:
- Using SpaceX Rest API
- Using Web Scrapping from Wikipedia
f
f
f
Methodology
Data collection
Data collection process involved a combination of API requests from SpaceX REST
API and Web Scraping data from a table in SpaceX’s Wikipedia entry.
We had to use both of these data collection methods in order to get complete
information about the launches for a more detailed analysis.
Replacing missing
Filtering the
values of Payload
Exporting the data dataframe to only Creating a dataframe
Mass column with
to CSV include Falcon 9 from the dictionary
calculated .mean()
launches
for this column
Constructing data
Exporting the data Creating a dataframe
we have obtained
to CSV from the dictionary
into a dictionary
Data wrangling
In the data set, there are several different cases where the Perform exploratory Data Analysis
booster did not land successfully. Sometimes a landing was and determine Training Labels
attempted but failed due to an accident; for example, True
Ocean means the mission outcome was successfully landed
Calculate the number of launches
to a speci ic region of the ocean while False Ocean means on each site
the mission outcome was unsuccessfully landed to a speci ic
region of the ocean. True RTLS means the mission outcome Calculate the number and occurrence
was successfully landed to a ground pad False RTLS means of each orbit
the mission outcome was unsuccessfully landed to a ground Calculate the number and occurrence
pad.True ASDS means the mission outcome was successfully of mission outcome per orbit type
landed on a drone ship False ASDS means the mission
Create a landing outcome label
outcome was unsuccessfully landed on a drone ship.
from Outcome column
We mainly convert those outcomes into Training Labels with
Exporting the data
“1” means the booster successfully landed, “0” means it was to CSV
unsuccessful.
f
EDA with data visualization
Charts were plotted:
Flight Number vs. Payload Mass, Flight Number vs. Launch Site, Payload Mass
vs. Launch Site, Orbit Type vs. Success Rate, Flight Number vs. Orbit Type,
Payload Mass vs Orbit Type and Success Rate Yearly Trend
Scatter Chart of Payload Mass vs. Success Rate for the di erent Booster Versions:
- Added a scatter chart to show the correlation between Payload and Launch Success.
ff
f
Results
screenshots
• Predictive analysis results
Explanation:
• The earliest lights all failed while the latest lights all succeeded.
• The CCAFS SLC 40 launch site has about a half of all launches.
• VAFB SLC 4E and KSC LC 39A have higher success rates.
• It can be assumed that each new launch has a higher rate of success.
f
Explanation:
• For every launch site the higher the payload mass, the higher the success
rate.
• Most of the launches with payload mass over 7000 kg were successful.
• KSC LC 39A has a 100% success rate for payload mass under 5500 kg too.
Explanation:
• In the LEO orbit the Success appears related to the number of lights;
on the other hand, there seems to be no relationship between light
number when in GTO orbit.
f
f
Payload Mass vs. Orbit type
Explanation:
• Heavy payloads have a negative in luence on GTO orbits and positive
on GTO and Polar LEO (ISS) orbits.
f
Launch success yearly trend
Explanation:
• The success rate
since 2013 kept
increasing till 2020.
Explanation:
• Displaying the names of the unique launch sites in the space mission.
Explanation:
• Displaying 5 records where launch sites begin with the string 'CCA'.
Explanation:
• Displaying the total payload mass carried by boosters launched by
NASA (CRS).
Explanation:
• Displaying average payload mass carried by booster version F9 v1.1.
Explanation:
• Listing the date when the irst successful landing outcome in ground
pad was achieved.
f
Successful drone ship landing with payload
between 4000 and 6000
Explanation:
• Listing the names of the boosters which have success in drone ship
and have payload mass greater than 4000 but less than 6000.
Explanation:
• Listing the total number of successful and failure mission outcomes.
Explanation:
• Listing the names of the booster versions which have carried the maximum
payload mass.
Explanation:
• Listing the failed landing outcomes in drone ship, their booster
versions and launch site names for the months in year 2015.
Explanation:
• Ranking the count of landing outcomes (such as Failure (drone ship) or Success
(ground pad)) between the date 2010-06-04 and 2017-03-20 in descending order.
Explanation:
• The chart clearly shows that from all the sites, KSC LC-39A has the most
successful launches.
Explanation:
• KSC LC-39A has the highest launch success rate (76.9%) with 10 successful and
only 3 failed landings.
Explanation:
• The charts show
that payloads
between 2000
and 5500 kg have
the highest
success rate.
Predictive analysis
(Classi ication)
f
Classi ication Accuracy
Explanation: Scores and Accuracy of the Test Set
• Based on the scores of the Test Set,
we can not con irm which method
performs best.
• Same Test Set scores may be due
to the small test sample size (18
samples). Therefore, we tested all
methods based on the whole
Dataset.
Scores and Accuracy of the Entire Data Set
• The scores of the whole Dataset
con irm that the best model is the
Decision Tree Model. This model
has not only higher scores, but also
the highest accuracy.
f
f
Confusion Matrix
Explanation:
• Examining the confusion matrix, we see
that logistic regression can distinguish
between the different classes. We see
that the major problem is false positives.
Conclusion
• Decision Tree Model is the best algorithm for this dataset.
• Orbits ES-L1, GEO, HEO and SSO have 100% success rate.
Appendix