0% found this document useful (0 votes)

173 views49 pages

Ds Capstone Template Coursera

Uploaded by

William Andreas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

173 views49 pages

Ds Capstone Template Coursera

Uploaded by

William Andreas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

William Andreas

July 11, 2022

Outline

• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix

2
Executive Summary
• Summary of methodologies
• Data Collection with Web Scraping
• Data Collection via API
• Data Wrangling
• EDA with SQL
• EDA with data visualization
• Build interactive maps with Folium.
• Build interactive dashboards with Plotly Dash.
• Predictive analysis using classification machine learning model

• Summary of all results

• EDA
• Interactive Analytics 3

• Predictive Analysis
Introduction
• Project background and context
• SpaceX advertises Falcon 9 rocket launches on its website, with a cost of 62 million dollars; other
providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse
the first stage.

• Problems you want to find answers

• Which elements determine whether the rocket will successfully land?
• Ways of different elements interact to affect the likelihood of a successful landing.
• Operational requirements which must be met to guarantee a successful landing program.

4
Section 1

5
Methodology
Executive Summary
• Data collection methodology:
• SpaceX Rest API.
• Web Scraping from SpaceX’s Wikipedia page.
• Perform data wrangling:
• Input missing value, encode categorical data, using only relevant columns of data.
• Perform exploratory data analysis (EDA) using visualization and SQL.
• Perform interactive visual analytics using Folium and Plotly Dash.
• Perform predictive analysis using classification models:
• Build several model (SVM, Classification Trees, kNN, and Logistic Regression).
• Find the best hyperparameter for each model. 6

• Find the method performs best using test data.

Data Collection
Describe how data sets were collected.
Utilizing a get request to the SpaceX API, data was gathered.
Next, we used the.json() function call to decode the response's content as JSON and the.json_normalize()
function call to convert it into a pandas dataframe.
The data was then cleansed, missing values were checked for, and filled in as appropriate.
Web scraping from SpaceX’s page on Wikipedia using BeautifulSoup.

7
Data Collection – SpaceX API

• https://github.com/W1lly-Wonka/IBM-Data-Science-Certification/
blob/main/jupyter-labs-spacex-data-collection-api.ipynb 8
Data Collection - Scraping
• https://
github.com/
W1lly-Wonka/
IBM-Data-
Science-
Certification/
blob/main/
jupyter_labs_we
bscraping.ipynb

9
Data Wrangling
https://
github.com/
W1lly-Wonka/
IBM-Data-
Science-
Certification/
blob/main/
jupyter-spacex-
Data
%20wrangling.i
pynb

10
EDA with Data Visualization
Scatter point charts to visualize the:
1. Relationship between Flight Number and Launch Site.
2. Relationship between Payload and Launch Site.
3. Relationship between FlightNumber and Orbit type.
4. Relationship between Payload and Orbit type
Bar chart to visualize the:
5. Relationship between success rate of each orbit type.
Line chart to visualize the:
6. Launch success yearly trend

https://github.com/W1lly-Wonka/IBM-Data-Science-Certification/blob/main/jupyter-labs-eda-dataviz.ipynb
11
EDA with SQL
Display the names of the unique launch sites in the space mission

Display 5 records where launch sites begin with the string 'CCA’

Display the total payload mass carried by boosters launched by NASA (CRS)

Display average payload mass carried by booster version F9 v1.1

List the date when the first succesful landing outcome in ground pad was acheived.

List the names of the boosters which have success in drone ship and have payload mass greater than 4000 but less than 6000

List the total number of successful and failure mission outcomes

List the names of the booster_versions which have carried the maximum payload mass. Use a subquery

List the records which will display the month names, failure landing_outcomes in drone ship ,booster versions, launch_site for the months in
year 2015.

Rank the count of successful landing_outcomes between the date 04-06-2010 and 20-03-2017 in descending order.

https://github.com/W1lly-Wonka/IBM-Data-Science-Certification/blob/main/jupyter-labs-eda-sql-coursera_sqllite.ipynb
12
Build an Interactive Map with Folium
Mark all launch sites on a map, by using map objects such as circles and color-labeled
markers to pinpoint success/failed launches for each site on the map.
Using lines to pinpoint the distance between launch site to the nearest coastline, city,
railway and highway

https://github.com/W1lly-Wonka/IBM-Data-Science-Certification/blob/main/
lab_jupyter_launch_site_location.ipynb

13
Build a Dashboard with Plotly Dash
Built two plots for dashboard:
1. pie graphs displaying the overall number of launches by specific sites and their
respective success rate
2. scatter plot displaying the link between Payload Mass (Kg) and Outcome for several
booster versions.

https://github.com/W1lly-Wonka/IBM-Data-Science-Certification/blob/main/
spacex_dash_app.py

14
Predictive Analysis (Classification)
Load the data, transform it using StandardScaler, and split it
into train and test data.
Using GridSearchCV, construct various machine learning
models and tuned their respective various hyperparameters.
Using accuracy metric and best hyperparameter for each
models, calculate accuracy in test data, also display the score.
The most effective classification model was discovered by
comparing accuracy score in test data for each models.

https://github.com/W1lly-Wonka/IBM-Data-Science-
Certification/blob/main/
SpaceX_Machine_Learning_Prediction_Part_5.ipynb 15
Results
• Exploratory data analysis results
1. CCAFS LC-40, has a success rate of 60 %, while KSC LC-39A and VAFB SLC 4E has
a success rate of 77%.
2. KSC LC 39A had the most successful launches rate from all the sites.
3. For VAFB-SLC launchsite, there are no rockets launched for heavypayload
mass(greater than 10000).
4. For Orbit ES-L1, GEO, HEO, and SSO has the best Success Rate.
5. LEO orbit the Success appears related to the number of flights; on the other hand, there
seems to be no relationship between flight number when in GTO orbit.
6. For the success rate since 2013 kept increasing till 2020.
16
Results
Interactive analytics demo in screenshots

17
Results
• Exploratory data analysis results
• Interactive analytics demo in screenshots
• Predictive analysis results
• Logistic Regression model is the best model based on the accuracy test score which is
0.833333 and the fastest in running time. Also the difference between accuracy scores on
the train and test data is the smallest among other models, suggesting that the model is
stable.

18
Section 2
Flight Number vs. Launch Site

• Launches from the site of CCAFS SLC 40 are significantly higher than launches
form other sites.
20
Payload vs. Launch Site

• For the VAFB-SLC launchsite there are no rockets launched for heavypayload mass(greater than 10000).

21
Success Rate vs. Orbit Type

• For Orbit ES-L1, GEO, HEO, and SSO has the best Success Rate.
22
Flight Number vs. Orbit Type

• LEO orbit the Success appears related to the number of flights; on the other hand, there seems to be no relationship between
flight number when in GTO orbit.

23
Payload vs. Orbit Type

• With heavy payloads the successful landing or positive landing rate are more for
Polar,LEO and ISS.
• However for GTO we cannot distinguish this well as both positive landing rate and
negative landing(unsuccessful mission) are both here and there.
24
Launch Success Yearly Trend

• The success rate since 2013 kept increasing till 2020

25
All Launch Site Names

Use distinct keyword for the column Launch_Site to show only unique values of the
column

26
Launch Site Names Begin with 'CCA'

Use wildcard method LIKE “CCA%” on launch site column to filter the data and
limit the data by 5 top rows.

27
Total Payload Mass

First sum the payload mass carried then filter the condition of payload column by
using wildcard method LIKE '%CRS%’

28
Average Payload Mass by F9 v1.1

First average the payload mass the filter the condition where the booster version is
F9 v1.1

29
First Successful Ground Landing Date

First we need to know the distinct values of landing outcome, then we need to know
if the landing outcome = success on the ground pad then filter the data using
min(date) or we can just limit the data by only the first row since the data is on the
top row. It can be seen that the first was December 22,2015

30
Successful Drone Ship Landing with Payload between 4000 and 6000

First we have to know the distinct values of booster version then filter the condition
where the payload mass is between 4000 and 6000 and the landing outcome =
success on the drone ship

31
Total Number of Successful and Failure Mission Outcomes

First we have to know the distinct values of mission outcome then the column needs
to be counted for each distinct values

32
Boosters Carried Maximum Payload

So we have to select the booster version data where the payloadmass is the
maximum of it. We can filter the data by using the subquery on the max
payload column and also additional condition which is ordering it by
booster version

33
2015 Launch Records

Because SQLLite does not support monthnames, we have to use substr(Date, 4, 2) as

month to get the months and substr(Date,7,4)='2015' for year. Also select booster
version, and launch site where landing outcome = failure on drone ship and use
wildcard method on year column like "2015%"

34
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20

First we have to know the dates between '04-06-2010' AND '20-03-2017’ then count
landing outcome also the condition where the column is success only, by using
wildcard method. Group by landing outcome and order the table by count of landing
outcome in descending order.

35
Section 3
All Launch Sites Location

It can be seen from the map that the launch sites always near the US coasts.
37
Color-labeled launch outcomes on the map
• Florida • California

The green color suggests that the launch is successful, and the red indicates otherwise.
38
Selected launch site to its proximity to landmarks
VAFB SLC-4E to coastline = 1,37 KM (close enough)
VAFB SLC-4E to railways = 1,27 KM (close enough)

VAFB SLC-4E to a highway = 12,42 KM (far enough)

VAFB SLC-4E to a city = 13,99 KM (far enough)

39
Section 4
Launch success count for all sites

It can be seen from the graph that the KSC LC-39A launch site has the best success
rate while the worst is CCAFS SLC-40.
41
Launch site with highest launch success ratio

KSC LC-39A launch site able to achieve a 79% success rate.

42
Payload vs. Launch Outcome scatter plot for all sites

On the left : less than 5000 kg

On the right : more than 5000 kg

It can be concluded that the heavier the weight (in this case more than 5000 kg) the lower
the success rate.
43
Section 5
Classification Accuracy

• Logistic Regression and the SVM models have the highest classification accuracy of all
with the score of 0.833333.
45
Confusion Matrix

• The model can distinguish which ones are successful and which are not. the problem is
that there are 3 data of false positives, which should have been predicted to land, but in
fact, they didn't.
46
Conclusions
KSC LC-39A launch site has the best success rate while the worst is CCAFS SLC-40.
KSC LC-39A launch site able to achieve a 79% success rate.
It can be concluded that the heavier the weight (in this case more than 5000 kg) the lower
the success rate.
Logistic Regression and the SVM models have the highest classification accuracy of all
with the score of 0.833333.
The model can distinguish which ones are successful and which are not. the problem is that
there are 3 data of false positives, which should have been predicted to land, but in fact,
they didn't.

47
Appendix

https://github.com/W1lly-Wonka/IBM-Data-Science-Certification

Winning Space Race With Data Science
No ratings yet
Winning Space Race With Data Science
46 pages
Data Analyst Udemy Report Writing PDF
No ratings yet
Data Analyst Udemy Report Writing PDF
15 pages
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Unofficial Cheat Sheet For Forecasting
No ratings yet
Unofficial Cheat Sheet For Forecasting
2 pages
Unit 1-Satellite Communication
No ratings yet
Unit 1-Satellite Communication
100 pages
Data Science Capstone Project
No ratings yet
Data Science Capstone Project
21 pages
Machine Learning Quiz Answer
No ratings yet
Machine Learning Quiz Answer
4 pages
Investing in Fixed Income Securities
100% (1)
Investing in Fixed Income Securities
16 pages
AB Cheatsheet
No ratings yet
AB Cheatsheet
13 pages
CISC 6080 Capstone Project in Data Science
No ratings yet
CISC 6080 Capstone Project in Data Science
9 pages
Business Report: Advanced Statistics Module Project I
100% (1)
Business Report: Advanced Statistics Module Project I
5 pages
PG Program Dsba Classroom
No ratings yet
PG Program Dsba Classroom
16 pages
Data Science Cheat Sheets
100% (1)
Data Science Cheat Sheets
1 page
Introduction To Statistical Machine Learning
No ratings yet
Introduction To Statistical Machine Learning
84 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
Thera Bank - Project
100% (4)
Thera Bank - Project
34 pages
Unit 6. Ethical Issues in Data Science PDF
No ratings yet
Unit 6. Ethical Issues in Data Science PDF
19 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
Cracking The LinkedIn Data Scientist Interview - by Dan Lee - DataInterview - Medium
No ratings yet
Cracking The LinkedIn Data Scientist Interview - by Dan Lee - DataInterview - Medium
17 pages
Data Analyst Roadmap by Rishabh Mishra
No ratings yet
Data Analyst Roadmap by Rishabh Mishra
9 pages
Lectures Machine Learning
No ratings yet
Lectures Machine Learning
205 pages
Sukanya Linear LogisticRegression Report
100% (1)
Sukanya Linear LogisticRegression Report
23 pages
Predictive Modeling Using Transactional Data: Financial Services
100% (1)
Predictive Modeling Using Transactional Data: Financial Services
12 pages
Business Report Advance Statistics
No ratings yet
Business Report Advance Statistics
39 pages
76 - Sample - Chapter Kunci M2K3 No 9
No ratings yet
76 - Sample - Chapter Kunci M2K3 No 9
94 pages
Building A Career in Data Science - The Overview
No ratings yet
Building A Career in Data Science - The Overview
2 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
6 XG Boost - Jupyter Notebook
100% (1)
6 XG Boost - Jupyter Notebook
3 pages
Customer Churn Prediction in Telecommunication
No ratings yet
Customer Churn Prediction in Telecommunication
13 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
MBA Marketing For The 21st Century - EDU Effective
No ratings yet
MBA Marketing For The 21st Century - EDU Effective
6 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
Confusion Matrix in Machine Learning
No ratings yet
Confusion Matrix in Machine Learning
10 pages
Data Analytics Using R (DA-R)
100% (1)
Data Analytics Using R (DA-R)
67 pages
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
No ratings yet
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
9 pages
Ch02 DSS BI
No ratings yet
Ch02 DSS BI
91 pages
Diabetes Prediction Using Data Mining
No ratings yet
Diabetes Prediction Using Data Mining
17 pages
Basic Data Science Interview Questions
No ratings yet
Basic Data Science Interview Questions
18 pages
Final - Data and Ai Governance.6sept2023
No ratings yet
Final - Data and Ai Governance.6sept2023
42 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Week 1 Quiz
100% (1)
Week 1 Quiz
28 pages
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
No ratings yet
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
71 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Data Science Case Study For Introduction
No ratings yet
Data Science Case Study For Introduction
19 pages
Ml-1-Guided-Bus Report
No ratings yet
Ml-1-Guided-Bus Report
35 pages
CSC8001-Data Science Project Report
No ratings yet
CSC8001-Data Science Project Report
5 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Lead Scoring Case Study Presentation
100% (2)
Lead Scoring Case Study Presentation
11 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
Australian Gas Production - Project On Time Series Forecasting
100% (19)
Australian Gas Production - Project On Time Series Forecasting
29 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Python Cheat Sheet For Data Analysis
No ratings yet
Python Cheat Sheet For Data Analysis
2 pages
Lead Scoring Subjective Questions
No ratings yet
Lead Scoring Subjective Questions
3 pages
ML Project Report: (Text Learning Case Study)
No ratings yet
ML Project Report: (Text Learning Case Study)
9 pages
Winning Space Race With Data Science
No ratings yet
Winning Space Race With Data Science
46 pages
DS Capstone Presentation
No ratings yet
DS Capstone Presentation
46 pages
DS Capstone Powerpoint
No ratings yet
DS Capstone Powerpoint
46 pages
Henry Yan 3-Jan-2022
No ratings yet
Henry Yan 3-Jan-2022
46 pages
Paper: Recycling and Recycled Materials
No ratings yet
Paper: Recycling and Recycled Materials
10 pages
Reuse of e Uent From A Wastepaper Wash-Deinking Process
No ratings yet
Reuse of e Uent From A Wastepaper Wash-Deinking Process
9 pages
Simulation of The e Ect of Froth Washing On !otation Performance
No ratings yet
Simulation of The e Ect of Froth Washing On !otation Performance
9 pages
Effect of Alkyl Chain in Alcohol Deinking of Recycled Fibers by Flotation Process
No ratings yet
Effect of Alkyl Chain in Alcohol Deinking of Recycled Fibers by Flotation Process
11 pages
Ze550ml - Ze551ml User Guide
No ratings yet
Ze550ml - Ze551ml User Guide
160 pages
The Global Financial Crisis1
No ratings yet
The Global Financial Crisis1
5 pages
Nonoxynol-9 MSDS: Section 1: Chemical Product and Company Identification
No ratings yet
Nonoxynol-9 MSDS: Section 1: Chemical Product and Company Identification
6 pages
Internship Report: Chevron Indonesia Company
No ratings yet
Internship Report: Chevron Indonesia Company
48 pages
AE1222-Workbook 2013 - Problems and Solutions
No ratings yet
AE1222-Workbook 2013 - Problems and Solutions
69 pages
GROUP 1 & UPSC Science & Tech Sample Copy by PMR
100% (1)
GROUP 1 & UPSC Science & Tech Sample Copy by PMR
74 pages
Satellite Launching
No ratings yet
Satellite Launching
25 pages
Unit-3 ORBITAL MANEUVER
No ratings yet
Unit-3 ORBITAL MANEUVER
47 pages
The Secrets of Rocket Design Revealed - Withpics
No ratings yet
The Secrets of Rocket Design Revealed - Withpics
10 pages
Satellites Types For Remote Sensing
No ratings yet
Satellites Types For Remote Sensing
12 pages
Vasundhara
No ratings yet
Vasundhara
42 pages
Hohmann Transfer and Mission Analysis
No ratings yet
Hohmann Transfer and Mission Analysis
24 pages
Chapter - 2
No ratings yet
Chapter - 2
32 pages
00 - SpaceX - Final Presentation - JF
100% (1)
00 - SpaceX - Final Presentation - JF
43 pages
Yearbook On Space Policy 2017 Security in Outer Space Rising Stakes For Civilian Space Programmes Edward Burger Instant Download
No ratings yet
Yearbook On Space Policy 2017 Security in Outer Space Rising Stakes For Civilian Space Programmes Edward Burger Instant Download
59 pages
Group2 S&T Material Final 2024
100% (1)
Group2 S&T Material Final 2024
397 pages
Comunicaciones Satelitales 2da Edicion Timothy Pratt, Charles W. Bostian, Jeremy E. Allnutt
0% (1)
Comunicaciones Satelitales 2da Edicion Timothy Pratt, Charles W. Bostian, Jeremy E. Allnutt
143 pages
1 Spacecraft Systems Design
No ratings yet
1 Spacecraft Systems Design
48 pages
Satellite
No ratings yet
Satellite
29 pages
Ds Capstone Template Coursera
No ratings yet
Ds Capstone Template Coursera
49 pages
6.SP - Lecture Notes
No ratings yet
6.SP - Lecture Notes
72 pages
Design of Guidance Laws For Lunar Pinpoint Soft Landing
No ratings yet
Design of Guidance Laws For Lunar Pinpoint Soft Landing
13 pages
PPS®NG: Hall Effect Thruster For Next Generation Spacecraft: IEPC-2011-120
No ratings yet
PPS®NG: Hall Effect Thruster For Next Generation Spacecraft: IEPC-2011-120
11 pages
Physics Behind Satellites
No ratings yet
Physics Behind Satellites
15 pages
Orbital Mechanics
No ratings yet
Orbital Mechanics
50 pages
Satellite - Wikipedia, The Free Encyclopedia
0% (1)
Satellite - Wikipedia, The Free Encyclopedia
21 pages
UNIT 2 Rocketry and Space Mechanics
No ratings yet
UNIT 2 Rocketry and Space Mechanics
12 pages
LM-3A Series Launch Vehicles User's Manual Issue 2011
No ratings yet
LM-3A Series Launch Vehicles User's Manual Issue 2011
254 pages
Satellite Book
100% (1)
Satellite Book
387 pages
Art Erik Vargas Menciones Roch 2020
No ratings yet
Art Erik Vargas Menciones Roch 2020
26 pages
Space Shuttle
No ratings yet
Space Shuttle
24 pages
Chapter 5 - Launch Vehicle Guidance Present Scenario and Future Trends
No ratings yet
Chapter 5 - Launch Vehicle Guidance Present Scenario and Future Trends
20 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ds Capstone Template Coursera

Uploaded by

Ds Capstone Template Coursera

Uploaded by

William Andreas

July 11, 2022

• Summary of all results

• Problems you want to find answers

• Find the method performs best using test data.

Display average payload mass carried by booster version F9 v1.1

List the total number of successful and failure mission outcomes

• The success rate since 2013 kept increasing till 2020

Because SQLLite does not support monthnames, we have to use substr(Date, 4, 2) as

VAFB SLC-4E to a highway = 12,42 KM (far enough)

KSC LC-39A launch site able to achieve a 79% success rate.

On the left : less than 5000 kg

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.