Data analysis with Python
Data analysis with Python
November, 2024
Outline
• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix
2
Executive Summary
Methodology
4
Section 1
5
Methodology
Executive Summary
• Data collection methodology:
• Describe how data was collected
• Perform data wrangling
• Describe how data was processed
• Perform exploratory data analysis (EDA) using visualization
and SQL
• Perform interactive visual analytics using Folium
and Plotly Dash
• Perform predictive analysis using classification models 6
Objective: Collect and analyze SpaceX past launch 2. Data Normalization and Transformation
data to predict rocket landing outcomes.
• Step 2: Convert JSON Data
• Use json_normalize to flatten JSON data into a table format.
1. Understanding the SpaceX REST API
• Step 3: Gather Additional Data
• Step 1: Accessing API Data
• Make further API calls to collect specific details like Booster and
• Target API Endpoint:
Launchpad information if needed.
api.spacexdata.com/v4/launches/past
• Use a GET request with the requests library to 3. Data Cleaning and Filtering
retrieve data.
• Step 4: Filter by Rocket Type
• Data includes details on rocket types,
• Remove Falcon 1 launches, focusing analysis on Falcon 9 only.
payloads, and landing outcomes.
• Step 5: Handle NULL Values
• For PayloadMass, replace NULL values with the column's mean.
• Keep NULLs in the LandingPad column unchanged.
7
https://github.com/ViviAhn/Pyth
on_Capstone/blob/main/spacex_
Data Collection – Scraping - Wrangling API.ipynb
https://github.com/ViviAhn/Pyth
on_Capstone/blob/main/Webscr
aping.ipynb
Data
Collection Wrangling
Data Filtering
and
Transformation
Store all data in
DataFrame
8
SpaceX API - Data Collection
https://github.com/ViviAhn/Pyth
on_Capstone/blob/main/spacex_
API.ipynb
spacex_url
API URLs:
static_json_url
requests.get
Data Retrieval
converts to
DataFrame.
Retrieve Rocket Append
Obtain Rocket ID SpaceX API
Info `BoosterVersion`
Append `Longitude`,
Obtain Launchpad Retrieve
SpaceX API `Latitude`,
Data Filtering and ID Launchpad Info
`LaunchSite`
Append
Transformation Retrieve Payload
Obtain Payload ID SpaceX API `PayloadMass`,
Info
`Orbit`
Retrieve Core
Obtain Core ID SpaceX API Append core details
Info
Store all data in DataFrame
9
SpaceX API - Data Wrangling
https://github.com/ViviAhn/Pyth
on_Capstone/blob/main/spacex_
API.ipynb
10
SpaceX REST API Calls https://github.com/ViviAhn/Pyth
on_Capstone/blob/main/Webscr
aping.ipynb
Web Scraping Process – Data Wrangling
11
Exploratory Data Analysis https://github.com/ViviAhn/Pyth
on_Capstone/blob/main/EDA%2
0with%20SQL.ipynb
6. Booster Versions with Successful Drone Ship Landings (4000kg < Payload Mass < 6000kg):
• SELECT Booster_Version FROM SPACEXTABLE WHERE Landing_Outcome = 'Success (drone ship)' AND PAYLOAD_MASS__KG_ >
4000 AND PAYLOAD_MASS__KG_ < 6000;
• This query selects Booster_Version for successful drone ship landings (Landing_Outcome = 'Success (drone ship)') with payload
mass between 4000kg and 6000kg.
14
EDA with SQL https://github.com/ViviAhn/Pyth
on_Capstone/blob/main/EDA%2
0with%20SQL.ipynb
17
Interactive Visual Analytics and Dashboards
Objective: Develop interactive visualizations and 2. Building with Folium
dashboards to enable real-time data exploration and • Step 3: Analyze Launch Site Geolocations
enhance data storytelling. • Use Folium to visualize launch sites on an interactive map.
• Mark locations and examine proximities to reveal patterns.
1. Interactive Visual Analytics • Step 4: Determine Optimal Launch Sites
• Step 1: Enable User Interactions • Use map exploration to identify potential launch site advantages.
• Allow users to interact with data using:
3. Creating a Dashboard with Plotly Dash
Zooming
Panning • Step 5: Set Up Dashboard Components
Filtering • Build a dashboard using Plotly Dash.
Searching • Add interactive input components such as:
Linking
Dropdown lists
• Goal: Facilitate quicker identification of visual
Range sliders
patterns. • Step 6: Visualize SpaceX Data
• Step 2: Advantages of Interactive Dashboards • Create interactive visualizations.
• Provide a dynamic way to present findings. • Allow users to interact with charts to gain deeper insights into
• Offer more engagement compared to static graphs. SpaceX data.
18
https://nbviewer.org/github/Vivi
Build an Interactive Map with Folium Ahn/Python_Capstone/blob/mai
n/Interactive%20Visual%20Anal
ytics%20with%20Folium%20lab.
ipynb
Folium map with the NASA Markers, circles, and lines are added to the map to
Johnson Space Center and all enhance visualization and convey specific information
launch sites about locations, distances, and regions on the map.
Markers: Represent specific points of interest, such as
a launch site.
Folium map with
Provide information when clicked, often using pop-ups
success/failed launches for
or custom icons. Useful for showing precise locations.
each site
Circles: Highlight areas around a point, such as a
radius around a launch site
Folium map and the distances
between a launch site to its Lines (or PolyLines): Connect two or more locations or
proximities distances between points. Lines can also indicate
relationships between a launch site and a destination
or transportation network.
Together, these elements help users interpret spatial
19
data in an intuitive and interactive way, adding context
Build a Dashboard with Plotly Dash
Using these plots, we can highlight the success rate and payload
capabilities of each launch site, providing an overview of the performance
at different launch sites.
24
Section 2
Flight Number vs. Launch Site
26
Payload vs. Launch Site
27
Success Rate vs. Orbit Type
orbit_success_rate = df.groupby('Orbit')
['Class'].mean().reset_index()
29
Payload vs. Orbit Type
30
Launch Success Yearly Trend
31
All Launch Site Names
These locations are primary launch sites used for different missions, likely
chosen based on mission requirements such as orbit and payload type.
These sites are unique because of their specific geographical positions,
infrastructure, and suitability for different types of orbits, which allows for
a diverse range of mission profiles based on payload and destination.
32
Launch Site Names Begin with 'CCA'
All the records retrieved have the launch site starting with "CCA", indicating that they were launched from the
Cape Canaveral Air Force Station (CCAFS).
33
Total Payload Mass
the result shows the total weight of all the cargo and payloads that
NASA has sent into space using SpaceX's rockets.
34
Average Payload Mass by F9 v1.1
35
First Successful Ground Landing Date
This indicates that the first successful ground landing achieved by SpaceX
occurred on December 22, 2015.
36
Successful Drone Ship Landing with Payload between 4000 and 6000
39
2015 Launch Records
40
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20
The map effectively highlights the launch sites relative to NASA's center. There is a
concentration of 3 launch sites along the East Coast of the United States, particularly
in Florida. This region benefits from favorable weather conditions and proximity to
the Atlantic Ocean, which is often used as a safety measure for rocket launches. 43The
inclusion of Vandenberg Air Force Base (VAFB) on the West Coast demonstrates the
importance of having launch sites on both coasts. This allows for launches into
Task 2: Mark the
success/failed launches for
each site on the map
The map highlights launch sites, visually
represented by the clusters of markers.
Green Markers: represent successful
launches. A higher concentration of green
markers at a site indicates a higher success
rate.
Red Markers: represent failed launches. A
higher number of red markers suggests a
higher failure rate.
47
Launch site with highest launch success ratio
48
Payload vs. Launch Outcome for all sites
The plot shows that SpaceX has consistently improved the reliability of its rockets, leading to high success rates
across different payload masses and booster versions.
Different booster versions have varying capabilities and performance characteristics. However, the plot suggests
that SpaceX has been able to achieve high success rates across all versions.
49
Section 5
Classification Accuracy
• Recall: 1.0
• F1-Score: 0.8889
52
Conclusions
CCAFS SLC-40 was the most used launch site
Predictive Analysis: Best performing model: Decision Tree with accuracy: 0.9036
53