Capstone Story Presentation
Capstone Story Presentation
Capstone
Project
• Executive Summary
• Introduction
• Methodology
• Results
• Visualization – Charts
• Dashboard
• Discussion
• Findings & Implications
• Conclusion
• Appendix
EXECUTIVE SUMMARY
1. Data Collection & Preparation:
Utilized public SpaceX API and Wikipedia page.
Created 'class' column for successful landing classification.
Explored data using SQL, visualization, Folium maps, and
dashboards.
Selected relevant features for machine learning.
2. Data Preprocessing:
Applied onehot encoding to categorical variables.
Standardized data for uniform scale.
Optimized model parameters using GridSearchCV.
3. Machine Learning Models:
Developed models:
Logistic Regression
Support Vector Machine
Decision Tree Classifier
K Nearest Neighbors
Achieved consistent accuracy (~83.33%).
4. Evaluation & Analysis:
Models tended to over predict successful landings.
Identified need for more data to enhance accuracy.
5. Model Performance Visualization:
Visualized accuracy scores to compare model performance.
INTRODUCTION
Background:
Commercial space age is booming.
SpaceX offers competitive pricing ($62M vs. $165M USD) due to
rocket recovery.
Space Y aims to rival SpaceX.
Problem:
Space Y seeks a machine learning model to predict successful
Stage 1 recovery.
Approach:
Data collection from SpaceX API and industry sources.
Preprocess data and engineer features.
Train ML models: logistic regression, SVM, decision trees,
kNN.
Evaluate model performance rigorously.
Potential Impact:
Accurate Stage 1 recovery prediction enhances Space Y's
competitiveness.
Optimizes resources, improves efficiency, mitigates financial
risks.
Contributes to the advancement of the commercial space
industry.
METHODOLOGY
1. Data Collection:
Combined data from SpaceX API and Wikipedia.
2. Data Wrangling:
Cleaned and organized collected data.
3. Classification:
Identified successful and unsuccessful landings.
4. Exploratory Data Analysis (EDA):
Used visualization and SQL for insights.
Visualized data distribution.
Extracted insights with SQL.
5. Interactive Visual Analytics:
Employed Folium and Plotly Dash.
6. Predictive Analysis:
Utilized classification models.
7. Model Tuning:
Optimized models using GridSearchCV.
RESULTS
Imputate missing
Data Collection – Request Filter data to only
include Falcon 9 PayloadMass
(Space X APIs)
launches values with mean
SpaceX API
JSON file +
Lists(Launch Site, Cast dictionary to a
Booster Version, DataFrame
Payload Data)
Json_normalize
Dictionary
to DataFrame
data from JSON relevant data
GitHub url:
https://github.com/Sandeepmopidevi/app
lied-data-science-capstone-edx-
tasks/blob/main/jupyter-labs-spacex-
data-collection-api.ipynb
Request Cast dictionary to
Wikipedia DataFrame
html
GitHub url:
Find launch info Create
html table dictionary
Requesting Creating a Extracting
Falcon 9 launch BeautifulSoup object all column names
data from from the HTML from the HTML table
Wikipedia response header
Constructing data
Exporting the data Creating a dataframe
we have obtained
to CSV from the dictionary
into a dictionary
Insights:
Identified key factors influencing the target
variable based on feature importance analysis.
Percentage
Percentage
100 100
100 90
90 80 80
80 70
70 60 60 60
60 50 50
50 40 40 40 40
40 30 30
30 20 20
20 10
10 0
0
PROGRAMMING LANGUAGE TRENDS FINDINGS &
IMPLICATIONS
Findings Implications
https://github.com/Sandeepmopidevi/appli
ed-data-science-capstone-edx-
tasks/blob/main/Cognos%20Dashboard.pdf
OVERALL FINDINGS & IMPLICATIONS
• Data Complexity: The analysis revealed the • Data Strategy: Organizations must develop
increasing complexity of data, with a growing comprehensive data strategies to manage and
volume, variety, and velocity of information harness the growing volume and complexity of
generated across various domains and industries.
data, ensuring alignment with business goals
and objectives.
• Technology Adoption: There is a notable trend • Technology Investment: Investing in advanced
towards the adoption of advanced technologies such technologies such as AI, ML, and big data
as artificial intelligence, machine learning, and big analytics is essential to gain insights from data,
data analytics, driven by the need for data-driven drive innovation, and maintain a competitive
decision-making and competitive advantage. edge in the market.
• Evolving Business Needs: Organizations are facing • Agile Decision-Making: Embracing real-time
evolving business needs and challenges, including
analytics and predictive insights enables
organizations to make agile, data-driven
the demand for real-time insights, personalized decisions, respond quickly to market changes,
customer experiences, and enhanced operational and capitalize on emerging opportunities.
efficiency.
• Skill Development: Addressing the talent gap
• Talent Gap: The findings indicate a talent gap in the through training, upskilling, and talent
field of data science and analytics, with a shortage of acquisition initiatives is crucial to build a
skilled professionals capable of leveraging complex workforce capable of effectively leveraging
data sets and advanced analytics tools effectively.
data and analytics for business success.
CONCLUSION
• User-friendly interface and intuitive design
enable easy creation and customization of
dashboards, reducing the learning curve for
users.
• Seamless data integration capabilities ensure
access to comprehensive data from diverse
sources, enhancing data analysis and decision-
making.
• Interactive visualization features empower users
to explore data dynamically, uncovering insights
and trends that drive business outcomes.
• Robust collaboration and sharing functionalities
facilitate teamwork and communication,
fostering a data-driven culture within the
organization and driving collective intelligence
POPULAR LANGUAGES
Percentage
100
100
90 80
80
70 60 60
60 50
50 40 40 40
40 30
30 20
20
10
0