0% found this document useful (0 votes)

47 views21 pages

Capstone Story Presentation

The document describes a data science capstone project that aims to build machine learning models to predict the successful landing of SpaceX rocket stages. It discusses data collection from SpaceX API and Wikipedia, exploratory data analysis using SQL and visualization, developing classification models, and evaluating model performance.

Uploaded by

asksandeepsd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views21 pages

Capstone Story Presentation

Uploaded by

asksandeepsd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Data Science

Capstone
Project

M.Durga Sai Sandeep

07/04/2024
https://github.com/Sandeepmopidevi/
OUTLINE

• Executive Summary
• Introduction
• Methodology
• Results
• Visualization – Charts
• Dashboard
• Discussion
• Findings & Implications
• Conclusion
• Appendix
EXECUTIVE SUMMARY
1. Data Collection & Preparation:
 Utilized public SpaceX API and Wikipedia page.
 Created 'class' column for successful landing classification.
 Explored data using SQL, visualization, Folium maps, and
dashboards.
 Selected relevant features for machine learning.
2. Data Preprocessing:
 Applied onehot encoding to categorical variables.
 Standardized data for uniform scale.
 Optimized model parameters using GridSearchCV.
3. Machine Learning Models:
 Developed models:
 Logistic Regression
 Support Vector Machine
 Decision Tree Classifier
 K Nearest Neighbors
 Achieved consistent accuracy (~83.33%).
4. Evaluation & Analysis:
 Models tended to over predict successful landings.
 Identified need for more data to enhance accuracy.
5. Model Performance Visualization:
 Visualized accuracy scores to compare model performance.
INTRODUCTION
Background:
 Commercial space age is booming.
 SpaceX offers competitive pricing ($62M vs. $165M USD) due to
rocket recovery.
 Space Y aims to rival SpaceX.
Problem:
 Space Y seeks a machine learning model to predict successful
Stage 1 recovery.
Approach:
 Data collection from SpaceX API and industry sources.
 Preprocess data and engineer features.
 Train ML models: logistic regression, SVM, decision trees,
kNN.
 Evaluate model performance rigorously.
Potential Impact:
 Accurate Stage 1 recovery prediction enhances Space Y's
competitiveness.
 Optimizes resources, improves efficiency, mitigates financial
risks.
 Contributes to the advancement of the commercial space
industry.
METHODOLOGY
1. Data Collection:
 Combined data from SpaceX API and Wikipedia.
2. Data Wrangling:
 Cleaned and organized collected data.
3. Classification:
 Identified successful and unsuccessful landings.
4. Exploratory Data Analysis (EDA):
 Used visualization and SQL for insights.
 Visualized data distribution.
 Extracted insights with SQL.
5. Interactive Visual Analytics:
 Employed Folium and Plotly Dash.
6. Predictive Analysis:
 Utilized classification models.
7. Model Tuning:
 Optimized models using GridSearchCV.
RESULTS
Imputate missing
Data Collection – Request Filter data to only
include Falcon 9 PayloadMass
(Space X APIs)
launches values with mean

SpaceX API
JSON file +
Lists(Launch Site, Cast dictionary to a
Booster Version, DataFrame
Payload Data)

Json_normalize
Dictionary
to DataFrame
data from JSON relevant data

GitHub url:
https://github.com/Sandeepmopidevi/app
lied-data-science-capstone-edx-
tasks/blob/main/jupyter-labs-spacex-
data-collection-api.ipynb
Request Cast dictionary to
Wikipedia DataFrame
html

BeautifulSoup Iterate through

table cells to
html5lib Parser
extract data to
dictionary

GitHub url:
Find launch info Create
html table dictionary
Requesting Creating a Extracting
Falcon 9 launch BeautifulSoup object all column names
data from from the HTML from the HTML table
Wikipedia response header

Collecting the data

by parsing
HTML tables

Constructing data
Exporting the data Creating a dataframe
we have obtained
to CSV from the dictionary
into a dictionary

Github Url Data Collection

COMPLETE THE EDA WITH SQL
• Utilized SQL queries to perform Data Exploration:
 Leveraged SQL queries to gain insights into the dataset.
comprehensive exploratory data analysis
(EDA), extracting valuable insights directly Summary Statistics:
 Calculated descriptive statistics such as mean, median,
from the dataset. and standard deviation.

• SQL facilitated efficient querying, aggregation, Data Distribution:

 Analyzed distribution of key variables using SQL functions.
and manipulation of data, enabling in-depth
analysis of various aspects such as Relationship Analysis:
 Investigated correlations between variables through SQL
distribution, relationships, trends, and outliers. joins and aggregations.

• The EDA with SQL provided a solid foundation Trend Analysis:

 Examined temporal trends using SQL date functions and
for understanding the dataset's characteristics time-series analysis.
and informing subsequent analytical decisions. Outlier Detection:
 Identified outliers using SQL queries and visualizations.
Data Quality Assessment:
 Assessed data completeness, accuracy, and consistency
through SQL validations.

GitHub Link:- Complete the EDA with SQL

COMPLETE THE EDA WITH VISUALIZATION
• EDA with visualization offers Data Distribution
insights into data characteristics,
aiding in decision-making and
hypothesis generation. Correlation Analysis

• Visualizations help identify patterns, Temporal Analysis

trends, outliers, and dependencies,
enhancing data understanding.
Geographic Insights
• Findings guide subsequent analysis
and modeling, ensuring Outlier Detection
interpretability and robustness of
results.
Feature Importance

GitHub Link:- COMPLETE THE EDA WITH VISUALIZATION

INTERACTIVE VISUAL ANALYTICS WITH FOLIUM
Utilized Folium, a Python library for creating
interactive maps, to perform geospatial analysis and
Findings:-
visualization of data. With Folium, interactive maps
were generated, allowing users to explore data
geographically. Marker clustering was implemented • Map Generation
to handle large datasets effectively, providing a clear
visualization of data density. Popup information • Marker Clustering
windows were incorporated to display additional • Popup Information
details when users interacted with map markers,
enhancing data exploration. Custom icons were • Custom Icons
utilized to represent different categories or attributes,
improving map readability. Geospatial analysis • Geospatial Analysis
techniques were applied to derive insights from • Interactive Features
spatial data, enabling users to identify spatial
patterns and relationships. Interactive features such
as zooming, panning, and toggling layers were
integrated to provide users with a dynamic and
engaging mapping experience, facilitating deeper
exploration and analysis of geospatial data.

GitHub Link:- Interactive Visual Analytics with Folium

BUILD AN INTERACTIVE DASHBOARD WITH PLOTY DASH
Data Visualization:
The Interactive Dashboard built with Plotly  Implemented interactive charts and graphs using Plotly
Dash offers a dynamic and user-friendly to visualize key insights and trends.
 Included line charts, bar charts, scatter plots, and
interface for exploring and visualizing data. heat maps to represent different aspects of the data.
Leveraging the capabilities of Plotly Dash, User Interaction:
the dashboard provides interactive features  Integrated dropdown menus, sliders, and date pickers
to enable users to filter and customize the displayed
such as dropdown menus, sliders, and data dynamically.
buttons to enable users to interactively Data Exploration:
control and customize the displayed data. It  Enabled users to explore data interactively by
selecting specific variables, time periods, or regions
incorporates various data visualization of interest.
components, including graphs, charts, and Dashboard Layout:
tables, to present insights and trends  Designed an intuitive and visually appealing layout
with clear navigation and organization of dashboard
effectively. The dashboard is designed to be components.

responsive and intuitive, allowing users to Performance and Scalability:

navigate through different views and explore  Optimized dashboard performance to handle large
datasets efficiently and deliver a smooth user
data from different perspectives seamlessly. experience.

GitHub Link:- BUILD AN INTERACTIVE DASHBOARD WITH PLOTY DASH

THE MACHINE LEARNING PREDICTION LAB
Data Preprocessing:
 Identified and handled missing values,
The Machine Learning Prediction Lab is outliers, and inconsistencies in the dataset.
dedicated to developing and evaluating  Conducted feature scaling and normalization to
predictive models using advanced machine ensure uniformity across features.
learning techniques. It encompasses
Feature Engineering:
various stages of the machine learning
 Extracted and selected relevant features to
pipeline, including data preprocessing, improve model performance.
feature engineering, model selection,
and evaluation. The lab employs a Model Selection:
systematic approach to analyze and  Explored a variety of machine learning
algorithms, including logistic regression,
interpret data, aiming to uncover support vector machines, decision trees, and
meaningful insights and patterns that ensemble methods.
can drive decision-making processes.
Model Evaluation:
 Employed cross-validation techniques to assess
model generalization and robustness.

Insights:
 Identified key factors influencing the target
variable based on feature importance analysis.

GitHub Link:- MACHINE LEARNING PREDICTION LAB

PROGRAMMING LANGUAGE TRENDS
2024 2025

Percentage
Percentage
100 100
100 90
90 80 80
80 70
70 60 60 60
60 50 50
50 40 40 40 40
40 30 30
30 20 20
20 10
10 0
0
PROGRAMMING LANGUAGE TRENDS FINDINGS &
IMPLICATIONS
Findings Implications

• Finding 1: Python remains • Prioritize Python skill

dominant due to versatility and development for diverse
extensive libraries. applications.
• Finding 2: JavaScript maintains • Enhance proficiency in
prominence for web JavaScript and frameworks.
development. • Consider adopting TypeScript
• Finding 3: TypeScript and and Kotlin for modern projects.
Kotlin are emerging as viable
options.
DATABASE TRENDS
Current Year 2024 Next Year 2025
DATABASE TRENDS FINDINGS &
IMPLICATIONS
Findings Implications

• Finding 1: Relational databases such as • Organizations should maintain

MySQL and PostgreSQL continue to be proficiency in relational databases to
manage structured data effectively,
widely adopted for traditional data particularly for legacy systems and
management tasks due to their traditional applications.
robustness and stability. • Consider adopting NoSQL databases for
• Finding 2: NoSQL databases like projects with requirements for handling
diverse and rapidly changing data
MongoDB and Redis are gaining types, such as social media analytics
popularity for handling unstructured and and IoT applications.
semi-structured data, providing flexibility • Embrace cloud-native databases and
and scalability for modern applications. managed services to leverage the
benefits of scalability, flexibility,
• Finding 3: Cloud-native databases and and reduced maintenance overhead,
managed services, including DynamoDB enabling faster time-to-market and cost
and Google BigQuery, are increasingly savings.
favored for their ease of use, scalability,
and cost-effectiveness.
DASHBOARD

https://github.com/Sandeepmopidevi/appli
ed-data-science-capstone-edx-
tasks/blob/main/Cognos%20Dashboard.pdf
OVERALL FINDINGS & IMPLICATIONS
• Data Complexity: The analysis revealed the • Data Strategy: Organizations must develop
increasing complexity of data, with a growing comprehensive data strategies to manage and
volume, variety, and velocity of information harness the growing volume and complexity of
generated across various domains and industries.
data, ensuring alignment with business goals
and objectives.
• Technology Adoption: There is a notable trend • Technology Investment: Investing in advanced
towards the adoption of advanced technologies such technologies such as AI, ML, and big data
as artificial intelligence, machine learning, and big analytics is essential to gain insights from data,
data analytics, driven by the need for data-driven drive innovation, and maintain a competitive
decision-making and competitive advantage. edge in the market.
• Evolving Business Needs: Organizations are facing • Agile Decision-Making: Embracing real-time
evolving business needs and challenges, including
analytics and predictive insights enables
organizations to make agile, data-driven
the demand for real-time insights, personalized decisions, respond quickly to market changes,
customer experiences, and enhanced operational and capitalize on emerging opportunities.
efficiency.
• Skill Development: Addressing the talent gap
• Talent Gap: The findings indicate a talent gap in the through training, upskilling, and talent
field of data science and analytics, with a shortage of acquisition initiatives is crucial to build a
skilled professionals capable of leveraging complex workforce capable of effectively leveraging
data sets and advanced analytics tools effectively.
data and analytics for business success.
CONCLUSION
• User-friendly interface and intuitive design
enable easy creation and customization of
dashboards, reducing the learning curve for
users.
• Seamless data integration capabilities ensure
access to comprehensive data from diverse
sources, enhancing data analysis and decision-
making.
• Interactive visualization features empower users
to explore data dynamically, uncovering insights
and trends that drive business outcomes.
• Robust collaboration and sharing functionalities
facilitate teamwork and communication,
fostering a data-driven culture within the
organization and driving collective intelligence
POPULAR LANGUAGES

Percentage
100
100
90 80
80
70 60 60
60 50
50 40 40 40
40 30
30 20
20
10
0

IBM Data Science Capstone
89% (9)
IBM Data Science Capstone
51 pages
Data Science Specialization Capstone Presentation
No ratings yet
Data Science Specialization Capstone Presentation
46 pages
The x86 PC Assembly Language, Design, and Interfacing MUHAMMAD ALI MAZIDI JANICE GILLISPIE MAZIDI DANNY CAUSEY Fifth Edition
100% (2)
The x86 PC Assembly Language, Design, and Interfacing MUHAMMAD ALI MAZIDI JANICE GILLISPIE MAZIDI DANNY CAUSEY Fifth Edition
826 pages
Winning Space Race With Data Science
No ratings yet
Winning Space Race With Data Science
46 pages
PYTHON Poster
No ratings yet
PYTHON Poster
1 page
Ds Capstone Template Coursera
No ratings yet
Ds Capstone Template Coursera
47 pages
Capstone Final
100% (1)
Capstone Final
40 pages
GLSCENE - User Tutorial
100% (4)
GLSCENE - User Tutorial
29 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
54 pages
IBM DS Certificate CapstoneProject SamiAlaruri
No ratings yet
IBM DS Certificate CapstoneProject SamiAlaruri
49 pages
IBM Data Science Journey - 005
No ratings yet
IBM Data Science Journey - 005
47 pages
IBM Data Science Capstone Project 2022
No ratings yet
IBM Data Science Capstone Project 2022
49 pages
Ds Capstone Presentation
No ratings yet
Ds Capstone Presentation
47 pages
Organized
No ratings yet
Organized
47 pages
IBM Data Science Professional Certificate Capstone Signed
No ratings yet
IBM Data Science Professional Certificate Capstone Signed
48 pages
Ds Capstone Template Coursera
No ratings yet
Ds Capstone Template Coursera
50 pages
FINAL FINDINGS - IBM-DataScience-Professional-Cert - Applied - Capstone - Project
No ratings yet
FINAL FINDINGS - IBM-DataScience-Professional-Cert - Applied - Capstone - Project
48 pages
IBM Data Science Capstone
No ratings yet
IBM Data Science Capstone
51 pages
Data Science Capstone Project
No ratings yet
Data Science Capstone Project
21 pages
00 Final Presentation Echeverria
No ratings yet
00 Final Presentation Echeverria
42 pages
SpaceY Data Analytics Final Presentation DJ
No ratings yet
SpaceY Data Analytics Final Presentation DJ
50 pages
Asmat Pace Tech 3-20-24
No ratings yet
Asmat Pace Tech 3-20-24
52 pages
IBMData Science Capstone
No ratings yet
IBMData Science Capstone
52 pages
DATASCIENCE Capstone
No ratings yet
DATASCIENCE Capstone
45 pages
My Capstone Project Presentation
No ratings yet
My Capstone Project Presentation
46 pages
Examen Final Coursera
No ratings yet
Examen Final Coursera
50 pages
IBM Capstone SpaceY Taylor Collard
No ratings yet
IBM Capstone SpaceY Taylor Collard
47 pages
Tiago Flores 2021-10-28
No ratings yet
Tiago Flores 2021-10-28
51 pages
Big Data Analytics
From Everand
Big Data Analytics
Venkat Ankam
No ratings yet
DS Capstone Presentation
No ratings yet
DS Capstone Presentation
46 pages
DS Capstone Powerpoint
No ratings yet
DS Capstone Powerpoint
46 pages
SPACEX
No ratings yet
SPACEX
19 pages
Datascience Capestone Presentation - Final
No ratings yet
Datascience Capestone Presentation - Final
47 pages
Module 2
No ratings yet
Module 2
78 pages
Henry Yan 3-Jan-2022
No ratings yet
Henry Yan 3-Jan-2022
46 pages
Winning Space Race With Data Science
No ratings yet
Winning Space Race With Data Science
46 pages
Applied Data Science Capstone - Spacex
No ratings yet
Applied Data Science Capstone - Spacex
49 pages
Project PPT
No ratings yet
Project PPT
47 pages
Group Assignment - 2024 - 9
No ratings yet
Group Assignment - 2024 - 9
3 pages
Introduction To Data Science: What Is Data Science? What Is A Data Science Pipeline?
No ratings yet
Introduction To Data Science: What Is Data Science? What Is A Data Science Pipeline?
3 pages
4.3 Applied Data Science Capstone-Collecting The Data 1
No ratings yet
4.3 Applied Data Science Capstone-Collecting The Data 1
14 pages
Hari
No ratings yet
Hari
18 pages
DP-420 Designing and Implementing Cloud-Native Applications Using Microsoft Azure Cosmos DB Certification Exam Guide
From Everand
DP-420 Designing and Implementing Cloud-Native Applications Using Microsoft Azure Cosmos DB Certification Exam Guide
Anand Vemula
No ratings yet
DS Capstone Presentation
No ratings yet
DS Capstone Presentation
46 pages
10 Points
No ratings yet
10 Points
3 pages
Capstone Presentation
No ratings yet
Capstone Presentation
36 pages
Sqoop Essentials: Definitive Reference for Developers and Engineers
From Everand
Sqoop Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Unit - 1
No ratings yet
Unit - 1
25 pages
Lab - 01 - Data Engineering Practice
No ratings yet
Lab - 01 - Data Engineering Practice
4 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
29 pages
Paper 1
No ratings yet
Paper 1
13 pages
Untitled
No ratings yet
Untitled
4 pages
CS202 Assignment - 4 - GIKI
No ratings yet
CS202 Assignment - 4 - GIKI
3 pages
Assignment03 DataScience Report
No ratings yet
Assignment03 DataScience Report
4 pages
Couchbase Essentials: Definitive Reference for Developers and Engineers
From Everand
Couchbase Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SQL Database Mastery: Advanced Techniques for Database Management
From Everand
SQL Database Mastery: Advanced Techniques for Database Management
Adam Jones
No ratings yet
Final Project
No ratings yet
Final Project
48 pages
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Analysis and Visualization On Space Race (Spacenalyzer)
No ratings yet
Data Analysis and Visualization On Space Race (Spacenalyzer)
25 pages
Structured Model Plan
No ratings yet
Structured Model Plan
3 pages
Learn SQL in 24 Hours: The Complete Beginner’s Guide: Master Coding in 24 Hours
From Everand
Learn SQL in 24 Hours: The Complete Beginner’s Guide: Master Coding in 24 Hours
Aniket Jain
No ratings yet
Software Design Document ARIS EB Revision
No ratings yet
Software Design Document ARIS EB Revision
35 pages
XTelc Sample B1 002
No ratings yet
XTelc Sample B1 002
12 pages
HP Z440, Z640, and Z840 Workstation Series: Maintenance and Service Guide
No ratings yet
HP Z440, Z640, and Z840 Workstation Series: Maintenance and Service Guide
133 pages
Puppet
No ratings yet
Puppet
38 pages
International Business Machines: 1885-1924: The Origin of IBM
No ratings yet
International Business Machines: 1885-1924: The Origin of IBM
6 pages
Scool Management System Mukul
No ratings yet
Scool Management System Mukul
203 pages
Emp Synopsis
No ratings yet
Emp Synopsis
6 pages
National College Report Hassain Event
No ratings yet
National College Report Hassain Event
28 pages
Bca 2024
100% (1)
Bca 2024
44 pages
6.radio Frequency (RF) Framework and Resource Management
No ratings yet
6.radio Frequency (RF) Framework and Resource Management
15 pages
Kritik Resume
No ratings yet
Kritik Resume
2 pages
HP Workstation Z2 SSF G4 Datasheet
No ratings yet
HP Workstation Z2 SSF G4 Datasheet
4 pages
Class V Computer Education Syllabus-2024-25
No ratings yet
Class V Computer Education Syllabus-2024-25
1 page
Unit 1 4
No ratings yet
Unit 1 4
18 pages
Debark University CS: Department of
No ratings yet
Debark University CS: Department of
18 pages
AulaPrática - 1a Ex7 8
No ratings yet
AulaPrática - 1a Ex7 8
1 page
Google Docs - All About Google Drive
No ratings yet
Google Docs - All About Google Drive
204 pages
Revit Structure PRESEN ASS
No ratings yet
Revit Structure PRESEN ASS
18 pages
En 202412 B 51
No ratings yet
En 202412 B 51
48 pages
Practical Class 10th
No ratings yet
Practical Class 10th
27 pages
Faculty of Computer Science: Academic Curriculum
No ratings yet
Faculty of Computer Science: Academic Curriculum
6 pages
PP-Unit 1-Notes
No ratings yet
PP-Unit 1-Notes
62 pages
PC Software 2015-2022 Ques Paper
No ratings yet
PC Software 2015-2022 Ques Paper
16 pages
One App and One Number
No ratings yet
One App and One Number
8 pages
1tool QuickReference EN
No ratings yet
1tool QuickReference EN
18 pages
Supercomputer - Wikipedia
No ratings yet
Supercomputer - Wikipedia
16 pages
Hypertext Is Text Displayed On A Computer Display or Other Electronic Devices With References
No ratings yet
Hypertext Is Text Displayed On A Computer Display or Other Electronic Devices With References
1 page
Heroine's Quest Manual English
No ratings yet
Heroine's Quest Manual English
46 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Capstone Story Presentation

Uploaded by

Capstone Story Presentation

Uploaded by

Data Science

M.Durga Sai Sandeep

BeautifulSoup Iterate through

Collecting the data

Github Url Data Collection

• SQL facilitated efficient querying, aggregation, Data Distribution:

• The EDA with SQL provided a solid foundation Trend Analysis:

GitHub Link:- Complete the EDA with SQL

• Visualizations help identify patterns, Temporal Analysis

GitHub Link:- COMPLETE THE EDA WITH VISUALIZATION

GitHub Link:- Interactive Visual Analytics with Folium

responsive and intuitive, allowing users to Performance and Scalability:

GitHub Link:- BUILD AN INTERACTIVE DASHBOARD WITH PLOTY DASH

GitHub Link:- MACHINE LEARNING PREDICTION LAB

• Finding 1: Python remains • Prioritize Python skill

• Finding 1: Relational databases such as • Organizations should maintain

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.