0% found this document useful (0 votes)
47 views21 pages

Capstone Story Presentation

The document describes a data science capstone project that aims to build machine learning models to predict the successful landing of SpaceX rocket stages. It discusses data collection from SpaceX API and Wikipedia, exploratory data analysis using SQL and visualization, developing classification models, and evaluating model performance.

Uploaded by

asksandeepsd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views21 pages

Capstone Story Presentation

The document describes a data science capstone project that aims to build machine learning models to predict the successful landing of SpaceX rocket stages. It discusses data collection from SpaceX API and Wikipedia, exploratory data analysis using SQL and visualization, developing classification models, and evaluating model performance.

Uploaded by

asksandeepsd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Data Science

Capstone
Project

M.Durga Sai Sandeep


07/04/2024
https://github.com/Sandeepmopidevi/
OUTLINE

• Executive Summary
• Introduction
• Methodology
• Results
• Visualization – Charts
• Dashboard
• Discussion
• Findings & Implications
• Conclusion
• Appendix
EXECUTIVE SUMMARY
1. Data Collection & Preparation:
 Utilized public SpaceX API and Wikipedia page.
 Created 'class' column for successful landing classification.
 Explored data using SQL, visualization, Folium maps, and
dashboards.
 Selected relevant features for machine learning.
2. Data Preprocessing:
 Applied onehot encoding to categorical variables.
 Standardized data for uniform scale.
 Optimized model parameters using GridSearchCV.
3. Machine Learning Models:
 Developed models:
 Logistic Regression
 Support Vector Machine
 Decision Tree Classifier
 K Nearest Neighbors
 Achieved consistent accuracy (~83.33%).
4. Evaluation & Analysis:
 Models tended to over predict successful landings.
 Identified need for more data to enhance accuracy.
5. Model Performance Visualization:
 Visualized accuracy scores to compare model performance.
INTRODUCTION
Background:
 Commercial space age is booming.
 SpaceX offers competitive pricing ($62M vs. $165M USD) due to
rocket recovery.
 Space Y aims to rival SpaceX.
Problem:
 Space Y seeks a machine learning model to predict successful
Stage 1 recovery.
Approach:
 Data collection from SpaceX API and industry sources.
 Preprocess data and engineer features.
 Train ML models: logistic regression, SVM, decision trees,
kNN.
 Evaluate model performance rigorously.
Potential Impact:
 Accurate Stage 1 recovery prediction enhances Space Y's
competitiveness.
 Optimizes resources, improves efficiency, mitigates financial
risks.
 Contributes to the advancement of the commercial space
industry.
METHODOLOGY
1. Data Collection:
 Combined data from SpaceX API and Wikipedia.
2. Data Wrangling:
 Cleaned and organized collected data.
3. Classification:
 Identified successful and unsuccessful landings.
4. Exploratory Data Analysis (EDA):
 Used visualization and SQL for insights.
 Visualized data distribution.
 Extracted insights with SQL.
5. Interactive Visual Analytics:
 Employed Folium and Plotly Dash.
6. Predictive Analysis:
 Utilized classification models.
7. Model Tuning:
 Optimized models using GridSearchCV.
RESULTS
Imputate missing
Data Collection – Request Filter data to only
include Falcon 9 PayloadMass
(Space X APIs)
launches values with mean

SpaceX API
JSON file +
Lists(Launch Site, Cast dictionary to a
Booster Version, DataFrame
Payload Data)

Json_normalize
Dictionary
to DataFrame
data from JSON relevant data

GitHub url:
https://github.com/Sandeepmopidevi/app
lied-data-science-capstone-edx-
tasks/blob/main/jupyter-labs-spacex-
data-collection-api.ipynb
Request Cast dictionary to
Wikipedia DataFrame
html

BeautifulSoup Iterate through


table cells to
html5lib Parser
extract data to
dictionary

GitHub url:
Find launch info Create
html table dictionary
Requesting Creating a Extracting
Falcon 9 launch BeautifulSoup object all column names
data from from the HTML from the HTML table
Wikipedia response header

Collecting the data


by parsing
HTML tables

Constructing data
Exporting the data Creating a dataframe
we have obtained
to CSV from the dictionary
into a dictionary

Github Url Data Collection


COMPLETE THE EDA WITH SQL
• Utilized SQL queries to perform Data Exploration:
 Leveraged SQL queries to gain insights into the dataset.
comprehensive exploratory data analysis
(EDA), extracting valuable insights directly Summary Statistics:
 Calculated descriptive statistics such as mean, median,
from the dataset. and standard deviation.

• SQL facilitated efficient querying, aggregation, Data Distribution:


 Analyzed distribution of key variables using SQL functions.
and manipulation of data, enabling in-depth
analysis of various aspects such as Relationship Analysis:
 Investigated correlations between variables through SQL
distribution, relationships, trends, and outliers. joins and aggregations.

• The EDA with SQL provided a solid foundation Trend Analysis:


 Examined temporal trends using SQL date functions and
for understanding the dataset's characteristics time-series analysis.
and informing subsequent analytical decisions. Outlier Detection:
 Identified outliers using SQL queries and visualizations.
Data Quality Assessment:
 Assessed data completeness, accuracy, and consistency
through SQL validations.

GitHub Link:- Complete the EDA with SQL


COMPLETE THE EDA WITH VISUALIZATION
• EDA with visualization offers Data Distribution
insights into data characteristics,
aiding in decision-making and
hypothesis generation. Correlation Analysis

• Visualizations help identify patterns, Temporal Analysis


trends, outliers, and dependencies,
enhancing data understanding.
Geographic Insights
• Findings guide subsequent analysis
and modeling, ensuring Outlier Detection
interpretability and robustness of
results.
Feature Importance

GitHub Link:- COMPLETE THE EDA WITH VISUALIZATION


INTERACTIVE VISUAL ANALYTICS WITH FOLIUM
Utilized Folium, a Python library for creating
interactive maps, to perform geospatial analysis and
Findings:-
visualization of data. With Folium, interactive maps
were generated, allowing users to explore data
geographically. Marker clustering was implemented • Map Generation
to handle large datasets effectively, providing a clear
visualization of data density. Popup information • Marker Clustering
windows were incorporated to display additional • Popup Information
details when users interacted with map markers,
enhancing data exploration. Custom icons were • Custom Icons
utilized to represent different categories or attributes,
improving map readability. Geospatial analysis • Geospatial Analysis
techniques were applied to derive insights from • Interactive Features
spatial data, enabling users to identify spatial
patterns and relationships. Interactive features such
as zooming, panning, and toggling layers were
integrated to provide users with a dynamic and
engaging mapping experience, facilitating deeper
exploration and analysis of geospatial data.

GitHub Link:- Interactive Visual Analytics with Folium


BUILD AN INTERACTIVE DASHBOARD WITH PLOTY DASH
Data Visualization:
The Interactive Dashboard built with Plotly  Implemented interactive charts and graphs using Plotly
Dash offers a dynamic and user-friendly to visualize key insights and trends.
 Included line charts, bar charts, scatter plots, and
interface for exploring and visualizing data. heat maps to represent different aspects of the data.
Leveraging the capabilities of Plotly Dash, User Interaction:
the dashboard provides interactive features  Integrated dropdown menus, sliders, and date pickers
to enable users to filter and customize the displayed
such as dropdown menus, sliders, and data dynamically.
buttons to enable users to interactively Data Exploration:
control and customize the displayed data. It  Enabled users to explore data interactively by
selecting specific variables, time periods, or regions
incorporates various data visualization of interest.
components, including graphs, charts, and Dashboard Layout:
tables, to present insights and trends  Designed an intuitive and visually appealing layout
with clear navigation and organization of dashboard
effectively. The dashboard is designed to be components.

responsive and intuitive, allowing users to Performance and Scalability:


navigate through different views and explore  Optimized dashboard performance to handle large
datasets efficiently and deliver a smooth user
data from different perspectives seamlessly. experience.

GitHub Link:- BUILD AN INTERACTIVE DASHBOARD WITH PLOTY DASH


THE MACHINE LEARNING PREDICTION LAB
Data Preprocessing:
 Identified and handled missing values,
The Machine Learning Prediction Lab is outliers, and inconsistencies in the dataset.
dedicated to developing and evaluating  Conducted feature scaling and normalization to
predictive models using advanced machine ensure uniformity across features.
learning techniques. It encompasses
Feature Engineering:
various stages of the machine learning
 Extracted and selected relevant features to
pipeline, including data preprocessing, improve model performance.
feature engineering, model selection,
and evaluation. The lab employs a Model Selection:
systematic approach to analyze and  Explored a variety of machine learning
algorithms, including logistic regression,
interpret data, aiming to uncover support vector machines, decision trees, and
meaningful insights and patterns that ensemble methods.
can drive decision-making processes.
Model Evaluation:
 Employed cross-validation techniques to assess
model generalization and robustness.

Insights:
 Identified key factors influencing the target
variable based on feature importance analysis.

GitHub Link:- MACHINE LEARNING PREDICTION LAB


PROGRAMMING LANGUAGE TRENDS
2024 2025

Percentage
Percentage
100 100
100 90
90 80 80
80 70
70 60 60 60
60 50 50
50 40 40 40 40
40 30 30
30 20 20
20 10
10 0
0
PROGRAMMING LANGUAGE TRENDS FINDINGS &
IMPLICATIONS
Findings Implications

• Finding 1: Python remains • Prioritize Python skill


dominant due to versatility and development for diverse
extensive libraries. applications.
• Finding 2: JavaScript maintains • Enhance proficiency in
prominence for web JavaScript and frameworks.
development. • Consider adopting TypeScript
• Finding 3: TypeScript and and Kotlin for modern projects.
Kotlin are emerging as viable
options.
DATABASE TRENDS
Current Year 2024 Next Year 2025
DATABASE TRENDS FINDINGS &
IMPLICATIONS
Findings Implications

• Finding 1: Relational databases such as • Organizations should maintain


MySQL and PostgreSQL continue to be proficiency in relational databases to
manage structured data effectively,
widely adopted for traditional data particularly for legacy systems and
management tasks due to their traditional applications.
robustness and stability. • Consider adopting NoSQL databases for
• Finding 2: NoSQL databases like projects with requirements for handling
diverse and rapidly changing data
MongoDB and Redis are gaining types, such as social media analytics
popularity for handling unstructured and and IoT applications.
semi-structured data, providing flexibility • Embrace cloud-native databases and
and scalability for modern applications. managed services to leverage the
benefits of scalability, flexibility,
• Finding 3: Cloud-native databases and and reduced maintenance overhead,
managed services, including DynamoDB enabling faster time-to-market and cost
and Google BigQuery, are increasingly savings.
favored for their ease of use, scalability,
and cost-effectiveness.
DASHBOARD

https://github.com/Sandeepmopidevi/appli
ed-data-science-capstone-edx-
tasks/blob/main/Cognos%20Dashboard.pdf
OVERALL FINDINGS & IMPLICATIONS
• Data Complexity: The analysis revealed the • Data Strategy: Organizations must develop
increasing complexity of data, with a growing comprehensive data strategies to manage and
volume, variety, and velocity of information harness the growing volume and complexity of
generated across various domains and industries.
data, ensuring alignment with business goals
and objectives.
• Technology Adoption: There is a notable trend • Technology Investment: Investing in advanced
towards the adoption of advanced technologies such technologies such as AI, ML, and big data
as artificial intelligence, machine learning, and big analytics is essential to gain insights from data,
data analytics, driven by the need for data-driven drive innovation, and maintain a competitive
decision-making and competitive advantage. edge in the market.
• Evolving Business Needs: Organizations are facing • Agile Decision-Making: Embracing real-time
evolving business needs and challenges, including
analytics and predictive insights enables
organizations to make agile, data-driven
the demand for real-time insights, personalized decisions, respond quickly to market changes,
customer experiences, and enhanced operational and capitalize on emerging opportunities.
efficiency.
• Skill Development: Addressing the talent gap
• Talent Gap: The findings indicate a talent gap in the through training, upskilling, and talent
field of data science and analytics, with a shortage of acquisition initiatives is crucial to build a
skilled professionals capable of leveraging complex workforce capable of effectively leveraging
data sets and advanced analytics tools effectively.
data and analytics for business success.
CONCLUSION
• User-friendly interface and intuitive design
enable easy creation and customization of
dashboards, reducing the learning curve for
users.
• Seamless data integration capabilities ensure
access to comprehensive data from diverse
sources, enhancing data analysis and decision-
making.
• Interactive visualization features empower users
to explore data dynamically, uncovering insights
and trends that drive business outcomes.
• Robust collaboration and sharing functionalities
facilitate teamwork and communication,
fostering a data-driven culture within the
organization and driving collective intelligence
POPULAR LANGUAGES

Percentage
100
100
90 80
80
70 60 60
60 50
50 40 40 40
40 30
30 20
20
10
0

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy