0% found this document useful (0 votes)
46 views39 pages

Lecture 1 - Introduction To Data Science

This document provides an overview of Module 1 of a data science certification program. The module focuses on ensuring participants understand how to implement data science initiatives in their work by addressing queries and infrastructure problems. It will include class exercises, quizzes, and case studies to enhance learning. The lecture series introduces data science fundamentals and hands-on experience with Python, NumPy, Pandas, matplotlib and seaborn.

Uploaded by

Khan Pk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views39 pages

Lecture 1 - Introduction To Data Science

This document provides an overview of Module 1 of a data science certification program. The module focuses on ensuring participants understand how to implement data science initiatives in their work by addressing queries and infrastructure problems. It will include class exercises, quizzes, and case studies to enhance learning. The lecture series introduces data science fundamentals and hands-on experience with Python, NumPy, Pandas, matplotlib and seaborn.

Uploaded by

Khan Pk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Data Science Certification

Module 1 – Data Science & Python


Lecture 1 - Introduction to Data
Science

10/5/2019 Frontier Technology Institute 1


Data Science Certification

Instructor Profile – Module 1

• Name: Abrar Ahmed Agha


• A-Levels: Karachi Grammar School
• BS: Institute of Business Administration - 2009
• MBA: Institute of Business Administration - 2011
• Professional Experience: 6+ years of analytics experience in
FMCG, Technology and Data Consultancy
• Email: abrarahmed85@gmail.com
• LinkedIn:
https://www.linkedin.com/in/abrar-ahmed-agha-09ba1b2b/

10/5/2019 Frontier Technology Institute 2


Data Science Certification

Module 1: Objectives
• Focus:
• Ensure participant understanding of implementing Data Science initiatives
in their line of work by addressing their queries and proposing solutions for
their infrastructure level problems

• Modalities:
• Class Exercises and Quizzes will be given during the module to ensure and
enhance learning
• We will try to keep the discussion as interactive as possible
• Participants are encouraged to ask questions throughout the course at any
time (lecture can be interrupted)
• We will be doing several Data science related case studies during this first
module and throughout the course

10/5/2019 Frontier Technology Institute 3


Data Science Certification

Module 1: Lecture Series


Introduction to Data
1
Science
2 Fundamentals of Data
Science
Hands-on with Python Installation & Basic Data
3
Structures
4 Hands-on with Python
Functions
5 Hands-on with Numpy Array

6 Hands-on with Pandas

7 Hands-on with matplotlib and seaborn

8 FINAL EXAM
10/5/2019 Frontier Technology Institute 4
Data Science Certification

How will you define Data Science?


Write it down.

10/5/2019 Frontier
Abrar Technology
Ahmed AghaInstitute 5
Data Science Certification

What is Data Science?

10/5/2019 Frontier Technology Institute 6


Data Science Certification

What is Data Science?

Data Science is the


scientific exploration of
data to extract meaning
or insight, and the
construction of software
system to utilise such
insight in a business
context.

Source: University of Texas,


Dallas

10/5/2019 Frontier Technology Institute 7


Data Science Certification

What is Data Science - In the news

10/5/2019 Frontier Technology Institute 8


Data Science Certification

What is Data Science - In the news

10/5/2019 Frontier Technology Institute 9


Data Science Certification

Buzzwords - You may have heard


Big Data Machine Learning Data Mining
Massive volume of both Discipline geared toward the technological The practice of examining
structured and unstructured development of human knowledge. Machine large pre-existing databases
data that is so large it is learning allows computers to handle new in order to identify patterns,
difficult to process using situations via analysis, self-training, observation establish relationships and
traditional database and and experience. generate new information.
software techniques.

The Cloud Pattern Recognition


Algorithm Development
Delivery of on-demand Branch of machine learning
Resource needed to automate procedures in
computing resources – that focuses on the
order to identify, analyse and extract specific
everything from applications recognition of patterns and
information from a large volume of data.
to data centres – over the regularities in data.
Internet on a pay-for-use
basis.

Neural Network Deep Learning Cognitive Learning


A series of algorithms that Branch of machine learning based on a set of Theory that explains why the
attempt to identify underlying algorithms that attempt to model high-level brain is the most incredible
relationships in a set of data abstractions in data by using multiple network of information
by using a process that processing layers with complex structures, processing and interpretation
mimics the way the human composed of multiple non-linear in the body as we learn
brain operates. transformations. things.

10/5/2019 Frontier Technology Institute 10


Data Science Certification

The Profile of a Data Scientist


• A Data Scientist is
required to have in-
depth knowledge across
multiple disciplines.

• A Data Scientist provides


an end-to-end solution
with a mission to get the
insights out of the data
to drive business value.

10/5/2019 Frontier Technology Institute 11


Data Science Certification

History and Evolution of Data


Science

10/5/2019 Frontier Technology Institute 12


Data Science Certification

History of Data Science

• In 1974, Peter Naur authored the concise survey of Computer Methods


using the term ‘Data Science’
• In 1977, The IASC, also known as the International Association for
Statistical Computing was formed
• In 1977, John Tukey wrote a second paper, titled Exploratory Data
Analysis, arguing the importance of using data in selecting “which”
hypotheses to test, and that confirmatory data analysis and exploratory
data analysis should work hand-in-hand
• In 1999, Jacob Zahavi pointed out the need for new tools to handle the
massive amounts of information available to businesses, in Mining Data
for Nuggets of Knowledge

10/5/2019 Frontier Technology Institute 13


Data Science Certification

History of Data Science

• In 2001, Software-as-a-Service (SaaS) was created. This was the pre-


cursor to using Cloud-based applications
• In 2006, Hadoop 0.1.0, an open-source, non-relational database, was
released
• In 2008, the title, “Data Scientist” became a buzzword, and eventually
a part of the language. DJ Patil and Jeff Hammerbacher, of LinkedIn
and Facebook, are given credit for initiating its use as a buzzword
• In 2011, job listings for Data Scientists increased by 15,000%. There
was also an increase in seminars and conferences devoted specifically
to Data Science and Big Data. Data Science had proven itself to be a
source of profits and had become a part of corporate culture

10/5/2019 Frontier Technology Institute 14


Data Science Certification

Evolution of Data Science

10/5/2019 Frontier Technology Institute 15


Data Science Certification

BREAK

10/5/2019 Frontier
Abrar Technology
Ahmed AghaInstitute 16
Data Science Certification

Why Data Science?

10/5/2019 Frontier Technology Institute 17


Data Science Certification

Why is it Important?
Ever complex business challenges Digital phenomenon Infobesity

Through Analogue era customers 1995 2020

data 170 gigaflops


Computing
capacity
1 exaflops
(~100,000 times vs
1995)
2019
Data volume
science, 2 zetabytes

businesses 2016
Data volume

can drive 1000 exabytes

significant Your
1986
Data volume
shopping
value. cart
3 exabytes

Technological advancement exponentially Data increases at overwhelming amounts for


Business organisations struggle to maintain
enables unprecedented connectivity the human brain to perceive in decision making
competitive positioning

10/5/2019 Frontier Technology Institute 18


Data Science Certification

How is it different to…


What is new, is the opportunity to fuse advance analytical techniques
with big data to produce impactful analyses for these old problems
Data Architecture Spreadsheets

Sandbox Data Modelling Advanced Applied Mathematics

Statistics BI Reporting

Computer Intuitive Visualisations & Business


Science Acumen

10/5/2019 Frontier Technology Institute 19


Data Science Certification

Write down examples of Data


Science from your previous
knowledge or in your business

10/5/2019 Frontier
Abrar Technology
Ahmed AghaInstitute 20
Data Science Certification

Why is Data Science needed Now?

10/5/2019 Frontier Technology Institute 21


Data Science Certification

Data Science – An Enabler

• Timely: Gain instant insights from diverse data sources

• Better analytics: Improvement of business performance through real-


time analytics

• Vast amount of data: Big data technologies manage huge amounts of


data

• Insights: Can provide better insights with the help of unstructured


and semi-structured data

• Decision-making: Helps mitigate risk and make smart decision by


proper risk analysis

10/5/2019 Frontier Technology Institute 22


Data Science Certification

Evolution of Data

10/5/2019 Frontier Technology Institute 23


Data Science Certification

When Data goes Big


• Walmart handles more than 1 million customer transactions every hour,
which is imported into databases estimated to contain more than 2.5
petabytes of data
• Facebook handles 40 billion photos from its user base
• Falcon Credit Card Fraud Detection System protects 2.1 billion active
accounts world-wide
• Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5
days
• Biomedical computation like decoding human Genome & personalized
medicine generates Petabytes of data per month
• 2.5 quintillion bytes of data created everyday
• 90% of all data created in the last 2 years

10/5/2019 Frontier Technology Institute 24


Data Science Certification

Data Science Applications


• Sales & Distribution >> Revenue Forecasting / Real-time Distribution
Intelligence
• Health Care >> Lab Reports & Diagnostics / Adaptive Health Care
• Banking & Finance >> Loan Prediction / Money Laundering
• Retail >> Consumer Basket Analysis / Footfall Analytics
• E-Commerce >> Automatic Recommendation Engine
• Telecom >> Churn Prediction
• Customer Service >> Target Marketing
• HR Analytics >> Employee Skill Optimization / Talent Retention

10/5/2019 Frontier Technology Institute 25


Data Science Certification

The Data Science Lifecycle

10/5/2019 Frontier Technology Institute 26


Data Science Certification

10/5/2019 Frontier Technology Institute 27


Data Science Certification

Business Context – From Understanding


to Approach

What Where How Why When Who

10/5/2019 Frontier Technology Institute 28


Data Science Certification

Business Context – Obtain Buy-in on


the Problem Statement

10/5/2019 Frontier Technology Institute 29


Data Science Certification

Analytics Approach – Master Plan


• What kind of Data is required?
• What is the Sample size?
• Where is the Data stored? How can it be accessed?
• Is the Data in its correct form? – Tidy Data Research Paper
• Does the Data require Cleaning / Normalization?
• What are the steps in the Analytics pipeline?
• What are the challenges specific to the Business Problem? Challenges
with Data? Class imbalance?

10/5/2019 Frontier Technology Institute 30


Data Science Certification

Analytics Approach – Know your Data


• Research on the Data Science o Setting data types:
Problem, are there any underlying
theories Numbers

• Exploratory Data Analysis: Text/String

o Find issues with Data such as Date time


missing values and how to Categorical
treat them
Boolean
o Standardize values such
M/Male/Man
o Find distribution of data and
treat Outliers

10/5/2019 Frontier Technology Institute 31


Data Science Certification

10/5/2019 Frontier Technology Institute 32


Data Science Certification

Data Collection & Wrangling


• Identifying the sources of Data
• Accessing the different sources of Data
• Integrating Data sets into a consolidated set
• Creating Data Pipelines

10/5/2019 Frontier Technology Institute 33


Data Science Certification

10/5/2019 Frontier Technology Institute 34


Data Science Certification

Model Building
• Feature Engineering & Designing KPIs
o Creating KPIs which are most relevant for predictions/classification

o Transforming Data through merging multiple variables together through calculations, log
transformations

• Feature Selection
• Train and Test Split
• Selecting Algorithms
• Tweak Parameters
• Evaluate Model
• Iterate over several prototypes and Select the model that has the best
performance
10/5/2019 Frontier Technology Institute 35
Data Science Certification

10/5/2019 Frontier Technology Institute 36


Data Science Certification

Closing the Deal


• Communicate Results to close Buy-In
o Present recommendations

o Drive Action

o Consult Business and Become a partner

o Feedback loop

o Documentation

• Deploy the Solution in Production


o Operationalize the model

o Deploy it in production environment (transfer from beta/QA to Production)

o Make the model computationally efficient so that it uses less resources without compromising
on results
o Retrain / Re-experiment

10/5/2019 Frontier Technology Institute 37


Data Science Certification

10/5/2019 Frontier Technology Institute 38


Data Science Certification

Up Next….
• Lecture 2: Sunday, October 6th, 2019 at 2pm

• Reading Assignment:
 Tidy Data Research Paper
 CRISP DRM Process Research Paper

Questions?
10/5/2019 Frontier
Abrar Technology
Ahmed AghaInstitute 39

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy