0% found this document useful (0 votes)
25 views40 pages

Data Science

Uploaded by

Tamzid Jayed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views40 pages

Data Science

Uploaded by

Tamzid Jayed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Data Science For all

Lecture Outlines
1. Why data science and why now?

2. Meaning, definition, history of Data Science

3. Big Data and Data Science

4. Practical implications and example from industry perspectives

5. The data in Data Science, ‘Datafication’

7. Necessary skills for data science/scientist

8. Job opportunities

9. Ethical aspects of Data Science


Death Calculator !!
British funeral planning service survey
(https://www.funeralplans.co.uk/)

Death Prediction?? • Participation= 3000


• Question= You want to know the
exact date of your death?
• Yes=64%, No=36%
36%
• Question= Why you want to
Agree know the exact date?
64%
Not Agree
• Answer= wanting to try and
avoid deaths and wanting to
make the most of the remaining
time
• Researchers gathered data from over 500,000
participants in the UK. Aged between 40 and 69,
the subjects participated in the study between
2006 and 2010, with a follow up in 2016.
• The algorithms analyzed demographic data,
biometrics and clinical data. Even life style and
dietary information. This was compared to
statistics on life expectancy and other information
on diseases.
• The deep learning algorithm predicted 2,343
individuals correctly who had died during follow-
up, resulting in the highest sensitivity (76.2%).
However, the random forest algorithm did improve
prediction of 1,625 individuals who were alive,
resulting in the highest specificity of all algorithms
(77.5%).
1. Why Data Science and Big data; and why now?

• In 2016, 90% of the world’s data


had been created in the
previous two years.
(Source: IBM)
A small glimpse of data we are using today!!
• How much data is generated
every day?
• Over 2.5 quintillion bytes by the
2018 figures. (Source: Domo)
• Google processes over 99,000
search queries every second; 8.5
billion searches per day and 2
trillion searches per
year worldwide.
Every minute stat !!

• Every minute (Service) • Every minute (Social Media)


• The Weather Channel receives 18,055,556 • Snapchat users share 527,760 photos
forecast requests • Users watch 4,146,600 YouTube videos
• Venmo processes $51,892 peer-to-peer • 456,000 tweets are sent on Twitter
transactions
• Spotify adds 13 new songs • Instagram users post 46,740 photos
• Uber riders take 45,788 trips! • In Facebook---
• There are 600 new page edits to Wikipedia • 1.5 billion people are active on
Facebook daily
• Every Minute (Communication) • There are five new Facebook profiles
• We send 16 million text messages created every second!
• 15,000 GIFs are sent via Facebook • More than 300 million photos get
messenger uploaded per day
• 103,447,520 spam emails sent • Every minute there are 510,000
• 154,200 calls on Skype comments posted and 293,000 statuses
updated (Marr, 2018).
Organisational response on Big data!!
• The amount of global data • Businesses are spending $187
sphere subject to data analysis billion on big data and analytics
will grow to 5.2 zettabytes by in 2019.
2025. • 91.6% of companies
• By 2021, insight-driven worldwide confirm an increased
businesses are predicted to pace in investment in big data in
take $1.8 trillion annually from 2019.
their less-informed peers.
• Data-driven organizations are 23
times more likely to acquire
customers than their peers.
2. Meaning, Definition & History of Data Science

• Data Science and big data are probably


the two hottest topics in today’s world
(Schutt & O’Neil, 2013). The term data
science was first used at a conference in
mid-nineties and was around for a long
time in scholarly communities (Press,
2013). Even though, It was only 2008
when the term ‘data scientist’ was first
coined and gained instant popularity as
well as media attention (Patil, 2011).
What is Data Science

• “The ability to take data — to be able


to understand it, to process it, to
extract value from it, to visualize it, to
communicate it — that’s going to be a
hugely important skill in the next
decades.” (Hal Varian)
Definition

• “Data science refers to an emerging area of work concerned with the


collection, preparation, analysis, visualisation, management, and
preservation of large collections of information” (Stanton, 2012).
• “I think of Data Science as a flag that was planted at the intersection
of several different disciplines that have not always existed in the
same place,” says Hilary Mason, chief data scientist at bitly.
“Statistics, computer science, domain expertise, and what I usually call
“hacking” (Burlingame & Nielsen, 2012).
• “Mason divides data science into two equally important functions.
One half is analytics, or as she describes it ‘counting things.’ The other
half is the invention of new techniques that can draw insights from
data that were not possible before” (Burlingame & Nielsen, 2012).
• Real world consists of activities (e.g., people using
Facebook)
• Raw data collected from real world activities (gathering,
sampling, scraping)
– Transactions, logs, tweets, emails, sensor outputs, ...
• Data may be combined with other datasets, processed,
cleaned (e.g. remove outliers, deal with duplicates and missing
values) ready for analysis
Data Science • May perform exploratory data analysis to understand the
data (e.g., plot distributions of all variables, identify relations)
process • Perform statistical analysis to develop statistical models or
apply machine learning / data mining algorithms for
classification, prediction or description (e.g. finding
associations)
• Interpret, visualise, report the findings of the statistical
analysis or build data product
3. Big Data and Data Science

• “Every day of the week, we create 2.5 quintillion bytes of data. This
data comes from everywhere: from sensors used to gather climate
information, posts to social media sites, digital pictures and videos
posted online, transaction records of online purchases, and from cell
phone GPS signals – to name a few. In the 11 years between 2009 and
2020, the size of the ‘Digital Universe’ will increase 44 fold. That’s a
41% increase in capacity every year. In addition, only 5% of this data
being created is structured and the remaining 95% is largely
unstructured, or at best semi-structured. This is Big Data.” (Burlingame
& Nielsen, 2013).
What is Big Data?

• “Simply put, it’s about data sets so large – in volume, velocity and
variety – that they’re impossible to manage with conventional database
tools.” (Michael Friedenberg, Network World)
• “Big data is data that exceeds the processing capacity of conventional
database systems. The data is too big, moves too fast, or doesn’t fit the
structures of your database architectures. To gain value from this data,
you must choose an alternative way to process it.” (Dumbill, 2012)
4. Practical Implications and example from Industry perspective

 In Agriculture: A biotechnology firm uses sensor data to optimize


crop efficiency. It plants test crops and runs simulations to measure
how plants react to various changes in condition. Its data
environment constantly adjusts to changes in the attributes of various
data it collects, including temperature, water levels, soil composition,
growth, output, and gene sequencing of each plant in the test bed.
These simulations allow it to discover the optimal environmental
conditions for specific gene types.
 In Finance: Today the bank performs its own credit score analysis for
existing customers using a wide range of data, including checking,
savings, credit cards, mortgages, and investment data…Continues
 Health and Medicine: Several promising applications of big data in
health care exist: better understanding of disease pathogenesis and
classification of complex diseases; development of predictive
prognostic models; reduction of risks; identification of predictive
events to support prevention initiatives; improvement of health-care
cost-effectiveness; and personalization of therapeutic regimens
(Olivera, Danese, Jay, Natoli, & Peyrin-Biroulet, 2019).
 Space Science: NASA builds large data stores as a basic product of its
missions, and recognizes the value of being able to integrate the data.
For example, to understand the drought in central California, NASA
might need to look at fusing together many types of data from
multiple measurements across satellite, airborne, and ground-based
sensors to better understand the water dynamics. Computational
methods are needed to integrate and reduce the data to understand
what it means (Earley, 2016)…Continues
• Biological, biomedical and behavioural science: Fueled by breakthrough
technology developments, the biological, biomedical, and behavioral
sciences are now collecting more data than ever before. There is a critical
need for time- and cost-efficient strategies to analyze and interpret these
data to advance human health. The recent rise of machine learning as a
powerful technique to integrate multimodality, multifidelity data, and
reveal correlations between intertwined phenomena presents a special
opportunity in this regard (Alber et al., 2019).
• In politics: Political campaigns and government agencies have also used
large data sets of information produced by citizens to develop models that
guide successful electoral strategies (Nickerson & Rogers, 2014)…Continues
Who use Big data

 Amazon: The online retail giant has access to a massive amount of data on
its customers; names, addresses, payments and search histories are all filed
away in its data bank.
 American Express: The American Express Company is using big data to
analyse and predict consumer behavior.
• Capital One: Marketing is one of the most common uses for big data and
Capital One are at the top of the game, utilizing big data management to
help them ensure the success of all customer offerings.
• General Electric (GE): GE is using the data from sensors on machinery like
gas turbines and jet engines to identify ways to improve working processes
and reliability…Continues
• Miniclip: Miniclip, who develop, publish and distribute digital games
globally, use big data to monitor and improve user experience.
• Netflix: The entertainment streaming service has a wealth of data and
analytics providing insight into the viewing habits of millions of
international consumers
• Starbucks: Have you ever wondered how Starbucks can open three
branches on the same street and not have their business suffer?
• T-Mobile: The mobile network, like American Express, is combining
customer transaction and interactions data to predict customer
fluctuations.
• And off course Google and Facebook!!
5. Datafication

• We now live in a world where it seems that everything about us is (or


soon will be) tracked and recorded: what we eat, what we watch, how
we socialize, what we like and dislike, our vital health statistics—and
the list goes on (https://www.technologyreview.com/s/602300/data-
science-and-statistics-opportunities-and-ch.allenges/).
Example

Sometimes I wonder, probably financial


sectors; especially banks have the most
personal data after government agencies.
Every time we use our credit or debit card
they know where we go, what we eat and
drinks, how many times in a month we
watch movies, where we go in our
vacations, what we present to our loved
ones !
Data analysis, Data
analytics, and Data
Visualisation in the
context of Data Science
• Science of examining data with the
purpose of drawing conclusions
• Process of inspecting, cleaning,
transforming and modelling data to
discover useful information and support
decision-making
Data Analysis -- Descriptive and inferential statistics
– Exploratory data analysis
– Confirmatory data analysis
– Graphs and plots
• Descriptive analytics
– Identify past successes and
failures

• Predictive analytics
Data analytics – Determine probable outcome for
event or likelihood of event occurring
– Examples include forecasting and
classification

• Prescriptive analytics
– Investigate what will happen, when
and why by evaluating alternatives
– Incorporate business rules and
investigate alternatives, e.g. through
simulations or ‘what-if’ analysis
Data visualisation is the presentation of data in
pictorial or graphical formats
Data
• Supports human ability to make insight (identify
Visualisation patterns or trends) and form knowledge

– Can help people with making sense of data,


especially large and complex datasets (e.g. identifying
relationships or trends)
• Commonly used for exploratory data analysis

• Increasingly seeing the term ‘visual analytics’


Data
Scientist
7. Necessary skills for data science/scientist

“The UK requires a strong skills base, able to manage, analyse,


interpret and communicate data, in order to extract insight and value”
Seizing the Data Opportunity (HM Government, 2013)
• Data Science solutions involve knowledge and understanding of
– Technologies (e.g., data warehousing, Hadoop)
– Data modelling (e.g., representing and aggregating multiple datasets)
– Data standards (e.g., open data and Linked Data)
– Data analysis
– Communication (e.g., data visualisation, generate data reports)
– Wider context (e.g., business processes, governance and ethics)
Required skills for data scientist
Important tools and techniques for Data Science/Scientist
Useful Books for Data Science
Job opportunity (Any guess???)

• Business Intelligence (BI) Developer: Average Salary: $89,333


• Data Architect: Average Salary: $137,630
• Applications Architect: Average Salary: $134,520
• Infrastructure Architect: Average Salary: $126,353
• Data Scientist: Average Salary: $139,840

• Typical Job Requirements: Find, clean, and organize data for companies.
Data scientists will need to be able to analyze large amounts of complex
raw and processed information to find patterns that will benefit an
organization and help drive strategic business decisions.

What we can do and what we should not
do!!
Thank you!!!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy