0% found this document useful (0 votes)
31 views23 pages

Lecture 1

rgfsgsfg

Uploaded by

sarahgohar0308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views23 pages

Lecture 1

rgfsgsfg

Uploaded by

sarahgohar0308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

BIG DATA ANALYTICS

Lecture 1 --- Week 1

Prepared by Khalid Mahboob - Sr. Lecturer - SED 1


Content
 Definition and Characteristics of Big Data

 Key computing resources for Big Data

 Big Data – 3Vs (volume, variety, velocity)

 Benefits of Big Data

 Big Data storage and analytics

Prepared by Khalid Mahboob - Sr. Lecturer - SED 2


Definition and Characteristics of
Big Data
 “Big data is high-volume, high-velocity and high-variety
information assets that demand cost-effective, innovative forms of
information processing for enhanced insight and decision
making.” -- Gartner
 which was derived from:
 “While enterprises struggle to consolidate systems and collapse
redundant databases to enable greater operational, analytical, and
collaborative consistencies, changing economic conditions have made
this job more difficult. E-commerce, in particular, has exploded data
management challenges along three dimensions: volumes, velocity
and variety. In 2001/02, IT organizations much compile a variety of
approaches to have at their disposal for dealing each.” – Doug Laney

Prepared by Khalid Mahboob - Sr. Lecturer - SED 3


Big Data Analytics - Overview

 The volume of data that one has to deal has exploded to


unimaginable levels in the past decade, and at the same time, the
price of data storage has systematically reduced. Private companies
and research institutions capture terabytes of data about their users’
interactions, business, social media, and also sensors from devices
such as mobile phones and automobiles. The challenge of this era is
to make sense of this sea of data. This is where big data
analytics comes into picture.

Prepared by Khalid Mahboob - Sr. Lecturer - SED 4


Big Data Analytics - Overview

 Big Data Analytics largely involves collecting data from different


sources, munge it in a way that it becomes available to be consumed
by analysts and finally deliver data products useful to the
organization business.

Prepared by Khalid Mahboob - Sr. Lecturer - SED 5


Big Data Analytics - Overview

 The process of converting large amounts of unstructured raw data,


retrieved from different sources to a data product useful for
organizations forms the core of Big Data Analytics.

Prepared by Khalid Mahboob - Sr. Lecturer - SED 6


What made Big Data needed?

Prepared by Khalid Mahboob - Sr. Lecturer - SED 7


Key Computing Resources for Big
Data
 Processing capability: CPU, processor, or node.
 Memory
 Storage
 Network

Prepared by Khalid Mahboob - Sr. Lecturer - SED 8


Scalability — Scale Up & Scale
Out
 Scale out
 Use more resources to distribute workload in parallel
 Higher data access latency is typically incurred
• For independent data ==> scale up may not
 Scale up have obvious advantage than scale out
 Efficiently use the resources • For linked data ==> utilizing scale up as much
 Architecture-aware algorithm design
as possible before scale out

Prepared by Khalid Mahboob - Sr. Lecturer - SED 9

Example: Resource utilization for a large


production cluster at Twitter data center
Contrasting Approaches in Adopting
High-Performance Capabilities

Prepared by Khalid Mahboob - Sr. Lecturer - SED 10


Big Data – 3Vs (volume, variety,
velocity)

Prepared by Khalid Mahboob - Sr. Lecturer - SED 11


Volume

 The exponential growth in the data storage as the data is now more
than text data.
 The data can be found in the format of videos, music’s and large
images on our social media channels.
 It is very common to have Terabytes and Petabytes of the storage
system for enterprises.
 As the database grows the applications and architecture built to
support the data needs to be reevaluated quite often.
 Sometimes the same data is re-evaluated with multiple angles and
even though the original data is the same the new found intelligence
creates explosion of the data.
 The big volume indeed represents Big Data.
Prepared by Khalid Mahboob - Sr. Lecturer - SED 12
Volume - examples

 Facebook processes 500 TB per day

 Walmart handles 1 million customer transactions per hour

 Airbus generates 640 TB in one fligth (10 TB per 30 minutes)

 72 hours of video uploaded to youtube every minute

 SMS, e-mail, internet, social media

Prepared by Khalid Mahboob - Sr. Lecturer - SED 13


Velocity
 The data growth and social media explosion have changed how we
look at the data.
 There was a time when we used to believe that data of yesterday is
recent.
 The matter of the fact newspapers is still following that logic.
 However, news channels and radios have changed how fast we receive
the news.
 Today, people reply on social media to update them with the latest
happening. On social media sometimes a few seconds old messages (a
tweet, status updates etc.) is not something interests users.
 They often discard old messages and pay attention to recent updates.
The data movement is now almost real time and the update window
has reduced to fractions of the seconds.
 This
Prepared high
by Khalid velocity
Mahboob - Sr. Lecturer data
- SED represent Big Data. 14
Velocity - examples

 Traffic data

 Financial market

 Social networks

Prepared by Khalid Mahboob - Sr. Lecturer - SED 15


Variety
 Data can be stored in multiple format. For example database, excel, csv,
access or for the matter of the fact, it can be stored in a simple text file.
 Sometimes the data is not even in the traditional format as we assume,
it may be in the form of video, SMS, pdf or something we might have not
thought about it. It is the need of the organization to arrange it and
make it meaningful.
 It will be easy to do so if we have data in the same format, however it is
not the case most of the time. The real world have data in many different
formats and that is the challenge we need to overcome with the Big
Data. This variety of the data represent Big Data.

Prepared by Khalid Mahboob - Sr. Lecturer - SED 16


Variety - examples

 HTML (101140)
 PDF (45026)
 XML (36564)
 ZIP (22376)
 CSV (21190)
 Originator data format (21023)
 WMS (16531)
 JSON (16196)
 TIFF (14180)
 SID (11874)
 RDF (11798
Prepared by Khalid Mahboob - Sr. Lecturer - SED 17
Benefits

 Cost & management


 Economies of scale, “out-sourced” resource management
 Reduced Time to deployment
 Ease of assembly, works “out of the box”
 Scaling
 On demand provisioning, co-locate data and compute
 Reliability
 Massive, redundant, shared resources
 Sustainability
 Hardware not owned

Prepared by Khalid Mahboob - Sr. Lecturer - SED 18


Data and Data Storage

 Database / Data source


 One (of several) ways to store data in electronic format
 Used in everyday life: bank, hotel reservations, library search,
shopping

Prepared by Khalid Mahboob - Sr. Lecturer - SED 19


Databases / Data sources

 Database management system (DBMS): a collection of programs to


create and maintain a database
 Database system = database + DBMS

Information
Queries Answer
Model

Database Processing of
system Database
management queries/updates
system
Access to stored data

Prepared by Khalid Mahboob - Sr. Lecturer - SED


Physical 20

database
Analytics
 Discovery, interpretation and communication of meaningful patterns in data

 The Explosive Growth of Data


 Data collection and data availability
 Automated data collection tools, database systems, Web,
computerized society
 Major sources of abundant data
 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation, …
 Society and everyone: news, digital cameras, YouTube
 We are drowning in data, but starving for knowledge!
Prepared by Khalid Mahboob - Sr. Lecturer - SED 21
Ex. 1: Market Analysis and
Management
Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, plus (public) lifestyle studies
 Target marketing
 Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.
 Determine customer purchasing patterns over time
 Customer profiling
 What types of customers buy what products (clustering or classification)
 Cross-market analysis
 Find associations/co-relations between product sales
 Predict based on such associations
 Customer requirement analysis
 Identify the best products for different groups of customers
 by Predict
Prepared what
Khalid Mahboob - Sr.factors will
Lecturer - SED attract new customers 22
Ex. 2: Fraud Detection & Mining
Unusual Patterns
 Approaches: Clustering & model construction for frauds, outlier
analysis
 Applications:

 Auto insurance: ring of collisions


 Money laundering: suspicious monetary transactions
 Medical insurance
 Professional patients, ring of doctors, and ring of references
 Unnecessary or correlated screening tests
 Anti-terrorism
Prepared by Khalid Mahboob - Sr. Lecturer - SED 23

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy