0% found this document useful (0 votes)
25 views46 pages

Big Data Topic1 (Introduction) (Thanh Binh Nguyen) .TextMark

big_data_topic1

Uploaded by

Đạt Trịnh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views46 pages

Big Data Topic1 (Introduction) (Thanh Binh Nguyen) .TextMark

big_data_topic1

Uploaded by

Đạt Trịnh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

B .

A
Big Data

L
(Understanding about Big data)

3
Instructor: Thanh Binh Nguyen

S
September 1st, 2019

S3Lab
Smart Software System Laboratory

1
B .
LA
“Without big data, you are blind and deaf

3
and in the middle of a freeway.”
– Geoffrey Moore

Big Data S 2
What is Big Data

B .
A
● Big data is the term for a collection of data sets so large and complex

L
that it becomes difficult to process using on-hand database
management tools or traditional data processing applications.

3
● Challenges: Capture, Curation, Storage, Search, Sharing, Transfer,

S
Analysis, and Visualization.

Big Data 3
Big Data: 3V’s

B .
LA
Big Data S3 4
Big Data: 3V’s

B .
A
Volume (scale)
● Data Volume is increasing exponentially:

L
○ 44x increase from 2009 - 2020.
From 0.8 zettabytes to 35zb

3

Big Data S 5
Big Data: 3V’s

B .
A
Volume (scale)

3 L
Big Data S 6
Big Data: 3V’s

B .
A
Volume (scale)

3 L
Big Data S
Earthscope - 67 terabytes of data CERN’s Large Hydron Collider (LHC) generates 15 PB a year

7
Big Data: 3V’s

B .
A
Variety (Complexity)
● Big data could be of three types

L
○ Structured: The data that can be stored and processed in a fixed format (fixed schema) is
called as Structured Data. Ex. RDBMS

3
○ Semi-Structured: not have a formal structure of a data model, but nevertheless it has
some organizational properties like tags and other markers to separate semantic

S
elements that makes it easier to analyze. Ex. XML files or JSON documents.
○ Unstructured: Text Files and multimedia contents like images, audios, videos are example
of unstructured data. The unstructured data is growing quicker than others, experts say
that 80 percent of the data in an organization are unstructured.

Big Data 8
Big Data: 3V’s

B .
A
Variety (Complexity)
● Semi-Structured, NoSQL

3 L
Big Data S 9
Big Data: 3V’s

B .
A
Variety (Complexity)
● Relational Data (Tables/Transaction/Legacy Data)

L
● Text Data (Web, log)
● Semi-structured Data (XML)

3
● Graph Data: Social network, Semantic web
(RDF - Resource Description Framework)...

S
● Streaming Data: You can only scan the data once
● A single application can be generating / collecting many types of data
● Big Public Data (online, weather, finance, etc.)

To extract extract knowledge ➠ all these types of data need to linked together

Big Data 10
Big Data: 3V’s

B .
A
Variety (Complexity)

3 L
Big Data S 11
Big Data: 3V’s

B .
A
Velocity (Speed)
● Data is begin generated fast & need to be processed fast

L
● Online Data Analytics

3
● Late decisions ➠ missing opportunities
● Examples

S
○ E-Promotions: Base on your current location, your purchase history, what you like ➠ send
promotions right now for store next to you
○ Healthcare monitoring: sensors monitoring your activities and body ➠ any abnormal
measurements require immediate reaction

Big Data 12
Big Data: 3V’s

B .
A
Velocity (Speed)

3 L
S
● The progress and innovation is no longer hindered by the ability to collect
data. But, by the ability to manage, analyze, summarize, visualize, and
discover knowledge from the collected data in a timely manner and in a
scalable fashion
Big Data 13
Big Data: 3V’s

B .
A
Velocity (Speed)

3 L
Big Data S 14
Big Data: 4V’s

B .
LA
Big Data S3 15
Big Data: 5V’s

B .
LA
Big Data S3 16
Big Data: NV’s

B .
A
● The above image depicts the five V’s of Big Data but as and when the

L
data keeps evolving so will the V’s. I am listing five more V’s which have
developed gradually over time:

3
○ Validity: correctness of data
Variability: dynamic behaviour

S

○ Volatility: tendency to change in time
○ Vulnerability: vulnerable to breach or attacks
○ Visualization: visualizing meaningful usage of data

Big Data 17
Big Data: Applications

B .
LA
Big Data S3 18
Big Data: Applications

B .
LA
Big Data S3 19
Big Data: Applications

B .
LA
Big Data S3 20
Big Data: Applications

B .
LA
Big Data S3 21
Big Data: Applications

B .
LA
Big Data S3 22
Big Data: Applications

B .
LA
Big Data S3 23
Big Data: Scale

B .
LA
Big Data S3 24
Big Data: Evolution

B .
A
● The Model of Generating / Consuming Data has changed
○ Old Model: a few companies are generation data, all others are consuming data

3 L
New Model: All of us are generating data, and all of us are consuming data

S

Big Data 25
Big Data: Evolution

B .
LA



S3
OLTP: Online Transaction Processing (DBMSs)
OLAP: Online Analytical Processing (Data Warehousing)
RTAP: Real-time Analytics Processing (Big Data Architecture & Technology)
Big Data 26
Big Data: Evolution

B .
A
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources

L
- Very large datasets
- More of a real-time

3
- Ad-hoc querying and reporting

S
- Data mining techniques
- Structured data, typical sources
- Small to midsize datasets

Big Data 27
Big Data: Evolution

B .
LA
Big Data S3 28
Big Data: Evolution

B .
A
● Big data is more real-time in nature
than traditional DW applications

L
● Traditional DW architectures (e.g.

3
Exadata, Teradata) are not
well-suited for big data apps

S
● Shared nothing, massively parallel
processing, scale out architectures
are well-suited for big data apps

Big Data 29
Big Data: Evolution

B .
LA
Big Data S3 30
Big Data: Evolution

B .
LA
Big Data S3 31
Big Data: Landscape

B .
LA
Big Data S3 32
Big Data: Landscape

B .
LA
Big Data S3 33
B
Big Data: Landscape (Open sources)

.
LA
Big Data S3 34
Big Data: Cloud Computing

B .
A
● IT resources provided as a service

L
○ Compute, storage, databases, queues

● Clouds leverage economies of scale of commodity hardware

3
○ Cheap storage, high bandwidth network & multi-core processors
Geographically distributed data centers

S

● Offerings from Microsoft, Amazon, Google, ...

Big Data 35
Big Data: Cloud Computing

B .
LA
Big Data S3 36
Big Data: Cloud Computing

B .
A
Benefits
● Cost & Management

L
○ Economies of scale, “outsourced” resource management

● Reduced time to deployment

3
○ Ease of assembly, works “out of the box”

Scaling

S

○ On demand provisioning, co-locate data and compute

● Reliability
○ Massive, redundant, shared resources

● Sustainability: Hardware not owned


Big Data 37
Big Data: Cloud Computing

B .
A
Benefits

3 L
Big Data S 38
Big Data: Cloud Computing

B .
A
Issues
● Data Security

L
○ Agree with the cloud service provider ensure data security.

● Performance

3
○ Service-Level Agreement (SLA) should be clear

Compliance

S

○ Depend on the service provider

● Legal Issues
○ Data stored in multiple locations

● Cost: pay as per usage, use services in a controlled manner


Big Data 39
Big Data: Cloud Computing

B .
A
Deployment Models
● Public: computing infrastructure is hosted at the vendor’s premises

L
● Private: Computing architecture is dedicated to customer and is not

3
shared with other organizations.
● Hybrid: Host some critical, secure applications in private clouds. The not

S
so critical applications are hosted in the public cloud
○ Cloud bursting: the organization uses its own infrastructure for normal usage, but cloud is
used for peak loads.

Big Data 40
Big Data: Cloud Computing

B .
A
Type of Services
● Infrastructure as a service (Iaas):

L
○ Why buy machines when you can rent cycles?
○ Amazon’s EC2, Rackspace

3
● Platform as a service (PaaS):
Give me nice API and take care of the maintenance, upgrades, …

S

○ Google App Engine (GAE), Windows Azure

● Software as a service (SaaS):


○ Just run it for me
○ Gmail, Salesforce, dropbox

Big Data 41
Big Data: Lambda Architecture

B .
A
What is Lambda Architecture?
● This is the new big data architecture.

L
○ Designed to ingest and process
○ Query both fresh and historical (batch) data in a single architecture.

3
○ Solve the problem of computing arbitrary functions, contains 3 layers:
■ Batch layer (Data lake): historical archive, batch query, batch processes for general

S
analytics or ad hoc.
■ Serving layer: handles serving up results. Also, combined with both the speed and
batch layer.
■ Speed layer: queuing, stream, and do the same analytics as batch but in real-time
on only the most recent data.
Big Data 42
Big Data: Lambda Architecture

B .
A
What is Lambda Architecture?

3 L
Big Data S 43
Big Data: Lambda Architecture

B .
A
real-approach

3 L
Big Data S 44
Big Data: Lambda Architecture

B .
A
real-approach

3 L
Big Data S 45
Q&A

B .
LA
Big Data S3
Cảm ơn đã theo dõi
Chúng tôi hy vọng cùng nhau đi đến thành công.

46

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy