0% found this document useful (0 votes)

43 views24 pages

Big Data Architecture Basics.pptx (1)

Big Data architecture involves the organization of technologies and processes to manage large volumes of data characterized by the five Vs: Volume, Velocity, Variety, Veracity, and Value. It includes various components such as NoSQL databases, distributed file systems, and processing frameworks like Hadoop and Lambda architecture for efficient data handling. Business use cases range from customer behavior analytics to real-time predictions, highlighting the importance of specialized infrastructure for extracting insights from complex datasets.

Uploaded by

shivani28ag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views24 pages

Big Data Architecture Basics.pptx (1)

Uploaded by

shivani28ag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Big Data Architecture

• Big Data architecture refers to the systematic organization and structure of
technologies, components, and processes designed to handle, process, store, and
analyse large volumes of data.

• Big Data is characterized by the five Vs: Volume, Velocity, and Variety, Veracity
and Value.

• Managing and extracting meaningful insights from massive and diverse datasets
requires specialized infrastructure and methodologies.
Identifying Big Data symptoms
• Data management is more complex than it has been before
• Big Data is everywhere.
• When should I think about employing Big Data?
• Am I ready?
• What should I start with?
• One may choose to start a big data project based on different needs:
• Volume of data
• Variety of data structure the system has
• Scalability issues
• Reduce the cost of data processing
Size matters
• Two main areas: Size + Volume
• Handle new data structures with flexible & schema less technology
• Big Data is also about extracting added value information
• Near real time processing with real time architecture
• Execute complex queries with NoSQL store
Business Use Cases
• Analyzing application log, web access log, server log, DB log, Social
networks
• Customer Behavior Analytics: used on e-commerce websites
• Sentiment Analysis: images and reputation of companies which perceived
across social networks.
• CRM on Boarding: Combine online data sources with offline data sources for
better and more accurate customer segmentation (profile-customized offers).
• Prediction: Learning from data, main big data trend (for past 2 years) – For
example in Telecom industry:
• Issue or event prediction based on router log
• Product catalog selection
• Pricing depending on user’s global behavior
Understanding Big Data Project’s Ecosystem
• Choosing:
• Hadoop distribution
• Distributed file system
• SQL like processing language
• Machine learning language
• Scheduler
• Message oriented middleware
• NoSQL data store
• Data visualization

• github.com/zenkay/bigdata-ecosystem#projects-1
• NoSQL - https://www.youtube.com/watch?v=0buKQHokLK8
An Architecture for Big Data – Client Server Architecture

A conceptual Cluster
Architecture for Big Data
Client level
Architecture
• The client level architecture consists of NoSQL databases, distributed file
systems and a distributed processing framework.
• NoSQL databases provide distributed, highly scalable data storage for Big Data.
• Oracle described the Oracle NoSQL Database as a distributed key-value database designed to
provide highly reliable, scalable and available data storage across a configurable set of systems
that function as storage nodes.
• A popular example of NoSQL database is Apache Hbase..
• The next layers consist of
• the distributed file system that is scalable and can handle a large volume of data, and
• a distributed processing framework that distributes computations over large server clusters.

• The Internet services file systems can have Google file system, Amazon Simple Storage
Service and the open-source Hadoop distributed files system.
Client level Architecture
• A popular platform is the Apache Hadoop.
• The two critical components for Hadoop are the Hadoop distributed file
system (HDFS) and MapReduce.
• HDFS is the storage system and distributes data files over large server clusters and
provides high-throughput access to large data sets.
• MapReduce is the distributed processing framework for parallel processing of large data
sets. It distributes computing jobs to each server in the cluster and collects the results.
Server Level Architecture
• The server level architecture for Big Data consists of parallel computing platforms
that can handle the associated volume and speed.
• There are three prominent parallel computing options:
• Clusters or Grids,
• Massively Parallel Processing (MPP), and
• High Performance Computing (HPC).
• A commonly used architecture for Hadoop consists of client machines and clusters of
loosely coupled commodity servers that serve as the HDFS distributed data storage
and MapReduce distributed data processing.
• There are three major categories of machine roles in a Hadoop deployment that consist of Client
machines, Master nodes and Slave nodes.
• HBase, built on top of HDFS provides fast record lookups and updates.
• Apache HBase provides random, real-time read/write access to Big Data
Server Level Architecture
• HDFS was originally designed for high-latency high-throughput batch analytic
systems like MapReduce
• HBase improved its suitability for real-time systems low-latency performance.
• In this architecture
• Hadoop HDFS provides a fault-tolerant and scalable distributed data storage for Big Data
• Hadoop MapReduce provides the fault-tolerant distributed processing over large data sets
across the Hadoop cluster
• HBase provides the real-time random access to Big Data.
Client Client Client

Job
Tracker

Task Task Task

Tracker Tracker Tracker

Task Task Task

Tracker Tracker Tracker
HBase / Hadoop Cluster
Architecture for Big Data
Lambda Architecture
Lambda Architecture
• Lambda architecture is a way of processing massive quantities of data
(i.e. “Big Data”) that provides access to batch-processing and
stream-processing methods with a hybrid approach.
• Lambda architecture is used to solve the problem of computing
arbitrary functions.
• The lambda architecture itself is composed of 3 layers
Batch Layer
• New data comes continuously, as a feed to the data system.
• It gets fed to the batch layer and the speed layer simultaneously.
• It looks at all the data at once and eventually corrects the data in the
stream layer.
• Here we can find lots of ETL and a traditional data warehouse.
• The batch layer has two very important functions:
• To manage the master dataset
• To pre-compute the batch views.
Serving Layer
• The outputs from the batch layer in the form of batch views and those
coming from the speed layer in the form of near real-time views get
forwarded to the serving.
• This layer indexes the batch views so that they can be queried in
low-latency on an ad-hoc basis
Speed Layer (Stream Layer)
• This layer handles the data that are not already delivered in the batch
view due to the latency of the batch layer.
• In addition, it only deals with recent data in order to provide a
complete view of the data to the user by creating real-time views.
Benefits of Lambda Architecture
• No Server Management
• You do not have to install, maintain, or administer any software.
• Flexible Scaling
• Your application can be either automatically scaled or scaled by the
adjustment of its capacity
• Automated High Availability
• Refers to the fact that serverless applications have already built-in availability
and faults tolerance.
• It represents a guarantee that all requests will get a response about whether
they were successful or not.
• Business Agility
• React in real-time to changing business/market scenarios
How to implement Lambda Architecture
• We can implement this architecture in the real-world by using
Hadoop data lakes, where HDFS can be used to store the master
dataset, Spark (or Storm) can form the speed layer, HBase (or
Cassandra) can be the serving layer, and Hive creates views that can
be queried.
Challenges with Lambda Architecture
• Complexity
• Lambda architectures can be highly complex.
• Administrators must typically maintain two separate code bases for batch and
streaming layers, which can make debugging difficult.
Lambda Architecture in use
• Yahoo
• For running analytics on its advertising data warehouse, Yahoo has taken a
similar approach, also using Apache Storm, Apache Hadoop, and Druid.

• Netflix
• The Netflix Suro project is the backbone of Netflix’s Data Pipeline that has
separate processing paths for data but does not strictly follow lambda
architecture since the paths may be intended to serve different purposes and
not necessarily to provide the same type of views.

• LinkedIn
• Bridging offline and nearline computations with Apache Calcite.
Thank you

Distributed and Cloud Computing 1st Edition Hwang Solutions Manual 2024 scribd download full chapters
100% (5)
Distributed and Cloud Computing 1st Edition Hwang Solutions Manual 2024 scribd download full chapters
60 pages
Data Pipelines Pocket Reference: Moving and Processing Data for Analytics 1st Edition James Densmore pdf download
No ratings yet
Data Pipelines Pocket Reference: Moving and Processing Data for Analytics 1st Edition James Densmore pdf download
64 pages
AWS CERTIFIED CLOUD PRACTITIONER (Week 3)
No ratings yet
AWS CERTIFIED CLOUD PRACTITIONER (Week 3)
122 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Big Data
No ratings yet
Big Data
12 pages
Big Data With Hadoop
No ratings yet
Big Data With Hadoop
26 pages
Hortonworks Data Platform (HDP)
100% (1)
Hortonworks Data Platform (HDP)
56 pages
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
No ratings yet
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
43 pages
Big Data-2
No ratings yet
Big Data-2
40 pages
Chapter 6 - Big Data Architecture Part 1
No ratings yet
Chapter 6 - Big Data Architecture Part 1
41 pages
Lec 4 - Big Data Ecosystem Architecture
No ratings yet
Lec 4 - Big Data Ecosystem Architecture
28 pages
Big Data Overview & Hadoop For DBA's: Satyendra Pasalapudi
No ratings yet
Big Data Overview & Hadoop For DBA's: Satyendra Pasalapudi
92 pages
LogicApps Practice Test
No ratings yet
LogicApps Practice Test
66 pages
Download Complete Apache Spark 2 x Cookbook Cloud ready recipes for analytics and data science Rishi Yadav PDF for All Chapters
100% (1)
Download Complete Apache Spark 2 x Cookbook Cloud ready recipes for analytics and data science Rishi Yadav PDF for All Chapters
55 pages
M.SC DA Syllabus 2017 19 Batch
No ratings yet
M.SC DA Syllabus 2017 19 Batch
64 pages
NoSQL Technologies Notes Unit 1
100% (1)
NoSQL Technologies Notes Unit 1
20 pages
Big - Data PPT Unit 4
No ratings yet
Big - Data PPT Unit 4
233 pages
UNIT1 -BDH
No ratings yet
UNIT1 -BDH
77 pages
Unit V Cloud Technologies and Advancements
No ratings yet
Unit V Cloud Technologies and Advancements
33 pages
Unit 5 Hbase - Hive - Pig
No ratings yet
Unit 5 Hbase - Hive - Pig
93 pages
Unit - I Introduction To Big Data
No ratings yet
Unit - I Introduction To Big Data
38 pages
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
No ratings yet
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
42 pages
S - Hadoop Ecosystem
No ratings yet
S - Hadoop Ecosystem
14 pages
Big Data Architecture
No ratings yet
Big Data Architecture
41 pages
DIGITALBUSINESSASSIGNMENT
No ratings yet
DIGITALBUSINESSASSIGNMENT
13 pages
Big Data Unit-1 Kcs-061
No ratings yet
Big Data Unit-1 Kcs-061
64 pages
Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems
No ratings yet
Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems
24 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Part2 HDFS
No ratings yet
Part2 HDFS
33 pages
Big Data
No ratings yet
Big Data
51 pages
Big Data: Presented By, Nishaa R
No ratings yet
Big Data: Presented By, Nishaa R
24 pages
3
No ratings yet
3
12 pages
Hadoop MapReduce2.0 (Part-I)
No ratings yet
Hadoop MapReduce2.0 (Part-I)
18 pages
Lez.a-03 Architectures BigData NewStyle
No ratings yet
Lez.a-03 Architectures BigData NewStyle
23 pages
Emerging IT Trends and Virtualization
No ratings yet
Emerging IT Trends and Virtualization
34 pages
Module 1.ppt
No ratings yet
Module 1.ppt
29 pages
HADOOP
No ratings yet
HADOOP
55 pages
BDA Unit 2 1
No ratings yet
BDA Unit 2 1
42 pages
Big Data Architectures
No ratings yet
Big Data Architectures
11 pages
IOT and Comp.architecture
No ratings yet
IOT and Comp.architecture
17 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Defense 4.0: Internet of Things in Military: Serhat Burmaoglu, Ozcan Saritas, and Haydar Yalcin
No ratings yet
Defense 4.0: Internet of Things in Military: Serhat Burmaoglu, Ozcan Saritas, and Haydar Yalcin
18 pages
Big Data Architecture
No ratings yet
Big Data Architecture
9 pages
Big Data Streams Analytics: Challenges, Analysis, and Applications
No ratings yet
Big Data Streams Analytics: Challenges, Analysis, and Applications
55 pages
Big Data Deals With Large Data Sets
No ratings yet
Big Data Deals With Large Data Sets
4 pages
big data unit 1
No ratings yet
big data unit 1
24 pages
Lab Chapter 10 Use RDDs
0% (1)
Lab Chapter 10 Use RDDs
4 pages
DA U2
No ratings yet
DA U2
17 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
SEM-VII AIML DE Syllabus
No ratings yet
SEM-VII AIML DE Syllabus
81 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
53 pages
Lecture 2 - Big Data
No ratings yet
Lecture 2 - Big Data
8 pages
Rhadoop
No ratings yet
Rhadoop
4 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
BigData OSFY Nov
No ratings yet
BigData OSFY Nov
6 pages
What Is Microsoft Azure
No ratings yet
What Is Microsoft Azure
20 pages
Chapter 6 - Big Data Architecture Part 1
No ratings yet
Chapter 6 - Big Data Architecture Part 1
41 pages
Cloudera Administrator Exercise Instructions PDF
No ratings yet
Cloudera Administrator Exercise Instructions PDF
126 pages
Data Science
No ratings yet
Data Science
87 pages
Big Data and Hadoop Overview
100% (1)
Big Data and Hadoop Overview
17 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Spark Intreview FAQ
100% (2)
Spark Intreview FAQ
21 pages
CP5261 Data Analytics Laboratory LTPC0042 Objectives
No ratings yet
CP5261 Data Analytics Laboratory LTPC0042 Objectives
80 pages
Escritura 1
No ratings yet
Escritura 1
7 pages
Hadoop & HDFS Final
No ratings yet
Hadoop & HDFS Final
31 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Introduction To Big Data Analytics
No ratings yet
Introduction To Big Data Analytics
33 pages
Big Data Architectures: A Detailed and Application Oriented Review
No ratings yet
Big Data Architectures: A Detailed and Application Oriented Review
11 pages
Data Engineering Cookbook
100% (1)
Data Engineering Cookbook
125 pages
Big Data Architecture
No ratings yet
Big Data Architecture
4 pages
Hadoop HIVE
No ratings yet
Hadoop HIVE
41 pages
Big Data Infrastructure
No ratings yet
Big Data Infrastructure
12 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
10 pages
L2 AWS Basics
No ratings yet
L2 AWS Basics
56 pages
Hadoop Architecture and Its Functionality
No ratings yet
Hadoop Architecture and Its Functionality
7 pages
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
No ratings yet
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
3 pages
It6701 - Information Management: Unit I - Database Modelling, Management and Development
No ratings yet
It6701 - Information Management: Unit I - Database Modelling, Management and Development
35 pages
Cloudcomputingbasics Aselfteachingintroduction PDF
100% (1)
Cloudcomputingbasics Aselfteachingintroduction PDF
199 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
Training For Bigdata and Hadoop: #I Background and Introduction
No ratings yet
Training For Bigdata and Hadoop: #I Background and Introduction
9 pages
Parameterized Pipelined Map Reduce Based Approach For Performance Improvement of Parallel Programming Model
No ratings yet
Parameterized Pipelined Map Reduce Based Approach For Performance Improvement of Parallel Programming Model
5 pages
Data Driven Business Report
No ratings yet
Data Driven Business Report
5 pages
Week3 Assignment Solution
No ratings yet
Week3 Assignment Solution
3 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Big Data Architecture Basics.pptx (1)

Uploaded by

Big Data Architecture Basics.pptx (1)

Uploaded by

Big Data Architecture