0% found this document useful (0 votes)

29 views26 pages

Apache Kafka 101

Uploaded by

nnta1342004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views26 pages

Apache Kafka 101

Uploaded by

nnta1342004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Apache Kafka 101

Mid-term Seminar Report on PIP

Advisor(s): Thoai Nam

La Quoc Nhut Huan
Dinh Phuc Hung

Team member(s): Tran Tuan Kiet

Pham Duy Tuong Phuoc
Phan Hong Quan
1. Introduction
Explaining why we need Apache Kafka.

2. Kafka Architecture
The underlying structure that makes Kafka performant.

3. Use-case and Demo

How do we utilize Apache Kafka in real-life project.

1
1. Introduction
1.1. Real-life context
• Business often have multiple data sources with varying formats.
• Various target systems use this data for insights; some require immediate processing.
• Engineers must create custom integrations between these sources for a unified business view.
• Direct one-to-one integrations can lead to complex systems.

2
1. Introduction
1.2. Apache Kafka for the rescue
• With Apache Kafka as an integration layer, it allows us to decouple data streams and systems:
• Data sources will publish their data to Apache Kafka.
• Target systems will source their required data from Apache Kafka.
• The data sources does not need to know about the target systems and vice versa.

3
1. Introduction
1.3. “When it comes to data event streaming, Kafka is the de facto standard.”
• At its core, Apache Kafka acts as a data event streaming platform:
• A data stream is a potentially unbounded sequence of data.
• A streaming platform allows us to process the data as soon as it arrives.
• Each applications is a potential data stream creator.
• Apache Kafka store these data streams and allows systems to perform stream processing.

4
1. Introduction
1.4. In conclusion: Why Apache Kafka is so popular?

- Simplify Data Integration

- Ensure Scalability and Flexibility

- Enable Real-Time Data Processing

- Maintain Data Consistency & Reliability

5
2. Kafka Architecture
Overview

6
2. Kafka Architecture
2.1. Storage Layer.
• One distributed system is called a cluster:
• A Cluster contains multiple Brokers (a Kafka server).
• A Brokers holds multiple Partitions of a Topic.

7
2. Kafka Architecture
2.1. Storage Layer.

• A Topic is:
• A particular data stream in Kafka.
• We can have as many topics as we want, and a
Topic can contain any kind of data format.
• A topic is identified by its name --> Thus, topics
are usually described as data categories.
• Topics cannot be queried, it can only have its
data sent by Kafka Producers or read by Kafka
Consumers.

8
2. Kafka Architecture
2.1. Storage Layer.
• One Topic is separated into multiple Partitions:
• Partitions are numbered starting from `0` to `N-1`, where `N` is the number of partitions.
• These numbers are called Partition Offset.
• Each new piece of data (or Message) can only be appended to the end of a partition.
• Every partition will have multiple replications, so called `Followers` across different brokers
• This ensure while a broker may shut down unexpectedly, another broker containing the
replications of the partition will be used as an alternative.

9
2. Kafka Architecture
2.1. Storage Layer.
• The underlying structure of a Partition consists of multiple Log file.

10
2. Kafka Architecture
2.2. Compute Layer.
• The Compute Layers: is built around four major pillars (or module):
• Kafka Producer API
• Kafka Consumer API
• Kafka Streams API
• Kafka Connect API

11
2. Kafka Architecture
2.2. Compute Layer.
• Kafka Producer:
• Kafka replication in partition level. Producer pushes data into primary (leader) partition only.
• Kafka identifies the partition to save producer messages based on round robin policy (if key
not provide) or by key which user provides.
• Each producer has its own buffer.
• Client id is used to distinguish producers from the same host.

12
2. Kafka Architecture

13
2. Kafka Architecture
2.2. Compute Layer.
• Kafka Consumer:
• Consumer can read from both primary and secondary (replica) partitions.
• Each consumer can read from multiple topics and partitions at the same time.
• Consumer belongs to the same group will not read duplicate data.
• Client id is used to distinguish consumers from the same host.
• Consumers work in a consumer group. This enable parallelism to increase the throughput

14
2. Kafka Architecture
2.2. Compute Layer.
• Kafka Consumer:
• Consumer can read from both primary and secondary (replica) partitions.
• Each consumer can read from multiple topics and partitions at the same time.
• Consumer belongs to the same group will not read duplicate data.
• Client id is used to distinguish consumers from the same host.
• Consumers work in a consumer group. This enable parallelism to increase the throughput

15
2. Kafka Architecture
2.2. Compute Layer.
• Kafka Connect:
• A free, open-source component of Apache
Kafka that serves as a centralized data hub for
simple data integration between databases, key-
value stores, search indexes, and file systems.
• Key concepts of Kafka Connect:
• Connectors: The high-level abstraction that
coordinates data streaming by managing tasks
• Tasks: The implementation of how data is
copied to or from Kafka
• Workers: The running processes that execute
connectors and tasks
• Converters: The code used to translate data
between Connect and the end system
• Transforms: Simple logic to alter each message
produced by or sent to a connector
• Dead Letter Queue: How Connect handles
connector errors
16
2. Kafka Architecture
2.2. Compute Layer.
• Connectors: The high-level abstraction that coordinates data streaming by managing tasks
• Source Connector: ingest entire databases and stream table updates to Kafka topics
• Sink Connector: deliver data from Kafka topics to secondary indexes, such as Elasticsearch,
or batch systems such as Hadoop, …
• Confluent encourages users to leverage existing connectors. However, it is possible to write a
new connector plugin from scratch, as the following workflow:

17
2. Kafka Architecture
2.2. Compute Layer.
• Kafka Streams:
• A client library that exposes a high-level API for processing, transforming and enriching
data in real time.

18
3. Use-case and Demo

19
3. Use-case and Demo
3.1. Kafka Tiered Storage at Uber.

Uber’s Data Pipeline.

20
3. Use-case and Demo
3.1. Kafka Tiered Storage at Uber.

End to end interaction of Kafka broker with tiered storage.

21
3. Use-case and Demo
3.1. Kafka Tiered Storage at Uber.

22
3. Use-case and Demo
3.1. Kafka Tiered Storage at Uber.

23
3. Use-case and Demo
3.1. Kafka Tiered Storage at Uber.

24
3. Use-case and Demo

3.2. Demo

(FREE JOB) Home Based Work Without Registration Fees or Investment, Free Online Data Entry Jobs Work From Home, Part Time Typing Jobs
100% (5)
(FREE JOB) Home Based Work Without Registration Fees or Investment, Free Online Data Entry Jobs Work From Home, Part Time Typing Jobs
1 page
TESUP ATLAS7 Wind Turbine User Manual
No ratings yet
TESUP ATLAS7 Wind Turbine User Manual
31 pages
RTN 900 V100R019C00 Configuration Guide 01 PDF
No ratings yet
RTN 900 V100R019C00 Configuration Guide 01 PDF
1,883 pages
Kafka Using Spring Boot
No ratings yet
Kafka Using Spring Boot
136 pages
Apache Kafka
No ratings yet
Apache Kafka
9 pages
Advance Auditing and Assurance
100% (2)
Advance Auditing and Assurance
182 pages
Understanding Apache Kafka White Paper
No ratings yet
Understanding Apache Kafka White Paper
7 pages
Cours - Kafka
No ratings yet
Cours - Kafka
72 pages
Unit 3
No ratings yet
Unit 3
26 pages
Kafka
No ratings yet
Kafka
12 pages
Apache Kafka
No ratings yet
Apache Kafka
130 pages
Kafka
No ratings yet
Kafka
23 pages
Instaclustr Understanding Apache Kafka White Paper
No ratings yet
Instaclustr Understanding Apache Kafka White Paper
8 pages
Kafka Architecture
No ratings yet
Kafka Architecture
5 pages
Apache Kafka Introduction
No ratings yet
Apache Kafka Introduction
21 pages
Introduction To Confluent Components
No ratings yet
Introduction To Confluent Components
68 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Data Engineering 101 Kafka Concepts 1721892046
No ratings yet
Data Engineering 101 Kafka Concepts 1721892046
76 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Data Engineering 101 - Kafka Concept
No ratings yet
Data Engineering 101 - Kafka Concept
76 pages
Why Is Kafka So Fast
No ratings yet
Why Is Kafka So Fast
10 pages
Kafka
No ratings yet
Kafka
43 pages
Streaming Data and Stream Processing With Apache Kafka ™: David Tucker, Director of Partner Engineering
No ratings yet
Streaming Data and Stream Processing With Apache Kafka ™: David Tucker, Director of Partner Engineering
44 pages
Kafka
No ratings yet
Kafka
50 pages
BDA Lab A7
No ratings yet
BDA Lab A7
10 pages
Event-Driven Architecture - Building Scalable Systems With Apache Kafka - The Tal
No ratings yet
Event-Driven Architecture - Building Scalable Systems With Apache Kafka - The Tal
19 pages
KAFKA
No ratings yet
KAFKA
11 pages
Documentation
No ratings yet
Documentation
105 pages
AK
No ratings yet
AK
22 pages
1646412329504-CCDAK Study Guide
No ratings yet
1646412329504-CCDAK Study Guide
56 pages
Apache Kafka Tutorial
No ratings yet
Apache Kafka Tutorial
6 pages
Apache Kafka
No ratings yet
Apache Kafka
13 pages
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
From Everand
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
Peter Jones
No ratings yet
Apache Kafka Key Concepts
100% (1)
Apache Kafka Key Concepts
8 pages
Kafka Clustering v1.0.0
No ratings yet
Kafka Clustering v1.0.0
20 pages
Apache Kafka
No ratings yet
Apache Kafka
27 pages
Event Driven Architecture With Kafka
No ratings yet
Event Driven Architecture With Kafka
8 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Kafka Notes1
No ratings yet
Kafka Notes1
19 pages
Using Kafka For Real Time Data Ingestion With .NET KevinFeasel
No ratings yet
Using Kafka For Real Time Data Ingestion With .NET KevinFeasel
33 pages
Apache Kafka 101
No ratings yet
Apache Kafka 101
25 pages
Kafka
No ratings yet
Kafka
21 pages
? Kafka
No ratings yet
? Kafka
2 pages
Fundamentals and Architecture of Apache Kafka
No ratings yet
Fundamentals and Architecture of Apache Kafka
30 pages
Kafka Architectures Notes
No ratings yet
Kafka Architectures Notes
9 pages
Chapter 1 - Introduction To KAFKA: Objectives
No ratings yet
Chapter 1 - Introduction To KAFKA: Objectives
17 pages
BDA Unit V
No ratings yet
BDA Unit V
21 pages
Kafka Presentation
No ratings yet
Kafka Presentation
16 pages
KAFKA
No ratings yet
KAFKA
22 pages
5 Kafka 2.7m
No ratings yet
5 Kafka 2.7m
46 pages
Kafkha
No ratings yet
Kafkha
32 pages
Kafka
No ratings yet
Kafka
15 pages
Kafka My Kafka Note v67
No ratings yet
Kafka My Kafka Note v67
55 pages
Kafka
No ratings yet
Kafka
19 pages
Introduction To Apache Kafka
No ratings yet
Introduction To Apache Kafka
18 pages
Mastering Apache Kafka
No ratings yet
Mastering Apache Kafka
17 pages
Apache Kafka Essentials
No ratings yet
Apache Kafka Essentials
10 pages
Kafka Using Spring Boot v2
No ratings yet
Kafka Using Spring Boot v2
150 pages
Kafka Overview
No ratings yet
Kafka Overview
36 pages
Kafka: Big Data Huawei Course
No ratings yet
Kafka: Big Data Huawei Course
14 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Kafka Mastery Guide: Comprehensive Techniques and Insights
From Everand
Kafka Mastery Guide: Comprehensive Techniques and Insights
Adam Jones
No ratings yet
Mastering Kafka Streams: From Basics to Expert Proficiency
From Everand
Mastering Kafka Streams: From Basics to Expert Proficiency
William Smith
No ratings yet
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
Automation Assignment
No ratings yet
Automation Assignment
2 pages
Flyer Ki M en
No ratings yet
Flyer Ki M en
2 pages
Meraki Whitepaper MSP
No ratings yet
Meraki Whitepaper MSP
9 pages
K12 Technology
No ratings yet
K12 Technology
10 pages
Data Configuration
No ratings yet
Data Configuration
2 pages
Basic CRUD Operations, F Unctions, Expressions An D Clauses
No ratings yet
Basic CRUD Operations, F Unctions, Expressions An D Clauses
35 pages
Chapter 14 Summary
No ratings yet
Chapter 14 Summary
2 pages
Week4 - Understanding Colors
No ratings yet
Week4 - Understanding Colors
43 pages
February
No ratings yet
February
2 pages
Unit2 (A) - Final - Process Synchronization
No ratings yet
Unit2 (A) - Final - Process Synchronization
12 pages
Micom P443 Programmable Logic: Non - Latching
No ratings yet
Micom P443 Programmable Logic: Non - Latching
20 pages
BESM - Cold Hands, Dark Hearts
No ratings yet
BESM - Cold Hands, Dark Hearts
132 pages
CSEC Information Technology June 2016 P02
No ratings yet
CSEC Information Technology June 2016 P02
17 pages
Explain Each of The Following Symmetric Key Algorithms in 50-100 and List at Least Two (2) Usages For Each of Symmetric Key Algorithms
100% (1)
Explain Each of The Following Symmetric Key Algorithms in 50-100 and List at Least Two (2) Usages For Each of Symmetric Key Algorithms
9 pages
Informatics Practices Class 12
No ratings yet
Informatics Practices Class 12
8 pages
IT Capstone Manuscript Outline
No ratings yet
IT Capstone Manuscript Outline
3 pages
System Design Handbook
No ratings yet
System Design Handbook
21 pages
2406 9MA0-02 A Level Pure Mathematics - June 2024 PDF PDF Mathematics Mathematical Analysis
No ratings yet
2406 9MA0-02 A Level Pure Mathematics - June 2024 PDF PDF Mathematics Mathematical Analysis
1 page
TK Series Magnet GPS Tracker USER MANUAL
No ratings yet
TK Series Magnet GPS Tracker USER MANUAL
26 pages
Polyga h3
No ratings yet
Polyga h3
3 pages
Brain CT and MRI Medical Image Fusion Using Convolutional Neural Networks and A Dual-Channel Spiking Cortical Model
No ratings yet
Brain CT and MRI Medical Image Fusion Using Convolutional Neural Networks and A Dual-Channel Spiking Cortical Model
14 pages
Simple Device Discovery Protocol Specification
No ratings yet
Simple Device Discovery Protocol Specification
12 pages
Design and Implementation of PV Emulator Based On Synchronous Buck Converter Using Arduino Nano Microcontroller
No ratings yet
Design and Implementation of PV Emulator Based On Synchronous Buck Converter Using Arduino Nano Microcontroller
9 pages
Ruijie RG-WLAN Series Access Points RGOS Configuration Guide, Release 11.1 (5) B40P2
No ratings yet
Ruijie RG-WLAN Series Access Points RGOS Configuration Guide, Release 11.1 (5) B40P2
1,249 pages
Cyberark Identity Adaptive Multi Factor Authentication Solution Brief
No ratings yet
Cyberark Identity Adaptive Multi Factor Authentication Solution Brief
2 pages
Computer Graphics (Lab File) - Satyam
No ratings yet
Computer Graphics (Lab File) - Satyam
61 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Apache Kafka 101

Uploaded by

Apache Kafka 101

Uploaded by

Apache Kafka 101

Mid-term Seminar Report on PIP

Advisor(s): Thoai Nam

Team member(s): Tran Tuan Kiet

3. Use-case and Demo

- Simplify Data Integration

- Ensure Scalability and Flexibility

- Enable Real-Time Data Processing

- Maintain Data Consistency & Reliability

Uber’s Data Pipeline.

End to end interaction of Kafka broker with tiered storage.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.