0% found this document useful (0 votes)

152 views7 pages

Different Stages in Data Pipeline

The document discusses the different stages in a Splunk data pipeline: - Data input stage where raw data is ingested and annotated with metadata - Data storage stage where data is parsed, indexed, and written to disk for searching - Data searching stage where users access, view, and analyze the indexed data through searches, reports, dashboards, and alerts It then describes the main Splunk components - forwarders that collect data, indexers that parse and index data, and search heads that provide the user interface for searching. Universal forwarders send raw data while heavy forwarders do some parsing before forwarding.

Uploaded by

Santosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

152 views7 pages

Different Stages in Data Pipeline

Uploaded by

Santosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Different Stages In Data Pipeline

There are primarily 3 different stages in Splunk:

 Data Input stage

 Data Storage stage
 Data Searching stage

Data Input Stage

In this stage, Splunk software consumes the raw data stream from its source,
breaks it into 64K blocks, and annotates each block with metadata keys. The
metadata keys include hostname, source, and source type of the data. The keys
can also include values that are used internally, such as character encoding of
the data stream and values that control the processing of data during the
indexing stage, such as the index into which the events should be stored.

Data Storage Stage

Data storage consists of two phases: Parsing and Indexing.

1. In Parsing phase, Splunk software examines, analyzes, and transforms the

data to extract only the relevant information. This is also known as event
processing. It is during this phase that Splunk software breaks the data
stream into individual events. The parsing phase has many sub-phases:
i. Breaking the stream of data into individual lines
ii. Identifying, parsing, and setting timestamps
iii. Annotating individual events with metadata copied from the
source-wide keys
iv. Transforming event data and metadata according to regex
transform rules
2. In Indexing phase, Splunk software writes parsed events to the index on
disk. It writes both compressed raw data and the corresponding index file.
The benefit of Indexing is that the data can be easily accessed during
searching.
Data Searching Stage

This stage controls how the user accesses, views, and uses the indexed data. As
part of the search function, Splunk software stores user-created knowledge
objects, such as reports, event types, dashboards, alerts and field extractions.
The search function also manages the search process.

Splunk Components
If you look at the below image, you will understand the different data pipeline
stages under which various Splunk components fall under.
There are 3 main components in Splunk:

 Splunk Forwarder, used for data forwarding

 Splunk Indexer, used for Parsing and Indexing the data
 Search Head, is a GUI used for searching, analyzing and reporting
Splunk Forwarder
Splunk Forwarder is the component which you have to use for collecting the
logs. Suppose, you want to collect logs from a remote machine, then you can
accomplish that by using Splunk’s remote forwarders which are independent of
the main Splunk instance.
In fact, you can install several such forwarders in multiple machines, which will
forward the log data to a Splunk Indexer for processing and storage. What if you
want to do real-time analysis of the data? Splunk forwarders can be used for that
purpose too. You can configure the forwarders to send data to Splunk indexers
in real-time. You can install them in multiple systems and collect the data
simultaneously from different machines in real time.
To understand how real time forwarding of data happens, you can read my blog on
how Domino’s is using Splunk to gain operational efficiency.
Compared to other traditional monitoring tools, Splunk Forwarder consumes
very less cpu ~1-2%. You can scale them up to tens of thousands of remote
systems easily, and collect terabytes of data with minimal impact on
performance.
Now, let us understand the different types of Splunk forwarders.
Universal Forwarder – You can opt for an universal forwarder if you want to
forward the raw data collected at the source. It is a simple component which
performs minimal processing on the incoming data streams before forwarding
them to an indexer.
Data transfer is a major problem with almost every tool in the market. Since
there is minimal processing on the data before it is forwarded, lot of
unnecessary data is also forwarded to the indexer resulting in performance
overheads.

Why go through the trouble of transferring all the data to the Indexers and then
filter out only the relevant data? Wouldn’t it be better to only send the relevant
data to the Indexer and save on bandwidth, time and money? This can be solved
by using Heavy forwarders which I have explained below.

Heavy Forwarder – You can use a Heavy forwarder and eliminate half your
problems, because one level of data processing happens at the source itself
before forwarding data to the indexer. Heavy Forwarder typically does parsing
and indexing at the source and also intelligently routes the data to the Indexer
saving on bandwidth and storage space. So when a heavy forwarder parses the
data, the indexer only needs to handle the indexing segment.

Splunk Architecture
If you have understood the concepts explained above, you can easily relate to
the Splunk architecture. Look at the image below to get a consolidated view of
the various components involved in the process and their functionalities.
 You can receive data from various network ports by running scripts for
automating data forwarding
 You can monitor the files coming in and detect the changes in real time
 The forwarder has the capability to intelligently route the data, clone the data
and do load balancing on that data before it reaches the indexer. Cloning is
done to create multiple copies of an event right at the data source where as load
balancing is done so that even if one instance fails, the data can be forwarded to
another instance which is hosting the indexer
 As I mentioned earlier, the deployment server is used for managing the entire
deployment, configurations and policies
 When this data is received, it is stored in an Indexer. The indexer is then broken
down into different logical data stores and at each data store you can set
permissions which will control what each user views, accesses and uses
 Once the data is in, you can search the indexed data and also distribute
searches to other search peers and the results will merged and sent back to the
Search head
 Apart from that, you can also do scheduled searches and create alerts, which
will be triggered when certain conditions match saved searches
 You can use saved searches to create reports and make analysis by
using Visualization dashboards
 Finally you can use Knowledge objects to enrich the existing unstructured data
 Search heads and Knowledge objects can be accessed from a Splunk CLI or
a Splunk Web Interface. This communication happens over a REST
API connection

SIEM Foundations: Architecture Primer

The McAfee SIEM solution is comprised of several appliance-based platforms
working in conjunction to deliver unmatched value and performance to enterprise
security professionals within an enterprise. A multitude of deployment
configurations allow for the most scalable and feature-rich SIEM architecture
available, delivering real-time forensics, comprehensive application and database
traffic/content monitoring, advanced rule- and risk-based correlation for real-time as
well as historical incident detection and the most complete set of compliance
features of any SIEM on the market. All appliances are available in a range of
physical and virtual models.

The following list details the entire suite of available SIEM components.

ESM - Enterprise Security Manager (sometimes referred to as ETM)

The McAfee ESM is the ‘brains’ of the McAfee SIEM solution. It hosts the web
interface through which all SIEM interaction is performed as well as the master
database of parsed events used for forensics and compliance reporting. It is
powered by the industry-leading McAfeeEDB proprietary embedded database which
boasts speeds more than 400% faster than any leading commercial or open source
database.

All McAfee SIEM deployments must start with [at least one] ESM (or a combination
ESM/REC/ELM appliance).

REC - Event Receiver (sometimes referred to as ERC)

The McAfee REC is used for the collection of all third-party event and flow data.

Event collection is supported via several methodologies:

1. Push – devices forward events or flows using SYSLOG, NetFlow, etc.

2. Pull – event/log data is collected from the data source using SQL, WMI, etc.
3. Agent – data sources are configured to send event/log/flow data using a small-
footprint agent such as McAfee SIEM Event Collector, SNARE, Adiscon, Lasso, etc.

The Event Receiver can also be configured to collect scan results from existing vulnerability
assessment platforms such as McAfee MVM, Nessus, Qualys, eEye, Rapid7, etc. In addition, the REC
supports the configuration of rule-based event correlation as an application running on the Receiver.
Receiver-based correlation has several limitations. Risk based correlation, deviation, and correlation
flows are not supported on a Receiver; an ACE (see below) is required for these functions. Also, as a
rule-of-thumb, Receiver-based correlation imposes approximately 20% performance penalty on your
Receiver. For most enterprise environments, McAfee recommends using an ACE to centralize the
correlation, and provide sufficient resources for this function.

McAfee Event Receivers come in physical appliances with EPS ratings ranging from
6k to 26k events per second as well as VM-based models with event collection rates
ranging from 500 to 15k EPS.
Multiple REC appliances (or VM platforms) can be deployed centrally to provide a
consolidated collection environment or can be geographically distributed throughout
the enterprise. Typical deployment scenarios will locate an Event Receiver in each
of several data centers, all of which will feed their collected events back to a
centralized ESM (or to multiple ESM appliances for redundancy and disaster recovery
purposes).

ELM - Enterprise Log Manager

The McAfee ELM stores the raw, litigation-quality event/log data collected from data
sources configured on Event Receivers. In SIEM environments where compliance is
a success factor, the ELM is used to maintain event chain of custody and ensure full
non-repudiation.

In addition to providing compliant-quality raw event archival, the ELM also supports
the full-text index (FTI) for all event details. The McAfee SIEM supports the ability
to perform ad-hoc searches against the unstructured data maintained in the archive.

ESM/REC/ELM

The ESMRECELM - also called an All-in-One (AIO) or a ‘combo box’ - provides the
combined functions of the McAfee Enterprise Security Manager (ESM), Event
Receiver (REC) and Enterprise Log Manager (ELM) in a single appliance.

As most SIEM POC deployments are intended to showcase functionality rather than
performance, the ESMRECELM is commonly used to demonstrate the features and
ease of use delivered by the McAfee SIEM. It can be deployed with minimal
disruption (single appliance, minimal rack space and power, single network
connection and IP address).

In larger POC or production SIEM environments, a combo box may be inadequate to

handle the sizable EPS performance requirements of an enterprise. The largest
ESMRECELM peaks at 6k EPS and provides no local storage for ELM archive but
instead requires supplemental storage by means of a SAN connection, NFS or CIFS
share.

ACE - Advanced Correlation Engine

The ACE provides the SIEM with unmatched advanced correlation capabilities that
include both rule- and risk-based options. In addition to performing real-time
analysis, the ACE can be configured to process historical event/log data against the
current set of rule and risk profiles, as well as deviation correlation and flow-
correlation. The ACE provides native risk scoring for GTI (for SIEM) and MRA-
enabled customer environments. It also allows custom risk scoring to be configured
to highlight threats performed against high-value assets, sensitive data and/or by
privileged users.

Typical production SIEM deployments will include two ACE appliances – one
performing real-time rule and risk correlation and another configured for historical
rule and risk correlation of events.

ADM - Application Data Monitor (sometimes referred to as APM)

The ADM provides layer 7 application decode of enterprise traffic via four
promiscuous network interfaces. It is used to track transmission of sensitive data
and application usage as well as detect malicious, covert traffic, theft or misuse of
credentials and application-layer threats.

Not to be confused with a true DLP, the integration with the SIEM provides
advanced forensics value by preserving full transactional detail for sessions violating
the user-defined policy managed from within the McAfee ESM common user
interface. Complex rule correlation can leverage policy violation or suspicious
application usage events to identify potential security incidents in real-time.

DEM - Database Event Monitor (sometimes referred to as DBM)

The DEM provides a network-based solution for real-time discovery and transactional
monitoring of database activity via two or four promiscuous network interfaces. It
works in lieu of OR in parallel with the McAfee (Sentrigo) agent-based database
activity solution to provide comprehensive, transaction-level database monitoring of
user or application DB usage.

Splunk for Data Insights: Definitive Reference for Developers and Engineers
From Everand
Splunk for Data Insights: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Introduction To Splunk
No ratings yet
Introduction To Splunk
15 pages
Splunk Quick Reference Guide
No ratings yet
Splunk Quick Reference Guide
6 pages
Splunk Components Architecture
100% (1)
Splunk Components Architecture
11 pages
Basics of Splunk
No ratings yet
Basics of Splunk
10 pages
Flysmart+ For Ipad Takeoff Performance App: Main Runway and Runway Entries Concept - Revk
No ratings yet
Flysmart+ For Ipad Takeoff Performance App: Main Runway and Runway Entries Concept - Revk
24 pages
Splunk 20230910-Wa0000
No ratings yet
Splunk 20230910-Wa0000
28 pages
Module1 - Introduction Architecture
No ratings yet
Module1 - Introduction Architecture
4 pages
eCIR Labs
No ratings yet
eCIR Labs
258 pages
Understanding Log Sources & Investigating With Splunk
No ratings yet
Understanding Log Sources & Investigating With Splunk
69 pages
Splunk User
No ratings yet
Splunk User
11 pages
Splunk 101
100% (1)
Splunk 101
66 pages
Splunk Quick Reference Guide
No ratings yet
Splunk Quick Reference Guide
6 pages
Splunk Quick Reference Guide
100% (1)
Splunk Quick Reference Guide
6 pages
DU Splunk
No ratings yet
DU Splunk
11 pages
Mastering Splunk for Cybersecurity: Advanced Threat Detection and Analysis
From Everand
Mastering Splunk for Cybersecurity: Advanced Threat Detection and Analysis
Robert Johnson
No ratings yet
Splunk 1
No ratings yet
Splunk 1
4 pages
© 2019 Caendra Inc. - Hera For IHRP - Effectively Using Splunk (Scenario 1)
100% (1)
© 2019 Caendra Inc. - Hera For IHRP - Effectively Using Splunk (Scenario 1)
37 pages
Splunk
No ratings yet
Splunk
12 pages
Splunk Quick Reference Guide
No ratings yet
Splunk Quick Reference Guide
6 pages
Splunk Quick Reference Guide PDF
No ratings yet
Splunk Quick Reference Guide PDF
6 pages
Splunk Notes
100% (3)
Splunk Notes
8 pages
Splunk Quick Reference Guide PDF
100% (1)
Splunk Quick Reference Guide PDF
6 pages
Splunk Quick Reference Guide
No ratings yet
Splunk Quick Reference Guide
13 pages
What Is Splunk - (Easy Guide With Pictures) - Cyber Security Kings
No ratings yet
What Is Splunk - (Easy Guide With Pictures) - Cyber Security Kings
9 pages
Splunk's Architecture
No ratings yet
Splunk's Architecture
4 pages
Splunk
No ratings yet
Splunk
2 pages
Splunk 5.0.2 Deploy Forwarders
No ratings yet
Splunk 5.0.2 Deploy Forwarders
228 pages
Lesson 200.1 Splunk Basics
No ratings yet
Lesson 200.1 Splunk Basics
41 pages
Splunk
No ratings yet
Splunk
2 pages
Splunk Forwarders Tech Brief
No ratings yet
Splunk Forwarders Tech Brief
3 pages
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
From Everand
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
Mario Marinov
No ratings yet
Introto Splunk
No ratings yet
Introto Splunk
28 pages
Splunk Interview Questions
No ratings yet
Splunk Interview Questions
14 pages
Splunk Development Day 1: - Vikram Yadav (VY)
No ratings yet
Splunk Development Day 1: - Vikram Yadav (VY)
34 pages
Splunk Forwarders Tech Brief
No ratings yet
Splunk Forwarders Tech Brief
2 pages
Zabbix Systems Monitoring and Management: Definitive Reference for Developers and Engineers
From Everand
Zabbix Systems Monitoring and Management: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Study Guide Cisco AppDynamics Professional Implementer (500-430 CAPI)
From Everand
Study Guide Cisco AppDynamics Professional Implementer (500-430 CAPI)
Anand Vemula
No ratings yet
Splunk Module 5 Forwarders
No ratings yet
Splunk Module 5 Forwarders
65 pages
Splunk Notes
No ratings yet
Splunk Notes
2 pages
Splunk-6 0 3-Deploy
No ratings yet
Splunk-6 0 3-Deploy
54 pages
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
From Everand
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
Splunk Course Notes
No ratings yet
Splunk Course Notes
70 pages
UNIT - VI Log Management Through Splunk
No ratings yet
UNIT - VI Log Management Through Splunk
53 pages
20 Windows Tools Every SysAdmin Should Know
From Everand
20 Windows Tools Every SysAdmin Should Know
padmin
4.5/5 (3)
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
From Everand
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
Robert Johnson
No ratings yet
Worst Practices and How To Fix Them PDF
No ratings yet
Worst Practices and How To Fix Them PDF
68 pages
Splunk4Ninjas - Data Onboarding - Attendee - Jan 2024
No ratings yet
Splunk4Ninjas - Data Onboarding - Attendee - Jan 2024
57 pages
Splunk Module 1 Introduction To Splunk
100% (1)
Splunk Module 1 Introduction To Splunk
59 pages
Splunk-7.0.0-Data - Getting Data in
No ratings yet
Splunk-7.0.0-Data - Getting Data in
360 pages
Splunk 8.0.1 Indexer Howindexingworks
No ratings yet
Splunk 8.0.1 Indexer Howindexingworks
5 pages
Splunk Quick Reference Guide
No ratings yet
Splunk Quick Reference Guide
6 pages
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
NEW Security4Rookiesv1.3
No ratings yet
NEW Security4Rookiesv1.3
98 pages
Splunk Best Practices: Table of Contents
No ratings yet
Splunk Best Practices: Table of Contents
10 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Airflow for Data Workflow Automation
From Everand
Airflow for Data Workflow Automation
Richard Johnson
No ratings yet
IT Operations With Splunk: - Splunkd Is A Distributed C/C++ Server That Accesses, Processes and Indexes Streaming IT Data
100% (1)
IT Operations With Splunk: - Splunkd Is A Distributed C/C++ Server That Accesses, Processes and Indexes Streaming IT Data
5 pages
Minimized Data Inconsistency. Data Inconsistency Exists When Different Versions of The Same Data Appear
No ratings yet
Minimized Data Inconsistency. Data Inconsistency Exists When Different Versions of The Same Data Appear
4 pages
FinalShell User Manual
100% (1)
FinalShell User Manual
5 pages
Getting Started - Conan 1.37.2 Documentation
No ratings yet
Getting Started - Conan 1.37.2 Documentation
12 pages
Mikroc Tutorial
No ratings yet
Mikroc Tutorial
221 pages
Learning Spring Application Development - Sample Chapter
100% (1)
Learning Spring Application Development - Sample Chapter
35 pages
SAP GUI Scripting Security Guide
No ratings yet
SAP GUI Scripting Security Guide
18 pages
Setups Required For 827
No ratings yet
Setups Required For 827
12 pages
Central User Administration - Guide To Setup & Administration of The CUA
No ratings yet
Central User Administration - Guide To Setup & Administration of The CUA
3 pages
07 Spatial Toolkit Module7
No ratings yet
07 Spatial Toolkit Module7
24 pages
Red Hat Directory Server-8.1-Configuration and Command Reference-En-US
No ratings yet
Red Hat Directory Server-8.1-Configuration and Command Reference-En-US
362 pages
1.1 Objective of Industrial Training
No ratings yet
1.1 Objective of Industrial Training
38 pages
Omega 2017.1 Quick Start Guide - Omega Windows Installation: Document Version Control
No ratings yet
Omega 2017.1 Quick Start Guide - Omega Windows Installation: Document Version Control
16 pages
Delete Temp Windows
No ratings yet
Delete Temp Windows
29 pages
Module 1 Mis PDF
No ratings yet
Module 1 Mis PDF
48 pages
Pix Asa DHCP SVR Client
No ratings yet
Pix Asa DHCP SVR Client
13 pages
Certified API Testing Professional Brochure
No ratings yet
Certified API Testing Professional Brochure
6 pages
Process Essay How To Create A Facebook Acount
No ratings yet
Process Essay How To Create A Facebook Acount
3 pages
RCCT - B - Cce Solutions Rns 12 6 2
No ratings yet
RCCT - B - Cce Solutions Rns 12 6 2
50 pages
Cheat Manual
No ratings yet
Cheat Manual
20 pages
Fire Remote Tasks
No ratings yet
Fire Remote Tasks
3 pages
Session 7 - ERP SAP Business by Design
No ratings yet
Session 7 - ERP SAP Business by Design
52 pages
Ashutosh Srivastava
No ratings yet
Ashutosh Srivastava
3 pages
ABAP Words
No ratings yet
ABAP Words
6 pages
Azure 3
No ratings yet
Azure 3
12 pages
Preventing Low Memory Problems
No ratings yet
Preventing Low Memory Problems
2 pages
RIMS User Manual E1.0.3
No ratings yet
RIMS User Manual E1.0.3
91 pages
AudioCodes FXS and FXO MediaPacks56
No ratings yet
AudioCodes FXS and FXO MediaPacks56
8 pages
Karthik
No ratings yet
Karthik
3 pages
CSNB244 - Lab 6 - Dynamic Memory Allocation in C++
No ratings yet
CSNB244 - Lab 6 - Dynamic Memory Allocation in C++
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Different Stages in Data Pipeline

Uploaded by

Different Stages in Data Pipeline

Uploaded by

Different Stages In Data Pipeline

There are primarily 3 different stages in Splunk:

 Data Input stage

Data Input Stage

Data Storage Stage

Data storage consists of two phases: Parsing and Indexing.

1. In Parsing phase, Splunk software examines, analyzes, and transforms the

 Splunk Forwarder, used for data forwarding

SIEM Foundations: Architecture Primer

ESM - Enterprise Security Manager (sometimes referred to as ETM)

REC - Event Receiver (sometimes referred to as ERC)

Event collection is supported via several methodologies:

1. Push – devices forward events or flows using SYSLOG, NetFlow, etc.

ELM - Enterprise Log Manager

In larger POC or production SIEM environments, a combo box may be inadequate to

ACE - Advanced Correlation Engine

ADM - Application Data Monitor (sometimes referred to as APM)

DEM - Database Event Monitor (sometimes referred to as DBM)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.