Different Stages in Data Pipeline
Different Stages in Data Pipeline
This stage controls how the user accesses, views, and uses the indexed data. As
part of the search function, Splunk software stores user-created knowledge
objects, such as reports, event types, dashboards, alerts and field extractions.
The search function also manages the search process.
Splunk Components
If you look at the below image, you will understand the different data pipeline
stages under which various Splunk components fall under.
There are 3 main components in Splunk:
Why go through the trouble of transferring all the data to the Indexers and then
filter out only the relevant data? Wouldn’t it be better to only send the relevant
data to the Indexer and save on bandwidth, time and money? This can be solved
by using Heavy forwarders which I have explained below.
Heavy Forwarder – You can use a Heavy forwarder and eliminate half your
problems, because one level of data processing happens at the source itself
before forwarding data to the indexer. Heavy Forwarder typically does parsing
and indexing at the source and also intelligently routes the data to the Indexer
saving on bandwidth and storage space. So when a heavy forwarder parses the
data, the indexer only needs to handle the indexing segment.
Splunk Architecture
If you have understood the concepts explained above, you can easily relate to
the Splunk architecture. Look at the image below to get a consolidated view of
the various components involved in the process and their functionalities.
You can receive data from various network ports by running scripts for
automating data forwarding
You can monitor the files coming in and detect the changes in real time
The forwarder has the capability to intelligently route the data, clone the data
and do load balancing on that data before it reaches the indexer. Cloning is
done to create multiple copies of an event right at the data source where as load
balancing is done so that even if one instance fails, the data can be forwarded to
another instance which is hosting the indexer
As I mentioned earlier, the deployment server is used for managing the entire
deployment, configurations and policies
When this data is received, it is stored in an Indexer. The indexer is then broken
down into different logical data stores and at each data store you can set
permissions which will control what each user views, accesses and uses
Once the data is in, you can search the indexed data and also distribute
searches to other search peers and the results will merged and sent back to the
Search head
Apart from that, you can also do scheduled searches and create alerts, which
will be triggered when certain conditions match saved searches
You can use saved searches to create reports and make analysis by
using Visualization dashboards
Finally you can use Knowledge objects to enrich the existing unstructured data
Search heads and Knowledge objects can be accessed from a Splunk CLI or
a Splunk Web Interface. This communication happens over a REST
API connection
The following list details the entire suite of available SIEM components.
The McAfee ESM is the ‘brains’ of the McAfee SIEM solution. It hosts the web
interface through which all SIEM interaction is performed as well as the master
database of parsed events used for forensics and compliance reporting. It is
powered by the industry-leading McAfeeEDB proprietary embedded database which
boasts speeds more than 400% faster than any leading commercial or open source
database.
All McAfee SIEM deployments must start with [at least one] ESM (or a combination
ESM/REC/ELM appliance).
The McAfee REC is used for the collection of all third-party event and flow data.
The Event Receiver can also be configured to collect scan results from existing vulnerability
assessment platforms such as McAfee MVM, Nessus, Qualys, eEye, Rapid7, etc. In addition, the REC
supports the configuration of rule-based event correlation as an application running on the Receiver.
Receiver-based correlation has several limitations. Risk based correlation, deviation, and correlation
flows are not supported on a Receiver; an ACE (see below) is required for these functions. Also, as a
rule-of-thumb, Receiver-based correlation imposes approximately 20% performance penalty on your
Receiver. For most enterprise environments, McAfee recommends using an ACE to centralize the
correlation, and provide sufficient resources for this function.
McAfee Event Receivers come in physical appliances with EPS ratings ranging from
6k to 26k events per second as well as VM-based models with event collection rates
ranging from 500 to 15k EPS.
Multiple REC appliances (or VM platforms) can be deployed centrally to provide a
consolidated collection environment or can be geographically distributed throughout
the enterprise. Typical deployment scenarios will locate an Event Receiver in each
of several data centers, all of which will feed their collected events back to a
centralized ESM (or to multiple ESM appliances for redundancy and disaster recovery
purposes).
The McAfee ELM stores the raw, litigation-quality event/log data collected from data
sources configured on Event Receivers. In SIEM environments where compliance is
a success factor, the ELM is used to maintain event chain of custody and ensure full
non-repudiation.
In addition to providing compliant-quality raw event archival, the ELM also supports
the full-text index (FTI) for all event details. The McAfee SIEM supports the ability
to perform ad-hoc searches against the unstructured data maintained in the archive.
ESM/REC/ELM
The ESMRECELM - also called an All-in-One (AIO) or a ‘combo box’ - provides the
combined functions of the McAfee Enterprise Security Manager (ESM), Event
Receiver (REC) and Enterprise Log Manager (ELM) in a single appliance.
As most SIEM POC deployments are intended to showcase functionality rather than
performance, the ESMRECELM is commonly used to demonstrate the features and
ease of use delivered by the McAfee SIEM. It can be deployed with minimal
disruption (single appliance, minimal rack space and power, single network
connection and IP address).
The ACE provides the SIEM with unmatched advanced correlation capabilities that
include both rule- and risk-based options. In addition to performing real-time
analysis, the ACE can be configured to process historical event/log data against the
current set of rule and risk profiles, as well as deviation correlation and flow-
correlation. The ACE provides native risk scoring for GTI (for SIEM) and MRA-
enabled customer environments. It also allows custom risk scoring to be configured
to highlight threats performed against high-value assets, sensitive data and/or by
privileged users.
Typical production SIEM deployments will include two ACE appliances – one
performing real-time rule and risk correlation and another configured for historical
rule and risk correlation of events.
The ADM provides layer 7 application decode of enterprise traffic via four
promiscuous network interfaces. It is used to track transmission of sensitive data
and application usage as well as detect malicious, covert traffic, theft or misuse of
credentials and application-layer threats.
Not to be confused with a true DLP, the integration with the SIEM provides
advanced forensics value by preserving full transactional detail for sessions violating
the user-defined policy managed from within the McAfee ESM common user
interface. Complex rule correlation can leverage policy violation or suspicious
application usage events to identify potential security incidents in real-time.
The DEM provides a network-based solution for real-time discovery and transactional
monitoring of database activity via two or four promiscuous network interfaces. It
works in lieu of OR in parallel with the McAfee (Sentrigo) agent-based database
activity solution to provide comprehensive, transaction-level database monitoring of
user or application DB usage.