Splunk-6 0 3-DistSearch
Splunk-6 0 3-DistSearch
3
Distributed Search
Generated: 4/22/2014 11:35 am
i
Overview
Use cases
These are some of the key use cases for distributed search:
The Splunk Enterprise instance that does the searching is referred to as the
search head. The indexers that participate in a distributed search are called
search peers or indexer nodes.
A search head by default runs its searches across all its search peers. You can
1
limit a search to one or more search peers by specifying the splunk_server field
in your query. See "Search across one or more distributed servers" in the Search
manual.
You can run multiple search heads across a set of search peers. To coordinate
the activity of multiple search heads (so that they share configuration settings,
search artifacts, and job management), you need to enable search head pooling.
In index replication, clusters use search heads to search across the set of
indexers, or peer nodes. You deploy and configure search heads very differently
when they are part of a cluster. To learn more about search heads and clusters,
read "Configure the search head" in the Managing Indexers and Clusters
Manual.
2
In this diagram showing a distributed search scenario for access control, a
"security" department search head has visibility into all the indexing search
peers. Each search peer also has the ability to search its own data. In addition,
the department A search peer has access to both its data and the data of
department B:
Finally, this diagram shows load-balanced forwarders inputting data across the
set of indexers. There's a dedicated search head, as well as a search head on
each indexer. All the search heads can search across the entire set of indexers:
3
For more information on load balancing, see "Set up load balancing" in the
Forwarding Data manual.
The search peers use the search head's knowledge bundle to execute queries on
its behalf. When executing a distributed search, the peers are ignorant of any
local knowledge objects. They have access only to the objects in the search
head's knowledge bundle.
Bundles typically contain a subset of files (configuration files and assets) from
$SPLUNK_HOME/etc/system, $SPLUNK_HOME/etc/apps and
4
$SPLUNK_HOME/etc/users.
The knowledge bundle from the search head gets distributed to the
$SPLUNK_HOME/var/run/searchpeers directory on each search peer. Because the
knowledge bundle reside at a different location on the search peers than on the
search head, search scripts should not hardcode paths to resources.
By default, the search head replicates and distributes the knowledge bundle to
each search peer. For greater efficiency, you can instead tell the search peers to
mount the knowledge bundle's directory location, eliminating the need for bundle
replication. When you mount a knowledge bundle, it's referred to as a mounted
bundle. To learn how to mount bundles, read "Mount the knowledge bundle".
User authorization
All authorization for a distributed search originates from the search head. At the
time it sends the search request to its search peers, the search head also
distributes the authorization information. It tells the search peers the name of the
user running the search, the user's role, and the location of the distributed
authorize.conf file containing the authorization information.
5
Search heads performing no indexing or only summary indexing can use the
forwarder license. If the search head performs any other type of indexing, it must
have access to a license pool.
See "Licenses for search heads" in the Installation manual for a detailed
discussion of licensing issues.
Version compatibility
Important: This topic does not apply to clusters. All cluster nodes (master,
peers, and search heads) must run the same version of Splunk Enterprise. See
"Upgrade a cluster" in the Managing Indexers and Clusters manual.
6.x search heads are compatible with 6.x and 5.x search peers, in a
non-clustered environment. The search head must be at the same or higher level
than the search peers:
You can run a 6.x search head against 5.x search peers, but there are a few
compatibility issues to be aware of. To take full advantage of the 6.x feature set,
it is recommended that you upgrade both search head(s) and search peers at the
same time.
When running a 6.x search head against 5.x search peers, note the following:
• You can use data models on the search head but only without report
acceleration.
6
• You can use Pivot on the search head.
• You can run predictive analytics (the predict command) on the search
head.
These features have been tested both with and without search head pooling.
A 6.x search head by default asks its search peers to generate a remote timeline.
This isn't a problem with 6.x search peers, but 5.x search peers won't know how
to generate the timeline. As a result, searches can slow dramatically.
[search]
remote_timeline_fetchall = false
After making this change, you must restart the search head.
Important: You should remove this attribute after all search peers have been
upgraded to 6.x.
7
Configure distributed search
Overview of configuration
The basic configuration to enable distributed search is simple. You just
designate a Splunk Enterprise instance as a search head and establish a
distributed search connection to a set of indexers.
There are a variety of other types of configuration that you can also perform.
The typical distributed search deployment uses a dedicated search head; that is,
a search head dedicated to running searches. A dedicated search head does not
index external data.
You can, however, also designate one or more of your search peers (indexers)
as search heads. These dual function search heads can be in addition or instead
of a dedicated search head. See "Some search scenarios" for examples of
several distributed search topologies.
2. Add search peers to the search head. See "Add search peers".
3. Add data inputs to the search peers. You add inputs the same as would for
any indexer, either directly on the search peer or through forwarders connecting
to the search peer. See the Getting Data In manual for information on data
inputs.
8
• Limiting the size of the knowledge bundle.
• Managing distributed server names.
• Mounting the knowledge bundle.
• Setting up a search head pool.
• Managing authorization.
Note: Splunk clusters also use search heads to search across their set of
indexers, or peer nodes. You deploy search heads very differently when they
are part of a cluster. To learn about deploying search heads in clusters, read
"Enable the search head" in the Managing Indexers and Clusters Manual.
In some cases, you might want a single instance to serve as both a search head
and a search peer. In other cases, however, you might need a dedicated search
head. A dedicated search head performs only searching; it does not index any
external data.
3. Add the search head to your Enterprise license group, even though it's a
dedicated search head that's not expected to index any external data. For more
information, see "Types of Splunk Enterprise licenses".
4. Establish distributed search from the search head to all the indexers (search
peers), you want it to search. See "Add search peers" for how to do this.
9
5. Log in to the search head and perform a search that runs across all the search
peers, such as a search for *. Examine the splunk_server field in the results.
Verify that all the search peers are listed in that field.
Important: Do not configure the dedicated search head for indexing of external
data, since that will violate its license.
If you want to use a single instance as both a search head and a search peer
(indexer), just install the search head as a regular Splunk Enterprise instance
with a normal license, as described in "About Splunk Enterprise licenses" in the
Installation manual. With a normal license, the instance can index external data.
You can also configure an existing indexer as a search head.
Once you have identified an instance to double as both search head and search
peer, add the search peers to the indexer. See "Add search peers".
When configuring instances as search heads or search peers, keep this key
distinction in mind:
• A search head must maintain a list of search peers, or it will have nothing
to search on. A dedicated search head does not external data inputs.
• A search peer must have specified external data inputs, or it will have
nothing to index.
10
These roles are not necessarily distinct. A Splunk Enterprise instance can
function simultaneously as both a search head and a search peer.
Important: Clusters also use search heads to search across the set of indexers,
or peer nodes. You deploy and configure search heads very differently when
they are part of a cluster. To learn more about configuring search heads in
clusters, read "Configure the search head" in the Managing Indexers and
Clusters Manual.
Configuration overview
You can set up distributed search on a search head using any of these
configuration methods:
• Splunk Web
• Splunk CLI
• The distsearch.conf configuration file
You perform the configuration on the designated search head. The main step is
to specify the search head's search peers. The distributed search capability itself
is already enabled by default.
Important: Before an indexer can function as a search peer, you must change its
password from the default "changeme". Otherwise, the search head will not be
able to authenticate against it.
1. Log into Splunk Web on the search head and click Settings at the top of the
page.
11
2. Click Distributed search in the Distributed Environment area.
6. Click Save.
1. Log into Splunk Web on the search head and click Settings at the top of the
page.
6. Click Save.
2. Invoke the splunk add search-server command for each search peer you
want to add.
• Use the -host flag to specify the IP address and management port for the
search peer.
• Provide credentials for both the local (search head) and remote (search
peer) instances. Use the -auth flag for the local credentials and the
-remoteUsername and -remotePassword flags for the remote credentials (in
12
this example, for search peer 10.10.10.10). The remote credentials must
be for an admin-level user on the search peer.
For example:
Edit distsearch.conf
In most cases, the settings available through Splunk Web provide sufficient
options for configuring distributed search environments. Some advanced
configuration settings, however, are only available by directly editing
distsearch.conf. For information on the configuration options, see the
distsearch.conf spec file.
If you add search peers via Splunk Web or the CLI, Splunk Enterprise
automatically handles authentication. However, if you add peers by editing
distsearch.conf, you must distribute the key files manually.
Any number of search heads can have their certificates stored on search peers
for authentication. The search peers can store keys in
13
$SPLUNK_HOME/etc/auth/distServerKeys/<searchhead_name>/trusted.pem
For example, if you have search heads A and B and they both need to search the
search peer C, do the following:
3. Restart C.
You can remove a search peer from a search head through the Distributed
search page on the search head's Splunk Web.
Note: This only removes the search peer entry from the search head; it does not
remove the search head key from the search peer. In most cases, this is not a
problem and no further action is needed.
On the search head, run the splunk remove search-server command to remove
a search peer from the search head.
• Use the -auth flag to provide credentials for the search head only.
• Use the -url flag to specify the peer's location and splunkd management
port. By default, the management port is 8089, although it might be
different for your deployment.
14
splunk remove search-server -auth admin:password -url 10.10.10.10:8089
As an additional step, you can disable the trust relationship between the search
peer and the search head. To do this, delete the trusted.pem file from
$SPLUNK_HOME/etc/auth/distServerKeys/<searchhead_name> on the search peer.
The knowledge bundle can grow quite large, because, by default, it includes
nearly the entire contents of all the search head's apps. To limit the size of the
bundle, you can create a replication whitelist. To do this, edit distsearch.conf
and specify a [replicationWhitelist] stanza:
[replicationWhitelist]
<name> = <whitelist_regex>
...
All files that satisfy the whitelist regex will be included in the bundle that the
search head distributes to its search peers. If multiple regex's are specified, the
bundle will include the union of those files.
In this example, the knowledge bundle will include all files with extensions of
either ".conf" or ".spec":
[replicationWhitelist]
allConf = *.conf
15
allSpec = *.spec
The names, such as allConf and allSpec, are used only for layering. That is, if
you have both a global and a local copy of distsearch.conf, the local copy can
be configured so that it overrides only one of the regex's. For instance, assume
that the example shown above is the global copy and that you then specify a
whitelist in your local copy like this:
[replicationWhitelist]
allConf = *.foo.conf
The two conf files will be layered, with the local copy taking precedence. Thus,
the search head will distribute only files that satisfy these two regex's:
allConf = *.foo.conf
allSpec = *.spec
In distributed search, all search heads and search peers in the group must have
unique names. The serverName has three specific uses in distributed search:
16
• For identifying search peers in search queries. serverName is the value
of the splunk_server field that you specify when you want to query a
specific node. See "Retrieve events from indexes and distributed search
peers" in the Search manual.
• For identifying search peers in search results. serverName gets
reported back in the splunk_server field.
Note: serverName is not used when adding search peers to a search head. In
that case, you identify the search peers through their domain names or IP
addresses.
The only reason to change serverName is if you have multiple instances of Splunk
Enterprise residing on a single machine, and they're participating in the same
distributed search group. In that case, you'll need to change serverName to
distinguish them.
The preferred approach is to forward the data directly to the indexers, without
indexing separately on the search head. You do this by configuring the search
head as a forwarder. These are the main steps:
1. Make sure that all necessary indexes exist on the indexers. For example, the
SoS app uses a scripted input that it puts data into a custom index. If you install
SoS on the search head, you need to also install the SoS-TA add-on on the
indexers, to provide the indexers with the necessary index settings for the data
the app generates. On the other hand, since _audit and _internal exist on
17
indexers as well as search heads, you do not need to create separate versions of
those indexes to hold the corresponding search head data.
[tcpout:my_search_peers]
server=10.10.10.1:9997,10.10.10.2:9997,10.10.10.3:9997
Note: Do not set indexAndForward=true in outputs.conf. If you do, the search
head will both retain the data locally and forward it to the search peers.
[tcpout]
defaultGroup = primary_indexers
forwardedindex.filter.disable = true
indexAndForward = false
[tcpout:primary_indexers]
server = <ip of indexer1>:9997,<ip of indexer2>:9997
autoLB = true
18
Mount the knowledge bundle
By default, the search head replicates and distributes the knowledge bundle to
each search peer. For greater efficiency, you can instead tell the search peers to
mount the knowledge bundle's directory location, eliminating the need for bundle
replication. When you mount a knowledge bundle on shared storage, it's referred
to as a mounted bundle.
Important: Most shared storage solutions don't work well across a WAN. Since
mounted bundles require shared storage, you generally should not implement
them across a WAN.
Mounted bundles are useful if you have large amounts of search-time data,
which could otherwise slow the replication process. One common cause of slow
bundle replication is large lookup tables.
Depending on your search head configuration, there are a number of ways to set
up mounted bundles. These are some of the typical ones:
19
• For a single search head. Mount the knowledge bundle on shared
storage. All the search peers then access the bundle to process search
requests. This diagram illustrates a single search head with a mounted
bundle on shared storage:
• For multiple pooled search heads. For multiple search heads, you can
combine mounted bundles with search head pooling. The pooled search
heads maintain one bundle on the shared storage, and all search peers
access that bundle. This diagram shows search head pooling with a
mounted bundle:
20
• For multiple non-pooled search heads. Maintain the knowledge
bundle(s) on each search head's local storage. In this diagram, each
search head maintains its own bundle, which each search peer mounts
and accesses individually:
There are numerous other architectures you can design with mounted bundles.
You could, for example, use shared storage for multiple search heads, but
without search head pooling. On the shared storage, you would maintain
separate bundles for each search head. The search peers would need to access
21
each bundle individually.
In each case, the search peers need access to each search head's
$SPLUNK_HOME/etc/{apps,users,system} subdirectories. In the case of search
head pooling, the search peers need access to the pool's shared set of
subdirectories.
Important: The search peers use the mounted directories only when fulfilling the
search head's search requests. For indexing and other purposes not directly
related to distributed search, the search peers will use their own, local apps,
users, and system directories, the same as any other indexer.
Note: It's best not to locate mounted bundles in the search head's local
$SPLUNK_HOME path.
These procedures also assume a single search head (no search head pooling).
For details on how to configure mounted bundles with search head pooling, see
"Use mounted bundles with search head pooling" below.
Important: The search head's Splunk user account needs read/write access to
the shared storage location. The search peers need read access to the bundle
subdirectories.
22
2. In the distsearch.conf file on the search head, set:
shareBundles=false
This stops the search head from replicating bundles to the search peers.
For each search peer, follow these steps to access the mounted bundle:
[searchhead:<searchhead-splunk-server-name>]
mounted_bundles=true
bundles_location=<path_to_bundles>
23
Important: If multiple search heads will be distributing searches to this search
peer, you must create a separate stanza on the search peer for each of them.
This is necessary even if you're using search head pooling.
Note: You can optionally set up symbolic links to the bundle subdirectories
(apps,users,system) to ensure that the search peer has access only to the
necessary subdirectories in the search head's /etc directory. See the following
example for details on how to do this.
Example configuration
Search head
[distributedSearch]
...
shareBundles = false
Search peers
1. Mount the search head's $SPLUNK_HOME/etc directory on the search peer to:
/mnt/searcher01
24
/opt/shared_bundles/searcher01
/opt/shared_bundles/searcher01/system -> /mnt/searcher01/system
/opt/shared_bundles/searcher01/users -> /mnt/searcher01/users
/opt/shared_bundles/searcher01/apps -> /mnt/searcher01/apps
Note: This optional step is useful for ensuring that the peer has access only to
the necessary subdirectories.
[searchhead:searcher01]
mounted_bundles = true
bundles_location = /opt/shared_bundles/searcher01
• Use the same shared storage location for both the search head pool and
the mounted bundles. Search head pooling uses a subset of the
directories required for mounted bundles.
• Search head pooling itself only requires that you mount the
$SPLUNK_HOME/etc/{apps,users} directories. However, when using
mounted bundles, you must also provide a mounted
$SPLUNK_HOME/etc/system directory. This doesn't create any conflict
among the search heads, as they will always use their own versions of the
system directory and ignore the mounted version.
• The search peers must create separate stanzas in distsearch.conf for
each search head in the pool. The bundles_location in each of those
stanzas must be identical.
See "Configure search head pooling" for information on setting up a search head
pool.
25
Example configuration: Search head pooling with mounted
bundles
This example shows how to combine search head pooling and mounted bundles
in one system. There are two main sections to the example:
1. Set up a search head pool consisting of two search heads. In this part, you
also mount the bundles.
2. Set up the search peers so that they can access bundles from the search head
pool.
The example assumes you're using an NFS mount for the shared storage
location.
For detailed information on these steps, see "Create a pool of search heads".
2. On each search head, enable search head pooling. In this example, you're
using an NFS mount of /mnt/search-head-pooling as your shared storage
location:
26
Among other things, this step creates empty /etc/apps and /etc/users
directories under /mnt/search-head-pooling. Step 3 uses those directories.
cp -r $SPLUNK_HOME/etc/apps/* /mnt/search-head-pooling/etc/apps
cp -r $SPLUNK_HOME/etc/users/* /mnt/search-head-pooling/etc/users
cp -r $SPLUNK_HOME/etc/system /mnt/search-head-pooling/etc/
[distributedSearch]
...
shareBundles = false
1. Mount the shared storage location (the same location that was earlier set to
/mnt/search-head-pooling on the search heads) so that it appears as
/mnt/bundles on the peer.
27
2. Create a directory that consists of symbolic links to the bundle subdirectories:
[searchhead:searcher01]
mounted_bundles = true
bundles_location = /opt/shared_bundles/bundles
[searchhead:searcher02]
mounted_bundles = true
bundles_location = /opt/shared_bundles/bundles
28
Search head pooling
You enable search head pooling on each search head that you want to be
included in the pool, so that they can share configuration and user data. Once
search head pooling has been enabled, these categories of objects will be
available as common resources across all search heads in the pool:
29
• scheduler state, so that only one search head in the pool runs a
particular scheduled search.
For example, if you create and save a search on one search head, all the other
search heads in the pool will automatically have access to it.
• Most shared storage solutions don't perform well across a WAN. Since
search head pooling requires low-latency shared storage capable of
serving a high number of operations per second, implementing search
head pooling across a WAN is not supported.
• All search heads in a pool must be running the same version of Splunk
Enterprise. Be sure to upgrade all of them at once. See "Upgrade your
distributed deployment" in the Distributed Deployment Manual for details.
The set of data that a search head distributes to its search peers is known as the
knowledge bundle. For details, see "What search heads send to search peers".
By default, only one search head in a search head pool sends the knowledge
bundle to the set of search peers. Also, if search heads in a pool are also search
peers of each other, they will not send bundles to each other, since they can
access the bundles in the pool.This is an optimization introduced in version 4.3.2
but made the default in version 5.0. It is controllable by means of the
useSHPBundleReplication attribute in distsearch.conf.
30
As a further optimization, you can mount knowledge bundles on shared storage,
as described in "About mounted bundles". By doing so, you eliminate the need to
distribute the bundle to the search peers. For information on how to combine
search head pooling with mounted knowledge bundles, read the section in that
topic called "Use mounted bundles with search head pooling".
See the other topics in this chapter for more information on search head pooling:
Answers
Have questions? Visit Splunk Answers and see what questions and answers the
Splunk community has about search head pooling.
31
1. Set up a shared storage location accessible to each search
head
So that each search head in a pool can share configurations and artifacts, they
need to access a common set of files via shared storage:
Important: The Splunk user account needs read/write access to the shared
storage location. When installing a search head on Windows, be sure to install it
as a user with read/write access to shared storage. The Local System user does
not have this access. For more information, see "Choose the user Splunk should
run as" in the Installation manual.
a. Set up each search head individually, specifying the search peers in the usual
fashion. See "Add search peers".
b. Make sure that each search head has a unique serverName attribute,
configured in server.conf. See "Manage distributed server names" for detailed
information on this requirement. If the search head does not have a unique
serverName, a warning will be generated at start-up. See "Warning about unique
serverName attribute" for details.
32
storage.
Before enabling pooling, you must stop splunkd. Do this for each search head in
the pool.
Use the CLI command splunk pooling enable to enable pooling on a search
head. The command sets certain values in server.conf. It also creates
subdirectories within the shared storage location and validates that Splunk
Enterprise can create and move files within them.
Note:
The command sets values in the [pooling] stanza of the server.conf file in
$SPLUNK_HOME/etc/system/local.
You can also directly edit the [pooling] stanza of server.conf. For detailed
information on server.conf, look here.
Important: The [pooling] stanza must be placed in the server.conf file directly
under $SPLUNK_HOME/etc/system/local/. This means that you cannot deploy the
[pooling] stanza via an app, either on local disk or on shared storage. For
details see the server.conf spec file.
33
5. Copy user and app directories to the shared storage
location
For example, if your NFS mount is at /tmp/nfs, copy the apps subdirectories that
match this pattern:
$SPLUNK_HOME/etc/apps/*
into
/tmp/nfs/etc/apps
/tmp/nfs/etc/apps/search
/tmp/nfs/etc/apps/launcher
/tmp/nfs/etc/apps/unix
[...]
$SPLUNK_HOME/etc/users/*
into
/tmp/nfs/etc/users
Important: You can choose to copy over just a subset of apps and user
subdirectories; however, be sure to move them to the precise locations described
above.
After running the splunk pooling enable command, restart splunkd. Do this for
each search head in the pool.
34
Use a load balancer with the search head pool
You will probably want to run a load balancer in front of your search heads. That
way, users can access the pool of search heads through a single interface,
without needing to specify a particular one.
Another reason for using a load balancer is to ensure access to search artifacts
and results if one of the search heads goes down. Ordinarily, RSS and email
alerts provide links to the search head where the search originated. If that search
head goes down (and there's no load balancer), the artifacts and results become
inaccessible. However, if you've got a load balancer in front, you can set the
alerts so that they reference the load balancer instead of a particular search
head.
There are a couple issues to note when selecting and configuring the load
balancer:
To generate alert links to the load balancer, you must edit alert_actions.conf:
The alert links should now point to the load balancer, not the individual search
heads.
35
Other pooling operations
Besides the splunk pooling enable CLI command, there are several other
commands that are important for managing search head pooling:
You must stop splunkd before running splunk pooling enable or splunk
pooling disable. However, you can run splunk pooling validate and splunk
pooling display while splunkd is either stopped or running.
The splunk pooling enable command validates search head access when you
initially set up search head pooling. If you ever need to revalidate the search
head's access to shared resources (for example, if you change the NFS
configuration), you can run the splunk pooling validate CLI command:
You can disable search head pooling with this CLI command:
Run this command for each search head that you need to disable.
Important: Before running the splunk pooling disable command, you must
stop splunkd. After running the command, you should restart splunkd.
You can use the splunk pooling display CLI command to determine whether
pooling is enabled on a search head:
This example shows how the system response varies depending on whether
pooling is enabled:
36
$ splunk pooling enable /foo/bar
$ splunk pooling display
Search head pooling is enabled with shared storage at: /foo/bar
$ splunk pooling disable
$ splunk pooling display
Search head pooling is disabled
Specifically, if you add a stanza to any configuration file in a local directory, you
must run the following command:
Note: This is not necessary if you make changes by means of Splunk Web or the
CLI.
If you want to use the deployment server to manage your search head
configuration, note the following:
37
repositoryLocation gets used as the download location.
The default settings have been changed to less frequent intervals starting with
5.0.3. In server.conf, the following settings affect configuration refresh timing:
# 5.0.3 defaults
[pooling]
poll.interval.rebuild = 1m
poll.interval.check = 1m
The previous defaults for these settings were 2s and 5s, respectively.
With the old default values, a change made on one search head would become
available on another search head at most seven seconds later. There is usually
no need for updates to be propagated that quickly. By changing the settings to
values of one minute, the load on the shared storage system is greatly reduced.
Depending on your business needs, you might be able to set these values to
even longer intervals.
38
Distributed search in action
• When processing a distributed search, the search peer uses the settings
contained in the knowledge bundle that the search head distributes to all
the search peers when it sends them a search request. These settings are
created and managed on the search head.
• When performing local activities, the search peer uses the authorization
settings created and stored locally on the search peer itself.
All authorization settings are stored in one or more authorize.conf files. This
includes settings configured through Splunk Web or the CLI. It is these
authorize.conf files that get distributed from the search head to the search
peers. On the knowledge bundle, the files are usually located in either
/etc/system/{local,default} and/or /etc/apps/<app-name>/{local,default}.
Since search peers automatically use the settings in the knowledge bundle,
things normally work fine. You configure roles for your users on the search head,
and the search head automatically distributes those configurations to the search
peers when it distributes the search itself.
39
With search head pooling, however, you must take care to ensure that the search
heads and the search peers all use the same set of authorize.conf file(s). For
this to happen, you must make sure:
• All search heads in the pool use the same set of authorize.conf files
• The set of authorize.conf files that the search heads use goes into the
knowledge bundle so that they get distributed to the search peers.
This topic describes the four main scenarios, based on whether or not you're
using search head pooling or mounted bundles. It describes the scenarios in
order from simple to complex.
Four scenarios
What you need to do with the distributed search authorize.conf files depends on
whether your deployment implements search head pooling or mounted bundles.
The four scenarios are:
The first two scenarios "just work" but the last two scenarios require careful
planning. For the sake of completeness, this section describes all four scenarios.
Note: These scenarios address authorization settings for distributed search only.
Local authorization settings function the same independent of your distributed
search deployment.
Whatever authorization settings you have on the search head get automatically
distributed to its search peers as part of the replicated knowledge bundle that
they receive with distributed search requests.
Whatever authorization settings you have on the search head get automatically
placed in the mounted bundle and used by the search peers during distributed
search processing.
40
Search head pooling, no mounted bundles
The search heads in the pool share their /apps and /users directories but not
their /etc/system/local directories. Any authorize.conf file in an /apps
subdirectory will be automatically shared by all search heads and included in the
knowledge bundle when any of the search heads distributes a search request to
the search peers.
The problem arises because authorization changes can also get saved to an
authorize.conf file in a search head's /etc/system/local directory (for example,
if you update the search head's authorization settings via Splunk Web). This
directory does not get shared among the search heads in the pool, but it still gets
distributed to the search peers as part of the knowledge bundle. Because of how
the configuration system works, any copy of authorize.conf file in
/etc/system/local will have precedence over a copy in an /apps subdirectory.
(See "Configuration file precedence" in the Admin manual for details.)
To avoid this problem, you need to make sure that any changes made to a
search head's /etc/system/local/authorize.conf file get propagated to all
search heads in the pool. One way to handle this is to move any changed
/etc/system/local/authorize.conf file into an app subdirectory, since all search
heads in the pool share the /apps directory.
This is similar to the previous scenario. The search heads in the pool share their
/apps and /users directories but not their /etc/system/local directories. Any
authorize.conf file in an /apps subdirectory will be automatically shared by all
search heads. It will also be included in the mounted bundle that the search
peers use when processing a search request from any of the search heads.
41
automatically distributed to the mounted bundle that the search peers use.
Therefore, you must provide some mechanism that ensures that all the search
heads and all the search peers have access to that version of authorize.conf.
Users can limit the search peers that participate in a search. They also need to
be aware of the distributed search configuration to troubleshoot.
In general, you specify a distributed search through the same set of commands
as for a local search. However, several additional commands and options are
available specifically to assist with controlling and limiting a distributed search.
A search head by default runs its searches across all search peers in its cluster.
You can limit a search to one or more search peers by specifying the
splunk_server field in your query. See "Retrieve events from indexes and
distributed search peers" in the Search manual.
In addition, the lookup command provides a local argument for use with
distributed searches. If set to true, the lookup occurs only on the search head; if
false, the lookup occurs on the search peers as well. This is particularly useful
for scripted lookups, which replicate lookup tables. See the description of lookup
in the Search Reference for details and an example.
42
Troubleshoot distributed search
It's important to keep the clocks on your search heads and search peers in sync,
via NTP (network time protocol) or some similar means. If the clocks are
out-of-sync by more than a few seconds, you can end up with search failures or
premature expiration of search artifacts.
Configuration changes can take a short time to propagate from search heads to
search peers. As a result, during the time between when configuration changes
are made on the search head and when they're replicated to the search peers
(typically, not more than a few minutes), distributed searches can either fail or
provide results based on the previous configuration.
Types of configuration changes that can cause search failures are those that
involve new apps or changes to authentication.conf or authorize.conf.
Examples include:
• changing the allowed indexes for a role and then running a search as a
user within that role
• creating a new app and then running a search from within that app
Types of changes that can provide results based on the previous configuration
include changing a field extraction or a lookup table file.
43
Search head pooling configuration issues
When implementing search head pooling, there are a few potential issues you
should be aware of, mainly having to do with coordination among search heads.
It's important to keep the clocks on your search heads and shared storage server
in sync, via NTP (network time protocol) or some similar means. If the clocks are
out-of-sync by more than a few seconds, you can end up with search failures or
premature expiration of search artifacts.
On each search head, the user account Splunk runs as must have read/write
permissions to the files on the shared storage server.
Performance analysis
• Storage: The storage backing the pool must be able to handle a very high
number of IOPS. IOPS under 1000 will probably never work well.
• Network: The communication path between the backing store and the
search heads must be high bandwidth and extremely low latency. This
probably means your storage system should be on the same switch as
44
your search heads. WAN links are not going to work.
• Server Parallelism: Because searching results in a large number of
processes requesting a large number of files, the parallelism in the system
must be high. This can require tuning the NFS server to handle a larger
number of requests in parallel.
• Client Parallelism: The client operating system must be able to handle a
significant number of requests at the same time.
• Use a storage benchmarking tool, such as Bonnie++, while the file store is
not in use to validate that the IOPS provided are robust.
• Use network testing methods to determine that the roundtrip time between
search heads and the storage system is on the order of 10ms.
• Perform known simple tasks such as creating a million files and then
deleting them.
• Assuming the above tests have not shown any weaknesses, perform
some IO load generation or run the actual Splunk Enterprise load while
gathering NFS stat data to see what's happening with the NFS requests.
If searches are timing out or running slowly, you might be exhausting the
maximum number of concurrent requests supported by the NFS client. To solve
this problem, increase your client concurrency limit. For example, on a Linux NFS
client, adjust the tcp_slot_table_entries setting.
Splunk Enterprise synchronizes the search head pool storage configuration state
with the in-memory state when it detects changes. Essentially, it reads the
configuration into memory when it detects updates. When dealing either with
45
overloaded search pool storage or with large numbers of users, apps, and
configuration files, this synchronization process can reduce performance. To
mitigate this, the minimum frequency of reading can be increased, as discussed
in "Select timing for configuration refresh".
Each search head in the pool must have a unique serverName attribute. Splunk
Enterprise validates this condition when each search head starts. If it finds a
problem, it generates this error message:
The most common cause of this error is that another search head in the pool is
already using the current search head's serverName. To fix the problem, change
the current search head's serverName attribute in .system/local/server.conf.
There are a few other conditions that also can generate this error:
This updates the pooling.ini file with the current search head's
serverName->GUID mapping, overwriting any previous mapping.
When upgrading pooled search heads, you must copy all updated apps - even
those that ship with Splunk Enterprise (such as the Search app and the data
preview feature, which is implemented as an app) - to the search head pool's
shared storage after the upgrade is complete. If you do not, you might see
46
artifacts or other incorrectly-displayed items in Splunk Web.
To fix the problem, copy all updated apps from an upgraded search head to the
shared storage for the search head pool, taking care to exclude the local
sub-directory of each app.
Important: Excluding the local sub-directory of each app from the copy process
prevents the overwriting of configuration files on the shared storage with local
copies of configuration files.
Once the apps have been copied, restart Splunk Enterprise on all search heads
in the pool.
47