0% found this document useful (0 votes)
132 views22 pages

Avoiding The Top 10 NGINX Configuration Mistakes - NGINX

This document summarizes 10 common mistakes in NGINX configuration and provides explanations and solutions for each. The top mistakes include not allocating enough file descriptors per worker, misunderstanding the error_log directive, not enabling keepalive connections to upstream servers, forgetting how directive inheritance works, improper use of proxy_buffering and if directives, excessive health checks, unsecured access to metrics, using ip_hash when traffic comes from the same CIDR block, and not taking advantage of upstream groups. The document provides detailed explanations and solutions for avoiding each mistake.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views22 pages

Avoiding The Top 10 NGINX Configuration Mistakes - NGINX

This document summarizes 10 common mistakes in NGINX configuration and provides explanations and solutions for each. The top mistakes include not allocating enough file descriptors per worker, misunderstanding the error_log directive, not enabling keepalive connections to upstream servers, forgetting how directive inheritance works, improper use of proxy_buffering and if directives, excessive health checks, unsecured access to metrics, using ip_hash when traffic comes from the same CIDR block, and not taking advantage of upstream groups. The document provides detailed explanations and solutions for avoiding each mistake.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Avoiding the Top 10 NGINX

Configuration Mistakes
When we help NGINX users who are having problems, we often see the
same configuration mistakes we’ve seen over and over in other users’
configurations – sometimes even in configurations written by fellow NGINX
engineers! In this blog we look at 10 of the most common errors, explaining
what’s wrong and how to fix it.

1. Not enough file descriptors per worker


2. The error_log off directive
3. Not enabling keepalive connections to upstream servers
4. Forgetting how directive inheritance works
5. The proxy_buffering off directive
6. Improper use of the if directive
7. Excessive health checks
8. Unsecured access to metrics
9. Using ip_hash when all traffic comes from the same /24 CIDR block
10. Not taking advantage of upstream groups

Mistake 1: Not Enough File Descriptors per Worker


The worker_connections directive sets the maximum number of
simultaneous connections that a NGINX worker process can have open (the
default is 512). All types of connections (for example, connections with
proxied servers) count against the maximum, not just client connections. But
it’s important to keep in mind that ultimately there is another limit on the
number of simultaneous connections per worker: the operating system limit
on the maximum number of file descriptors (FDs) allocated to each process.
In modern UNIX distributions, the default limit is 1024.

For all but the smallest NGINX deployments, a limit of 512 connections per
worker is probably too small. Indeed, the default nginx.conf file we
distribute with NGINX Open Source binaries and NGINX Plus increases it
to 1024.

The common configuration mistake is not increasing the limit on FDs to at


least twice the value of worker_connections. The fix is to set that value with
the worker_rlimit_nofile directive in the main configuration context.

Here’s why more FDs are needed: each connection from an NGINX worker
process to a client or upstream server consumes an FD. When NGINX acts
as a web server, it uses one FD for the client connection and one FD per
served file, for a minimum of two FDs per client (but most web pages are
built from many files). When it acts as a proxy server, NGINX uses one FD
each for the connection to the client and upstream server, and potentially a
third FD for the file used to store the server’s response temporarily. As a
caching server, NGINX behaves like a web server for cached responses and
like a proxy server if the cache is empty or expired.

NGINX also uses an FD per log file and a couple FDs to communicate with
master process, but usually these numbers are small compared to the
number of FDs used for connections and files.

UNIX offers several ways to set the number of FDs per process:

The ulimit command if you start NGINX from a shell


The init script or systemd service manifest variables if you start NGINX
as a service
The /etc/security/limits.conf file

However, the method to use depends on how you start NGINX, whereas
worker_rlimit_nofile works no matter how you start NGINX.

There is also a system-wide limit on the number of FDs, which you can set
with the OS’s sysctl fs.file-max command. It is usually large enough, but it
is worth verifying that the maximum number of file descriptors all NGINX
worker processes might use (worker_rlimit_nofile * worker_processes) is
significantly less than fs.file‑max. If NGINX somehow uses all available FDs
(for example, during a DoS attack), it becomes impossible even to log in to
the machine to fix the issue.

Mistake 2: The error_log off Directive


The common mistake is thinking that the error_log off directive disables
logging. In fact, unlike the access_log directive, error_log does not take an
off parameter. If you include the error_log off directive in the
configuration, NGINX creates an error log file named off in the default
directory for NGINX configuration files (usually /etc/nginx).

We don’t recommend disabling the error log, because it is a vital source of


information when debugging any problems with NGINX. However, if storage
is so limited that it might be possible to log enough data to exhaust the
available disk space, it might make sense to disable error logging. Include
this directive in the main configuration context:

error_log /dev/null emerg;

Note that this directive doesn’t apply until NGINX reads and validates the
configuration. So each time NGINX starts up or the configuration is reloaded,
it might log to the default error log location (usually
/var/log/nginx/error.log) until the configuration is validated. To change the
log directory, include the -e <error_log_location> parameter on the nginx
command.

Mistake 3: Not Enabling Keepalive Connections to


Upstream Servers
By default, NGINX opens a new connection to an upstream (backend) server
for every new incoming request. This is safe but inefficient, because NGINX
and the server must exchange three packets to establish a connection and
three or four to terminate it.

At high traffic volumes, opening a new connection for every request can
exhaust system resources and make it impossible to open connections at all.
Here’s why: for each connection the 4-tuple of source address, source port,
destination address, and destination port must be unique. For connections
from NGINX to an upstream server, three of the elements (the first, third, and
fourth) are fixed, leaving only the source port as a variable. When a
connection is closed, the Linux socket sits in the TIME‑WAIT state for two
minutes, which at high traffic volumes increases the possibility of exhausting
the pool of available source ports. If that happens, NGINX cannot open new
connections to upstream servers.

The fix is to enable keepalive connections between NGINX and upstream


servers – instead of being closed when a request completes, the connection
stays open to be used for additional requests. This both reduces the
possibility of running out of source ports and improves performance.

To enable keepalive connections:

Include the keepalive directive in every upstream{} block, to set the


number of idle keepalive connections to upstream servers preserved in
the cache of each worker process.

Note that the keepalive directive does not limit the total number of
connections to upstream servers that an NGINX worker process can
open – this is a common misconception. So the parameter to keepalive
does not need to be as large as you might think.

We recommend setting the parameter to twice the number of servers


listed in the upstream{} block. This is large enough for NGINX to
maintain keepalive connections with all the servers, but small enough
that upstream servers can process new incoming connections as well.

Note also that when you specify a load-balancing algorithm in the


upstream{} block – with the hash, ip_hash, least_conn, least_time, or
random directive – the directive must appear above the keepalive
directive. This is one of the rare exceptions to the general rule that the
order of directives in the NGINX configuration doesn’t matter.

In the location{} block that forwards requests to an upstream group,


include the following directives along with the proxy_pass directive:

proxy_http_version 1.1;
proxy_set_header "Connection" "";

By default NGINX uses HTTP/1.0 for connections to upstream servers


and accordingly adds the Connection: close header to the requests
that it forwards to the servers. The result is that each connection gets
closed when the request completes, despite the presence of the
keepalive directive in the upstream{} block.

The proxy_http_version directive tells NGINX to use HTTP/1.1 instead,


and the proxy_set_header directive removes the close value from the
Connection header.

Mistake 4: Forgetting How Directive Inheritance


Works
NGINX directives are inherited downwards, or “outside-in”: a child context –
one nested within another context (its parent) – inherits the settings of
directives included at the parent level. For example, all server{} and
location{} blocks in the http{} context inherit the value of directives
included at the http level, and a directive in a server{} block is inherited by
all the child location{} blocks in it. However, when the same directive is
included in both a parent context and its child context, the values are not
added together – instead, the value in the child context overrides the parent
value.

The mistake is to forget this “override rule” for array directives, which can be
included not only in multiple contexts but also multiple times within a given
context. Examples include proxy_set_header and add_header – having “add”
in the name of second makes it particularly easy to forget about the override
rule.

We can illustrate how inheritance works with this example for add_header:

http {
add_header X-HTTP-LEVEL-HEADER 1;
add_header X-ANOTHER-HTTP-LEVEL-HEADER 1;

server {
listen 8080;
location / {
return 200 "OK";
}
}

server {
listen 8081;
add_header X-SERVER-LEVEL-HEADER 1;

location / {
return 200 "OK";
}

location /test {
add_header X-LOCATION-LEVEL-HEADER 1;
return 200 "OK";
}

location /correct {
add_header X-HTTP-LEVEL-HEADER 1;
add_header X-ANOTHER-HTTP-LEVEL-HEADER 1;

add_header X-SERVER-LEVEL-HEADER 1;
add_header X-LOCATION-LEVEL-HEADER 1;
return 200 "OK";
}
}
}

For the server listening on port 8080, there are no add_header directives in


either the server{} or location{} blocks. So inheritance is straightforward
and we see the two headers defined in the http{} context:

% curl -is localhost:8080


HTTP/1.1 200 OK
Server: nginx/1.21.5
Date: Mon, 21 Feb 2022 10:12:15 GMT
Content-Type: text/plain
Content-Length: 2
Connection: keep-alive
X-HTTP-LEVEL-HEADER: 1
X-ANOTHER-HTTP-LEVEL-HEADER: 1
OK

For the server listening on port 8081, there is an add_header directive in the


server{} block but not in its child location / block. The header defined in
the server{} block overrides the two headers defined in the http{} context:

% curl -is localhost:8081


HTTP/1.1 200 OK
Server: nginx/1.21.5
Date: Mon, 21 Feb 2022 10:12:20 GMT
Content-Type: text/plain
Content-Length: 2
Connection: keep-alive
X-SERVER-LEVEL-HEADER: 1
OK

In the child location /test block, there is an add_header directive and it


overrides both the header from its parent server{} block and the two
headers from the http{} context:

% curl -is localhost:8081/test


HTTP/1.1 200 OK
Server: nginx/1.21.5
Date: Mon, 21 Feb 2022 10:12:25 GMT
Content-Type: text/plain
Content-Length: 2
Connection: keep-alive
X-LOCATION-LEVEL-HEADER: 1
OK

If we want a location{} block to preserve the headers defined in its parent


contexts along with any headers defined locally, we must redefine the parent
headers within the location{} block. That’s what we’ve done in the
location /correct block:

% curl -is localhost:8081/correct


HTTP/1.1 200 OK
Server: nginx/1.21.5
Date: Mon, 21 Feb 2022 10:12:30 GMT
Content-Type: text/plain
Content-Length: 2
Connection: keep-alive
X-HTTP-LEVEL-HEADER: 1
X-ANOTHER-HTTP-LEVEL-HEADER: 1
X-SERVER-LEVEL-HEADER: 1
X-LOCATION-LEVEL-HEADER: 1
OK

Mistake 5: The proxy_buffering off Directive


Proxy buffering is enabled by default in NGINX (the proxy_buffering
directive is set to on). Proxy buffering means that NGINX stores the response
from a server in internal buffers as it comes in, and doesn’t start sending
data to the client until the entire response is buffered. Buffering helps to
optimize performance with slow clients – because NGINX buffers the
response for as long as it takes for the client to retrieve all of it, the proxied
server can return its response as quickly as possible and return to being
available to serve other requests.

When proxy buffering is disabled, NGINX buffers only the first part of a
server’s response before starting to send it to the client, in a buffer that by
default is one memory page in size (4 KB or 8 KB depending on the
operating system). This is usually just enough space for the response
header. NGINX then sends the response to the client synchronously as it
receives it, forcing the server to sit idle as it waits until NGINX can accept the
next response segment.

So we’re surprised by how often we see proxy_buffering off in


configurations. Perhaps it is intended to reduce the latency experienced by
clients, but the effect is negligible while the side effects are numerous: with
proxy buffering disabled, rate limiting and caching don’t work even if
configured, performance suffers, and so on.

There are only a small number of use cases where disabling proxy buffering
might make sense (such as long polling), so we strongly discourage
changing the default. For more information, see the NGINX Plus Admin
Guide.

Mistake 6: Improper Use of the if Directive


The if directive is tricky to use, especially in location{} blocks. It often
doesn’t do what you expect and can even cause segfaults. In fact, it’s so
tricky that there’s an article titled If is Evil in the NGINX Wiki, and we direct
you there for a detailed discussion of the problems and how to avoid them.

In general, the only directives you can always use safely within an if{} block
are return and rewrite. The following example uses if to detect requests
that include the X‑Test header (but this can be any condition you want to
test for). NGINX returns the 430 (Request Header Fields Too Large) error,
intercepts it at the named location @error_430 and proxies the request to
the upstream group named b.

location / {
error_page 430 = @error_430;
if ($http_x_test) {
return 430;
}

proxy_pass http://a;
}

location @error_430 {
proxy_pass b;
}

For this and many other uses of if, it’s often possible to avoid the directive
altogether. In the following example, when the request includes the X‑Test
header the map{} block sets the $upstream_name variable to b and the
request is proxied to the upstream group with that name.

map $http_x_test $upstream_name {


default "b";
"" "a";
}

# ...

location / {
proxy_pass http://$upstream_name;
}

Mistake 7: Excessive Health Checks


It is quite common to configure multiple virtual servers to proxy requests to
the same upstream group (in other words, to include the identical
proxy_pass directive in multiple server{} blocks). The mistake in this
situation is to include a health_check directive in every server{} block. This
just creates more load on the upstream servers without yielding any
additional information.

At the risk of being obvious, the fix is to define just one health check per
upstream{} block. Here we define the health check for the upstream group
named b in a special named location, complete with appropriate timeouts
and header settings.

location / {
proxy_set_header Host $host;
proxy_set_header "Connection" "";
proxy_http_version 1.1;
proxy_pass http://b;
}
location @health_check {
health_check;
proxy_connect_timeout 2s;
proxy_read_timeout 3s;
proxy_set_header Host example.com;
proxy_pass http://b;
}

In complex configurations, it can further simplify management to group all


health-check locations in a single virtual server along with the NGINX Plus
API and dashboard, as in this example.

server {
listen 8080;

location / {
# …
}

location @health_check_b {
health_check;
proxy_connect_timeout 2s;
proxy_read_timeout 3s;
proxy_set_header Host example.com;
proxy_pass http://b;
}

location @health_check_c {
health_check;
proxy_connect_timeout 2s;
proxy_read_timeout 3s;
proxy_set_header Host api.example.com;
proxy_pass http://c;
}

location /api {
api write=on;
# directives limiting access to the API (see 'Mistake 8' below)
}

location = /dashboard.html {
root /usr/share/nginx/html;
}
}

For more information about health checks for HTTP, TCP, UDP, and gRPC
servers, see the NGINX Plus Admin Guide.

Mistake 8: Unsecured Access to Metrics


Basic metrics about NGINX operation are available from the Stub Status
module. For NGINX Plus, you can also gather a much more extensive set of
metrics with the NGINX Plus API. Enable metrics collection by including the
stub_status or api directive, respectively, in a server{} or location{} block,
which becomes the URL you then access to view the metrics. (For the
NGINX Plus API, you also need to configure shared memory zones for the
NGINX entities – virtual servers, upstream groups, caches, and so on – for
which you want to collect metrics; see the instructions in the NGINX Plus
Admin Guide.)

Some of the metrics are sensitive information that can be used to attack
your website or the apps proxied by NGINX, and the mistake we sometimes
see in user configurations is failure to restrict access to the corresponding
URL. Here we look at some of the ways you can secure the metrics. We’ll use
stub_status in the first examples.

With the following configuration, anyone on the Internet can access the
metrics at http://example.com/basic_status.

server {
listen 80;
server_name example.com;

location = /basic_status {
stub_status;
}
}

Protect Metrics with HTTP Basic Authentication

To password-protect the metrics with HTTP Basic Authentication, include


the auth_basic and auth_basic_user_file directives. The file (here,
.htpasswd) lists the usernames and passwords of clients who can log in to
see the metrics:

server {
listen 80;
server_name example.com;

location = /basic_status {
auth_basic “closed site”;
auth_basic_user_file conf.d/.htpasswd;
stub_status;
}
}

Protect Metrics with the allow and deny Directives

If you don’t want authorized users to have to log in, and you know the IP
addresses from which they will access the metrics, another option is the
allow directive. You can specify individual IPv4 and IPv6 addresses and
CIDR ranges. The deny all directive prevents access from any other
addresses.
server {
listen 80;
server_name example.com;

location = /basic_status {
allow 192.168.1.0/24;
allow 10.1.1.0/16;
allow 2001:0db8::/32;
allow 96.1.2.23/32;
deny all;
stub_status;
}
}

Combine the Two Methods

What if we want to combine both methods? We can allow clients to access


the metrics from specific addresses without a password and still require
login for clients coming from different addresses. For this we use the
satisfy any directive. It tells NGINX to allow access to clients who either log
in with HTTP Basic auth credentials or are using a preapproved IP address.
For extra security, you can set satisfy to all to require even people who
come from specific addresses to log in.

server {
listen 80;
server_name monitor.example.com;

location = /basic_status {
satisfy any;

auth_basic “closed site”;


auth_basic_user_file conf.d/.htpasswd;
allow 192.168.1.0/24;
allow 10.1.1.0/16;
allow 2001:0db8::/32;
allow 96.1.2.23/32;
deny all;
stub_status;
}
}

With NGINX Plus, you use the same techniques to limit access to the NGINX
Plus API endpoint (http://monitor.example.com:8080/api/ in the following
example) as well as the live activity monitoring dashboard at
http://monitor.example.com/dashboard.html.

This configuration permits access without a password only to clients coming


from the 96.1.2.23/32 network or localhost. Because the directives are
defined at the server{} level, the same restrictions apply to both the API and
the dashboard. As a side note, the write=on parameter to api means these
clients can also use the API to make configuration changes.

For more information about configuring the API and dashboard, see the
NGINX Plus Admin Guide.

server {
listen 8080;
server_name monitor.example.com;

satisfy any;
auth_basic “closed site”;
auth_basic_user_file conf.d/.htpasswd;
allow 127.0.0.1/32;
allow 96.1.2.23/32;
deny all;

location = /api/ {
api write=on;
}
location = /dashboard.html {
root /usr/share/nginx/html;
}
}

Mistake 9: Using ip_hash When All Traffic Comes


from the Same /24 CIDR Block
The ip_hash algorithm load balances traffic across the servers in an
upstream{} block, based on a hash of the client IP address. The hashing key
is the first three octets of an IPv4 address or the entire IPv6 address. The
method establishes session persistence, which means that requests from a
client are always passed to the same server except when the server is
unavailable.

Suppose that we have deployed NGINX as a reverse proxy in a virtual private


network configured for high availability. We put various firewalls, routers,
Layer 4 load balancers, and gateways in front of NGINX to accept traffic
from different sources (the internal network, partner networks, the Internet,
and so on) and pass it to NGINX for reverse proxying to upstream servers.
Here’s the initial NGINX configuration:

http {

upstream {
ip_hash;
server 10.10.20.105:8080;
server 10.10.20.106:8080;
server 10.10.20.108:8080;
}

server {# …}
}
But it turns out there’s a problem: all of the “intercepting” devices are on the
same 10.10.0.0/24 network, so to NGINX it looks like all traffic comes from
addresses in that CIDR range. Remember that the ip_hash algorithm hashes
the first three octets of an IPv4 address. In our deployment, the first three
octets are the same – 10.10.0 – for every client, so the hash is the same for
all of them and there’s no basis for distributing traffic to different servers.

The fix is to use the hash algorithm instead with the $binary_remote_addr
variable as the hash key. That variable captures the complete client address,
converting it into a binary representation that is 4 bytes for an IPv4 address
and 16 bytes for an IPv6 address. Now the hash is different for each
intercepting device and load balancing works as expected.

We also include the consistent parameter to use the ketama hashing


method instead of the default. This greatly reduces the number of keys that
get remapped to a different upstream server when the set of servers
changes, which yields a higher cache hit ratio for caching servers.

http {
upstream {
hash $binary_remote_addr consistent;
server 10.10.20.105:8080;
server 10.10.20.106:8080;
server 10.10.20.108:8080;
}

server {# …}
}

Mistake 10: Not Taking Advantage of Upstream


Groups
Suppose you are employing NGINX for one of the simplest use cases, as a
reverse proxy for a single NodeJS-based backend application listening on
port 3000. A common configuration might look like this:

http {

server {
listen 80;
server_name example.com;

location / {
proxy_set_header Host $host;
proxy_pass http://localhost:3000/;
}
}
}

Straightforward, right? The proxy_pass directive tells NGINX where to send


requests from clients. All NGINX needs to do is resolve the hostname to an
IPv4 or IPv6 address. Once the connection is established NGINX forwards
requests to that server.

The mistake here is to assume that because there’s only one server – and
thus no reason to configure load balancing – it’s pointless to create an
upstream{} block. In fact, an upstream{} block unlocks several features that
improve performance, as illustrated by this configuration:

http {

upstream node_backend {
zone upstreams 64K;
server 127.0.0.1:3000 max_fails=1 fail_timeout=2s;
keepalive 2;
}

server {
listen 80;
server_name example.com;

location / {
proxy_set_header Host $host;
proxy_pass http://node_backend/;
proxy_next_upstream error timeout http_500;

}
}
}

The zone directive establishes a shared memory zone where all NGINX
worker processes on the host can access configuration and state
information about the upstream servers. Several upstream groups can share
the zone. With NGINX Plus, the zone also enables you to use the NGINX Plus
API to change the servers in an upstream group and the settings for
individual servers without restarting NGINX. For details, see the NGINX Plus
Admin Guide.

The server directive has several parameters you can use to tune server
behavior. In this example we have changed the conditions NGINX uses to
determine that a server is unhealthy and thus ineligible to accept requests.
Here it considers a server unhealthy if a communication attempt fails even
once within each 2-second period (instead of the default of once in a 10-
second period).

We’re combining this setting with the proxy_next_upstream directive to


configure what NGINX considers a failed communication attempt, in which
case it passes requests to the next server in the upstream group. To the
default error and timeout conditions we add http_500 so that NGINX
considers an HTTP 500 (Internal Server Error) code from an upstream
server to represent a failed attempt.
The keepalive directive sets the number of idle keepalive connections to
upstream servers preserved in the cache of each worker process. We
already discussed the benefits in Mistake 3: Not Enabling Keepalive
Connections to Upstream Servers.

With NGINX Plus you can configure additional features with upstream


groups:

We mentioned above that NGINX Open Source resolves server


hostnames to IP addresses only once, during startup. The resolve
parameter to the server directive enables NGINX Plus to monitor
changes to the IP addresses that correspond to an upstream server’s
domain name, and automatically modify the upstream configuration
without the need to restart.

The service parameter further enables NGINX Plus to use DNS SRV


records, which include information about port numbers, weights, and
priorities. This is critical in microservices environments where the port
numbers of services are often dynamically assigned.

For more information about resolving server addresses, see Using DNS
for Service Discovery with NGINX and NGINX Plus on our blog.

The slow_start parameter to the server directive enables NGINX Plus


to gradually increase the volume of requests it sends to a server that is
newly considered healthy and available to accept requests. This
prevents a sudden flood of requests that might overwhelm the server
and cause it to fail again.

The queue directive enables NGINX Plus to place requests in a queue


when it’s not possible to select an upstream server to service the
request, instead of returning an error to the client immediately.

Resources
Creating NGINX Plus and NGINX Configuration Files in the NGINX Plus
Admin Guide
Gixy, an NGINX configuration analyzer on GitHub
NGINX Amplify, which includes the Analyzer tool

To try NGINX Plus, start your free 30-day trial today or contact us to discuss
your use cases.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy