Avoiding The Top 10 NGINX Configuration Mistakes - NGINX
Avoiding The Top 10 NGINX Configuration Mistakes - NGINX
Configuration Mistakes
When we help NGINX users who are having problems, we often see the
same configuration mistakes we’ve seen over and over in other users’
configurations – sometimes even in configurations written by fellow NGINX
engineers! In this blog we look at 10 of the most common errors, explaining
what’s wrong and how to fix it.
For all but the smallest NGINX deployments, a limit of 512 connections per
worker is probably too small. Indeed, the default nginx.conf file we
distribute with NGINX Open Source binaries and NGINX Plus increases it
to 1024.
Here’s why more FDs are needed: each connection from an NGINX worker
process to a client or upstream server consumes an FD. When NGINX acts
as a web server, it uses one FD for the client connection and one FD per
served file, for a minimum of two FDs per client (but most web pages are
built from many files). When it acts as a proxy server, NGINX uses one FD
each for the connection to the client and upstream server, and potentially a
third FD for the file used to store the server’s response temporarily. As a
caching server, NGINX behaves like a web server for cached responses and
like a proxy server if the cache is empty or expired.
NGINX also uses an FD per log file and a couple FDs to communicate with
master process, but usually these numbers are small compared to the
number of FDs used for connections and files.
UNIX offers several ways to set the number of FDs per process:
However, the method to use depends on how you start NGINX, whereas
worker_rlimit_nofile works no matter how you start NGINX.
There is also a system-wide limit on the number of FDs, which you can set
with the OS’s sysctl fs.file-max command. It is usually large enough, but it
is worth verifying that the maximum number of file descriptors all NGINX
worker processes might use (worker_rlimit_nofile * worker_processes) is
significantly less than fs.file‑max. If NGINX somehow uses all available FDs
(for example, during a DoS attack), it becomes impossible even to log in to
the machine to fix the issue.
Note that this directive doesn’t apply until NGINX reads and validates the
configuration. So each time NGINX starts up or the configuration is reloaded,
it might log to the default error log location (usually
/var/log/nginx/error.log) until the configuration is validated. To change the
log directory, include the -e <error_log_location> parameter on the nginx
command.
At high traffic volumes, opening a new connection for every request can
exhaust system resources and make it impossible to open connections at all.
Here’s why: for each connection the 4-tuple of source address, source port,
destination address, and destination port must be unique. For connections
from NGINX to an upstream server, three of the elements (the first, third, and
fourth) are fixed, leaving only the source port as a variable. When a
connection is closed, the Linux socket sits in the TIME‑WAIT state for two
minutes, which at high traffic volumes increases the possibility of exhausting
the pool of available source ports. If that happens, NGINX cannot open new
connections to upstream servers.
Note that the keepalive directive does not limit the total number of
connections to upstream servers that an NGINX worker process can
open – this is a common misconception. So the parameter to keepalive
does not need to be as large as you might think.
proxy_http_version 1.1;
proxy_set_header "Connection" "";
The mistake is to forget this “override rule” for array directives, which can be
included not only in multiple contexts but also multiple times within a given
context. Examples include proxy_set_header and add_header – having “add”
in the name of second makes it particularly easy to forget about the override
rule.
We can illustrate how inheritance works with this example for add_header:
http {
add_header X-HTTP-LEVEL-HEADER 1;
add_header X-ANOTHER-HTTP-LEVEL-HEADER 1;
server {
listen 8080;
location / {
return 200 "OK";
}
}
server {
listen 8081;
add_header X-SERVER-LEVEL-HEADER 1;
location / {
return 200 "OK";
}
location /test {
add_header X-LOCATION-LEVEL-HEADER 1;
return 200 "OK";
}
location /correct {
add_header X-HTTP-LEVEL-HEADER 1;
add_header X-ANOTHER-HTTP-LEVEL-HEADER 1;
add_header X-SERVER-LEVEL-HEADER 1;
add_header X-LOCATION-LEVEL-HEADER 1;
return 200 "OK";
}
}
}
When proxy buffering is disabled, NGINX buffers only the first part of a
server’s response before starting to send it to the client, in a buffer that by
default is one memory page in size (4 KB or 8 KB depending on the
operating system). This is usually just enough space for the response
header. NGINX then sends the response to the client synchronously as it
receives it, forcing the server to sit idle as it waits until NGINX can accept the
next response segment.
There are only a small number of use cases where disabling proxy buffering
might make sense (such as long polling), so we strongly discourage
changing the default. For more information, see the NGINX Plus Admin
Guide.
In general, the only directives you can always use safely within an if{} block
are return and rewrite. The following example uses if to detect requests
that include the X‑Test header (but this can be any condition you want to
test for). NGINX returns the 430 (Request Header Fields Too Large) error,
intercepts it at the named location @error_430 and proxies the request to
the upstream group named b.
location / {
error_page 430 = @error_430;
if ($http_x_test) {
return 430;
}
proxy_pass http://a;
}
location @error_430 {
proxy_pass b;
}
For this and many other uses of if, it’s often possible to avoid the directive
altogether. In the following example, when the request includes the X‑Test
header the map{} block sets the $upstream_name variable to b and the
request is proxied to the upstream group with that name.
# ...
location / {
proxy_pass http://$upstream_name;
}
At the risk of being obvious, the fix is to define just one health check per
upstream{} block. Here we define the health check for the upstream group
named b in a special named location, complete with appropriate timeouts
and header settings.
location / {
proxy_set_header Host $host;
proxy_set_header "Connection" "";
proxy_http_version 1.1;
proxy_pass http://b;
}
location @health_check {
health_check;
proxy_connect_timeout 2s;
proxy_read_timeout 3s;
proxy_set_header Host example.com;
proxy_pass http://b;
}
server {
listen 8080;
location / {
# …
}
location @health_check_b {
health_check;
proxy_connect_timeout 2s;
proxy_read_timeout 3s;
proxy_set_header Host example.com;
proxy_pass http://b;
}
location @health_check_c {
health_check;
proxy_connect_timeout 2s;
proxy_read_timeout 3s;
proxy_set_header Host api.example.com;
proxy_pass http://c;
}
location /api {
api write=on;
# directives limiting access to the API (see 'Mistake 8' below)
}
location = /dashboard.html {
root /usr/share/nginx/html;
}
}
For more information about health checks for HTTP, TCP, UDP, and gRPC
servers, see the NGINX Plus Admin Guide.
Some of the metrics are sensitive information that can be used to attack
your website or the apps proxied by NGINX, and the mistake we sometimes
see in user configurations is failure to restrict access to the corresponding
URL. Here we look at some of the ways you can secure the metrics. We’ll use
stub_status in the first examples.
With the following configuration, anyone on the Internet can access the
metrics at http://example.com/basic_status.
server {
listen 80;
server_name example.com;
location = /basic_status {
stub_status;
}
}
server {
listen 80;
server_name example.com;
location = /basic_status {
auth_basic “closed site”;
auth_basic_user_file conf.d/.htpasswd;
stub_status;
}
}
If you don’t want authorized users to have to log in, and you know the IP
addresses from which they will access the metrics, another option is the
allow directive. You can specify individual IPv4 and IPv6 addresses and
CIDR ranges. The deny all directive prevents access from any other
addresses.
server {
listen 80;
server_name example.com;
location = /basic_status {
allow 192.168.1.0/24;
allow 10.1.1.0/16;
allow 2001:0db8::/32;
allow 96.1.2.23/32;
deny all;
stub_status;
}
}
server {
listen 80;
server_name monitor.example.com;
location = /basic_status {
satisfy any;
With NGINX Plus, you use the same techniques to limit access to the NGINX
Plus API endpoint (http://monitor.example.com:8080/api/ in the following
example) as well as the live activity monitoring dashboard at
http://monitor.example.com/dashboard.html.
For more information about configuring the API and dashboard, see the
NGINX Plus Admin Guide.
server {
listen 8080;
server_name monitor.example.com;
satisfy any;
auth_basic “closed site”;
auth_basic_user_file conf.d/.htpasswd;
allow 127.0.0.1/32;
allow 96.1.2.23/32;
deny all;
location = /api/ {
api write=on;
}
location = /dashboard.html {
root /usr/share/nginx/html;
}
}
http {
upstream {
ip_hash;
server 10.10.20.105:8080;
server 10.10.20.106:8080;
server 10.10.20.108:8080;
}
server {# …}
}
But it turns out there’s a problem: all of the “intercepting” devices are on the
same 10.10.0.0/24 network, so to NGINX it looks like all traffic comes from
addresses in that CIDR range. Remember that the ip_hash algorithm hashes
the first three octets of an IPv4 address. In our deployment, the first three
octets are the same – 10.10.0 – for every client, so the hash is the same for
all of them and there’s no basis for distributing traffic to different servers.
The fix is to use the hash algorithm instead with the $binary_remote_addr
variable as the hash key. That variable captures the complete client address,
converting it into a binary representation that is 4 bytes for an IPv4 address
and 16 bytes for an IPv6 address. Now the hash is different for each
intercepting device and load balancing works as expected.
http {
upstream {
hash $binary_remote_addr consistent;
server 10.10.20.105:8080;
server 10.10.20.106:8080;
server 10.10.20.108:8080;
}
server {# …}
}
http {
server {
listen 80;
server_name example.com;
location / {
proxy_set_header Host $host;
proxy_pass http://localhost:3000/;
}
}
}
The mistake here is to assume that because there’s only one server – and
thus no reason to configure load balancing – it’s pointless to create an
upstream{} block. In fact, an upstream{} block unlocks several features that
improve performance, as illustrated by this configuration:
http {
upstream node_backend {
zone upstreams 64K;
server 127.0.0.1:3000 max_fails=1 fail_timeout=2s;
keepalive 2;
}
server {
listen 80;
server_name example.com;
location / {
proxy_set_header Host $host;
proxy_pass http://node_backend/;
proxy_next_upstream error timeout http_500;
}
}
}
The zone directive establishes a shared memory zone where all NGINX
worker processes on the host can access configuration and state
information about the upstream servers. Several upstream groups can share
the zone. With NGINX Plus, the zone also enables you to use the NGINX Plus
API to change the servers in an upstream group and the settings for
individual servers without restarting NGINX. For details, see the NGINX Plus
Admin Guide.
The server directive has several parameters you can use to tune server
behavior. In this example we have changed the conditions NGINX uses to
determine that a server is unhealthy and thus ineligible to accept requests.
Here it considers a server unhealthy if a communication attempt fails even
once within each 2-second period (instead of the default of once in a 10-
second period).
For more information about resolving server addresses, see Using DNS
for Service Discovery with NGINX and NGINX Plus on our blog.
Resources
Creating NGINX Plus and NGINX Configuration Files in the NGINX Plus
Admin Guide
Gixy, an NGINX configuration analyzer on GitHub
NGINX Amplify, which includes the Analyzer tool
To try NGINX Plus, start your free 30-day trial today or contact us to discuss
your use cases.