Performance Decline When Using Nginx Reverse Proxy


I’m just starting to learn Sanic framework because of its fast benchmark. I made a simple hello world API, then connected it with Gunicorn. The performance was quite good, but when I combined it with Nginx, it became really bad. I’ve found out that Gunicorn processes with Nginx were limited to 1% - 4% CPU resource for each process. Without Nginx, Gunicorn could reach up to 10% for each process. I thought it was because of wrong Nginx configuration. Can anyone give me some suggestions?

Server information:

OS: Ubuntu 18.04    
Python version: 3.7.2    
Sanic version: 18.12.0    
Processor: i3-4130

Sanic + Gunicorn performance:

wrk -t8 -c1000 -d60s --timeout 2s
Running 1m test @
  8 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    29.54ms   15.13ms 175.77ms   71.23%
    Req/Sec     4.32k     1.29k   19.46k    64.77%
  2060010 requests in 1.00m, 249.50MB read
Requests/sec:  34281.64
Transfer/sec:      4.15MB

Sanic + Gunicorn + Nginx performance:

wrk -t8 -c1000 -d60s --timeout 2s
Running 1m test @
  8 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   364.78ms  271.20ms   1.39s    67.53%
    Req/Sec   370.88    251.66     3.52k    87.12%
  177223 requests in 1.00m, 30.42MB read
Requests/sec:   2948.79
Transfer/sec:    518.25KB

Sanic app:

from sanic import Sanic
from sanic.response import json

app = Sanic()
app.config.ACCESS_LOG = False

async def test(request):
    return json({"hello": "world"})

Gunicorn command:

gunicorn --bind --workers 8 --threads 4 app:app --worker-class sanic.worker.GunicornWorker --name SanicHelloWorld

Global Nginx configuration:

worker_processes 8;
worker_rlimit_nofile 400000;
thread_pool sanic_thread_pool threads=32 max_queue=65536;
pid /run/;
include /etc/nginx/modules-enabled/*.conf;

events {
	multi_accept on;
	worker_connections 25000;
	use epoll;
	accept_mutex off;

http {
	access_log off;
	sendfile on;
	sendfile_max_chunk 512k;
	tcp_nopush on;
	tcp_nodelay on;
	server_names_hash_bucket_size 64;
	proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
	proxy_set_header X-Forwarded-Proto $scheme;
	proxy_set_header Host $http_host;
	proxy_set_header X-Real-IP $remote_addr;
	proxy_redirect off;

	include /etc/nginx/mime.types;
	default_type application/octet-stream;

	include /etc/nginx/conf.d/*.conf;
	include /etc/nginx/sites-enabled/*;

	upstream sanic-test {

Nginx configuration for Sanic + Gunicorn:

server {
	listen 8081;
	listen [::]:8081;


	location / {
		aio threads=sanic_thread_pool;


I found the same problem a few days ago, tried with Sanic and Starlette. Also 18.04 and 3.7.2, latest on pip for both. Running just a single naked worker gets me 12k req/s, and virtually the same with gunicorn. Add nginx and it drops to 4k req/s (for both frameworks)

Only thing I can think of is that I’m using ppa:nginx/mainline, maybe that has something to do with it?


I found a follow up answer on this issue ( link below ), as it suggest proxy_buffering by nginx limits the active connection to the real backend stack, hence the reduced performance. Try to set proxy_buffering off; for more realistic performance results?