I tried your code above to see what I can find.
I’m using Ubuntu 20.04, I’m doing these tests on my low-power laptop, it has 4-core Intel Core i5 mobile CPU @1.7Ghz, so the results quite a bit lower than what you’re seeing on your Ryzen.
I’m testing on Sanic 20.12.1, and master (21.03a). The quoted results are for 20.12, with 21.03a results in parentheses after it. I’m using just one Sanic worker.
Firstly, using your python (urllib3) based method, I ran 1 million requests spread across 20 processes using python multiprocessing. And it took 252.9 seconds (master 287.9), which equates to just 3954 requests per second (master 3484/sec) (We’ll need to check out why master is slower in this test)
Next, I tried exactly the same example using
wrk for 60 seconds.
wrk -c 20 -d 60 -t 20 http://127.0.0.1
With this test I got 9434 requests/sec (master 9648/sec).
Almost 3 times faster. I think one of the differences is that each
wrk worker holds its TCP connection open between requests.
Looks to me that you’ve deliberately written your Python tester code to close the TCP connection between requests (with
http.close()) This is sensible because it simulates real world conditions where each connection might be coming from a different client, so Sanic cannot re-use any existing connections.
Putting that aside, you can get the Python-bechmark numbers a bit higher by reusing the connection, taking out the
http.close() line. On my laptop now this completes 1 million requests in 168.1 (master 152.8) seconds, which comes out to 5948 requests/sec (master 6544/sec).
Unfortunately you can’t do the opposite test to force
wrk to close connections between requests, because
wrk doesn’t work that way.
All of this is a far cry from the approx 20k requests/sec I’d expect to be seeing on Sanic with a single worker, so I might look further into this testing methodology to see where I can find some more gains.
Testing the same code on my work computer, 8-core Intel Core i9 @3.4Ghz:
Python URLlib3 test1 (with connection close) - 1 million requests across 20 workers:
133.3 seconds total, 7.5k requests/sec (master 164.6sec, 6k/sec)
Python URLlib3 test2 (no close connection):
97.3 seconds, 10.2k requests/sec (master 97.2sec, 10.3k/sec)
wrk 20 threads for 60 seconds:
12.5k requests/sec (master 12.6k req/sec)
wrk 20 client threads and 8 server workers: