Sanic server timeout isue

Hi, I am using Sanic(v0.8.3) with websockets and running webserver using:

app.run(host=args.host, port=args.port, workers=args.workers, debug=args.debug, protocol=WebSocketProtocol)

But i am getting ServiceUnavailable(‘Response Timeout’).
Attached is the stacktrace of the error:

ERROR:root:Traceback (most recent call last):
File “/home/deploy/.local/share/virtualenvs/computron-n7i6-y73/lib/python3.6/site-packages/sanic/server.py”, line 174, in response_timeout_callback
raise ServiceUnavailable(‘Response Timeout’)
sanic.exceptions.ServiceUnavailable: Response Timeout

The server works and this issue comes up after some time abruptly.
Please help.

How is your Sanic app deployed?

Its deployed using supervisorctl in linux machine. The timeout mostly comes after prolonged usage or when i do load testing of APIs/websockets.

Are you making any modifications to the Protocol? Are you using the latest version of Sanic (0.8.3)?

i did pip freeze in the virtual environment. Here is the output:

sanic==0.8.3
Sanic-Cors==0.9.7
Sanic-Plugins-Framework==0.6.5
sanic-prometheus==0.1.4
sanic-sentry==0.1.7

And i have implemented web-sockets using blueprint and running the app server like the way i have mentioned above.

Thanks @kingshukb. I really would like to help you figure this one out.

I need to check this for sure, but I believe the problem is in using WebSocketProtocol to inside the run method. Basically, _last_request_time gets set by the execute_request_handler upon response. But, the request_timeout_callback only executes when there is no websocket connection made.

Therefore, it looks to me like there is some sort of failure to create a handshake and upgrade the connection to a websocket.

Are you able to establish a connection? What type of client are you using? Are you passing any Sec-Websocket-Protocol headers?

Hi @ahopkins, here is a sample headers in the websocket connection:

<CIMultiDict(‘upgrade’: ‘websocket’, ‘connection’: ‘upgrade’, ‘x-forwarded-proto’: ‘http’, ‘pragma’: ‘no-cache’, ‘cache-control’: ‘no-cache’, ‘sec-websocket-key’: ‘22zWPg1YfvhFKStmRygSrA==’, ‘sec-websocket-version’: ‘13’, ‘sec-websocket-extensions’: ‘x-webkit-deflate-frame’, ‘user-agent’: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1.2 Safari/605.1.15’)>

As i have said earlier, the websocket connection works fine for initial hits. But timeout issue begins only after sometime, this happens very quickly when i flood the server with websocket connections during load-testing, else after sometime when new websocket connections are made at normal rate

I am using javascript client for this.

Can you provide your load testing script?

Its a basic websocket loadtesting script using artillery:

#https://artillery.io/docs/ws-reference/

config:
target: "ws://XXXX/
phases:
- duration: 20
arrivalRate: 2000
ws:
# Ignore SSL certificate errors
# - useful in development with self-signed certs
rejectUnauthorized: false
scenarios:
- engine: “ws”
flow:
- send: “hello”
- think: 20
- send: “world”

Normal connections are created using javascript client.

Thanks. I want to play around a little on this tomorrow. I’ll get back to you then.

Hi, any progress regarding the issue? I am having a difficult time figuring this out.

I am trying this on a very vanilla install of Sanic websockets.

from sanic import Sanic
from sanic.websocket import WebSocketProtocol

app = Sanic()


@app.websocket('/feed')
async def feed(request, ws):
    while True:
        data = 'hello!'
        print('Sending: ' + data)
        await ws.send(data)
        data = await ws.recv()
        print('Received: ' + data)

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8000, protocol=WebSocketProtocol)

Running this on my 4 core i7-7700 CPU @ 3.60GHz with 48GB memory.

artillery run sample.yml

# sample.yml
config:
  target: "ws://localhost:8000/feed"
  phases:
    - duration: 20
      arrivalRate: 10
  ws:
    rejectUnauthorized: false
scenarios:
  - engine: "ws"
    flow:
      - send: "hello"
      - think: 20
      - send: "world"

The results seem okay.

Started phase 0, duration: 20s @ 00:13:24(+0200) 2018-12-31
Report @ 00:13:34(+0200) 2018-12-31
  Scenarios launched:  99
  Scenarios completed: 0
  Requests completed:  99
  RPS sent: 10.02
  Request latency:
    min: 0.1
    max: 0.8
    median: 0.2
    p95: 0.2
    p99: 0.5
  Codes:
    0: 99

Report @ 00:13:44(+0200) 2018-12-31
  Scenarios launched:  101
  Scenarios completed: 0
  Requests completed:  101
  RPS sent: 10.1
  Request latency:
    min: 0.1
    max: 1.6
    median: 0.1
    p95: 0.2
    p99: 1.5
  Codes:
    0: 101

Report @ 00:13:54(+0200) 2018-12-31
  Scenarios launched:  0
  Scenarios completed: 100
  Requests completed:  100
  RPS sent: 10.11
  Request latency:
    min: 0.1
    max: 0.3
    median: 0.2
    p95: 0.3
    p99: 0.3
  Codes:
    0: 100

Report @ 00:14:04(+0200) 2018-12-31
  Scenarios launched:  0
  Scenarios completed: 100
  Requests completed:  100
  RPS sent: 10.09
  Request latency:
    min: 0.1
    max: 0.3
    median: 0.2
    p95: 0.3
    p99: 0.3
  Codes:
    0: 100

All virtual users finished
Summary report @ 00:14:04(+0200) 2018-12-31
  Scenarios launched:  200
  Scenarios completed: 200
  Requests completed:  400
  RPS sent: 10.03
  Request latency:
    min: 0.1
    max: 1.6
    median: 0.2
    p95: 0.3
    p99: 0.3
  Scenario counts:
    0: 200 (100%)
  Codes:
    0: 400

When I bump up that arrivalRate to the 2000 that you have in your config, I get:

Warning: 
CPU usage of Artillery seems to be very high (pids: 24059)
which may severely affect its performance.
See https://artillery.io/docs/faq/#high-cpu-warnings for details.

Which leads me to my next question, what kind of hardware are you running it on? To create 2000 new connections every second for 20 seconds is a lot of simultaneous connections requiring more resources than my beefy laptop can handle.

It seems like you may be running into hardware issues, and need a strategy to achieve greater horizontal scaling.


Question:

  1. What hardware are you running?
  2. What target level of throughput/traffic are you aiming for?

Hi @ahopkins,
Thanks immensely for your time. Regarding the questions that you had:

  1. Hardware config: 8 GB RAM, 4 GB swap mem, 2 core CPUs(Intel Xeon Platinum 8000 series (Skylake-SP) processor. refer: https://aws.amazon.com/ec2/instance-types/m5/). Basically i am using m5.large EC2 instances from AWS.
  2. Target traffic: i am aiming to support final throughput of 10k users requests/sec. Have started with 2k users initially

Let me try to upgrade my system and make another round of load testing and get back to you then.

Just have a doubt:



Are these related issues?

I’ll get back with a more thorough response to this later when I have more time.

Any results? What type system did you use?

Hey. Any news related to this theme? Had the same issue with WS Protocol but using Nginx.

@ahopkins do i need to provide some additional data?

What version of Sanic?