Sanic server timeout isue

kingshukb · December 20, 2018, 3:34pm

Hi, I am using Sanic(v0.8.3) with websockets and running webserver using:

app.run(host=args.host, port=args.port, workers=args.workers, debug=args.debug, protocol=WebSocketProtocol)

But i am getting ServiceUnavailable(‘Response Timeout’).
Attached is the stacktrace of the error:

ERROR:root:Traceback (most recent call last):
File “/home/deploy/.local/share/virtualenvs/computron-n7i6-y73/lib/python3.6/site-packages/sanic/server.py”, line 174, in response_timeout_callback
raise ServiceUnavailable(‘Response Timeout’)
sanic.exceptions.ServiceUnavailable: Response Timeout

The server works and this issue comes up after some time abruptly.
Please help.

ahopkins · December 20, 2018, 7:23pm

How is your Sanic app deployed?

kingshukb · December 21, 2018, 5:55am

Its deployed using supervisorctl in linux machine. The timeout mostly comes after prolonged usage or when i do load testing of APIs/websockets.

ahopkins · December 24, 2018, 8:41am

Are you making any modifications to the Protocol? Are you using the latest version of Sanic (0.8.3)?

kingshukb · December 24, 2018, 10:48am

i did pip freeze in the virtual environment. Here is the output:

sanic==0.8.3
Sanic-Cors==0.9.7
Sanic-Plugins-Framework==0.6.5
sanic-prometheus==0.1.4
sanic-sentry==0.1.7

And i have implemented web-sockets using blueprint and running the app server like the way i have mentioned above.

ahopkins · December 25, 2018, 10:35am

Thanks @kingshukb. I really would like to help you figure this one out.

I need to check this for sure, but I believe the problem is in using WebSocketProtocol to inside the run method. Basically, _last_request_time gets set by the execute_request_handler upon response. But, the request_timeout_callback only executes when there is no websocket connection made.

Therefore, it looks to me like there is some sort of failure to create a handshake and upgrade the connection to a websocket.

Are you able to establish a connection? What type of client are you using? Are you passing any Sec-Websocket-Protocol headers?

kingshukb · December 26, 2018, 8:18am

Hi @ahopkins, here is a sample headers in the websocket connection:

<CIMultiDict(‘upgrade’: ‘websocket’, ‘connection’: ‘upgrade’, ‘x-forwarded-proto’: ‘http’, ‘pragma’: ‘no-cache’, ‘cache-control’: ‘no-cache’, ‘sec-websocket-key’: ‘22zWPg1YfvhFKStmRygSrA==’, ‘sec-websocket-version’: ‘13’, ‘sec-websocket-extensions’: ‘x-webkit-deflate-frame’, ‘user-agent’: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1.2 Safari/605.1.15’)>

As i have said earlier, the websocket connection works fine for initial hits. But timeout issue begins only after sometime, this happens very quickly when i flood the server with websocket connections during load-testing, else after sometime when new websocket connections are made at normal rate

I am using javascript client for this.

ahopkins · December 26, 2018, 3:21pm

Can you provide your load testing script?

kingshukb · December 26, 2018, 3:40pm

Its a basic websocket loadtesting script using artillery:

#https://artillery.io/docs/ws-reference/

config:
target: "ws://XXXX/
phases:
- duration: 20
arrivalRate: 2000
ws:
# Ignore SSL certificate errors
# - useful in development with self-signed certs
rejectUnauthorized: false
scenarios:
- engine: “ws”
flow:
- send: “hello”
- think: 20
- send: “world”

Normal connections are created using javascript client.

ahopkins · December 26, 2018, 5:43pm

Thanks. I want to play around a little on this tomorrow. I’ll get back to you then.

kingshukb · December 29, 2018, 7:49am

Hi, any progress regarding the issue? I am having a difficult time figuring this out.

ahopkins · December 30, 2018, 10:22pm

I am trying this on a very vanilla install of Sanic websockets.

from sanic import Sanic
from sanic.websocket import WebSocketProtocol

app = Sanic()


@app.websocket('/feed')
async def feed(request, ws):
    while True:
        data = 'hello!'
        print('Sending: ' + data)
        await ws.send(data)
        data = await ws.recv()
        print('Received: ' + data)

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8000, protocol=WebSocketProtocol)

Running this on my 4 core i7-7700 CPU @ 3.60GHz with 48GB memory.

artillery run sample.yml

# sample.yml
config:
  target: "ws://localhost:8000/feed"
  phases:
    - duration: 20
      arrivalRate: 10
  ws:
    rejectUnauthorized: false
scenarios:
  - engine: "ws"
    flow:
      - send: "hello"
      - think: 20
      - send: "world"

The results seem okay.

Started phase 0, duration: 20s @ 00:13:24(+0200) 2018-12-31
Report @ 00:13:34(+0200) 2018-12-31
  Scenarios launched:  99
  Scenarios completed: 0
  Requests completed:  99
  RPS sent: 10.02
  Request latency:
    min: 0.1
    max: 0.8
    median: 0.2
    p95: 0.2
    p99: 0.5
  Codes:
    0: 99

Report @ 00:13:44(+0200) 2018-12-31
  Scenarios launched:  101
  Scenarios completed: 0
  Requests completed:  101
  RPS sent: 10.1
  Request latency:
    min: 0.1
    max: 1.6
    median: 0.1
    p95: 0.2
    p99: 1.5
  Codes:
    0: 101

Report @ 00:13:54(+0200) 2018-12-31
  Scenarios launched:  0
  Scenarios completed: 100
  Requests completed:  100
  RPS sent: 10.11
  Request latency:
    min: 0.1
    max: 0.3
    median: 0.2
    p95: 0.3
    p99: 0.3
  Codes:
    0: 100

Report @ 00:14:04(+0200) 2018-12-31
  Scenarios launched:  0
  Scenarios completed: 100
  Requests completed:  100
  RPS sent: 10.09
  Request latency:
    min: 0.1
    max: 0.3
    median: 0.2
    p95: 0.3
    p99: 0.3
  Codes:
    0: 100

All virtual users finished
Summary report @ 00:14:04(+0200) 2018-12-31
  Scenarios launched:  200
  Scenarios completed: 200
  Requests completed:  400
  RPS sent: 10.03
  Request latency:
    min: 0.1
    max: 1.6
    median: 0.2
    p95: 0.3
    p99: 0.3
  Scenario counts:
    0: 200 (100%)
  Codes:
    0: 400

When I bump up that arrivalRate to the 2000 that you have in your config, I get:

Warning: 
CPU usage of Artillery seems to be very high (pids: 24059)
which may severely affect its performance.
See https://artillery.io/docs/faq/#high-cpu-warnings for details.

Which leads me to my next question, what kind of hardware are you running it on? To create 2000 new connections every second for 20 seconds is a lot of simultaneous connections requiring more resources than my beefy laptop can handle.

It seems like you may be running into hardware issues, and need a strategy to achieve greater horizontal scaling.

Question:

What hardware are you running?
What target level of throughput/traffic are you aiming for?

kingshukb · January 2, 2019, 6:56am

Hi @ahopkins,
Thanks immensely for your time. Regarding the questions that you had:

Hardware config: 8 GB RAM, 4 GB swap mem, 2 core CPUs(Intel Xeon Platinum 8000 series (Skylake-SP) processor. refer: https://aws.amazon.com/ec2/instance-types/m5/). Basically i am using m5.large EC2 instances from AWS.
Target traffic: i am aiming to support final throughput of 10k users requests/sec. Have started with 2k users initially

Let me try to upgrade my system and make another round of load testing and get back to you then.

Just have a doubt:

Are these related issues?

ahopkins · January 7, 2019, 10:29am

I’ll get back with a more thorough response to this later when I have more time.

Any results? What type system did you use?

qual · September 8, 2019, 2:55pm

Hey. Any news related to this theme? Had the same issue with WS Protocol but using Nginx.

qual · September 8, 2019, 4:17pm

@ahopkins do i need to provide some additional data?

ahopkins · September 11, 2019, 12:01pm

What version of Sanic?