Making Sanic even faster

There are a number of ideas floating around on where and how to make Sanic faster. I want to start compiling them here.

I did some pretty crude initial testing to see where time is being spent (will try to do something more sophisticated later). Here is the results:

Timing: _run_request_middleware
	 6.67572021484375
Timing: self.router.get
	 47.92213439941406
Timing: handler
	 22.411346435546875
Timing: _run_response_middleware
	 2.86102294921875
Timing: write_callback
	 97.99003601074219

Timer results:
	_run_request_middleware: 3.75%
	self.router.get: 26.94%
	handler: 12.6%
	_run_response_middleware: 1.61%
	write_callback: 55.09%

This is on a cold start of a basic hello world app, so no lrucache on the router and nothing complex at all. As you can see, the write_callback takes up much of the time. This comment here, seems to imply we may have maxed out speed in “pure python”:

This is all returned in a kind-of funky way
We tried to make this as fast as possible in pure python

Maybe we need to start looking at Cython or other utilties to get some gains in the output and also the router? Initial thoughts?

What is more important? Speed or a pure Python package?

  • speed
  • pure Python

0 voters

By “pure python” I mean using the Standard Library as used in the the versions of Python that are supported by Sanic (cPython 3.5, 3.6 and 3.7 for 18.12LTS, and 3.6, 3.7, 3.8 for 19.12LTS) as opposed to using extensions and pre-compiled code.

1 Like

Should we compile whole sanic in cython like falcon do ?

Hmmm, “pure python” … does that mean

  • cPython 3.5 ?
  • cPython 3.5 with special compiler flags ?
  • cPython 3.8 ? (IIRC, 3.7 has a big speed boost)
  • cPython 3.8 with special compiler flags ?
  • PyPy 3.5 ?
  • PyPy 3.8 ?
  • cython 3.5 ?

I’m just thinking that since it is a slippery slope, someone desperate for speed may want to run PyPy or cython; another person may be constrained to whatever homebrew is serving.
(Disclosure: I just exhausted my knowledge of performance testing).

By “pure python” I mean using the Standard Library as used in the the versions of Python that are supported by Sanic (cPython 3.5, 3.6 and 3.7 for 18.12LTS, and 3.6, 3.7, 3.8 for 19.12LTS) as opposed to using extensions and pre-compiled code.

1 Like

maybe we can make a cython version of router & write callback, since its compatible with cpython, and uvloop not working in pypy thou…

@vltr has a router (xrtr) that he built in cython that we could use (or with his permission, adopt). It is based on a radix tree which is very efficient in certain circumstances.

image
image
image
image

I removed routes from the last two diagrams because it was so far off the curve that it made the rest hard to see

As you can see, @vltr package appears to scale and operate faster than Sanic and is on par with (or better than) Falcon. I did some initial work to see what it would look like as a drop in replacement. I can push that to my fork if anyone would like to see it.

Also, as a side note and reminder … uvloop and httptools (both being Sanic dependencies) are also each written with Cython.

3 Likes

Yeah, i remember routing speed is one of elixir’s Phoenix very proud of. Great to see better routing in sanic :smiley:

====
About the write callback, maybe we should convert string var in this block https://github.com/huge-success/sanic/blob/master/sanic/response.py#L170 to cdef str or char* type as a first step ? like this: https://gist.github.com/greentornado/69df26f6fbe044232398137833d8e41c

1 Like

I cast my vote for speed (hell, it’s the reason we’re after a fast async framework, right?) but I want to add the caveat that I think there should be reasonable fallbacks to make the framework usable, if slower, in other environments but we should probably note it up front in the documentation.

1 Like

Err … I know I almost “disappeared” in the last couple of months (all the problems you can imagine), but since I made xrtr and also a proof-of-concept project linking xrtr and Sanic (called sanic-boom), I made a throughout examination of a lot of routing techniques, some other proof-of-concepts, benchmark scripts and some other things (pros and cons between a lot of these routing systems).

Here are some thoughts:

  • xrtr is based on a trie solution, which means it iterates over chars to find a matching route, which means it can’t match against RegEx, for example, as we do today with Sanic;
  • Having a C or Cython boosted solution might be good for speed, but can limit a lot your plans regarding distribution - not only CPython vs PyPy/etc, but also development dependencies or even (the ol’ history) of having a different libc in your host machine (Alpine?). So, I’m always a step behind when dealing with a non-pure Python solution for something;
  • Falcon’s router, which almost matches xrtr performance is, incredibly enough, pure-Python - with RegEx support to routes and everything else you can imagine. It just creates a single, “almost” static method source and evaluates it to Python, which I consider a very good strategy (AST can do wonders in this case);
  • I started working last year on a pure-C version of xrtr, but it’s on very, very early stages (that one where you may be embarrassed enough to show any source code), with bits of a trie router in here and some RegEx there (where needed). I even thought on making it just enough to work with a Python interface, so not only Sanic could use it in the future;
    – I even considered integrating parts of this router with httptools, to have everything working at the C level and then just providing the interface to Python, but I’m unsure if embedding more than one functionality can be good to a library (hence the UNIX philosophy);
  • From what I could see recently, there are developers still concerned about the “removing routes” feature, something that it’s impossible in xrtr;
  • There are some loose ends in Sanic that are strictly related to the router that we should also take a look and figure out how to make them work without making the same design choices of the past.

Well, I think I’m out of thoughts but it’s also enough to heat up the conversation I think :wink:

Cheers!

3 Likes

Okay, so I’ve been going back and forth on this in my head and trying to dig into various options. I think I’ve got a potential solution in the works leveraging hyperscan.

I’m building a POC router that will match the core of the Sanic router to keep compatability. Assuming it all goes well, I’ll publish to you guys to get feedback. I’m hesitant for now just in case I make a flop of it, but initial testing looks very promising.

After I finish ASGI on #1475 this week, I’ll put some more attention on this and solicit feedback. But, I wanted to share with you all and let you know I’m working on another option.

2 Likes

In my benchmarks the routing (at least with static paths) is so fast that it doesn’t really even show up on profiling reports. There are no particularly big things slowing it down, but rather it is a long chain of checks and calls to get from request to handler, and then back with a response, and all those small parts have some cost, and it is not easy to rewrite all of that in C++. However, pure Python implementation can indeed be quite fast if done correctly, like keeping special cases out of the fast path and making sure that most “inner loops” occur in within CPython implementation and not in Python code.

There are a number of calls to router.get that are unnecessary in my opinion, and also prevent things like modifying the request (see #1373 for an example). In the course of a single request, the router is activated–unnecessarily–when checking for a stream and every time accessing URL parameters.

Agreed. I don’t think there is a reason to take this all outside of Python. Reduce the number of calls, for sure. I have, however, put in A LOT of hours on trying to figure out different strategies for routing. What Sanic is doing today is not terrible, but it is not great either. When I was working on the solution mentioned above using hyperscan, I was able to get benchmarks close to that of falcon. There is performance to be had here.

:+1: on this. The biggest gains here are to identify the 99%, and pull the edge cases out. This is sort of where I left it last I was working on this issue.

Since routing is always done right after a request object is created, couldn’t we just store the results within that object? This would allow for request middleware to re-route before the handler is called. Right after router.get there is a separate call to router.is_stream_handler which could probably be combined with the former function without any performance loss even for the majority of requests which don’t have content (GET requests don’t need to check if the handler is streaming).

It should be interesting to benchmark against Falcon, especially with a realistic-sized app with dynamic routes. Has anyone got an app implemented in Sanic and Falcon that I could use?

@vltr put together a nice tool for benchmarking routing. It will generate decent sized apps with both static and dynamic apps on the fly for testing.

I suggest you turn of kua and routes.