xrtr
would be simply “X router”, since I just made it as a proof of concept. No final name yet as no final verdict I came across on the best solution to the matter. I’ll add more thoughts replying another comment.
@ahopkins: when I pointed you towards AST, I already knew (from my benchmarks) that this could be an awesome feature for Sanic, not only because it would be pure Python (requiring no extra dependencies) but also would be in our reach to maintain.
@5onic I also like the trie
model, I just don’t know if having a pure Python (or regex mixed version) would be as fast as having code written on the fly. Moreover, perhaps we could try and mix trie
with the AST router and see if we can benefit from it - but this would make the router code pretty exclusive to maintain …
As of xrtr
, I’m playing a little bit (when possible) with pybind11, and I really believe an interesting router could be made using plain C++ with direct bindings to Python (instead of using Cython
), but, again, that would be a binary dependency for Sanic … Which I believe is a downside.
Oh, just saw that (regarding trie ideas with AST) … anding
seems like a good alternative, but the speed up is almost unnoticeable
So … A trie regex perhaps? Argh, that would be hell on earth to debug (and even harder to find a good way to write it down inside the router).
That was this test: A fAST new router
There is an ever so sight benefit of you can evaluate a few levels of the hierarchy at once. Given the scope of complexity it adds for the almost irrelevant difference…
That’s basically what hyprtr is.
At the end of the day, the AST has a lot of benefits. As much as I looked into it, @vltr, you were right.
I am working on another base version so that it can be used for both routing and also the signaling RFC.
Ah, I did not dig myself completely inside the hyprtr
code to check on that … Sorry.
Well, I’m glad all this effort we put into xrtr
, hyprtr
, the Sanic router and the benchmark suite came to some good usage inside Sanic after all
I know I’m a bit over my head with things to do but I would love to work on the new router.
So, I redid the methodology for parsing the tree. But I realized an obstacle that I am not sure how to overcome.
First of all, one thing I am trying very hard to achieve is to make this a drop in replacement for the Sanic router. Sanic will allow you to do something like this:
srouter = sanic.router.Router()
srouter.add("/foo/<id:int>", ["get"], 1)
srouter.add("/foo/<id:string>", ["get"], 2)
You can run this and see it works as expected.
class Request:
def __init__(self, path, method):
self.path = path
self.method = method
print(srouter.get(Request("/foo/1", "get")))
print(srouter.get(Request("/foo/abc", "get")))
Two routes have an identical path, but they have different regex. Therefore the Sanic router allows this.
The equivalent in Falcon does not work. This is not acceptable
frouter = falcon.routing.CompiledRouter()
frouter.add_route("/foo/{id:int}", method_map=["GET"], resource=1)
frouter.add_route("/foo/{id:uuid}", method_map=["GET"], resource=1)
The way the AST style routers work is that all dynamic variables are sort of the same. There is nothing to check or compare against. Later, after the route is found it can be cast to type.
Which seems like a roadblock for full compatibility.
Or is it?
Perhaps we can merge them with two optional casts, and it can try both. Of course the idea popped into my head after I started writing this…
I am going to leave the post anyway. Will report back.
Okay, ignore that last post. It is feasible. Not thrilled about adding a regex.match
, but it works. This is definitely something to optimize at a further point.
def find_route(path, router, basket):
parts = path.split(router.delimiter)
num = len(parts)
if num > 0:
if parts[0] == "":
if num > 1 and REGEX_TYPES["int"][1].match(parts[1]):
basket[1] = parts[1]
return router.dynamic_routes["/<foo:int>"], basket
elif num > 1 and REGEX_TYPES["string"][1].match(parts[1]):
basket[1] = parts[1]
return router.dynamic_routes["/<foo>"], basket
Here is most recent version: https://gist.github.com/ahopkins/9b91b950c94c1bc3c24df741b7fb02ef#file-v2-py
So, quick explanation.
The new version is generating a tree that looks like this:
Node(level=0)
Node(part=a, level=1)
Node(part=much, level=2)
Node(part=longer, level=3)
Node(part=item, level=4)
Node(part=<id>, level=5, route=<Route: /a/much/longer/item/<id>>, dynamic=True)
Node(part=<longer>, level=3, dynamic=True)
Node(part=item, level=4)
Node(part=<id>, level=5, route=<Route: /a/much/<longer>/item/<id>>, dynamic=True)
Node(part=thing, level=2)
Node(part=<id>, level=3, route=<Route: /a/thing/<id>>, dynamic=True)
Node(part=and, level=4)
Node(part=doit, level=5, route=<Route: /a/thing/<id>/and/doit>)
Node(part=<banana>, level=2, route=<Route: /a/<banana>>, dynamic=True)
Node(part=<foo>, level=1, route=<Route: /<foo>>, dynamic=True)
Node(part=bar, level=2, route=<Route: /<foo>/bar>)
Node(part=buzz, level=2, route=<Route: /<foo>/buzz>)
Node(part=fuzz, level=2, route=<Route: /<foo>/fuzz>)
Node(part=<foo:int>, level=1, route=<Route: /<foo:int>>, dynamic=True)
It is strictly ordered, so it does not matter at all the order of inserting new routes.
As you can see from this example, there does exist the potential to collapse some layers together where you have multiple consecutive static path parts.
Wow, nice stuff @ahopkins could you please attach examples how to use the v2 router? I’ll also play with it.
For now… this API will get cleaned up.
from functools import lru_cache
import random
# This would be the router used by Sanic still need to add some cleanup here
class Router(BaseRouter):
@lru_cache
def get(self, path, *args):
return self.resolve(path, *args)
router = Router()
routes = [
"/hello/world",
"/<foo>",
"/a/<banana>",
"/a/thing/<id>/and/doit",
"/a/thing/<id>",
"/<foo>/fuzz",
"/<foo>/buzz",
"/<foo>/bar",
"/a/much/longer/item/<id>",
"/a/much/<longer>/item/<id>",
"/<foo:int>",
]
# To prove that input order does not matter
random.shuffle(routes)
for i, r in enumerate(routes):
router.add(r, i)
# These will probably be abstracted away
router._generate_tree()
router._render()
# This is a debug feature to show the tree
router.tree.display()
print("===")
# Output the generated source code
print(router.find_route_src)
print("===")
# See the routes that were attached
print(router.static_routes)
print(router.dynamic_routes)
# Do some sample fetches
print(router.get("/hello/world"))
print(router.get("/hello"))
print(router.get("/hello/bar"))
print(router.get("/a/bar"))
print(router.get("/a/thing/bar"))
print(router.get("/a/thing/bar/and/doit"))
I tried the new version of ast router and add two route like this:
routes = ["/foo/<some:int>/bar", "/foo/<some:ymd>/bar"]
the find_route_src
shows that the def find_route
looks like this
def find_route(path, router, basket):
parts = path.split(router.delimiter)
num = len(parts)
if num > 0:
if parts[0] == "":
if num > 1:
if parts[1] == "foo":
if num > 2:
basket[2] = parts[2]
if num > 3:
if parts[3] == "bar":
return router.dynamic_routes["/foo/<some:int>/bar"], basket
basket[2] = parts[2]
if num > 3:
if parts[3] == "bar":
return router.dynamic_routes["/foo/<some:ymd>/bar"], basket
raise NotFound
So if the request url is /foo/2020-08-29/bar
, the method won’t return the right handler and basket.
The method can match the dynamic part earlier but it looks very inefficient
def find_route(path, router, basket):
parts = path.split(router.delimiter)
num = len(parts)
if num > 0:
if parts[0] == "":
if num > 1:
if parts[1] == "foo":
if num > 2:
basket[2] = parts[2]
if num > 3:
if parts[3] == "bar":
if num > 4:
if router.dynamic_routes["/foo/<some:int>/bar"].params[4].pattern.search(parts[4]):
return router.dynamic_routes["/foo/<some:int>/bar"], basket
basket[2] = parts[2]
if num > 3:
if parts[3] == "bar":
if num > 4:
if router.dynamic_routes["/foo/<some:ymd>/bar"].params[4].pattern.search(parts[4]):
return router.dynamic_routes["/foo/<some:ymd>/bar"], basket
raise NotFound
Falcon
didn’t support the routes like the these , But some framwork did.
xrtr
seems did’t support type match for dynamic part.
So will the ast route drop support for this scenario?
A few things:
- I have another version that does handle this scenario, however…
- Doing so has its own set of disadvantages, namely that it adds some regex into
find_route
- I am preparing and RFC right now. This is one of the big questions that needs to be answered because as I see it there are two directions to go. One favoring performance, the other compatability. Since this is a valid route scheme in Sanic today, dropping this support will break some applications for sure.
Great. this should be seriously considered
I have pushed the repo that I have been working on.
The goal is to get this integrated into master
branch in the next week or two to try and make it for v21.3. Let me know your thoughts.
100% test coverage
Lots of cleanup to still do, but we are getting closer. Also, this was a nice side-by-side comparison of the router change.
In Version 20.12
-----------------------------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Mops/s) Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_resolve_route_no_arg_string_path 183.5220 (1.0) 260.4150 (1.0) 193.0517 (1.0) 6.9385 (1.0) 191.8610 (1.0) 3.9795 (1.0) 134;97 5.1800 (1.0) 1000 1000
test_resolve_route_with_typed_args 184.8240 (1.01) 590.3360 (2.27) 201.4882 (1.04) 30.8262 (4.44) 194.9180 (1.02) 4.0070 (1.01) 40;171 4.9631 (0.96) 1000 1000
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
In new router branch
----------------------------------------------------------------------------------------------- benchmark: 2 tests -----------------------------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Mops/s) Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_resolve_route_no_arg_string_path 104.7200 (1.0) 192.7400 (1.0) 109.5443 (1.0) 7.1217 (2.22) 108.8200 (1.0) 0.3000 (1.0) 20;202 9.1287 (1.0) 1000 1000
test_resolve_route_with_typed_args 117.8700 (1.13) 193.0490 (1.00) 126.3849 (1.15) 3.2035 (1.0) 127.1450 (1.17) 3.0255 (10.08) 156;16 7.9123 (0.87) 1000 1000
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Here are some overall benchmarks.
Version 20.12
Running 30s test @ http://127.0.0.1:3333
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.48ms 5.95ms 131.35ms 91.06%
Req/Sec 10.21k 1.94k 56.63k 76.23%
3659132 requests in 30.09s, 411.78MB read
Requests/sec: 121614.26
Transfer/sec: 13.69MB
Current master
Running 30s test @ http://127.0.0.1:3333
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.81ms 4.55ms 107.27ms 90.82%
Req/Sec 11.22k 2.29k 36.57k 74.27%
4014399 requests in 30.09s, 394.33MB read
Requests/sec: 133394.06
Transfer/sec: 13.10MB
New router branch
Running 30s test @ http://127.0.0.1:3333
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.74ms 4.60ms 102.41ms 90.73%
Req/Sec 11.60k 2.61k 46.51k 75.23%
4148337 requests in 30.08s, 407.48MB read
Requests/sec: 137923.77
Transfer/sec: 13.55MB
For the most part, we will see little change in the new routing branch because of route caching. The most noticeable place that we will see the increase is when matching on dynamic routes that are not cacheable.