A fAST new router

First of all nice stuff @ahopkins but I don’t have much time to fully check your POC, but in the first glance it looks nice, also as a contributor of Falcon I can confirm that Falcon uses AST stuff for routing.

Am I wondering whether we did the research for that case? for example to find some Data Structure which will satisfy our needs. I have been thinking about Prefix-Tree but right now, I’m not sure that suitable for us, need to do some research. Becase in our case, we also need to handle url parameters, segments etc.

1 Like

Thanks for the heads up. I agree that the data structure here needs some consideration. One of the things that both htprtr and xrtr do for optimization is exactly that. I agree that it should be worked in here. Also, I am using a simple structure, as opposed to what it looks like Falcon does with some custom classes. Something more robust is probably needed since there is a lot of meta-data per node level that needs to be considered, and the current implementation does not take some of this into consideration to optimize the structure (which is the errors being experienced above).

and the current implementation does not take some of this into consideration to optimize the structure (which is the errors being experienced above).

No problems at all here, just want to say that for the future implementation we should keep in mind the performance/easy scaling of the router, etc.

Also, what htprtr and xrtr means?

@vltr will need to explain xrtr. I imagine “extreme router”?

hyprtr is “hyperscan router”

I have some more tests to check performance on different ways to potentially handle tries to see if it is indeed faster than nested if/else statements.

A python data struct would be much more clear for me.The AST is also great but with less readable. Although the idea of AST is blowing my mind. Great work again.

I tried a few types of simple tests that could be used with a trie.

  • control - just continue if/elif with no trie
  • anding - if parts[2] == "bar" and parts[3] == "baz"
  • concat - if parts[2] + "/" + parts[3] == "bar/baz"
  • concat_s - if "%s/%s" % (parts[2], parts[3]) == "bar/baz"
  • concat_f - if f"{parts[2]}/{parts[3]}" == "bar/baz"

Results:

Timy executed (control) for 1000000 times in 0.995399 seconds
Timy best time was 0.000001 seconds
Timy executed (anding) for 1000000 times in 0.977669 seconds
Timy best time was 0.000001 seconds
Timy executed (concat) for 1000000 times in 1.208060 seconds
Timy best time was 0.000001 seconds
Timy executed (concat_s) for 1000000 times in 1.320606 seconds
Timy best time was 0.000001 seconds
Timy executed (concat_f) for 1000000 times in 1.247579 seconds
Timy best time was 0.000001 seconds

The take away is that using a trie may ever so incredibly slightly impact performance. I am debating whether that is even worth the extra effort. 18/1000 of a second over a million runs.

Granted, this was not a super nested structure. As the complexity increases, so should the theoretical difference.

Hm, good tests, also after yesterday discussing here, I think probably AST will be slightly faster than Trie, but the readability of Trie will be fairly better.

One more important point - AST will be harder to debug.

In general - hard choice :confounded:

The question on the last one was whether to build out a trie inside the AST. :woozy_face:

I do not think the AST-based router will be too hard to debug. Clearly we need a heavy test suite against it to cover as many edge cases as possible. But, in the end it is just Python code that can be output, viewed, and run like anything else.

Yes, agree! also would be cool to hear @vltr @ashleysommer thoughts.

1 Like

xrtr would be simply “X router”, since I just made it as a proof of concept. No final name yet as no final verdict I came across on the best solution to the matter. I’ll add more thoughts replying another comment.

@ahopkins: when I pointed you towards AST, I already knew (from my benchmarks) that this could be an awesome feature for Sanic, not only because it would be pure Python (requiring no extra dependencies) but also would be in our reach to maintain.

@5onic I also like the trie model, I just don’t know if having a pure Python (or regex mixed version) would be as fast as having code written on the fly. Moreover, perhaps we could try and mix trie with the AST router and see if we can benefit from it - but this would make the router code pretty exclusive to maintain …

As of xrtr, I’m playing a little bit (when possible) with pybind11, and I really believe an interesting router could be made using plain C++ with direct bindings to Python (instead of using Cython), but, again, that would be a binary dependency for Sanic … Which I believe is a downside.

Oh, just saw that (regarding trie ideas with AST) … anding seems like a good alternative, but the speed up is almost unnoticeable :man_shrugging:

So … A trie regex perhaps? Argh, that would be hell on earth to debug (and even harder to find a good way to write it down inside the router).

That was this test: A fAST new router

There is an ever so sight benefit of you can evaluate a few levels of the hierarchy at once. Given the scope of complexity it adds for the almost irrelevant difference… :balance_scale:

That’s basically what hyprtr is.

At the end of the day, the AST has a lot of benefits. As much as I looked into it, @vltr, you were right.

I am working on another base version so that it can be used for both routing and also the signaling RFC.

1 Like

Ah, I did not dig myself completely inside the hyprtr code to check on that … Sorry.

Well, I’m glad all this effort we put into xrtr, hyprtr, the Sanic router and the benchmark suite came to some good usage inside Sanic after all :muscle:

I know I’m a bit over my head with things to do but I would love to work on the new router.

3 Likes

So, I redid the methodology for parsing the tree. But I realized an obstacle that I am not sure how to overcome.

First of all, one thing I am trying very hard to achieve is to make this a drop in replacement for the Sanic router. Sanic will allow you to do something like this:

srouter = sanic.router.Router()
srouter.add("/foo/<id:int>", ["get"], 1)
srouter.add("/foo/<id:string>", ["get"], 2)

You can run this and see it works as expected.

class Request:
    def __init__(self, path, method):
        self.path = path
        self.method = method

print(srouter.get(Request("/foo/1", "get")))
print(srouter.get(Request("/foo/abc", "get")))

Two routes have an identical path, but they have different regex. Therefore the Sanic router allows this.

The equivalent in Falcon does not work. This is not acceptable

frouter = falcon.routing.CompiledRouter()
frouter.add_route("/foo/{id:int}", method_map=["GET"], resource=1)
frouter.add_route("/foo/{id:uuid}", method_map=["GET"], resource=1)

The way the AST style routers work is that all dynamic variables are sort of the same. There is nothing to check or compare against. Later, after the route is found it can be cast to type.

Which seems like a roadblock for full compatibility.


Or is it?

Perhaps we can merge them with two optional casts, and it can try both. Of course the idea popped into my head after I started writing this…

I am going to leave the post anyway. Will report back.

Okay, ignore that last post. It is feasible. Not thrilled about adding a regex.match, but it works. This is definitely something to optimize at a further point.

def find_route(path, router, basket):
    parts = path.split(router.delimiter)
    num = len(parts)
    if num > 0:
        if parts[0] == "":
            if num > 1 and REGEX_TYPES["int"][1].match(parts[1]):
                basket[1] = parts[1]
                return router.dynamic_routes["/<foo:int>"], basket
            elif num > 1 and REGEX_TYPES["string"][1].match(parts[1]):
                basket[1] = parts[1]
                return router.dynamic_routes["/<foo>"], basket

Here is most recent version: https://gist.github.com/ahopkins/9b91b950c94c1bc3c24df741b7fb02ef#file-v2-py

2 Likes

So, quick explanation.

The new version is generating a tree that looks like this:

 Node(level=0)
     Node(part=a, level=1)
         Node(part=much, level=2)
             Node(part=longer, level=3)
                 Node(part=item, level=4)
                     Node(part=<id>, level=5, route=<Route: /a/much/longer/item/<id>>, dynamic=True)
             Node(part=<longer>, level=3, dynamic=True)
                 Node(part=item, level=4)
                     Node(part=<id>, level=5, route=<Route: /a/much/<longer>/item/<id>>, dynamic=True)
         Node(part=thing, level=2)
             Node(part=<id>, level=3, route=<Route: /a/thing/<id>>, dynamic=True)
                 Node(part=and, level=4)
                     Node(part=doit, level=5, route=<Route: /a/thing/<id>/and/doit>)
         Node(part=<banana>, level=2, route=<Route: /a/<banana>>, dynamic=True)
     Node(part=<foo>, level=1, route=<Route: /<foo>>, dynamic=True)
         Node(part=bar, level=2, route=<Route: /<foo>/bar>)
         Node(part=buzz, level=2, route=<Route: /<foo>/buzz>)
         Node(part=fuzz, level=2, route=<Route: /<foo>/fuzz>)
     Node(part=<foo:int>, level=1, route=<Route: /<foo:int>>, dynamic=True)

It is strictly ordered, so it does not matter at all the order of inserting new routes.

As you can see from this example, there does exist the potential to collapse some layers together where you have multiple consecutive static path parts.

2 Likes

Wow, nice stuff @ahopkins could you please attach examples how to use the v2 router? I’ll also play with it.