Shutdown Sanic server on KeepAlive Timeout (Saving Cloud Run minutes)

I want to deploy a Docker container to Google Cloud Run and use Sanic for accepting GET and POST requests. Since active containers cost money, I want them to shutdown whenever a request has finished processing.

My use-case is a PWA (Progressive Web App) that sends a request to my Cloud Run container where some processing takes place, which will then return the results. If my understanding is correct, when no Cloud Run container is active, Cloud Run (fully managed) will start one (or more depending on how many requests) automatically (Docker RUN command) and redirect the request to a specific container. You pay for the time the container is active. Therefore, it is important to shutdown as soon as the user disconnects.

I’m still testing phase, so I’m the only user. I don’t want to run out of monthly free up-time minutes, so I was thinking of setting app.config.KEEP_ALIVE_TIMEOUT = 30 or 60 seconds.
Is it possible to trigger a function when Sanic invokes a KeepAlive Timeout? Because then I can invoke Sanic’s stop() in that function.

Or any other ideas of shutting down Sanic after x seconds has passed without a request?

If you don’t mind diving into the sanic internal code there is a keep_alive_timeout_callback in sanic HttpProtocol here. Evidently you can define your own HttpProtocol subclass with that callback overridden and run the server with the custom protocol as documented here. Also don’t forget to call the super classes’ keep_alive_timeout_callback inside your own callback to keep the default functionality running properly as well.

The keepalive timeout will help you with individual HTTP sessions, but it’s not going to handle the non-session lifecycle, which I think is what you’re after.

Effectively, as I understand it, you are trying to shutdown the app with no new connections in X period of time (30 seconds, for example), correct?

Thank you for the pointer to the code and the suggestion! Will look into it in more detail.

Effectively, as I understand it, you are trying to shutdown the app with no new connections in X period of time (30 seconds, for example), correct?

You’re right. At the moment I’m the only user, therefore individual HTTP session == the connection, however at a later stage there will be more users.

Computation is scoped to a request

You should only expect to be able to do computation within the scope of a request: a container instance does not have any CPU available if it is not processing a request.

Stateless services

Your service should be stateless: you should not rely on the state of a service across requests because a container instance can be started or stopped at any time.

I’m not sure what you mean with “non-session lifecycle”, but I don’t have plans to do any processing unrelated to the user’s request (because Cloud Run can suddenly send a kill signal).

In my current case, I’m returning data to the user and updating a NoSQL database on Firebase. Therefore, it is possible that the database update request hasn’t finished yet when sending the user a reply, but that update request should not take longer than a few seconds.

Cloud Run has a Docker container request timeout (, but that is unaware of the internal state of the container. This would be a hard kill. However, it seems safer that Sanic decides when it should stop (soft kill).

Basically all the loop housekeeping that sanic does to stay alive when not serving requests. Effectively this is the wait period.

If you open an HTTP session, and a request is served, the session then closes. What you are proposing is that at the end of that session, the container is stopped. This may or may not be the desired effect, because from the outside, what would happen is:

user -> request (HTTP GET /)
google -> spin up container
container -> start app
container -> receive request
container -> serve response
user -> response received
container -> app exit
container -> terminate on exit

This would be the lifecycle for the entire app, ad aeternum. The HTTP session is normally relatively short lived, however the keepalive may need to be longer for a number of reasons, such as serving large files, waiting on a backend request to be fulfilled; something like:

By the same token someone may want to shorten the keepalive because they don’t want the user to wait too long. This can be handled on the front end gracefully, especially if API calls are being made.

So the keepalive timeout is not really what you want, per se. If you are okay with handling one request at a time, forever, then you probably want to simply call app.stop() after the response is complete. Unfortunately right now that would mean hooking in to the response lifecycle in a way that is currently not supported by default - middleware currently only supports modifying the response prior to the response being sent, not after.

Possibly, and this is where the “how” gets hazy is that you could estimate the time you want to wait for the request to be complete, fire off an async call that waits that long and then does an app.stop() but it’s still effectively a hard kill, I don’t believe app.stop() gracefully handles open requests. That, however, isn’t much better than google’s request timeout other than you can fire it off as part of your response.

Hopefully this helps? You also might want to check out cloud endpoints as a different option, effectively you run a request and there’s no container you have to manage - though that would mean leaving Sanic behind I think.

This should get the job done:

from asyncio import sleep
from time import monotonic as current_time

last_request_time = current_time()

def monitor_time(request):
    global last_request_time
    last_request_time = current_time()

async def monitor_idle(app):
    """Shutdown the server when there are no requests in 30 seconds."""
    while current_time() - last_request_time < 30:
        await sleep(5)

However, I would recommend hosting at where you can get a good virtual server for 3 euros a month, and don’t need to worry about shutting down your services to save costs, plus they also have native IPv6 which Google doesn’t. I am a customer but not otherwise affiliated with the company. Amazon EC2 also offers tiny virtual servers completely free but if your service is heavy, you might run out of RAM, CPU or disk space, plus Amazon’s services are damn difficult to setup.

Your solution worked great for my simple case!
The only thing that took me a while to figure out is that when, then you get the message the server stopped, but the script does not exit. I tried all kinds kill commands such as sys.exit() to get the script to stop, but None of them worked. debug=False did the trick, however.

To be a bit more truer to my original question (have the timer start at the end of the request), I modified your example (with the info of @sjsadowski that middelware can call a function prior to sending) to the following:

request_finish_time = current_time()
request_served = False

def monitor_time(request):
    print("Request is made to the server")
    global last_request_time
    last_request_time = current_time()
    global request_served
    request_served = False

async def print_on_response(request, response):
    print("Response is returned by the server")
    global request_finish_time
    request_finish_time = current_time()
    global request_served
    request_served = True

async def monitor_idle(app):
    """Shutdown the server when user connection is cut"""
    while not request_served or current_time() - request_finish_time <= app.config.KEEP_ALIVE_TIMEOUT:
        await sleep(2)
    print("!!!!!! Stopping !!!!!")

Now my server will at least serve 1 request before quitting, and stops when the user time’s out. If the same user sends another request, request_served = False and therefore does not quit while a task is being processed.

I have a feeling this is not multiple-request proof however. Maybe an even nicer way would be to not use await sleep(2), but instead built a async coroutine timer. Start this suicide coroutine when a response is returned, but kill this coroutine when a new request comes.

I would recommend hosting at

Thank you for the suggestion. While their service seems great, some reasons a cloud service supporting Docker has my preference for this use-case:

  • The processing of the request is not heavy and there is nothing to do outside the request.
  • I had my Python code already and now I only had to be bothered by throwing it in a Docker container with a HTTP interface (their website talks about images, but don’t mention Docker).
  • Duplicate containers:
    • Scales from 0 - any user base I will ever get.
    • Latency is very important for a PWA, and with Cloud Run it seems I can host it in any region (although currently only Japan).
  • Network ingress (free and low-latency) access to Firebase.

Although Hetzner doesn’t directly run containers, you can install Docker on your virtual servers.