Using Sanic to serve a Tensorflow model with multiple workers

I am having some issues running a Tensorflow model with Sanic even though it works fine with Flask. I think it has something to do with the async nature of Sanic? The specific error I get is:

2019-10-22 18:05:39.062833: E tensorflow/core/grappler/clusters/utils.cc:87] Failed to get device properties, error code: 3
2019-10-22 18:05:39.731232: F tensorflow/stream_executor/cuda/cuda_driver.cc:175] Check failed: err == cudaSuccess || err == cudaErrorInvalidValue Unexpected CUDA error: initialization error

and I feel it’s at least partially related to https://github.com/pytorch/pytorch/issues/2517

One other thing is that the Sanic app works if I set the number of workers to 1, but this isn’t ideal for me and I’d like to be able to use more than one worker. I am using Sanic’s inbuilt webserver.

I know it’s a pretty general description of the issue, but I guess I just need some ideas to try. I think maybe isolating the code that serves as the entry point for Tensorflow and sharing that resource (i.e., Tensorflow model) between workers/processes?

I’m guessing you’re initializing some things post-fork for the workers, but I don’t have a really great answer for you otherwise.

Do you have a code snippet you could post that might help us troubleshoot with you?

I came up with a minimal example that produces a similar error. The example works if workers is set to 1 in the sanic_config dict, but not when set to more than 1.

from sanic import Sanic
from sanic.response import json
import tensorflow as tf
import random

tf.compat.v1.disable_eager_execution()
tf.compat.v1.disable_resource_variables()

app = Sanic(__name__)
my_test = None


class Test:
    def __init__(self):
        config = tf.compat.v1.ConfigProto()
        config.gpu_options.allow_growth = True
        self.a = tf.compat.v1.constant(5.0)
        self.b = tf.compat.v1.constant(6.0)
        self.c = self.a * self.b

        self.sess = tf.compat.v1.Session(graph=self.c.graph, config=config)
        self.sess.run(tf.compat.v1.global_variables_initializer())

    def test_it(self, rand_num=1.0):
        with self.sess.as_default():
            return float(self.sess.run(self.c * self.tf.compat.v1.to_float(rand_num)))


@app.route('/')
async def index(request):
    return json({'tf result': my_test.test_it(random.randint(1, 100))})


if __name__ == "__main__":
    my_test = Test()

    sanic_config = {
        "host": "127.0.0.1",
        "port": 5000,
        "workers": 3,
        "debug": False,
        "access_log": False
    }

    app.run(**sanic_config)

In my own code I’ve also tried instantiating the class, which in the minimal example is Test, on the first request, but that doesn’t work either. I’ve tried using listeners “before_server_start” and “after_server_start”, but these try to instantiate the model multiple times and the model I am working is rather large so multiple copies cause the GPU to run out of memory. I read somewhere that setting the method to start a process to “spawn” or “forkserver” might help, but I’ve tried setting the start method right before app.run with multiprocessing.set_start_method("forkserver") and I just get AttributeError: 'NoneType' object has no attribute 'test_it'.