Share a model object among multiple workers

Hi!
I’m writing a websocket server in Sanic, which is to handle a frame sended from client, run an object-detection model and send the results back. For case when workers==1, it works good. Now I’m trying to make these things running in multiple workers, in order to handle more requests from different clients.
Inspired by this issue Sanic_issues_2359, I was able to load my model at the start of every individual worker, using the app.ctx, as belows.

app = Sanic("server")

@app.listener("before_server_start")
async def worker_start(app, loop):
	app.ctx.use_cuda = True
	use_tiny = False
	app.ctx.darknet, app.ctx.class_names = NewDarknet(use_tiny, app.ctx.use_cuda)

async def handler(request, ws):
    while True:
        str_json = await ws.recv()

        data = json.loads(str_json)
        str_encode = data['frame']

        # json to frame
        data_encode_64 = str_encode.encode('utf-8')
        data_encode = np.frombuffer(base64.b64decode(data_encode_64), np.uint8)
        current_frame = cv2.imdecode(data_encode, cv2.IMREAD_COLOR)

        # detect
        sized = cv2.resize(current_frame, (app.ctx.darknet.width, app.ctx.darknet.height))
        sized = cv2.cvtColor(sized, cv2.COLOR_BGR2RGB)

        boxes = do_detect(app.ctx.darknet, sized, 0.4, 0.6, app.ctx.use_cuda)
        boxes_zero = np.array(boxes[0]).tolist()

        width = current_frame.shape[1]
        height = current_frame.shape[0]

        torch.cuda.empty_cache()

        response_boxes = []
        for i in range(len(boxes_zero)):
            response_boxes.append(
                {
                    'x1': int(boxes_zero[i][0] * width),
                    'y1': int(boxes_zero[i][1] * height),
                    'x2': int(boxes_zero[i][2] * width),
                    'y2': int(boxes_zero[i][3] * height),
                    'conf': boxes_zero[i][5],
                    'name': app.ctx.class_names[int(boxes_zero[i][6])]
                }
            )
        
        response = {
            'boxes': response_boxes
        }

        await ws.send(json.dumps(response))

app.add_websocket_route(handler, "/")

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=12345, workers=3)

But that’s not really what I want, I loaded a new model every time a new worker starts, thus several times of GPU memory are being consumed (loading just one model already consumes nearly 1GB memory).
I am expecting for: Loading my model only once in the beginning (probably at a starting point), and the workers might share this model object so that multiple detect tasks are able to run concurrently in different workers.
Sadly, I have no idea about how to implement it yet. I knows things here are different form multi-threading cases, since the workers are running in different processes. Perhaps some sharing mechanisms may work.
Any suggestions about it?

Honestly if you can’t afford for the GPU memory to be consumed in that manner, you need to decouple the object detection from the endpoint. Accept the data, push it on to a queue, have a model processor pick the data up from the queue, process it, then push the results to a new queue that is picked up by the worker and returned to the websocket client.

By decoupling the two, you can scale the websocket service independently from your object recognition service.

Thanks, decoupling sounds good. Are there any examples or documentations about this scheme in Sanic? Like adding tasks to workers, using queues among workers, etc. I would like to learn more about it.

You can just search using your search engine of choice, but here’s something to get you started: