Hi!
I’m writing a websocket server in Sanic, which is to handle a frame sended from client, run an object-detection model and send the results back. For case when workers==1, it works good. Now I’m trying to make these things running in multiple workers, in order to handle more requests from different clients.
Inspired by this issue Sanic_issues_2359, I was able to load my model at the start of every individual worker, using the app.ctx
, as belows.
app = Sanic("server")
@app.listener("before_server_start")
async def worker_start(app, loop):
app.ctx.use_cuda = True
use_tiny = False
app.ctx.darknet, app.ctx.class_names = NewDarknet(use_tiny, app.ctx.use_cuda)
async def handler(request, ws):
while True:
str_json = await ws.recv()
data = json.loads(str_json)
str_encode = data['frame']
# json to frame
data_encode_64 = str_encode.encode('utf-8')
data_encode = np.frombuffer(base64.b64decode(data_encode_64), np.uint8)
current_frame = cv2.imdecode(data_encode, cv2.IMREAD_COLOR)
# detect
sized = cv2.resize(current_frame, (app.ctx.darknet.width, app.ctx.darknet.height))
sized = cv2.cvtColor(sized, cv2.COLOR_BGR2RGB)
boxes = do_detect(app.ctx.darknet, sized, 0.4, 0.6, app.ctx.use_cuda)
boxes_zero = np.array(boxes[0]).tolist()
width = current_frame.shape[1]
height = current_frame.shape[0]
torch.cuda.empty_cache()
response_boxes = []
for i in range(len(boxes_zero)):
response_boxes.append(
{
'x1': int(boxes_zero[i][0] * width),
'y1': int(boxes_zero[i][1] * height),
'x2': int(boxes_zero[i][2] * width),
'y2': int(boxes_zero[i][3] * height),
'conf': boxes_zero[i][5],
'name': app.ctx.class_names[int(boxes_zero[i][6])]
}
)
response = {
'boxes': response_boxes
}
await ws.send(json.dumps(response))
app.add_websocket_route(handler, "/")
if __name__ == '__main__':
app.run(host="0.0.0.0", port=12345, workers=3)
But that’s not really what I want, I loaded a new model every time a new worker starts, thus several times of GPU memory are being consumed (loading just one model already consumes nearly 1GB memory).
I am expecting for: Loading my model only once in the beginning (probably at a starting point), and the workers might share this model object so that multiple detect tasks are able to run concurrently in different workers.
Sadly, I have no idea about how to implement it yet. I knows things here are different form multi-threading cases, since the workers are running in different processes. Perhaps some sharing mechanisms may work.
Any suggestions about it?