hi, dear friends!
I try to test Llama with sanic, worker=cpuNum=2 ,
code as follows:
@app.listener("before_server_start")
async def load_model(app, loop):
    app.ctx.llam=Llama(model_path="./model")
@app.get("/get")
async def handler(request):
    question=request.args.get('question')
    # res=app.ctx.llam(question)  #there will take about 4 seconds and will block all.
    res = await sync_to_async(func=app.ctx.llam)(question)  #await no improvement here. 
    return {"answer":res["choices"]}
I send 100 jmeter request to the server,
answers will always return 2 by 2 ,block for about 4s at each round,
two cpu is always 100%,
as we know, Llama is CPU-bound task, async await  function did no help.
but I still want to ask "how can i improve the performance "
