Hi,
i have a web api, using sanic, that basically serves files from a directory on disk.
The twists are:
- The list of files changes over time. New files are added, old ones are deleted.
- for some reasons the files system is very slow. (%$#$ M$…) and blocks the whole thing for about 1/2 min on each call.
The simple approach was to start a forever running task which modifies app context.
async def ten_second_job(app):
while True:
file_list = glob.glob(f"{app.config.INCOMING_DIR}/{FILE_GLOB}")
app.ctx.file_list = sorted([Path(f).stem for f in file_list])
await asyncio.sleep(11)
app.add_task(ten_second_job);
Bad idea as the glob
blocks the whole app for really long because the file system is slow.
So the idea was to start a thread or process in the background so it does not interfere with the main web service.
First try
was something like this:
async def ten_second_job(app):
while True:
app.ctx.file_list = await loop.run_in_executor(None, read_files_thread, app)
await asyncio.sleep(11)
and read_files_thread
does now the glob
thing from above
Works somewhat better, but if i have n workers i also have n background tasks running.
Which couldn’t be good for the file system performance, especially as they all run basically all at once.
This could be amended by adding a random number to the sleep() call.
Or adding a lock to the call (there is a redis based lock for another task already implemented). But then the workers would have different lists…
Second try
The next idea was starting one empty main process and one child for the read_files_thread()
and an other for sanic (there is a question on how to run sanic in an sub process here.
But i can’t figure out how to update the app.ctx
from the parent process…
Question:
So back to the drawing board:
What is the best way to do updates on `app.ctx’ with the results from a long running and blocking background sub task that should only run in a single instance?