Multiple workers - beginner question

seanPH · March 29, 2023, 3:12am

Ok - loving Sanic more and more every day !

Latest problem is this :

At top of my py file (runs when the server starts) I load a dict into sql_lines from a saved pickle. It is a global dict used far and wide across my code… I notice in the Sanic console that it actually runs (i.e. loads) multiple times , seems to be: number_of_workers + 1 .

I see 2 possible solutions to this … Need some sample code and some advice about which of the 2 solutions is the better approach for me .

Assume I have 5 workers.

Option 1: if one of the workers updates the dict sql_lines, and saves it into the pickle, that worker should somehow signal to the other workers to refresh (i.e. reload) their copy. How ?

OR

Option 2: Somehow tell Sanic - "only load sql_lines once, not (number_of_workers + 1) times "

Obviously if it is only loaded once as per Option 2, (which is my preferred approach), I need to be SURE that Sanic will manage it and will make sure sql_lines can still be accessed across all workers and all functions .

Please advise me - and try not to be too harsh about my (growing, but still beginner level) depth of understanding of Sanic.

ahopkins · March 29, 2023, 10:51am

PS - me too

I’d like to think I generally am not. If I ever do come across as harsh, then it is likely either a misunderstanding or I suppose I am having a rough day. I personally feel that questions like these are an obligation to respond with and be helpful. Being a self-taught programmer, it was by asking questions like this and seeking advice from others willing to give it that I learned myself.

To answer the question, first, I would suggest looking at this page: Worker Manager | Sanic Framework.

num_workers + 1 is:

main process
n workers

When you start up (particularly with DEBUG turned on), you should see each of the processes.

When you have code in the global scope, then it will be executed in EVERY worker. If you do not want this, you have a few options.

You could run it in the global scope of just the main process:
```
if __name__ == "__main__":
    # here
```
You could run it in the global scope of all subprocesses, except the main:
```
if __name__ == "__mp_main__":
    # here
```

You could run the code in just the main process using a listener:

@app.main_process_start
async def main_process_start(app):
    # here

@app.main_process_stop
async def main_process_stop(app):
    # here

You could run the code in its own process using a custom process:

@app.main_process_ready
async def ready(app: Sanic, _):
#   app.manager.manage(<name>, <callable>, <kwargs>)
    app.manager.manage("MyProcess", my_process, {"foo": "bar"})

Take a look at the docs for more information on this.

You could store the state and share it across workers using shared_ctx.

Without knowing more about your use case and seeing some code of what you are trying to do I cannot say which I think is best for you. There are potentially some other choices as well.

seanPH · March 29, 2023, 11:30am

Hi Adam, and thanks for the reply .
I didn’t mean to imply that you are harsh on newbies - far from it actually. You are incredibly helpful .

In fact your great answers were a significant factor in my (small) team choosing Sanic.

I think we will go for this one : " 1. You could run it in the global scope of just the main process:

if __name__ == "__main__":
    # here

Please confirm that by us doing that - in “main process” not any of the workers - that our dict , called sql_lines , will be accessible from every worker process and all other parts of the code…

Yes - we already had read those doc pages (and will re-read them) , but the clarity of your precise answer helps enormously !

ahopkins · March 30, 2023, 3:59am

Actually the opposite is true. Putting it there makes it inaccessible to any workers.

So I understand:

all workers need access to read this value
all workers must be able to mutate the values
each worker should “see” the mutations of everyone else

I guess the next question: does each worker just need to be confident that when it reads it has the updates available, or does it need to receive an event that an update was made?