Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Gradio apps can now have master/worker dynamics natively. A gradio app can be launched with the roles "master", "worker", or "hybrid" (default). A master app receives events and stores them in a queue and communicates updates to the client. A worker app communicates with the master to request tasks, process them, and reports results to the master. By default, gradio is launched in hybrid mode, so it does everything: receives tasks from the client, processes them, and then returns the result to client. A hybrid app can still have extra workers bind to it to help process tasks.
An example app:
The
role
kwarg can be "master", "worker", or "hybrid". Launch this aspython run.py
in one terminal to start the master, andpython run.py -w
to start the worker. You can attach multiple workers. You'll notice that without any workers, the master will not process any tasks, and just show the queue size. You can then add or remove workers to see how they handle tasks.The
master_url
kwarg points a worker to the master to bind to it, and anapp_key
shared between masters and workers authenticates requests between them.The purpose of this PR was to create a more mature autoscaling approach to gradio. Previously scaling was only possible via having nginx configs that required sticky sessions, so that the same IP address was served the same gradio server in a cluster, so that state was maintained. However, if a worker went offline, we would lose the tasks it was processing. Now, if a worker goes down, the master reassigns its tasks to a new worker.
With this approach, there's no need for an extra nginx config to do routing, and extra workers can be added without needing a separate IP address so that the nginx config can distinguish between them.
There are some parts of the app that are not stateless (gr.render, and any use of gr.state) and the data they hold is not JSONifyable, so in that case, gradio will internally use session_id to replicate "sticky sessions" - the same worker will be assigned all requests for a session_id. If a worker goes down, this data will be lost.