[Feature Request] Task-level Optimization with Distributed Data Parallelization

## Motivation

Task-level parallelization for multi-host multi-process optimization.

Batch-level parallelization can be implemented easily by wrapping the network (`nn.Module`) with:

- [`torch.nn.DataParallel`](https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html) (single-host multi-GPUs) (SPMD)
- [`torch.nn.parallel.DistributedDataParallel`](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html) (multi-host multi-GPUs)

However, for algorithms that require task-level parallelization, non of the above solutions work. `torch.nn.DataParallel` and `torch.nn.parallel.DistributedDataParallel` are used for module-level parallelization. The wrapper will replicate the user module to multiple copies, then do the forward pass in parallel. For task-level parallelization, each task needs to maintain its own model parameters and (optional) training data. The module parameters may be different across tasks.

## Solution

`functorch.vmap` + distributed data parallel optimization.

## Example

```python
import torch
import torch.distributed.autograd as dist_autograd
import torch.distributed.rpc as rpc
from torch import optim
from torch.distributed.optim import DistributedOptimizer

with dist_autograd.context() as context_id:
    # Forward pass.
    rref1 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 3))
    rref2 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 1))
    loss = rref1.to_here() + rref2.to_here()
 
    # Backward pass.
    dist_autograd.backward(context_id, [loss.sum()])
 
    # Optimizer.
    dist_optim = DistributedOptimizer(
        optim.SGD,
        [rref1, rref2],
        lr=0.05,
    )
    dist_optim.step(context_id)
```

## Additional context

**Resources:**

PyTorch:

- Tutorial: [PyTorch Distributed Overview](https://pytorch.org/tutorials/beginner/dist_overview.html#)
- Tutorial: [Distributed Data Parallel](https://pytorch.org/docs/stable/notes/ddp.html#)
- API: [Module level Data Parallel `torch.nn.DataParallel`](https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html) (SPMD)
- API: [Module level Distributed Data Parallel `torch.nn.parallel.DistributedDataParallel`](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html)
- API: [PyTorch Distributed Optimizers `torch.distributed.optim`](https://pytorch.org/docs/stable/distributed.optim.html)
- API: [Vectorization map `functorch.vmap`](https://pytorch.org/functorch/stable/generated/functorch.vmap.html)
- API: [NVIDIA apex.parallel](https://nvidia.github.io/apex/)

JAX:

- Tutorial: [Named axes and easy-to-revise parallelism](https://jax.readthedocs.io/en/latest/notebooks/xmap_tutorial.html)
- API: [Vectorization map `jax.vmap`](https://jax.readthedocs.io/en/latest/jax.html?highlight=pmap#vectorization-vmap)
- API: [Parallel map `jax.pmap`](https://jax.readthedocs.io/en/latest/jax.html?highlight=pmap#parallelization-pmap) (SPMD)
- API _(Experimental)_: [`jax.experimental.maps.xmap`](https://jax.readthedocs.io/en/latest/_autosummary/jax.experimental.maps.xmap.html#jax-experimental-maps-xmap)
- Tutorial: [Using JAX in multi-host and multi-process environments](https://jax.readthedocs.io/en/latest/multi_process.html)

## Checklist

- [X] I have checked that there is no similar issue in the repo (**required**)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Task-level Optimization with Distributed Data Parallelization #57

Motivation

Solution

Example

Additional context

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

[Feature Request] Task-level Optimization with Distributed Data Parallelization #57

Description

Motivation

Solution

Example

Additional context

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.