Skip to content

chore(setup): rewrite packaging #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 10, 2022
Merged
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
a70220e
chore(setup): rewrite packaging
XuehaiPan Jul 7, 2022
0a1ccb8
chore: fix file modes
XuehaiPan Jul 7, 2022
01df51e
docs(README): update README
XuehaiPan Jul 7, 2022
c5d6b80
style: format code with yapf
XuehaiPan Jul 7, 2022
fbe3b12
style: reindent license
XuehaiPan Jul 7, 2022
fdf5077
chore(workflow): install CUDA Toolkit in GitHub Action
XuehaiPan Jul 7, 2022
857338a
feat(workflow): build PyPI wheels for Python 3.7 / 3.8 / 3.9 / 3.10
XuehaiPan Jul 7, 2022
b9f9ebe
refactor(setup): use pybind11 PyPI package rather than git submodule
XuehaiPan Jul 7, 2022
ca92f9f
chore(setup): add dependency `typing-extensions`
XuehaiPan Jul 7, 2022
294f7ed
chore(workflow): update workflow trigger
XuehaiPan Jul 7, 2022
bf14292
docs: use HTTPS URL for git clone
XuehaiPan Jul 7, 2022
2e80b67
fix(tests): update requirements.txt
XuehaiPan Jul 7, 2022
4ab4efa
chore(setup): remove deprecated dependency `distutils`
XuehaiPan Jul 7, 2022
9580a96
test: disable CUDA tests if no GPU available
XuehaiPan Jul 8, 2022
8640b81
chore(workflow): use Python 3.7 in tests
XuehaiPan Jul 8, 2022
9730748
refactor(Makefile): update Makefile
XuehaiPan Jul 8, 2022
3dcd29c
style: format CXX code
XuehaiPan Jul 8, 2022
2f4e44f
chore(tests): test with CUDA Toolkit 11.3
XuehaiPan Jul 8, 2022
ad069a4
feat(CMakeLists.txt): auto detect nvcc arch flags
XuehaiPan Jul 8, 2022
bce754b
chore(workflow): set timeout
XuehaiPan Jul 8, 2022
2dba710
docs: update sphinx docs
XuehaiPan Jul 8, 2022
de1bb87
chore: use CamelCased URL paths
XuehaiPan Jul 9, 2022
5597da7
chore: add conda recipe
XuehaiPan Jul 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs(README): update README
  • Loading branch information
XuehaiPan committed Jul 7, 2022
commit 01df51e13169f9955ce88dbc79e425d70ed7c862
88 changes: 69 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->

<div align="center">
<img src=image/logod-07.png width=75% />
<img src="image/logod-07.png" width="75%" />
</div>

**TorchOpt** is a high-performance optimizer library built upon [PyTorch](https://pytorch.org/) for easy implementation of functional optimization and gradient-based meta-learning. It consists of two main features:

- TorchOpt provides functional optimizer which enables [JAX-like](https://github.com/google/jax) composable functional optimizer for PyTorch. With TorchOpt, one can easily conduct neural network optimization in PyTorch with functional style optimizer, similar to [Optax](https://github.com/deepmind/optax) in JAX.
- With the desgin of functional programing, TorchOpt provides efficient, flexible, and easy-to-implement differentiable optimizer for gradient-based meta-learning research. It largely reduces the efforts required to implement sophisticated meta-learning algorithms.
- With the design of functional programing, TorchOpt provides efficient, flexible, and easy-to-implement differentiable optimizer for gradient-based meta-learning research. It largely reduces the efforts required to implement sophisticated meta-learning algorithms.

--------------------------------------------------------------------------------

The README is organized as follows:

- [TorchOpt as Functional Optimizer](#torchopt-as-functional-optimizer)
- [Optax-Like API](#optax-like-api)
- [PyTorch-Like API](#pytorch-like-api)
Expand All @@ -23,11 +28,16 @@ The README is organized as follows:
- [The Team](#the-team)
- [Citing TorchOpt](#citing-torchopt)

--------------------------------------------------------------------------------

## TorchOpt as Functional Optimizer
The desgin of TorchOpt follows the philosophy of functional programming. Aligned with [functorch](https://github.com/pytorch/functorch), users can conduct functional style programing with models, optimizers and training in PyTorch. We use the Adam optimizer as an example in the following illustration. You can also check out the tutorial notebook [Functional Optimizer](./tutorials/1_Functional_Optimizer.ipynb) for more details.

The design of TorchOpt follows the philosophy of functional programming. Aligned with [`functorch`](https://github.com/pytorch/functorch), users can conduct functional style programing with models, optimizers and training in PyTorch. We use the Adam optimizer as an example in the following illustration. You can also check out the tutorial notebook [Functional Optimizer](./tutorials/1_Functional_Optimizer.ipynb) for more details.

### Optax-Like API
For those users who prefer fully functional programing, we offer Optax-Like API by passing gradients and optimizers states to the optimizer function. We design base class `torchopt.Optimizer` that has the same interface as `torch.optim.Optimizer`. Here is an example coupled with functorch:

For those users who prefer fully functional programing, we offer Optax-Like API by passing gradients and optimizers states to the optimizer function. We design base class `torchopt.Optimizer` that has the same interface as `torch.optim.Optimizer`. Here is an example coupled with `functorch`:

```python
import functorch
import torch
Expand All @@ -52,9 +62,12 @@ grad = torch.autograd.grad(loss, params) # compute gradients
updates, opt_state = optimizer.update(grad, opt_state) # get updates
params = torchopt.apply_updates(params, updates) # update network parameters
```

### PyTorch-Like API

We also offer origin PyTorch APIs (e.g. `zero_grad()` or `step()`) by warpping our Optax-Like API for traditional PyTorch user:
<!-- The functional programming can easily disguise as origin PyTorch APIs (e.g. `zero_grad()` or `step()`), the only we need is to build a new class that contains both the optimizer function and optimizer states. -->

```python
net = Net() # init
loader = Loader()
Expand All @@ -66,31 +79,40 @@ optimizer.zero_grad() # zero gradients
loss.backward() # backward
optimizer.step() # step updates
```

### Differentiable

On top of the same optimization function as `torch.optim`, an important benefit of functional optimizer is that one can implement differentiable optimization easily. This is particularly helpful when the algorithm requires to differentiate through optimization update (such as meta learning practices). We take as the inputs the gradients and optimizer states, use non-in-place operators to compute and output the updates. The processes can be automatically implemented, with the only need from users being to pass the argument `inplace=False` to the functions:

```python
# get updates
updates, opt_state = optimizer.update(grad, opt_state, inplace=False)
# update network parameters
params = torchopt.apply_updates(params, updates, inplace=False)
```

--------------------------------------------------------------------------------

## TorchOpt as Differentiable Optimizer for Meta-Learning

Meta-Learning has gained enormous attention in both Supervised Learning and Reinforcement Learning. Meta-Learning algorithms often contain a bi-level optimisation process with *inner loop* updating the network parameters and *outer loop* updating meta parameters. The figure below illustrates the basic formulation for meta-optimization in Meta-Learning. The main feature is that the gradients of *outer loss* will back-propagate through all `inner.step` operations.

<div align="center">
<img src=/image/TorchOpt.png width=85% />
<img src="/image/TorchOpt.png" width="85%" />
</div>

Since network parameters become a node of computation graph, a flexible Meta-Learning library should enable users manually control the gradient graph connection which means that users should have access to the network parameters and optimizer states for manually detaching or connecting the computation graph. In PyTorch designing, the network parameters or optimizer states are members of network (a.k.a. `nn.Module`) or optimizer (a.k.a. `optim.Optimizer`), this design significantly introducing difficulty for user control network parameters or optimizer states. Previous differentiable optimizer Repo [higher](https://github.com/facebookresearch/higher), [learn2learn](https://github.com/learnables/learn2learn) follows the PyTorch designing which leads to inflexible API.
Since network parameters become a node of computation graph, a flexible Meta-Learning library should enable users manually control the gradient graph connection which means that users should have access to the network parameters and optimizer states for manually detaching or connecting the computation graph. In PyTorch designing, the network parameters or optimizer states are members of network (a.k.a. `nn.Module`) or optimizer (a.k.a. `optim.Optimizer`), this design significantly introducing difficulty for user control network parameters or optimizer states. Previous differentiable optimizer Repo [`higher`](https://github.com/facebookresearch/higher), [`learn2learn`](https://github.com/learnables/learn2learn) follows the PyTorch designing which leads to inflexible API.

In contrast to them, TorchOpt realizes differentiable optimizer with functional programing, where Meta-Learning researchers could control the network parameters or optimizer states as normal variables (a.k.a. `torch.Tensor`). This functional optimizer design of TorchOpt is beneficial for implementing complex gradient flow Meta-Learning algorithms and allow us to improve computational efficiency by using techniques like operator fusion.

<!-- The biggest difference of implementing Meta-Learning algorithms between others is that the network parameters are not [leaf variables](https://pytorch.org/docs/stable/generated/torch.Tensor.is_leaf.html?highlight=leaf#torch.Tensor.is_leaf) for backpropagating the *outer loss*. This difference requires a differentiable optimizer that updates the network parameters using a non-[in-place](https://discuss.pytorch.org/t/what-is-in-place-operation/16244) manner for preserving *inner loop*'s computation graph.

Since network parameters become a node of computation graph, a flexible meta-learning library should enable users manually control the gradient graph connection which means that users should have access to the network parameters and optimizer states for manually detaching or connecting the computation graph. In the PyTorch design, the network parameters or optimizer states are members of network (a.k.a. `nn.Module`) or optimizer (a.k.a. `optim.Optimizer`), this design incurs difficulties for user to control network parameters or optimizer states.

We hope meta-learning researchers could control the network parameters or optimizer states as normal variables (a.k.a. `torch.Tensor`). Inspired by [Optax](https://github.com/deepmind/optax), we think designing a functional style optimizer that treat network parameters or optimizer states as variables instead of class members, which mathces our demond of making network parameters or optimizer states. This design would be beneficial for implementing complex gradient flow meta-learning algorithms and allow us to dig potential performance by using techniques like operator fusion. -->
We hope meta-learning researchers could control the network parameters or optimizer states as normal variables (a.k.a. `torch.Tensor`). Inspired by [Optax](https://github.com/deepmind/optax), we think designing a functional style optimizer that treat network parameters or optimizer states as variables instead of class members, which matches our demand of making network parameters or optimizer states. This design would be beneficial for implementing complex gradient flow meta-learning algorithms and allow us to dig potential performance by using techniques like operator fusion. -->

### Meta-Learning API

<!-- Meta-Learning algorithms often use *inner loop* to update network parameters and compute an *outer loss* then back-propagate the *outer loss*. So the optimizer used in the *inner loop* should be differentiable. Thanks to the functional design, we can easily realize this requirement. -->
- We design a base class `torchopt.MetaOptimizer` for managing network updates in Meta-Learning. The constructor of `MetaOptimizer` takes as input the network rather than network parameters. `MetaOptimizer` exposed interface `step(loss)` takes as input the loss for step the network parameter. Refer to the tutorial notebook [Meta Optimizer](./tutorials/2_Meta_Optimizer.ipynb) for more details.
- We offer `torchopt.chain` which can apply a list of chainable update transformations. Combined with `MetaOptimizer`, it can help you conduct gradient transformation such as gradient clip before the Meta optimizer steps. Refer to the tutorial notebook [Meta Optimizer](./tutorials/2_Meta_Optimizer.ipynb) for more details.
Expand All @@ -100,7 +122,8 @@ We hope meta-learning researchers could control the network parameters or optimi
We give an example of [MAML](https://arxiv.org/abs/1703.03400) with inner-loop Adam optimizer to illustrate TorchOpt APIs:

```python
net = Net() # init
net = Net() # init

# the constructor `MetaOptimizer` takes as input the network
inner_optim = torchopt.MetaAdam(net)
outer_optim = torchopt.Adam(net.parameters())
Expand Down Expand Up @@ -137,60 +160,87 @@ for train_iter in range(train_iters):
torchopt.stop_gradient(net)
torchopt.stop_gradient(inner_optim)
```

--------------------------------------------------------------------------------

## Examples
In *examples/*, we offer serveral examples of functional optimizer and 5 light-weight meta-learning examples with TorchOpt. The meta-learning examples covers 2 Supervised Learning and 3 Reinforcement Learning algorithms.

In [`examples`](examples), we offer several examples of functional optimizer and 5 light-weight meta-learning examples with TorchOpt. The meta-learning examples covers 2 Supervised Learning and 3 Reinforcement Learning algorithms.

- [Model Agnostic Meta Learning (MAML)-Supervised Learning](https://arxiv.org/abs/1703.03400) (ICML2017)
- [Learning to Reweight Examples for Robust Deep Learning](https://arxiv.org/pdf/1803.09050.pdf) (ICML2018)
- [Model Agnostic Meta Learning (MAML)-Reinforcement Learning](https://arxiv.org/abs/1703.03400) (ICML2017)
- [Meta Gradient Reinforcement Learning (MGRL)](https://proceedings.neurips.cc/paper/2018/file/2715518c875999308842e3455eda2fe3-Paper.pdf) (NeurIPS 2018)
- [Learning through opponent learning process (LOLA)](https://arxiv.org/abs/1709.04326) (AAMAS 2018)

--------------------------------------------------------------------------------

## High-Performance

One can think of the scale procedures on gradients of optimizer algorithms as a combination of several operations. For example, the implementation of the Adam algorithm often includes addition, multiplication, power and square operations, one can fuse these operations into several compound functions. The operator fusion could greatly simplify the computation graph and reduce the GPU function launching stall. In addition, one can also implement the optimizer backward function and manually reuse some intermediate tensors to improve the backward performance. Users can pass argument `use_accelerated_op=True` to `adam`, `Adam` and `MetaAdam` to enable the fused accelerated operator. The arguments are the same between the two kinds of implementations.

Here we evaluate the performance using the maml-omniglot code with the inner-loop Adam optimizer on GPU. We comparble the run time of the overall algorithm and the meta-optimization (outer-loop optimization) under different network architecture/inner-step numbers. We choose [higher](https://github.com/facebookresearch/higher) as our baseline. The figure below illustrate that our accelerated Adam can achieve at least 1/3 efficiency improvement over the baseline.
Here we evaluate the performance using the maml-omniglot code with the inner-loop Adam optimizer on GPU. We comparable the run time of the overall algorithm and the meta-optimization (outer-loop optimization) under different network architecture/inner-step numbers. We choose [`higher`](https://github.com/facebookresearch/higher) as our baseline. The figure below illustrate that our accelerated Adam can achieve at least 1/3 efficiency improvement over the baseline.

<div align="center">
<img src=image/time.png width=80% />
<img src="image/time.png" width="80%" />
</div>

Notably, the operator fusion not only increases performance but also help simplify the computation graph, which will be discussed in the next section.

--------------------------------------------------------------------------------

## Visualization
Complex gradient flow in meta-learning brings in a great challenge for managing the gradient flow and verifying the correctness of it. TorchOpt provides a visualization tool that draw variable (e.g. network parameters or meta parameters) names on the gradient graph for better analyzing. The visualization tool is modified from [torchviz](https://github.com/szagoruyko/pytorchviz). We provide an example using the [visualization code](./examples/visualize.py). Also refer to the notebook [Visualization](./tutorials/3_Visualization.ipynb) for more details.

The figure below show the visulization result. Compared with torchviz, TorchOpt fuses the operations within the Adam together (orange) to reduce the complexity and provide simpler visualization.
Complex gradient flow in meta-learning brings in a great challenge for managing the gradient flow and verifying the correctness of it. TorchOpt provides a visualization tool that draw variable (e.g. network parameters or meta parameters) names on the gradient graph for better analyzing. The visualization tool is modified from [`torchviz`](https://github.com/szagoruyko/pytorchviz). We provide an example using the [visualization code](./examples/visualize.py). Also refer to the notebook [Visualization](./tutorials/3_Visualization.ipynb) for more details.

The figure below show the visualization result. Compared with [`torchviz`](https://github.com/szagoruyko/pytorchviz), TorchOpt fuses the operations within the Adam together (orange) to reduce the complexity and provide simpler visualization.

<div align="center">
<img src=image/torchviz_torchopt.jpg width=80% />
<img src="image/torchviz_torchopt.jpg" width="80%" />
</div>

--------------------------------------------------------------------------------

## Installation

Requirements
- (Optional) For visualizing computation graphs
- [Graphviz](https://graphviz.org/download/) (for Linux users use `apt/yum install graphviz` or `conda install -c anaconda python-graphviz`)

- PyTorch
- JAX
- (Optional) For visualizing computation graphs
- [Graphviz](https://graphviz.org/download/) (for Linux users use `apt/yum install graphviz` or `conda install -c anaconda python-graphviz`)

```bash
pip install torchopt
pip3 install torchopt
```

You can also build shared libraries from source, use:

```bash
git clone git@github.com:metaopt/torchopt.git
cd torchopt
python setup.py build_from_source
pip3 install .
```

--------------------------------------------------------------------------------

## Future Plan

- [ ] Support general implicit differentiation with functional programing.
- [ ] Support more optimizers such as AdamW, RMSPROP
- [ ] CPU-acclerated optimizer

--------------------------------------------------------------------------------

## The Team

TorchOpt is a work by Jie Ren, Xidong Feng, [Bo Liu](https://github.com/Benjamin-eecs/), [Luo Mai](https://luomai.github.io/) and [Yaodong Yang](https://www.yangyaodong.com/).

## Citing TorchOpt

If you find TorchOpt useful, please cite it in your publications.

```
```bibtex
@software{TorchOpt,
author = {Jie Ren and Xidong Feng and Bo Liu and Luo Mai and Yaodong Yang},
title = {TorchOpt},
Expand Down
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy