From c0be269f6d7bd4e99a0f83591170e75a5d5e20ca Mon Sep 17 00:00:00 2001
From: Ailing Zhang <ailzhang@fb.com>
Date: Wed, 17 Mar 2021 07:06:54 +0000
Subject: [PATCH 01/10] RFC-0011-InferenceMode

---
 RFC-0011-InferenceMode.md | 138 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 138 insertions(+)
 create mode 100644 RFC-0011-InferenceMode.md

diff --git a/RFC-0011-InferenceMode.md b/RFC-0011-InferenceMode.md
new file mode 100644
index 00000000..53673ed4
--- /dev/null
+++ b/RFC-0011-InferenceMode.md
@@ -0,0 +1,138 @@
+Note: a large part of this RFC will become "InferenceMode" documentation once it's finalized.
+
+## Goals:
+- Provide a RAII in C++ and a context manager in Python frontend to switch between inference mode and normal mode, with the following constraints:
+  - correctness is always guaranteed. (compared to `AutoNonVariableType` which has risks producing silent wrong result.)
+  - performance of infenrence mode should match current existing `AutoNonVariableTypeMode` which is widely used in prod.
+  - switching between normal mode and inference mode should be really easy with minimal code change.
+- Make `AutoNonVariableTypeMode` an internal only API, replace all callsites of `AutoNonVariableTypeMode` outside pytorch codebase with the new `InferenceMode`.
+
+## Non-goals:
+- Match the theorectical best inference performance which can be achieved by striping all autograd related stuff at build time (not flexible).
+- Allowing the most flexible interaction between normal mode and inference mode. Current main use case for inference mode is "either inference or normal" without mixing, so we ban a lot of interactions between two modes to keep the implementation simple.
+
+# Different levels of control over autograd (copied from @Alban)
+The following modes are ranked from slowest to fastest in speed, and from the most flexible to the most restrictive in what users can do.
+
+* Normal Mode: we create the graph for all Tensors that require gradients, always track view and inplace even they don't require gradients.
+* GradMode disabled: we never create the graph, still track all views and inplace. User code always succeeds to properly track gradients.
+* InferenceMode: we never create the graph, only track view and inplace if that could lead to silent error, skip that logic otherwise (we can potentially skip the allocation of the version counter for these tensors). Raise errors if users try to mix inference mode and autograd. (this one will have the same perf as AutoNonVariableTypeMode used today, but it will be safe!).
+* (Not available yet) Compile time no grad: all autograd related code is completely removed for the best perf. This requires the users to change their code to make sure they don't use any autograd construct or they will see errors.
+
+# New concepts
+
+In this RFC we introduces the following new concepts:
+- **InplaceOrView** is a new dipsatch key in dispatcher. It's fallthrough kernel by default, but it does `increment_version` for inplace ops and `as_view` setup for view ops. Here's some genernated InplaceOrView kernels:
+```
+   Tensor & add__Tensor(c10::DispatchKeySet ks, Tensor & self, const Tensor & other, Scalar alpha) {
+
+     TORCH_CHECK(c10::impl::is_all_dispatch_keyset_excluded(c10::autograd_dispatch_keyset),
+       "Calling inplace/view ops on inference tensor outside InferenceMode is not allowed, ",
+       "consider making a clone first. ",
+       "If you have a valid use case, please make a feature request to PyTorch.");
+     {
+       at::AutoDispatchBelowInplaceOrView guard;
+       at::redispatch::add_(ks & c10::after_InplaceOrView_keyset, self, other, alpha);
+     }
+     increment_version(self);
+     return self;
+   }
+
+
+   Tensor expand(c10::DispatchKeySet ks, const Tensor & self, IntArrayRef size, bool implicit) {
+
+     TORCH_CHECK(c10::impl::is_all_dispatch_keyset_excluded(c10::autograd_dispatch_keyset),
+       "Calling inplace/view ops on inference tensor outside InferenceMode is not allowed, ",
+       "consider making a clone first. ",
+       "If you have a valid use case, please make a feature request to PyTorch.");
+     auto _tmp = ([&]() {
+       at::AutoDispatchBelowInplaceOrView guard;
+       return at::redispatch::expand(ks & c10::after_InplaceOrView_keyset, self, size, implicit);
+     })();
+     std::function<at::Tensor(const at::Tensor&)> func=nullptr;
+     if (false || !self.unsafeGetTensorImpl()->support_as_strided()) {
+       auto size_vec = size.vec();
+       func = [=](const at::Tensor& input_base) {
+         return input_base.expand(size_vec, implicit);
+       };
+     }
+     auto result = as_view(/* base */ self, /* output */ _tmp, /* is_bw_differentiable */ true, /* is_fw_differentiable */ true, /* view_func */ func, /* creatio
+     return result;
+   }
+ ```
+ - **Inference mode** can be turned on when you are sure you don't need any autograd computation. This saves the cost of creating autograd graph and as_view/version_counter setup compared to the normal mode.
+ - **Inference tensor** is defined as a tensor without Autograd **and** InplaceOrView keys on it.
+ - **Normal tensor** has both Autograd & InplaceOrView keys. This includes both `requires_grad=true` and `requires_grad=false` tensors. (see [Ideal end state] section for more details).
+ - Additional notes:
+   - All Inference tensors are created in inference mode, but not all of the tensors created in inference mode are inference tensors. For example, a view of normal tensor created in inference mode is still a normal tensor (but with special creation_meta!).
+   - (Autograd & !InplaceOrView) and (!Autogad & InplaceOrView) are invalid states, we don't have such tensors.
+
+# Expected Behavior
+## Implementation:
+1. Inference Mode: InplaceOrView not in included, Autograd in excluded
+2. Normal Mode: InplaceOrView in included, Autograd not in excluded
+3. In VariableType kernel, throw an error if input is inference tensor.
+4. In InplaceOrView kernel, throw an error if Autograd keyset is not in excluded set already.
+5. In VariableType kernel, throw an error if input is a view with `NO_VARIABLE_TYPE_VIEW` creation_meta.
+## Behavior
+| Mode          | Input                                    | Op         | Go through Kernels                        | Produced Output                                            |   |   |
+|---------------|------------------------------------------|------------|-------------------------------------------|------------------------------------------------------------|---|---|
+| InferenceMode | All inference tensors                    | functional | CPU                                       | inference tensor                                           |   |   |
+| InferenceMode | All inference tensors                    | view       | CPU                                       | inference tensor                                           |   |   |
+| InferenceMode | All inference tensors                    | inplace    | CPU                                       | inference tensor                                           |   |   |
+| InferenceMode | Contains normal tensor                   | functional | InplaceOrView(fallthrough), CPU           | inference tensor                                           |   |   |
+| InferenceMode | Contains normal tensor                   | view       | InplaceOrView, CPU                        | normal tensor (with creation_meta=NO_VARIABLE_TYPE_VIEW)   |   |   |
+| InferenceMode | Contains normal tensor                   | inplace    | InplaceOrView, CPU                        | normal tensor (which is input itself with updated version) |   |   |
+| NormalMode    | All inference tensors                    | functional | InplaceOrView(fallthrough), CPU           | normal tensor (see note*)                                  |   |   |
+| NormalMode    | All inference tensors                    | view       | InplaceOrView(ERROR4!), CPU               |                                                            |   |   |
+| NormalMode    | All inference tensors                    | inplace    | InplaceOrView(ERROR4!), CPU               |                                                            |   |   |
+| NormalMode    | Mixed normal tensor and inference tensor | functional | VariableType(ERROR3!), InplaceOrView, CPU |                                                            |   |   |
+| NormalMode    | Mixed normal tensor and inference tensor | view       | VariableType(ERROR3!), InplaceOrView, CPU |                                                            |   |   |
+| NormalMode    | Mixed normal tensor and inference tensor | inplace    | VariableType(ERROR3!), InplaceOrView, CPU |                                                            |   |   |
+|               |                                          |            |                                           |                                                            |   |   |
+|               |                                          |            |                                           |                                                            |   |   |
+## additional notes:
+1. ERROR3 means it hits (3) described in implementation section and ERROR4 means it hits (4) in implementation section.
+2. Functional ops on inference tensors might run slower outside InferenceMode than inside.
+   But it's fine that we don't care about perf of this case that much.
+
+## Alternative implementations we've considered and why they don't work:
+1. For NormalMode + All inference tensors + functional op, an alternative behavior we perfer but didn't implement is throwing an error by forcing this op go through VariableType kernel and hit the assert_no_inference_tensor check. But to do that we'll have to add c10::autograd_dispatch_keyset to the globally enabled set, but doing that might accidentally call autograd kernel from a backend that doesn't match tensor input. Thus we allow functional ops run without throwing an error.
+2. Why implementation (1) and (2)?
+```
+    // 1. When InferenceMode is enabled, Autograd dispatch keys are excluded
+    //    but not InplaceOrView key.
+    //
+    //    For example:
+    //    torch::Tensor a = torch::ones({1, 2, 3}).set_requires_grad(true);
+    //    torch::Tensor k = a + 2;
+    //    {
+    //      c10::InferenceMode guard(true);
+    //      k.add_(2);
+    //    }
+    //    `k.add_(2)` still need to go through InplaceOrView kernel so that it's
+    //    prepared for future autograd.
+    //  2. When InferenceMode is disabled, InplaceOrView must be added
+    //     to included set.
+    //
+    //     For example:
+    //     torch::Tensor a;
+    //     {
+    //       c10::InferenceMode guard(true);
+    //       torch::Tensor in = torch::ones({2, 2});
+    //       a = in.view({1, 4});
+    //     }
+    //     torch::Tensor c = a.view({4, 1}); // (*)
+    //     If we don't add InplaceOrView to included set, (*) will skip its as_view
+    //     setup entirely, `c` will be a Tensor that is not from Inference mode
+    //     but has potentially wrong view metadata which should be forbidden..
+    //     By going through InplaceOrView kernel, we can throw an error since it
+    //     broke our invariant: "Autograd keys must be in excluded set before
+    //     reaching InplaceOrView kernel".
+```
+3.
+# Ideal end state
+Ideal end state is that we can link skip VariableType kernel when requires_grad=False which means we don't always go through VariableType kernel in normal mode.
+But this work is currently blocked for the following reason:
+- If requires_grad=False skips VariableType kernel, functional ops won't be able to go through `AutoDispatchBelowInplaceOrView` guard which suppresses both autograd and InplaceOrView keys in TLS excluded. Not suppressing InplaceOrView key means unnecessary calls to `as_view/increment_version` if any view/inplace ops are used in the kernel implementation which adds a lot of overhead. To avoid overhead, instead of fallthrough kerenl being backend fallback, we'll want to use a real kernel that suppresses InplaceOrView key. But compared to the current implementation which only adds an extra dispatch for view/inplace ops, it forces all functional ops to have an extra dispatch as well. That's why it's blocked.
+- To unblock it requires some fixes like identifying at:: callsites in backend-specific kernels (static analysis? ) , replacing these with at::native:: should unblock us from linking requires_grad with VariableType kernel.

From 23fa81d1cab195e3b283eab7d5ca26e019e37e32 Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang" <ezyang@fb.com>
Date: Wed, 17 Mar 2021 11:32:13 -0400
Subject: [PATCH 02/10] minor copyedit

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
---
 RFC-0011-InferenceMode.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/RFC-0011-InferenceMode.md b/RFC-0011-InferenceMode.md
index 53673ed4..0fef26db 100644
--- a/RFC-0011-InferenceMode.md
+++ b/RFC-0011-InferenceMode.md
@@ -3,12 +3,12 @@ Note: a large part of this RFC will become "InferenceMode" documentation once it
 ## Goals:
 - Provide a RAII in C++ and a context manager in Python frontend to switch between inference mode and normal mode, with the following constraints:
   - correctness is always guaranteed. (compared to `AutoNonVariableType` which has risks producing silent wrong result.)
-  - performance of infenrence mode should match current existing `AutoNonVariableTypeMode` which is widely used in prod.
+  - performance of inference mode should match current existing `AutoNonVariableTypeMode` which is widely used in prod.
   - switching between normal mode and inference mode should be really easy with minimal code change.
 - Make `AutoNonVariableTypeMode` an internal only API, replace all callsites of `AutoNonVariableTypeMode` outside pytorch codebase with the new `InferenceMode`.
 
 ## Non-goals:
-- Match the theorectical best inference performance which can be achieved by striping all autograd related stuff at build time (not flexible).
+- Match the theoretical best inference performance which can be achieved by stripping all autograd related stuff at build time (not flexible).
 - Allowing the most flexible interaction between normal mode and inference mode. Current main use case for inference mode is "either inference or normal" without mixing, so we ban a lot of interactions between two modes to keep the implementation simple.
 
 # Different levels of control over autograd (copied from @Alban)
@@ -60,11 +60,11 @@ In this RFC we introduces the following new concepts:
      return result;
    }
  ```
- - **Inference mode** can be turned on when you are sure you don't need any autograd computation. This saves the cost of creating autograd graph and as_view/version_counter setup compared to the normal mode.
+ - **Inference mode** can be turned on when you are sure you don't need any autograd computation. This saves the cost of creating autograd graph and `as_view` / `version_counter` setup compared to the normal mode.
  - **Inference tensor** is defined as a tensor without Autograd **and** InplaceOrView keys on it.
  - **Normal tensor** has both Autograd & InplaceOrView keys. This includes both `requires_grad=true` and `requires_grad=false` tensors. (see [Ideal end state] section for more details).
  - Additional notes:
-   - All Inference tensors are created in inference mode, but not all of the tensors created in inference mode are inference tensors. For example, a view of normal tensor created in inference mode is still a normal tensor (but with special creation_meta!).
+   - All Inference tensors are created in inference mode, but not all of the tensors created in inference mode are inference tensors. For example, a view of normal tensor created in inference mode is still a normal tensor (but with special `creation_meta`!).
    - (Autograd & !InplaceOrView) and (!Autogad & InplaceOrView) are invalid states, we don't have such tensors.
 
 # Expected Behavior

From 5cdeaa765d3bc48ddf651b813e7a3b2097b255af Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang" <ezyang@fb.com>
Date: Wed, 17 Mar 2021 11:39:08 -0400
Subject: [PATCH 03/10] copyedit

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
---
 RFC-0011-InferenceMode.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/RFC-0011-InferenceMode.md b/RFC-0011-InferenceMode.md
index 0fef26db..c2fda8a9 100644
--- a/RFC-0011-InferenceMode.md
+++ b/RFC-0011-InferenceMode.md
@@ -22,7 +22,7 @@ The following modes are ranked from slowest to fastest in speed, and from the mo
 # New concepts
 
 In this RFC we introduces the following new concepts:
-- **InplaceOrView** is a new dipsatch key in dispatcher. It's fallthrough kernel by default, but it does `increment_version` for inplace ops and `as_view` setup for view ops. Here's some genernated InplaceOrView kernels:
+- **InplaceOrView** is a new dispatch key in dispatcher. It's fallthrough kernel by default, but it does `increment_version` for inplace ops and `as_view` setup for view ops. Here's some genernated InplaceOrView kernels:
 ```
    Tensor & add__Tensor(c10::DispatchKeySet ks, Tensor & self, const Tensor & other, Scalar alpha) {
 

From 7746b41dc6b37494f61501e4a2c42e521fb56fb5 Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang" <ezyang@fb.com>
Date: Wed, 17 Mar 2021 11:40:05 -0400
Subject: [PATCH 04/10] copyedit

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
---
 RFC-0011-InferenceMode.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/RFC-0011-InferenceMode.md b/RFC-0011-InferenceMode.md
index c2fda8a9..a56a4d33 100644
--- a/RFC-0011-InferenceMode.md
+++ b/RFC-0011-InferenceMode.md
@@ -22,7 +22,7 @@ The following modes are ranked from slowest to fastest in speed, and from the mo
 # New concepts
 
 In this RFC we introduces the following new concepts:
-- **InplaceOrView** is a new dispatch key in dispatcher. It's fallthrough kernel by default, but it does `increment_version` for inplace ops and `as_view` setup for view ops. Here's some genernated InplaceOrView kernels:
+- **InplaceOrView** is a new dispatch key in dispatcher. It's fallthrough kernel by default, but it does `increment_version` for inplace ops and `as_view` setup for view ops. Here's some generated InplaceOrView kernels:
 ```
    Tensor & add__Tensor(c10::DispatchKeySet ks, Tensor & self, const Tensor & other, Scalar alpha) {
 

From 3a4edd677307f5c265a0be718b4f0bc514138f34 Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang" <ezyang@fb.com>
Date: Wed, 17 Mar 2021 11:43:56 -0400
Subject: [PATCH 05/10] copyedit

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
---
 RFC-0011-InferenceMode.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/RFC-0011-InferenceMode.md b/RFC-0011-InferenceMode.md
index a56a4d33..cb5ecab1 100644
--- a/RFC-0011-InferenceMode.md
+++ b/RFC-0011-InferenceMode.md
@@ -97,7 +97,7 @@ In this RFC we introduces the following new concepts:
    But it's fine that we don't care about perf of this case that much.
 
 ## Alternative implementations we've considered and why they don't work:
-1. For NormalMode + All inference tensors + functional op, an alternative behavior we perfer but didn't implement is throwing an error by forcing this op go through VariableType kernel and hit the assert_no_inference_tensor check. But to do that we'll have to add c10::autograd_dispatch_keyset to the globally enabled set, but doing that might accidentally call autograd kernel from a backend that doesn't match tensor input. Thus we allow functional ops run without throwing an error.
+1. For NormalMode + All inference tensors + functional op, an alternative behavior we prefer but didn't implement is throwing an error by forcing this op go through VariableType kernel and hit the assert_no_inference_tensor check. But to do that we'll have to add c10::autograd_dispatch_keyset to the globally enabled set, but doing that might accidentally call autograd kernel from a backend that doesn't match tensor input. Thus we allow functional ops run without throwing an error.
 2. Why implementation (1) and (2)?
 ```
     // 1. When InferenceMode is enabled, Autograd dispatch keys are excluded

From 575861b26d3137ff248462243fe5c5109456a94f Mon Sep 17 00:00:00 2001
From: Ailing <ailzhang@users.noreply.github.com>
Date: Fri, 19 Mar 2021 13:35:44 -0700
Subject: [PATCH 06/10] Update RFC-0011-InferenceMode.md

---
 RFC-0011-InferenceMode.md | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/RFC-0011-InferenceMode.md b/RFC-0011-InferenceMode.md
index cb5ecab1..e8167644 100644
--- a/RFC-0011-InferenceMode.md
+++ b/RFC-0011-InferenceMode.md
@@ -60,8 +60,32 @@ In this RFC we introduces the following new concepts:
      return result;
    }
  ```
- - **Inference mode** can be turned on when you are sure you don't need any autograd computation. This saves the cost of creating autograd graph and `as_view` / `version_counter` setup compared to the normal mode.
- - **Inference tensor** is defined as a tensor without Autograd **and** InplaceOrView keys on it.
+ - **Inference mode** a thread local state that can be turned on via RAII guard/context manager. (Either you are in inference mode, or you are not.) Intuitively, inference mode lets you do inference only operation with better performance than normal mode.
+   - All operations do not create autograd graph, even if the inputs require_grad=True
+   - Setting requires_grad in inference mode will update requires_grad field on tensors, but behavior won't change.  
+   - Things that continue to work:
+     - Inplace operations on both normal/inference tensors are OK
+        - Inplace operation on inference tensor is guaranteed not to VC bump
+        - NB: if you do an inplace operation on a normal tensor, you WILL get a version counter bump
+     - View operations on both normal/inference tensors are OK
+        -  View operation on inference tensor is guaranteed not to allocate view metadata
+        -  View operation on normal tensor produces a "bad" normal tensor (for safety reasons). Bad normal tensors (impl: CreationMeta) cannot be inplace modified outside inference mode. These normal tensors behave identically to no_grad, except that they always raise error (rather than give a warning).
+
+* **Inference tensor** are tensors that are constructed if and only if inference mode is enabled, with the exception of views on normal tensors. Non-inference tensors are called **normal tensors**. 
+    * Q: Why not views on normal tensors? A: Because we guarantee performance on inference tensors, but views on normal tensors require additional safety checks (e.g. normal tensor ----(view)---> ----(inplace)----> this should properly bump version on base which requires view produce a normal tensor).
+    * Setting requires_grad on an inference tensors outside of inference mode raises an error.
+        * NB: Inference tensors and bad normal tensors are leaf tensors.
+    * Outside of inference mode, the following operations on inference tensors is forbidden:
+        * Inplace/view operations (functional operations produce normal tensors), if at least one input tensor is inference mode.
+            * Why? In principle, these are safe if they produce inference tensors, but we are trying to maintain the invariant that inference tensors are ONLY created in inference mode.
+            * Impl: Functional on normal tensors is allowed because we cannot conveniently ban it (VariableType/InplaceOrView kernel are all skipped)
+        * Mixing inference and normal tensors, even for functional operations, is forbidden.
+            * Why? For simplicity of implementation. In particular, if you save the inference tensor in backwards, you’re likely to hit an error in a weird place (better to error early). By forbidding mixed operations, it is impossible for this situation to occur.
+    * Impl: inference tensors are guaranteed not to have AutogradMeta
+
+
+
+
  - **Normal tensor** has both Autograd & InplaceOrView keys. This includes both `requires_grad=true` and `requires_grad=false` tensors. (see [Ideal end state] section for more details).
  - Additional notes:
    - All Inference tensors are created in inference mode, but not all of the tensors created in inference mode are inference tensors. For example, a view of normal tensor created in inference mode is still a normal tensor (but with special `creation_meta`!).

From 0fee4b7e9476fc0c1717db0a9a14b2b577e52a7d Mon Sep 17 00:00:00 2001
From: Ailing <ailzhang@users.noreply.github.com>
Date: Mon, 22 Mar 2021 00:04:30 -0700
Subject: [PATCH 07/10] Update RFC-0011-InferenceMode.md

---
 RFC-0011-InferenceMode.md | 43 +++++++--------------------------------
 1 file changed, 7 insertions(+), 36 deletions(-)

diff --git a/RFC-0011-InferenceMode.md b/RFC-0011-InferenceMode.md
index e8167644..6bd15131 100644
--- a/RFC-0011-InferenceMode.md
+++ b/RFC-0011-InferenceMode.md
@@ -62,14 +62,14 @@ In this RFC we introduces the following new concepts:
  ```
  - **Inference mode** a thread local state that can be turned on via RAII guard/context manager. (Either you are in inference mode, or you are not.) Intuitively, inference mode lets you do inference only operation with better performance than normal mode.
    - All operations do not create autograd graph, even if the inputs require_grad=True
-   - Setting requires_grad in inference mode will update requires_grad field on tensors, but behavior won't change.  
+   - Setting requires_grad in inference mode will update requires_grad field on tensors, but it doesn't affect any behavior inside InferenceMode.
    - Things that continue to work:
      - Inplace operations on both normal/inference tensors are OK
         - Inplace operation on inference tensor is guaranteed not to VC bump
         - NB: if you do an inplace operation on a normal tensor, you WILL get a version counter bump
      - View operations on both normal/inference tensors are OK
         -  View operation on inference tensor is guaranteed not to allocate view metadata
-        -  View operation on normal tensor produces a "bad" normal tensor (for safety reasons). Bad normal tensors (impl: CreationMeta) cannot be inplace modified outside inference mode. These normal tensors behave identically to no_grad, except that they always raise error (rather than give a warning).
+        -  View operation on normal tensor produces a normal tensor(NO_GRAD_FN), behavior is the same as creating a view inside NoGrad mode. 
 
 * **Inference tensor** are tensors that are constructed if and only if inference mode is enabled, with the exception of views on normal tensors. Non-inference tensors are called **normal tensors**. 
     * Q: Why not views on normal tensors? A: Because we guarantee performance on inference tensors, but views on normal tensors require additional safety checks (e.g. normal tensor ----(view)---> ----(inplace)----> this should properly bump version on base which requires view produce a normal tensor).
@@ -81,48 +81,19 @@ In this RFC we introduces the following new concepts:
             * Impl: Functional on normal tensors is allowed because we cannot conveniently ban it (VariableType/InplaceOrView kernel are all skipped)
         * Mixing inference and normal tensors, even for functional operations, is forbidden.
             * Why? For simplicity of implementation. In particular, if you save the inference tensor in backwards, you’re likely to hit an error in a weird place (better to error early). By forbidding mixed operations, it is impossible for this situation to occur.
-    * Impl: inference tensors are guaranteed not to have AutogradMeta
-
-
+    * Impl: inference tensors are guaranteed to have is_leaf=True. 
 
 
  - **Normal tensor** has both Autograd & InplaceOrView keys. This includes both `requires_grad=true` and `requires_grad=false` tensors. (see [Ideal end state] section for more details).
  - Additional notes:
-   - All Inference tensors are created in inference mode, but not all of the tensors created in inference mode are inference tensors. For example, a view of normal tensor created in inference mode is still a normal tensor (but with special `creation_meta`!).
+   - All Inference tensors are created in inference mode, but not all of the tensors created in inference mode are inference tensors. For example, a view of normal tensor created in inference mode is still a normal tensor (but with special `creation_meta=NO_GRAD_FN`!).
    - (Autograd & !InplaceOrView) and (!Autogad & InplaceOrView) are invalid states, we don't have such tensors.
 
-# Expected Behavior
-## Implementation:
-1. Inference Mode: InplaceOrView not in included, Autograd in excluded
-2. Normal Mode: InplaceOrView in included, Autograd not in excluded
-3. In VariableType kernel, throw an error if input is inference tensor.
-4. In InplaceOrView kernel, throw an error if Autograd keyset is not in excluded set already.
-5. In VariableType kernel, throw an error if input is a view with `NO_VARIABLE_TYPE_VIEW` creation_meta.
-## Behavior
-| Mode          | Input                                    | Op         | Go through Kernels                        | Produced Output                                            |   |   |
-|---------------|------------------------------------------|------------|-------------------------------------------|------------------------------------------------------------|---|---|
-| InferenceMode | All inference tensors                    | functional | CPU                                       | inference tensor                                           |   |   |
-| InferenceMode | All inference tensors                    | view       | CPU                                       | inference tensor                                           |   |   |
-| InferenceMode | All inference tensors                    | inplace    | CPU                                       | inference tensor                                           |   |   |
-| InferenceMode | Contains normal tensor                   | functional | InplaceOrView(fallthrough), CPU           | inference tensor                                           |   |   |
-| InferenceMode | Contains normal tensor                   | view       | InplaceOrView, CPU                        | normal tensor (with creation_meta=NO_VARIABLE_TYPE_VIEW)   |   |   |
-| InferenceMode | Contains normal tensor                   | inplace    | InplaceOrView, CPU                        | normal tensor (which is input itself with updated version) |   |   |
-| NormalMode    | All inference tensors                    | functional | InplaceOrView(fallthrough), CPU           | normal tensor (see note*)                                  |   |   |
-| NormalMode    | All inference tensors                    | view       | InplaceOrView(ERROR4!), CPU               |                                                            |   |   |
-| NormalMode    | All inference tensors                    | inplace    | InplaceOrView(ERROR4!), CPU               |                                                            |   |   |
-| NormalMode    | Mixed normal tensor and inference tensor | functional | VariableType(ERROR3!), InplaceOrView, CPU |                                                            |   |   |
-| NormalMode    | Mixed normal tensor and inference tensor | view       | VariableType(ERROR3!), InplaceOrView, CPU |                                                            |   |   |
-| NormalMode    | Mixed normal tensor and inference tensor | inplace    | VariableType(ERROR3!), InplaceOrView, CPU |                                                            |   |   |
-|               |                                          |            |                                           |                                                            |   |   |
-|               |                                          |            |                                           |                                                            |   |   |
-## additional notes:
-1. ERROR3 means it hits (3) described in implementation section and ERROR4 means it hits (4) in implementation section.
-2. Functional ops on inference tensors might run slower outside InferenceMode than inside.
-   But it's fine that we don't care about perf of this case that much.
+
 
 ## Alternative implementations we've considered and why they don't work:
 1. For NormalMode + All inference tensors + functional op, an alternative behavior we prefer but didn't implement is throwing an error by forcing this op go through VariableType kernel and hit the assert_no_inference_tensor check. But to do that we'll have to add c10::autograd_dispatch_keyset to the globally enabled set, but doing that might accidentally call autograd kernel from a backend that doesn't match tensor input. Thus we allow functional ops run without throwing an error.
-2. Why implementation (1) and (2)?
+2. 
 ```
     // 1. When InferenceMode is enabled, Autograd dispatch keys are excluded
     //    but not InplaceOrView key.
@@ -154,7 +125,7 @@ In this RFC we introduces the following new concepts:
     //     broke our invariant: "Autograd keys must be in excluded set before
     //     reaching InplaceOrView kernel".
 ```
-3.
+
 # Ideal end state
 Ideal end state is that we can link skip VariableType kernel when requires_grad=False which means we don't always go through VariableType kernel in normal mode.
 But this work is currently blocked for the following reason:

From 25378036a5f9a6cc760fd1dbd2ef4831ee2052b2 Mon Sep 17 00:00:00 2001
From: Ailing <ailzhang@users.noreply.github.com>
Date: Tue, 23 Mar 2021 10:00:51 -0700
Subject: [PATCH 08/10] Update RFC-0011-InferenceMode.md

---
 RFC-0011-InferenceMode.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/RFC-0011-InferenceMode.md b/RFC-0011-InferenceMode.md
index 6bd15131..7ca1bd98 100644
--- a/RFC-0011-InferenceMode.md
+++ b/RFC-0011-InferenceMode.md
@@ -73,8 +73,7 @@ In this RFC we introduces the following new concepts:
 
 * **Inference tensor** are tensors that are constructed if and only if inference mode is enabled, with the exception of views on normal tensors. Non-inference tensors are called **normal tensors**. 
     * Q: Why not views on normal tensors? A: Because we guarantee performance on inference tensors, but views on normal tensors require additional safety checks (e.g. normal tensor ----(view)---> ----(inplace)----> this should properly bump version on base which requires view produce a normal tensor).
-    * Setting requires_grad on an inference tensors outside of inference mode raises an error.
-        * NB: Inference tensors and bad normal tensors are leaf tensors.
+    * NB: Inference tensors and bad normal tensors are leaf tensors.
     * Outside of inference mode, the following operations on inference tensors is forbidden:
         * Inplace/view operations (functional operations produce normal tensors), if at least one input tensor is inference mode.
             * Why? In principle, these are safe if they produce inference tensors, but we are trying to maintain the invariant that inference tensors are ONLY created in inference mode.

From 1effdab8c2463fcc14f0d3249930ed625c3d9046 Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang" <ezyang@fb.com>
Date: Thu, 25 Mar 2021 14:43:30 -0400
Subject: [PATCH 09/10] rfc rewrite

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
---
 RFC-0011-InferenceMode.md | 272 +++++++++++++++++++++++++++++---------
 1 file changed, 213 insertions(+), 59 deletions(-)

diff --git a/RFC-0011-InferenceMode.md b/RFC-0011-InferenceMode.md
index 7ca1bd98..9704c619 100644
--- a/RFC-0011-InferenceMode.md
+++ b/RFC-0011-InferenceMode.md
@@ -1,69 +1,223 @@
-Note: a large part of this RFC will become "InferenceMode" documentation once it's finalized.
+## Summary
 
-## Goals:
-- Provide a RAII in C++ and a context manager in Python frontend to switch between inference mode and normal mode, with the following constraints:
-  - correctness is always guaranteed. (compared to `AutoNonVariableType` which has risks producing silent wrong result.)
-  - performance of inference mode should match current existing `AutoNonVariableTypeMode` which is widely used in prod.
-  - switching between normal mode and inference mode should be really easy with minimal code change.
-- Make `AutoNonVariableTypeMode` an internal only API, replace all callsites of `AutoNonVariableTypeMode` outside pytorch codebase with the new `InferenceMode`.
+`InferenceMode` is a new context manager / RAII guard analogous to
+`NoGradMode` to be used when you are certain your operations will have no
+interactions with autograd (e.g., model training).  Code run under this
+mode gets better performance by disabling view tracking and version
+counter bumps.
 
-## Non-goals:
-- Match the theoretical best inference performance which can be achieved by stripping all autograd related stuff at build time (not flexible).
-- Allowing the most flexible interaction between normal mode and inference mode. Current main use case for inference mode is "either inference or normal" without mixing, so we ban a lot of interactions between two modes to keep the implementation simple.
+## Motivation
 
-# Different levels of control over autograd (copied from @Alban)
-The following modes are ranked from slowest to fastest in speed, and from the most flexible to the most restrictive in what users can do.
+In production use of PyTorch for inference, we have seen a proliferation
+of uses of the C++ guard `AutoNonVariableTypeMode`, which disables
+autograd, view tracking and version counter bumps.  Unfortunately,
+current colloquial use of this guard is unsafe: it is possible to use
+`AutoNonVariableTypeMode` to bypass PyTorch's safety checks for, e.g.,
+ensuring tensors saved for backwards are not subsequently mutated.
 
-* Normal Mode: we create the graph for all Tensors that require gradients, always track view and inplace even they don't require gradients.
-* GradMode disabled: we never create the graph, still track all views and inplace. User code always succeeds to properly track gradients.
-* InferenceMode: we never create the graph, only track view and inplace if that could lead to silent error, skip that logic otherwise (we can potentially skip the allocation of the version counter for these tensors). Raise errors if users try to mix inference mode and autograd. (this one will have the same perf as AutoNonVariableTypeMode used today, but it will be safe!).
-* (Not available yet) Compile time no grad: all autograd related code is completely removed for the best perf. This requires the users to change their code to make sure they don't use any autograd construct or they will see errors.
+`InferenceMode` offers a drop in replacement for
+`AutoNonVariableTypeMode` which:
 
-# New concepts
+1. Preserves the performance characteristics of
+   `AutoNonVariableTypeMode` (Autograd, view tracking and version
+   counter bumps are skipped for all tensors allocated within the
+   inference mode region), but
+
+2. Is safe, in the sense that it is not possible to bypass version
+   counter updates on tensors which may alias with tensors which
+   have been saved for backwards.
+
+For now, this guard is to only be made available inside C++, although
+we could also introduce a Python guard `torch.inference_mode` as well.
+
+Some goals and non-goals:
+
+* Goal: `InferenceMode` is semantically equivalent to `NoGradMode`,
+  except some operations may not be supported.  (In other words, this is
+  a partial equivalence: *if* inference mode does not throw an error,
+  then it behaves the same way as no grad mode).
+
+* Goal: Don't be a global or compile time flag.  This makes
+  `InferenceMode` widely applicable as it can still be used in processes
+  where there may be training going on in another thread (e.g.,
+  federated learning on mobile).
+
+* Non-goal: `InferenceMode` doesn't affect computation beyond its scope.
+  Indeed, the capacity for tensors allocated in `InferenceMode` (so
+  called "inference tensors") to behave differently even outside of
+  `InferenceMode` is one of the key implementation tools to ensuring
+  that `InferenceMode` is safe.
+
+* Non-goal: Make operations on inference tensors fast outside of
+  `InferenceMode`
+
+* Non-goal: Avoid performance slowdown for view/inplace operations
+  outside of `InferenceMode`.  Benchmarking on popular models reveal
+  that a slight slowdown on these operations is acceptable; in our
+  case, this slowdown will be due to an extra redispatch in these cases.
+
+# User description
+
+`InferenceMode` is an RAII guard which can be enabled for a given block
+of code.  Inside inference mode, all newly allocated (non-view) tensors
+are marked as **inference tensors**; these tensors are guaranteed not to
+alias with tensors that may have been saved for backwards (or are
+otherwise making use of version counters--perhaps more accurately,
+you could call these "non version counter tracked tensors").  Inference
+tensors do not have a version counter, and raise an error if you try
+to read their version (e.g., because you saved this tensor for
+backwards.)
+
+A non-view tensor is an inference tensor if and only if it was
+allocated during inference mode.  A view tensor is an inference
+tensor if and only if the tensor it is a view of is an inference tensor.
+
+Inside an `InferenceMode` block, we make the following performance
+guarantees:
+
+* All operations do not record `grad_fn`, even if their `requires_grad=True`
+  (like `NoGradMode`).  This applies for both inference tensors and
+  normal tensors (also like `NoGradMode`).
+* View operations do not do view tracking (like `NoGradMode` after
+  PyTorch 1.9).  This applies for both inference tensors and normal
+  tensors.
+* Inplace operations on inference tensors are guaranteed not to do
+  a version counter bump (which is equivalent to an atomic increment).
+  Inplace operations on normal tensors still do version counter bumps.
+
+# Implementation description
+
+**Dispatcher.**  The dispatcher decides what implementation of a kernel
+to call when an operator is invoked.  The set of possible options is
+controlled by several sources:
+
+* Tensor inputs (keys are unioned from all inputs)
+* TLS included set
+* TLS excluded set (which removes keys from the above two sources)
+
+**Autograd.**  This is a preexisting dispatch key which is responsible
+for recording `grad_fn` on output tensors when any of their inputs
+`require_grad`.
+
+Autograd dispatch key is associated with tensors.  Prior to this
+proposal, all tensors unconditionally have an autograd key.
+(Technically, the autograd dispatch key is not a single key,
+but a set of keys per backend; for the purposes of this proposal,
+this doesn't matter.)
+
+**InplaceOrView.**  This is a new dispatch key which is responsible for
+doing version counter bumps on inplace operations, and view metadata
+tracking for view ops.  Previously, this functionality was also done
+as part of the Autograd kernel.  For all other operators, it is a fallthrough
+kernel.  Here is an example kernel for an inplace op and a view op prior
+to this proposal:
 
-In this RFC we introduces the following new concepts:
-- **InplaceOrView** is a new dispatch key in dispatcher. It's fallthrough kernel by default, but it does `increment_version` for inplace ops and `as_view` setup for view ops. Here's some generated InplaceOrView kernels:
 ```
-   Tensor & add__Tensor(c10::DispatchKeySet ks, Tensor & self, const Tensor & other, Scalar alpha) {
-
-     TORCH_CHECK(c10::impl::is_all_dispatch_keyset_excluded(c10::autograd_dispatch_keyset),
-       "Calling inplace/view ops on inference tensor outside InferenceMode is not allowed, ",
-       "consider making a clone first. ",
-       "If you have a valid use case, please make a feature request to PyTorch.");
-     {
-       at::AutoDispatchBelowInplaceOrView guard;
-       at::redispatch::add_(ks & c10::after_InplaceOrView_keyset, self, other, alpha);
-     }
-     increment_version(self);
-     return self;
-   }
-
-
-   Tensor expand(c10::DispatchKeySet ks, const Tensor & self, IntArrayRef size, bool implicit) {
-
-     TORCH_CHECK(c10::impl::is_all_dispatch_keyset_excluded(c10::autograd_dispatch_keyset),
-       "Calling inplace/view ops on inference tensor outside InferenceMode is not allowed, ",
-       "consider making a clone first. ",
-       "If you have a valid use case, please make a feature request to PyTorch.");
-     auto _tmp = ([&]() {
-       at::AutoDispatchBelowInplaceOrView guard;
-       return at::redispatch::expand(ks & c10::after_InplaceOrView_keyset, self, size, implicit);
-     })();
-     std::function<at::Tensor(const at::Tensor&)> func=nullptr;
-     if (false || !self.unsafeGetTensorImpl()->support_as_strided()) {
-       auto size_vec = size.vec();
-       func = [=](const at::Tensor& input_base) {
-         return input_base.expand(size_vec, implicit);
-       };
-     }
-     auto result = as_view(/* base */ self, /* output */ _tmp, /* is_bw_differentiable */ true, /* is_fw_differentiable */ true, /* view_func */ func, /* creatio
-     return result;
-   }
- ```
- - **Inference mode** a thread local state that can be turned on via RAII guard/context manager. (Either you are in inference mode, or you are not.) Intuitively, inference mode lets you do inference only operation with better performance than normal mode.
-   - All operations do not create autograd graph, even if the inputs require_grad=True
-   - Setting requires_grad in inference mode will update requires_grad field on tensors, but it doesn't affect any behavior inside InferenceMode.
-   - Things that continue to work:
+Tensor & add__Tensor(c10::DispatchKeySet ks, Tensor & self, const Tensor & other, Scalar alpha) {
+  {
+    at::AutoDispatchBelowInplaceOrView guard;
+    at::redispatch::add_(ks & c10::after_InplaceOrView_keyset, self, other, alpha);
+  }
+  increment_version(self);
+  return self;
+}
+
+Tensor expand(c10::DispatchKeySet ks, const Tensor & self, IntArrayRef size, bool implicit) {
+  auto _tmp = ([&]() {
+    at::AutoDispatchBelowInplaceOrView guard;
+    return at::redispatch::expand(ks & c10::after_InplaceOrView_keyset, self, size, implicit);
+  })();
+  std::function<at::Tensor(const at::Tensor&)> func=nullptr;
+  if (false || !self.unsafeGetTensorImpl()->support_as_strided()) {
+    auto size_vec = size.vec();
+    func = [=](const at::Tensor& input_base) {
+      return input_base.expand(size_vec, implicit);
+    };
+  }
+  auto result = as_view(
+   /* base */ self, /* output */ _tmp, /* is_bw_differentiable */ true,
+   /* is_fw_differentiable */ true, /* view_func */ func,
+   /* creation_meta */ at::GradMode::is_enabled() ? CreationMeta::DEFAULT : CreationMeta::NO_GRAD_MODE
+  return result;
+}
+```
+
+InplaceOrView is considered part of the default TLS included set; i.e.,
+it is always run.  It is also associated with normal tensors (like Autograd),
+so that these kernels get run even if InplaceOrView is not in the
+default TLS included set.
+
+**The algorithm.**  At a high level, we would like to skip both the
+Autograd and InplaceOrView kernels while in inference mode, whenever
+it is safe to do so.  Whether or not this is safe is maintained by
+the invariant:
+
+  **The invariant:** Any inference tensor (tensor whose dispatch key
+  set does not include Autograd nor InplaceOrView) is guaranteed
+  not to alias with any tensor which is saved for backwards (or
+  otherwise depends on accurate version counter tracking).
+
+This invariant guarantees that it safe skip version counter bumps on
+inference tensors.
+
+**Inference mode** is defined to be the state when:
+
+* Autograd is added to the TLS excluded set
+* InplaceOrView is removed from the TLS included set (recall that by
+  default, InplaceOrView is part of the TLS included set)
+* View metadata is not recorded (similar to NoGradMode)
+
+It is legal for only Autograd to be excluded (this happens during normal
+processing of Autograd kernels), but it is illegal for InplaceOrView to
+be removed from the TLS included set if Autograd is not also excluded.
+
+An **inference tensor** is a tensor that does not have the Autograd or
+InplaceOrView dispatch keys and has no version counter.  Whether or not
+the result of a functional/view operation is an inference tensor (e.g.,
+that omit these keys) is the result of the following rules:
+
+* If a functional operation, the output tensor is an inference
+  tensor if and only if we are running in inference mode.  In practice,
+  this is implemented by only adding the Autograd+InplaceOrView keys
+  in the TensorImpl constructor if inference mode is off.
+* If a view operation, the output tensor is an inference tensor
+  if and only if the input tensor is an inference tensor.  In practice,
+  this is implemented by propagating the dispatch key set from the
+  base tensor to the view tensor.
+
+These rules satisfy the invariant: functional operations are guaranteed
+to have non-aliasing outputs and are safe to mark as inference tensors;
+view operations introducing aliasing relationships, and it is only
+safe for inference tensors to alias other inference tensors.
+
+Finally, we must ensure that inference tensors are not saved for
+backwards.  **TODO FINISH ME**
+
+**Examples.**  Given the rules above, we can describe the behavior
+for each combination of possibilities:
+
+* In inference mode...
+  * Inplace operation...
+    * On a normal tensor - version counter will increment (due
+      to InplaceOrView key on the normal tensor)
+    * On an inference tensor - no increment
+  * View operation...
+    * On a normal tensor - view metadata is not recorded,
+      version counter is propagated, result is a normal tensor
+    * On an inference tensor - view metadata is not recorded,
+      result is an inference tensor
+  * Functional operation...
+    * On a normal tensor - produces an inference tensor
+    * On an inference tensor - produces an inference tensor
+* Outside of inference mode...
+  * Inplace operation...
+    * On an inference tensor - allowed, no increment
+  * View operation...
+    * On an inference tensor - allowed, view metadata 
+
+
+**TODO EVERYTHING ELSE**
+
      - Inplace operations on both normal/inference tensors are OK
         - Inplace operation on inference tensor is guaranteed not to VC bump
         - NB: if you do an inplace operation on a normal tensor, you WILL get a version counter bump

From 02f6eeb830777a7d74f191522be685d781285aab Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang" <ezyang@fb.com>
Date: Mon, 29 Mar 2021 11:51:46 -0400
Subject: [PATCH 10/10] rewrite

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
---
 RFC-0011-InferenceMode.md | 199 +++++++++++++++++++++-----------------
 1 file changed, 111 insertions(+), 88 deletions(-)

diff --git a/RFC-0011-InferenceMode.md b/RFC-0011-InferenceMode.md
index 9704c619..346cc1cf 100644
--- a/RFC-0011-InferenceMode.md
+++ b/RFC-0011-InferenceMode.md
@@ -35,7 +35,13 @@ Some goals and non-goals:
 * Goal: `InferenceMode` is semantically equivalent to `NoGradMode`,
   except some operations may not be supported.  (In other words, this is
   a partial equivalence: *if* inference mode does not throw an error,
-  then it behaves the same way as no grad mode).
+  then it behaves the same way as no grad mode).  Caveat: this
+  equivalence does not extend to methods that expose private
+  implementation details; esp., `Tensor._is_view` and `Tensor._base`.
+
+* Goal: It should be possible to run code that allocates parameters
+  (tensors with `requires_grad=True`) unchanged inside of an inference
+  mode block.
 
 * Goal: Don't be a global or compile time flag.  This makes
   `InferenceMode` widely applicable as it can still be used in processes
@@ -49,14 +55,15 @@ Some goals and non-goals:
   that `InferenceMode` is safe.
 
 * Non-goal: Make operations on inference tensors fast outside of
-  `InferenceMode`
+  `InferenceMode`; nor, be maximally expressive with inference
+  tensor outside of `InferenceMode`.
 
 * Non-goal: Avoid performance slowdown for view/inplace operations
   outside of `InferenceMode`.  Benchmarking on popular models reveal
   that a slight slowdown on these operations is acceptable; in our
   case, this slowdown will be due to an extra redispatch in these cases.
 
-# User description
+## User description
 
 `InferenceMode` is an RAII guard which can be enabled for a given block
 of code.  Inside inference mode, all newly allocated (non-view) tensors
@@ -64,9 +71,14 @@ are marked as **inference tensors**; these tensors are guaranteed not to
 alias with tensors that may have been saved for backwards (or are
 otherwise making use of version counters--perhaps more accurately,
 you could call these "non version counter tracked tensors").  Inference
-tensors do not have a version counter, and raise an error if you try
-to read their version (e.g., because you saved this tensor for
-backwards.)
+tensors:
+
+* Do not have a version counter.
+* Raise an error if you try to read their version (e.g., because you
+  saved this tensor for backwards.)
+* Raise an error if you try to mutate them into requiring gradients
+  (e.g., directly set `requires_grad=True` or mutate them with a tensor
+  that `requires_grad=True`.)
 
 A non-view tensor is an inference tensor if and only if it was
 allocated during inference mode.  A view tensor is an inference
@@ -78,14 +90,13 @@ guarantees:
 * All operations do not record `grad_fn`, even if their `requires_grad=True`
   (like `NoGradMode`).  This applies for both inference tensors and
   normal tensors (also like `NoGradMode`).
-* View operations do not do view tracking (like `NoGradMode` after
-  PyTorch 1.9).  This applies for both inference tensors and normal
-  tensors.
+* View operations on inference tensors do not do view tracking; views
+  and base inference tensors are indistinguishable.
 * Inplace operations on inference tensors are guaranteed not to do
   a version counter bump (which is equivalent to an atomic increment).
   Inplace operations on normal tensors still do version counter bumps.
 
-# Implementation description
+## Implementation description
 
 **Dispatcher.**  The dispatcher decides what implementation of a kernel
 to call when an operator is invoked.  The set of possible options is
@@ -150,22 +161,31 @@ default TLS included set.
 **The algorithm.**  At a high level, we would like to skip both the
 Autograd and InplaceOrView kernels while in inference mode, whenever
 it is safe to do so.  Whether or not this is safe is maintained by
-the invariant:
+a pair of invariants:
 
-  **The invariant:** Any inference tensor (tensor whose dispatch key
-  set does not include Autograd nor InplaceOrView) is guaranteed
+  **The no-aliasing invariant:** Inference tensors are guaranteed
   not to alias with any tensor which is saved for backwards (or
-  otherwise depends on accurate version counter tracking).
+  otherwise depends on accurate version counter tracking)
 
-This invariant guarantees that it safe skip version counter bumps on
-inference tensors.
+  **The immutable invariant:** Inference tensors are immutable outside of
+  inference mode.
+
+The no-aliasing invariant guarantees it is safe to skip version counter
+bumps when mutating inference tensors, as the set of tensors affected by
+mutation is precisely the set of aliases to that tensor.  The immutable
+invariant guarantees it is safe to skip view metadata, as view metadata
+is only used to enable inplace updates on tensors that require
+gradients.
 
 **Inference mode** is defined to be the state when:
 
 * Autograd is added to the TLS excluded set
 * InplaceOrView is removed from the TLS included set (recall that by
   default, InplaceOrView is part of the TLS included set)
-* View metadata is not recorded (similar to NoGradMode)
+* If view metadata is recorded (e.g., because a tensor has InplaceOrView
+  directly recorded on it), the creation metadata of the view is
+  set to forbid subsequent inplace modification with
+  `requires_grad=True` tensors (`CreationMeta::NO_GRAD_MODE`)
 
 It is legal for only Autograd to be excluded (this happens during normal
 processing of Autograd kernels), but it is illegal for InplaceOrView to
@@ -185,13 +205,23 @@ that omit these keys) is the result of the following rules:
   this is implemented by propagating the dispatch key set from the
   base tensor to the view tensor.
 
-These rules satisfy the invariant: functional operations are guaranteed
-to have non-aliasing outputs and are safe to mark as inference tensors;
-view operations introducing aliasing relationships, and it is only
-safe for inference tensors to alias other inference tensors.
+These rules guarantee half of the no-aliasing invariant: functional
+operations are guaranteed to have non-aliasing outputs and are safe to
+mark as inference tensors; view operations introducing aliasing
+relationships, and it is only safe for inference tensors to alias other
+inference tensors.
+
+Furthermore, the following operations on inference tensors are disabled:
 
-Finally, we must ensure that inference tensors are not saved for
-backwards.  **TODO FINISH ME**
+* Inplace modifications on inference tensors outside of inference mode
+  (tested at the point we do version counter increments; this code is
+  guaranteed to run outside of inference mode because InplaceOrView is
+  part of default included TLS).  This guarantees the immutability
+  invariant.  (TODO: Also need to prevent `requires_grad` from being
+  explicitly toggled)
+* Saving an inference tensor for backwards (tested in the constructor
+  of SavedVariable).  This guarantees the other half of the no-aliasing
+  invariant.
 
 **Examples.**  Given the rules above, we can describe the behavior
 for each combination of possibilities:
@@ -202,8 +232,9 @@ for each combination of possibilities:
       to InplaceOrView key on the normal tensor)
     * On an inference tensor - no increment
   * View operation...
-    * On a normal tensor - view metadata is not recorded,
-      version counter is propagated, result is a normal tensor
+    * On a normal tensor - view metadata is recorded, creation
+      meta is set to `INFERENCE_MODE`, version counter is propagated,
+      result is a normal tensor
     * On an inference tensor - view metadata is not recorded,
       result is an inference tensor
   * Functional operation...
@@ -211,76 +242,68 @@ for each combination of possibilities:
     * On an inference tensor - produces an inference tensor
 * Outside of inference mode...
   * Inplace operation...
-    * On an inference tensor - allowed, no increment
+    * On an inference tensor - forbidden
   * View operation...
-    * On an inference tensor - allowed, view metadata 
-
-
-**TODO EVERYTHING ELSE**
+    * On an inference tensor - allowed, view metadata is not
+      recorded, result is an inference tensor
 
-     - Inplace operations on both normal/inference tensors are OK
-        - Inplace operation on inference tensor is guaranteed not to VC bump
-        - NB: if you do an inplace operation on a normal tensor, you WILL get a version counter bump
-     - View operations on both normal/inference tensors are OK
-        -  View operation on inference tensor is guaranteed not to allocate view metadata
-        -  View operation on normal tensor produces a normal tensor(NO_GRAD_FN), behavior is the same as creating a view inside NoGrad mode. 
+**Edge case: explicit `requires_grad` setting.**  One might expect that in
+no grad mode that it is impossible to allocate a tensor with
+`requires_grad=True`.  However, this is not true: any tensor that
+is explicitly allocated with `requires_grad=True` preserves this
+property outside of no grad mode:
 
-* **Inference tensor** are tensors that are constructed if and only if inference mode is enabled, with the exception of views on normal tensors. Non-inference tensors are called **normal tensors**. 
-    * Q: Why not views on normal tensors? A: Because we guarantee performance on inference tensors, but views on normal tensors require additional safety checks (e.g. normal tensor ----(view)---> ----(inplace)----> this should properly bump version on base which requires view produce a normal tensor).
-    * NB: Inference tensors and bad normal tensors are leaf tensors.
-    * Outside of inference mode, the following operations on inference tensors is forbidden:
-        * Inplace/view operations (functional operations produce normal tensors), if at least one input tensor is inference mode.
-            * Why? In principle, these are safe if they produce inference tensors, but we are trying to maintain the invariant that inference tensors are ONLY created in inference mode.
-            * Impl: Functional on normal tensors is allowed because we cannot conveniently ban it (VariableType/InplaceOrView kernel are all skipped)
-        * Mixing inference and normal tensors, even for functional operations, is forbidden.
-            * Why? For simplicity of implementation. In particular, if you save the inference tensor in backwards, you’re likely to hit an error in a weird place (better to error early). By forbidding mixed operations, it is impossible for this situation to occur.
-    * Impl: inference tensors are guaranteed to have is_leaf=True. 
+```
+>>> with torch.no_grad():
+...   x = torch.empty(2, requires_grad=True)
+...
+>>> x
+tensor([-1.3667e-17,  4.5801e-41], requires_grad=True)
+```
 
+This can also be achieved by explicitly setting
+`x.requires_grad = True`.  Furthermore, in no grad mode, this requires
+grad setting propagates to views
 
- - **Normal tensor** has both Autograd & InplaceOrView keys. This includes both `requires_grad=true` and `requires_grad=false` tensors. (see [Ideal end state] section for more details).
- - Additional notes:
-   - All Inference tensors are created in inference mode, but not all of the tensors created in inference mode are inference tensors. For example, a view of normal tensor created in inference mode is still a normal tensor (but with special `creation_meta=NO_GRAD_FN`!).
-   - (Autograd & !InplaceOrView) and (!Autogad & InplaceOrView) are invalid states, we don't have such tensors.
+```
+>>> with torch.no_grad():
+...   x = torch.empty(2)
+...   y = x.view(2)
+...   x.requires_grad = True
+...
+>>> y.requires_grad
+True
+```
 
+This poses a problem for inference mode, which doesn't track view
+metadata and cannot implement this propagation.  Our proposed solution
+is to forbid setting `requires_grad` (but permit tensors to be directly
+constructed with `requires_grad=True`).  This cannot be easily
+implemented today as internally `requires_grad=True` factory is
+implemented by first constructing a tensor, and then setting its
+`requires_grad=True`.
 
+## Future work: skipping Autograd kernels when `requires_grad=False`
 
-## Alternative implementations we've considered and why they don't work:
-1. For NormalMode + All inference tensors + functional op, an alternative behavior we prefer but didn't implement is throwing an error by forcing this op go through VariableType kernel and hit the assert_no_inference_tensor check. But to do that we'll have to add c10::autograd_dispatch_keyset to the globally enabled set, but doing that might accidentally call autograd kernel from a backend that doesn't match tensor input. Thus we allow functional ops run without throwing an error.
-2. 
-```
-    // 1. When InferenceMode is enabled, Autograd dispatch keys are excluded
-    //    but not InplaceOrView key.
-    //
-    //    For example:
-    //    torch::Tensor a = torch::ones({1, 2, 3}).set_requires_grad(true);
-    //    torch::Tensor k = a + 2;
-    //    {
-    //      c10::InferenceMode guard(true);
-    //      k.add_(2);
-    //    }
-    //    `k.add_(2)` still need to go through InplaceOrView kernel so that it's
-    //    prepared for future autograd.
-    //  2. When InferenceMode is disabled, InplaceOrView must be added
-    //     to included set.
-    //
-    //     For example:
-    //     torch::Tensor a;
-    //     {
-    //       c10::InferenceMode guard(true);
-    //       torch::Tensor in = torch::ones({2, 2});
-    //       a = in.view({1, 4});
-    //     }
-    //     torch::Tensor c = a.view({4, 1}); // (*)
-    //     If we don't add InplaceOrView to included set, (*) will skip its as_view
-    //     setup entirely, `c` will be a Tensor that is not from Inference mode
-    //     but has potentially wrong view metadata which should be forbidden..
-    //     By going through InplaceOrView kernel, we can throw an error since it
-    //     broke our invariant: "Autograd keys must be in excluded set before
-    //     reaching InplaceOrView kernel".
-```
+As view and inplace handling has been moved out of Autograd kernels, a
+tantalizing possibility is to remove the Autograd dispatch keys from
+tensors with `requires_grad=False`, thus skipping this kernel entirely.
 
-# Ideal end state
-Ideal end state is that we can link skip VariableType kernel when requires_grad=False which means we don't always go through VariableType kernel in normal mode.
 But this work is currently blocked for the following reason:
-- If requires_grad=False skips VariableType kernel, functional ops won't be able to go through `AutoDispatchBelowInplaceOrView` guard which suppresses both autograd and InplaceOrView keys in TLS excluded. Not suppressing InplaceOrView key means unnecessary calls to `as_view/increment_version` if any view/inplace ops are used in the kernel implementation which adds a lot of overhead. To avoid overhead, instead of fallthrough kerenl being backend fallback, we'll want to use a real kernel that suppresses InplaceOrView key. But compared to the current implementation which only adds an extra dispatch for view/inplace ops, it forces all functional ops to have an extra dispatch as well. That's why it's blocked.
-- To unblock it requires some fixes like identifying at:: callsites in backend-specific kernels (static analysis? ) , replacing these with at::native:: should unblock us from linking requires_grad with VariableType kernel.
+
+- If `requires_grad=False` skips Autograd kernel, functional ops won't
+  be able to go through `AutoDispatchBelowInplaceOrView` guard which
+  suppresses both autograd and InplaceOrView keys in TLS excluded. Not
+  suppressing InplaceOrView key means unnecessary calls to
+  `as_view/increment_version` if any view/inplace ops are used in the
+  kernel implementation which adds a lot of overhead. To avoid overhead,
+  instead of fallthrough kerenl being backend fallback, we'll want to
+  use a real kernel that suppresses InplaceOrView key. But compared to
+  the current implementation which only adds an extra dispatch for
+  view/inplace ops, it forces all functional ops to have an extra
+  dispatch as well. That's why it's blocked.
+- To unblock it requires some fixes like identifying `at::` callsites in
+  backend-specific kernels (static analysis? ) , replacing these with
+  `at::native::` should unblock us from linking `requires_grad` with
+  VariableType kernel.  Alternately, do
+  https://github.com/pytorch/pytorch/issues/54614

<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>
<html xmlns='http://www.w3.org/1999/xhtml'>
<head>
<title>pFad - Phonifier reborn</title>
<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />
</head>
<body>
<h1>Pfad - The Proxy pFad of &#169; 2024 Garber Painting. All rights reserved.</h1>


<!-- Disclaimer -->
<p>Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.</p>
<br>
<p>Alternative Proxies:</p><p><a href="http://rainy.clevelandohioweatherforecast.com/php-proxy/index.php?q=https://patch-diff.githubusercontent.com/raw/pytorch/rfcs/pull/17.patch" target="_blank">Alternative Proxy</a></p><p><a href="http://rainy.clevelandohioweatherforecast.com/pFad/index.php?u=https://patch-diff.githubusercontent.com/raw/pytorch/rfcs/pull/17.patch" target="_blank">pFad Proxy</a></p><p><a href="http://rainy.clevelandohioweatherforecast.com/pFad/v3index.php?u=https://patch-diff.githubusercontent.com/raw/pytorch/rfcs/pull/17.patch" target="_blank">pFad v3 Proxy</a></p><p><a href="http://rainy.clevelandohioweatherforecast.com/pFad/v4index.php?u=https://patch-diff.githubusercontent.com/raw/pytorch/rfcs/pull/17.patch" target="_blank">pFad v4 Proxy</a></p></body>
</html>