-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Add a functools.cache variant for methods to avoid keeping instances alive #102618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would like to work on it, if no one else has started yet :) |
See: https://stackoverflow.com/a/68052994/424499 If something like this does get added, it should come with clear guidance on when it would be beneficial. There is a narrow set of circumstances where this would be the best solution as compared to
The Implementation is the easy part. The OP hasn't even decided whether he wants |
I would basically never counsel someone to use the existing
Assuming I am the OP, I'm not sure why you're saying that. I feel like the desired behavior is clear from the original post. I'm not sure exactly what you mean by those terms in this context, though. If "skipped" means a global-level cache shared by all instances of the class, keyed only on the rest of the arguments but excluding FWIW I think an implementation that most closely resembles the following class Baz:
def meth(self, key):
return key + key
def wrapped_meth(self, key):
if not hasattr(self, '_cache'):
setattr(self, '_cache', {})
if key not in self._cache:
self._cache[key] = self.meth(key)
return self._cache[key] @sobolevn An implementation (or multiple, if different approaches are considered) would be most welcome for discussion purposes IMO. Some of the things that need to be thought out are what to call the decorator, whether there should be parallel variations like I think it might also be worth discussing whether the existing
|
I'm going to close this for the time being. Shortly (not right now), I will kick off a new discussion with an outline of the perceived problem, various use cases, and what things can or should be done about it. I've been thinking about this for months and have been putting together an analysis. Note the python-help post was not useful. The example was for something that didn't use Note, there is no rush here — we can take the time to deliberate. The lru_cache has been about for 13 years. No new problem has suddenly arisen.
The are plenty of valid uses in the wild. AFAICT, the only downside is needing to wait for an entry to age of the cache rather than having it immediately cleared where the instance is no longer needed. That mostly only matters when the instances are very large or that have a Note that adding weakrefs also has its problems. There is a speed penalty. Noot all instances are weak referenceable, and many users don't fully understand the dance they would have to do to add weak reference support. Also, if there are two instance that are equal but not identical, you lose the benefit of caching and force a potentially expensive recomputation.
No, that is @stevendaprano, the one who opened the issue and said, "we should consider adding a cachemethod decorator that either avoids caching self, or uses a weakref for it".
IIRC there is already a lint tool that does this. That said, the tool has no ability to differentiate a legitimate use case from a problematic case. |
Thank you Raymond for looking into this. I'm not a heavy user of the
lru_cache and I completely forgot it existed!
I see that you have closed this issue, but you also mention taking time
to deliberate. I take it you consider that there is something to
deliberate then.
|
Yes, absolutely. At a minimum, there needs to be a recommended best practice for common cases. |
Right now, we have the FAQ entry: How do I cache method calls? Already in progress was an update to show how to stack @staticmethod
@cache
def sm(x, y): ...
@classmethod
@cache
def cm(cls, x, y): ... We could add entries to show how to do per instance caching and weakref caching. If we do, then we should list out the advantages and disadvantages for the two different ways of freeing objects immediately rather than waiting for them to age out of a cache or for the cache to be cleared: If anyone is interested, here's the code that I've been experimented with to evaluate the various approaches: from dataclasses import dataclass
from weakref import WeakKeyDictionary, ref
from functools import wraps, cache, lru_cache
def PIPM_MVC(func):
"""Per instance, per method, multiple variable cache
Advantages:
* Lifetime of cache exactly matches the lifetime of an instance.
* Does not keep an instance alive.
* Works on non-hashable instances
Disadvantages:
* High overhead for space and time
* No ability to manage total memory consumption
* Hard to find and clear all the caches
* No hit/miss instrumentation
* Loses caching of EBNI instances.
* Does not and cache detect changes to instance attributes
When to use:
* You want to cache ALL living instances, not just recently used
* The underlying method is slower than cost of the cache wrapper
* When instances are large relative to the size of the cache entries.
* When you don't need or want cache hits for with equal but distinct instances.
"""
cachename = f'_cached_{func.__qualname__}_'
@wraps(func)
def wrapper(self, *args):
try:
cachedict = getattr(self, cachename)
except AttributeError:
cachedict = {}
setattr(self, cachename, cachedict)
try:
return cachedict[args]
except KeyError:
pass
ans = func(self, *args)
cachedict[args] = ans
return ans
return wrapper
def weak_lru(maxsize=128, typed=False):
'LRU Cache decorator that keeps a weak reference to "self"'
def wrapper(func):
@lru_cache(maxsize, typed)
def _func(_self, *args, **kwargs):
return func(_self(), *args, **kwargs)
@wraps(func)
def inner(self, *args, **kwargs):
return _func(ref(self), *args, **kwargs)
return inner
return wrapper
@dataclass(unsafe_hash=True)
class A:
x: int
def __del__(self):
print(f'Died: A({self.x!r}) {hex(id(self))=!r}')
@PIPM_MVC
def m1(self, y):
ans = self.x + y
print(f'm1({y!r}) adds {self.x!r} giving {ans!r}')
return ans
@lru_cache
def m2(self, y):
ans = self.x * y
print(f'm2({y!r}) multiplies {self.x!r} giving {ans!r}')
return ans
@weak_lru()
def m3(self, y):
ans = self.x ** y
print(f'm2({y!r}) raises to {self.x!r} giving {ans!r}')
return ans |
From a core workflow perspective, per discussion of this issue with the other core devs on the Discord there was a unanimous consensus that the issue should be left open if the relevant core dev has acknowledged it as valid enough to be worth further exploration and discussion (as indeed has been the case here). Therefore, re-opening accordingly. (To be clear, this is not expressing any particular opinion with regard to the merits of this proposal's content, just ensuring the status is set appropriately per the statements of those who are.) Also just a quick tip—if the issue is in fact invalid, wontfix, duplicate, etc., it can (and should) be closed as "not planned" rather than closed as "completed". You can click the drop down arrow next to the close button to change the close reason. |
Thanks @CAM-Gerlach , I appreciate the followup. |
Are there any updates on this? I think even if there are no updates, the documentation could/should be more explicit in stating implications for use with instance methods. |
To me the fact that Something along the lines of...
|
I encountered this behavior too, which results in a memory leak. |
encountered it too, I wrote a module for it until python supports it officially, PR's are more then welcome |
There are existing implementations in kids.cache and cachetools (more manual) packages |
@nmoreaud kids.cache is not mantained and I don't see such a feature. And cachetools as mentioned is more manual. @dsal3389 Ideally something could be made to work just like class cached_method:
def __init__(self, method):
self._method = method
update_wrapper(self, method)
def __get__(self, instance, objtype=None):
if instance is None:
return self
cache = {}
@wraps(self._method)
def lookup(*args, **kwargs):
key = (args, tuple(kwargs.items()))
try:
return cache[key]
except KeyError:
res = self._method(instance, *args, **kwargs)
cache[key] = res
return res
setattr(instance, self._method.__name__, lookup)
return lookup
def __call__(self, instance, *args, **kwargs):
func = getattr(instance, self._method.__name__)
return func(*args, **kwargs) |
Has anyone filed a report against linting tools to warn when |
@ktbarrett, both flake8-bugbear and ruff already have this rule. The rule is B019 in both tools. |
For those like me who landed on this issue, here's a Notably, it:
import functools as ft
import inspect
import weakref
from collections.abc import Callable
from typing import TYPE_CHECKING, Concatenate, ParamSpec, TypeVar
_Self = TypeVar("_Self")
_Params = ParamSpec("_Params")
_Return = TypeVar("_Return")
# Like `weakref.WeakKeyDictionary`, but uses identity-based hashing and equality.
class _WeakIdDict:
__slots__ = ("_dict",)
def __init__(self):
self._dict = {}
def __getitem__(self, key):
item, _ = self._dict[id(key)]
return item
def __setitem__(self, key, value):
id_key = id(key)
ref = weakref.ref(key, lambda _: self._dict.pop(id_key))
self._dict[id_key] = (value, ref)
# Used to compute an object's hash just once.
class _CacheKey:
__slots__ = ("hashvalue", "value")
def __init__(self, value):
self.hashvalue = hash(value)
self.value = value
def __hash__(self) -> int:
return self.hashvalue
def __eq__(self, other) -> bool:
# Assume `type(other) is _Key`
return self.value == other.value
def cachemethod(fn: Callable[Concatenate[_Self, _Params], _Return]) -> Callable[Concatenate[_Self, _Params], _Return]:
"""Like `functools.cache`, except that it only holds a weak reference to its first argument.
Note that `functools.cached_property` (which also uses a weak reference) can often be used for similar purposes.
The differences are that (a) `cached_property` will be pickled whilst `cachemethod` will not, and (b) `cachemethod`
can be used on functions with additional arguments, and (c) `cachemethod` requires brackets to call, helping to
visually emphasise that computationl work may be being performed.
"""
cache1 = _WeakIdDict()
sig = inspect.signature(fn)
parameters = list(sig.parameters.values())
if len(parameters) == 0:
raise ValueError("Cannot use `cachemethod` on a function without a `self` argument.")
if parameters[0].kind not in {inspect.Parameter.POSITIONAL_ONLY, inspect.Parameter.POSITIONAL_OR_KEYWORD}:
raise ValueError("Cannot use `cachemethod` on a function without a positional argument.")
parameters = parameters[1:]
sig = sig.replace(parameters=parameters)
@ft.wraps(fn)
def fn_wrapped(self: _Self, *args: _Params.args, **kwargs: _Params.kwargs) -> _Return:
# Standardise arguments to a single form to encourage cache hits.
# Not binding `self` (and instead doing the signature-adjustment above) in order to avoid keeping a strong
# reference to `self` via `argkey`.
bound = sig.bind(*args, **kwargs)
del args, kwargs
argkey = _CacheKey((bound.args, tuple(bound.kwargs.items())))
try:
out = cache1[self][argkey]
except KeyError:
try:
cache2 = cache1[self]
except KeyError:
cache2 = cache1[self] = {}
out = cache2[argkey] = fn(self, *bound.args, **bound.kwargs)
return out
return fn_wrapped |
you might not keep references to the instance so it can be garbage collected, but you use the builtin https://docs.python.org/3/library/functions.html#id
this will cause false cache hits if 2 different objects are allocated on the same memory location this is also the reason the default I wrote a quick an dirty POC for this class Foo:
def __init__(self, name: str) -> None:
self.name = name
seen = set()
i = 0
while True:
x = Foo("bob" + str(i))
i += 1
if id(x) in seen:
print("duplicate", id(x))
break
seen.add(id(x))
del x
print(".") the only solution is to bind the cache to the running instance (as a private property or something) and make the cache not hold references to |
Is it possible for an object to GC'd, then a new one to be reallocated, before From the |
totally missed that lambda callback while glancing over the code this looks like a good approach, I wonder why the stdlib doesn't really use it like u suggested, maybe a good question / suggestion for the python discourse? |
@dsal3389 I think caching the self object, weakref or not, is unnecessary. It's doing more than it needs to. |
@ktbarrett but if you don't cache the instance, how can you distinguish between cache hits that are coming from different instances? most of the time you don't want to mix them here is a very simple example (theory) how it can hit you (assume from functools import cache
class Foo:
def __init__(self, n: int) -> None:
self._n = n
@cache
def foo(self) -> str:
if self._n % 5 == 0:
return "yes"
else:
return "no"
x = Foo(5)
y = Foo(7)
print(x.foo()) # will print "yes"
print(y.foo()) # will print also "yes", expected to be "no" |
You place the cache on the instance rather than keeping it in the class. This is how |
This has the disadvantage of requiring that the instance be mutable in this way, which is the kind of sneakiness that can lead to unexpected behavior! For example, |
Neither issues bothered the core devs when |
functools.cache
, when applied to a method, caches theself
argument. This can also keep the instance alive long after it is no longer needed. See for example this discussion.Following the same API as
singledispatch
andsingledispatchmethod
, we should consider adding acachemethod
decorator that either avoids cachingself
, or uses a weakref for it.The text was updated successfully, but these errors were encountered: