Skip to content

Commit a77ca97

Browse files
authored
Merge pull request #12174 from shoyer/nep-16-abstract-array
NEP 16 abstract arrays: rebased and marked as "Withdrawn"
2 parents ced77c1 + 869e68d commit a77ca97

File tree

3 files changed

+386
-0
lines changed

3 files changed

+386
-0
lines changed

doc/neps/index.rst.tmpl

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,3 +81,13 @@ Rejected NEPs
8181
{% for nep, tags in neps.items() if tags['Status'] == 'Rejected' %}
8282
{{ tags['Title'] }} <{{ tags['Filename'] }}>
8383
{% endfor %}
84+
85+
Withdrawn NEPs
86+
--------------
87+
88+
.. toctree::
89+
:maxdepth: 1
90+
91+
{% for nep, tags in neps.items() if tags['Status'] == 'Withdrawn' %}
92+
{{ tags['Title'] }} <{{ tags['Filename'] }}>
93+
{% endfor %}

doc/neps/nep-0016-abstract-array.rst

Lines changed: 328 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,328 @@
1+
=============================================================
2+
NEP 16 — An abstract base class for identifying "duck arrays"
3+
=============================================================
4+
5+
:Author: Nathaniel J. Smith <njs@pobox.com>
6+
:Status: Withdrawn
7+
:Type: Standards Track
8+
:Created: 2018-03-06
9+
:Resolution: https://github.com/numpy/numpy/pull/12174
10+
11+
.. note::
12+
13+
This NEP has been withdrawn in favor of the protocol based approach
14+
described in
15+
`NEP 22 <http://www.numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html>`__
16+
17+
Abstract
18+
--------
19+
20+
We propose to add an abstract base class ``AbstractArray`` so that
21+
third-party classes can declare their ability to "quack like" an
22+
``ndarray``, and an ``asabstractarray`` function that performs
23+
similarly to ``asarray`` except that it passes through
24+
``AbstractArray`` instances unchanged.
25+
26+
27+
Detailed description
28+
--------------------
29+
30+
Many functions, in NumPy and in third-party packages, start with some
31+
code like::
32+
33+
def myfunc(a, b):
34+
a = np.asarray(a)
35+
b = np.asarray(b)
36+
...
37+
38+
This ensures that ``a`` and ``b`` are ``np.ndarray`` objects, so
39+
``myfunc`` can carry on assuming that they'll act like ndarrays both
40+
semantically (at the Python level), and also in terms of how they're
41+
stored in memory (at the C level). But many of these functions only
42+
work with arrays at the Python level, which means that they don't
43+
actually need ``ndarray`` objects *per se*: they could work just as
44+
well with any Python object that "quacks like" an ndarray, such as
45+
sparse arrays, dask's lazy arrays, or xarray's labeled arrays.
46+
47+
However, currently, there's no way for these libraries to express that
48+
their objects can quack like an ndarray, and there's no way for
49+
functions like ``myfunc`` to express that they'd be happy with
50+
anything that quacks like an ndarray. The purpose of this NEP is to
51+
provide those two features.
52+
53+
Sometimes people suggest using ``np.asanyarray`` for this purpose, but
54+
unfortunately its semantics are exactly backwards: it guarantees that
55+
the object it returns uses the same memory layout as an ``ndarray``,
56+
but tells you nothing at all about its semantics, which makes it
57+
essentially impossible to use safely in practice. Indeed, the two
58+
``ndarray`` subclasses distributed with NumPy – ``np.matrix`` and
59+
``np.ma.masked_array`` – do have incompatible semantics, and if they
60+
were passed to a function like ``myfunc`` that doesn't check for them
61+
as a special-case, then it may silently return incorrect results.
62+
63+
64+
Declaring that an object can quack like an array
65+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
66+
67+
There are two basic approaches we could use for checking whether an
68+
object quacks like an array. We could check for a special attribute on
69+
the class::
70+
71+
def quacks_like_array(obj):
72+
return bool(getattr(type(obj), "__quacks_like_array__", False))
73+
74+
Or, we could define an `abstract base class (ABC)
75+
<https://docs.python.org/3/library/collections.abc.html>`__::
76+
77+
def quacks_like_array(obj):
78+
return isinstance(obj, AbstractArray)
79+
80+
If you look at how ABCs work, this is essentially equivalent to
81+
keeping a global set of types that have been declared to implement the
82+
``AbstractArray`` interface, and then checking it for membership.
83+
84+
Between these, the ABC approach seems to have a number of advantages:
85+
86+
* It's Python's standard, "one obvious way" of doing this.
87+
88+
* ABCs can be introspected (e.g. ``help(np.AbstractArray)`` does
89+
something useful).
90+
91+
* ABCs can provide useful mixin methods.
92+
93+
* ABCs integrate with other features like mypy type-checking,
94+
``functools.singledispatch``, etc.
95+
96+
One obvious thing to check is whether this choice affects speed. Using
97+
the attached benchmark script on a CPython 3.7 prerelease (revision
98+
c4d77a661138d, self-compiled, no PGO), on a Thinkpad T450s running
99+
Linux, we find::
100+
101+
np.asarray(ndarray_obj) 330 ns
102+
np.asarray([]) 1400 ns
103+
104+
Attribute check, success 80 ns
105+
Attribute check, failure 80 ns
106+
107+
ABC, success via subclass 340 ns
108+
ABC, success via register() 700 ns
109+
ABC, failure 370 ns
110+
111+
Notes:
112+
113+
* The first two lines are included to put the other lines in context.
114+
115+
* This used 3.7 because both ``getattr`` and ABCs are receiving
116+
substantial optimizations in this release, and it's more
117+
representative of the long-term future of Python. (Failed
118+
``getattr`` doesn't necessarily construct an exception object
119+
anymore, and ABCs were reimplemented in C.)
120+
121+
* The "success" lines refer to cases where ``quacks_like_array`` would
122+
return True. The "failure" lines are cases where it would return
123+
False.
124+
125+
* The first measurement for ABCs is subclasses defined like::
126+
127+
class MyArray(AbstractArray):
128+
...
129+
130+
The second is for subclasses defined like::
131+
132+
class MyArray:
133+
...
134+
135+
AbstractArray.register(MyArray)
136+
137+
I don't know why there's such a large difference between these.
138+
139+
In practice, either way we'd only do the full test after first
140+
checking for well-known types like ``ndarray``, ``list``, etc. `This
141+
is how NumPy currently checks for other double-underscore attributes
142+
<https://github.com/numpy/numpy/blob/master/numpy/core/src/private/get_attr_string.h>`__
143+
and the same idea applies here to either approach. So these numbers
144+
won't affect the common case, just the case where we actually have an
145+
``AbstractArray``, or else another third-party object that will end up
146+
going through ``__array__`` or ``__array_interface__`` or end up as an
147+
object array.
148+
149+
So in summary, using an ABC will be slightly slower than using an
150+
attribute, but this doesn't affect the most common paths, and the
151+
magnitude of slowdown is fairly small (~250 ns on an operation that
152+
already takes longer than that). Furthermore, we can potentially
153+
optimize this further (e.g. by keeping a tiny LRU cache of types that
154+
are known to be AbstractArray subclasses, on the assumption that most
155+
code will only use one or two of these types at a time), and it's very
156+
unclear that this even matters – if the speed of ``asarray`` no-op
157+
pass-throughs were a bottleneck that showed up in profiles, then
158+
probably we would have made them faster already! (It would be trivial
159+
to fast-path this, but we don't.)
160+
161+
Given the semantic and usability advantages of ABCs, this seems like
162+
an acceptable trade-off.
163+
164+
..
165+
CPython 3.6 (from Debian)::
166+
167+
Attribute check, success 110 ns
168+
Attribute check, failure 370 ns
169+
170+
ABC, success via subclass 690 ns
171+
ABC, success via register() 690 ns
172+
ABC, failure 1220 ns
173+
174+
175+
Specification of ``asabstractarray``
176+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
177+
178+
Given ``AbstractArray``, the definition of ``asabstractarray`` is simple::
179+
180+
def asabstractarray(a, dtype=None):
181+
if isinstance(a, AbstractArray):
182+
if dtype is not None and dtype != a.dtype:
183+
return a.astype(dtype)
184+
return a
185+
return asarray(a, dtype=dtype)
186+
187+
Things to note:
188+
189+
* ``asarray`` also accepts an ``order=`` argument, but we don't
190+
include that here because it's about details of memory
191+
representation, and the whole point of this function is that you use
192+
it to declare that you don't care about details of memory
193+
representation.
194+
195+
* Using the ``astype`` method allows the ``a`` object to decide how to
196+
implement casting for its particular type.
197+
198+
* For strict compatibility with ``asarray``, we skip calling
199+
``astype`` when the dtype is already correct. Compare::
200+
201+
>>> a = np.arange(10)
202+
203+
# astype() always returns a view:
204+
>>> a.astype(a.dtype) is a
205+
False
206+
207+
# asarray() returns the original object if possible:
208+
>>> np.asarray(a, dtype=a.dtype) is a
209+
True
210+
211+
212+
What exactly are you promising if you inherit from ``AbstractArray``?
213+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
214+
215+
This will presumably be refined over time. The ideal of course is that
216+
your class should be indistinguishable from a real ``ndarray``, but
217+
nothing enforces that except the expectations of users. In practice,
218+
declaring that your class implements the ``AbstractArray`` interface
219+
simply means that it will start passing through ``asabstractarray``,
220+
and so by subclassing it you're saying that if some code works for
221+
``ndarray``\s but breaks for your class, then you're willing to accept
222+
bug reports on that.
223+
224+
To start with, we should declare ``__array_ufunc__`` to be an abstract
225+
method, and add the ``NDArrayOperatorsMixin`` methods as mixin
226+
methods.
227+
228+
Declaring ``astype`` as an ``@abstractmethod`` probably makes sense as
229+
well, since it's used by ``asabstractarray``. We might also want to go
230+
ahead and add some basic attributes like ``ndim``, ``shape``,
231+
``dtype``.
232+
233+
Adding new abstract methods will be a bit tricky, because ABCs enforce
234+
these at subclass time; therefore, simply adding a new
235+
`@abstractmethod` will be a backwards compatibility break. If this
236+
becomes a problem then we can use some hacks to implement an
237+
`@upcoming_abstractmethod` decorator that only issues a warning if the
238+
method is missing, and treat it like a regular deprecation cycle. (In
239+
this case, the thing we'd be deprecating is "support for abstract
240+
arrays that are missing feature X".)
241+
242+
243+
Naming
244+
~~~~~~
245+
246+
The name of the ABC doesn't matter too much, because it will only be
247+
referenced rarely and in relatively specialized situations. The name
248+
of the function matters a lot, because most existing instances of
249+
``asarray`` should be replaced by this, and in the future it's what
250+
everyone should be reaching for by default unless they have a specific
251+
reason to use ``asarray`` instead. This suggests that its name really
252+
should be *shorter* and *more memorable* than ``asarray``... which
253+
is difficult. I've used ``asabstractarray`` in this draft, but I'm not
254+
really happy with it, because it's too long and people are unlikely to
255+
start using it by habit without endless exhortations.
256+
257+
One option would be to actually change ``asarray``\'s semantics so
258+
that *it* passes through ``AbstractArray`` objects unchanged. But I'm
259+
worried that there may be a lot of code out there that calls
260+
``asarray`` and then passes the result into some C function that
261+
doesn't do any further type checking (because it knows that its caller
262+
has already used ``asarray``). If we allow ``asarray`` to return
263+
``AbstractArray`` objects, and then someone calls one of these C
264+
wrappers and passes it an ``AbstractArray`` object like a sparse
265+
array, then they'll get a segfault. Right now, in the same situation,
266+
``asarray`` will instead invoke the object's ``__array__`` method, or
267+
use the buffer interface to make a view, or pass through an array with
268+
object dtype, or raise an error, or similar. Probably none of these
269+
outcomes are actually desireable in most cases, so maybe making it a
270+
segfault instead would be OK? But it's dangerous given that we don't
271+
know how common such code is. OTOH, if we were starting from scratch
272+
then this would probably be the ideal solution.
273+
274+
We can't use ``asanyarray`` or ``array``, since those are already
275+
taken.
276+
277+
Any other ideas? ``np.cast``, ``np.coerce``?
278+
279+
280+
Implementation
281+
--------------
282+
283+
1. Rename ``NDArrayOperatorsMixin`` to ``AbstractArray`` (leaving
284+
behind an alias for backwards compatibility) and make it an ABC.
285+
286+
2. Add ``asabstractarray`` (or whatever we end up calling it), and
287+
probably a C API equivalent.
288+
289+
3. Begin migrating NumPy internal functions to using
290+
``asabstractarray`` where appropriate.
291+
292+
293+
Backward compatibility
294+
----------------------
295+
296+
This is purely a new feature, so there are no compatibility issues.
297+
(Unless we decide to change the semantics of ``asarray`` itself.)
298+
299+
300+
Rejected alternatives
301+
---------------------
302+
303+
One suggestion that has come up is to define multiple abstract classes
304+
for different subsets of the array interface. Nothing in this proposal
305+
stops either NumPy or third-parties from doing this in the future, but
306+
it's very difficult to guess ahead of time which subsets would be
307+
useful. Also, "the full ndarray interface" is something that existing
308+
libraries are written to expect (because they work with actual
309+
ndarrays) and test (because they test with actual ndarrays), so it's
310+
by far the easiest place to start.
311+
312+
313+
Links to discussion
314+
-------------------
315+
316+
* https://mail.python.org/pipermail/numpy-discussion/2018-March/077767.html
317+
318+
319+
Appendix: Benchmark script
320+
--------------------------
321+
322+
.. literalinclude:: nep-0016-benchmark.py
323+
324+
325+
Copyright
326+
---------
327+
328+
This document has been placed in the public domain.

doc/neps/nep-0016-benchmark.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
import perf
2+
import abc
3+
import numpy as np
4+
5+
class NotArray:
6+
pass
7+
8+
class AttrArray:
9+
__array_implementer__ = True
10+
11+
class ArrayBase(abc.ABC):
12+
pass
13+
14+
class ABCArray1(ArrayBase):
15+
pass
16+
17+
class ABCArray2:
18+
pass
19+
20+
ArrayBase.register(ABCArray2)
21+
22+
not_array = NotArray()
23+
attr_array = AttrArray()
24+
abc_array_1 = ABCArray1()
25+
abc_array_2 = ABCArray2()
26+
27+
# Make sure ABC cache is primed
28+
isinstance(not_array, ArrayBase)
29+
isinstance(abc_array_1, ArrayBase)
30+
isinstance(abc_array_2, ArrayBase)
31+
32+
runner = perf.Runner()
33+
def t(name, statement):
34+
runner.timeit(name, statement, globals=globals())
35+
36+
t("np.asarray([])", "np.asarray([])")
37+
arrobj = np.array([])
38+
t("np.asarray(arrobj)", "np.asarray(arrobj)")
39+
40+
t("attr, False",
41+
"getattr(not_array, '__array_implementer__', False)")
42+
t("attr, True",
43+
"getattr(attr_array, '__array_implementer__', False)")
44+
45+
t("ABC, False", "isinstance(not_array, ArrayBase)")
46+
t("ABC, True, via inheritance", "isinstance(abc_array_1, ArrayBase)")
47+
t("ABC, True, via register", "isinstance(abc_array_2, ArrayBase)")
48+

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy