-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Provide an 'out' parameter for numpy.fft.fft #25399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide an 'out' parameter for numpy.fft.fft #25399
Conversation
Note: this is just to test the waters. In case of positive feedback, I'll provide the same parameter for other |
0b6d5be
to
73a1d1c
Compare
cc @stefanv , but really that could be anyone :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be very useful to have an out
argument! However, one needs to take care that it has the right dtype and shape - see in-line comments.
p.s. Looking at the actual code, I'm somewhat surprised it is does not use the iterator, since then that kind of stuff could be dealt with by it (as well as possibly the axis). Indeed, the fft
routines would seem easily implemented as a gufunc
. Though that may be better done as follow-up!
else: | ||
a = swapaxes(a, axis, -1) | ||
r = pfi.execute(a, is_real, is_forward, fct) | ||
r = pfi.execute(a, is_real, is_forward, fct, out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be risky - here, the axes of a
have been altered but those of out
have not. For complex-to-complex, it is possible to copy beforehand, though generally I think it is better to swap the axis for out
as well - for real-to-complex, this will be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have to do exactly the same operation on out
as you do on a
. .resize
definitely is not right as that can allocate new memory.
The only thing that would seem safe is the following:
if out is not None:
out_swapped = swapaxes(out, axis, -1)
pfi.execute(a, is_real, is_forward, fct, out_swapped)
return out
numpy/fft/_pocketfft.c
Outdated
if (!data) return NULL; | ||
} | ||
else { | ||
data = (PyArrayObject*)PyArray_EnsureArray(out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this branch, one needs to be sure the dtype and shape are both correct. Does PyArray_CopyObject
take care of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my understanding, yes.
37f057e
to
b11f2ae
Compare
++ extra test cases |
(the macos issue seems unrelated) |
@mhvk gentle ping :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear the code you have is wrong: CopyObject
converts data types, so, e.g., someone could have passed in a float32
array as out
, to which the input gets copied correctly, but the data is interpreted incorrectly in the calculation below, where it is assumed to be float64
.
It also looks like below the data array is assumed to be in C order, which does not have to be the case for an arbitrary out
array.
My own sense is that one should write this as a gufunc
so that iteration and input/output is done correctly automatically.
I also had a quick look at |
b11f2ae
to
04f8293
Compare
@mhvk : I've kept the 'out' name to prepare the (future) move to ufunc. I also added the proper checks before copying. |
04f8293
to
b67dcc7
Compare
Extra checks and tests added, thanks @mhvk for the hints |
b67dcc7
to
1fd1ce4
Compare
As the first parameter is always copied to the output, it doesn't have much impact performance wise. It is useful, however, for those who need fine-grain control over memory allocation and cannot afford the cost of a temporary allocation.
1fd1ce4
to
6a17489
Compare
@mhvk looks good now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you'll see from the in-line comments, I still think you have a problem here... The main problem is that in every other case in numpy, out
will be used to store the result only, with no change to shape or strides, but here for all but axis=-1
, you need to swap axes which means that if you pass in something C-contiguous, it will not be C-contiguous afterwards. Although it still would work if one is re-using a previous result, since that is swapped just the same way already. Since this is arguably one of the more important use cases, you could still just go for that (see in-line comment).
Otherwise, I think you are sort-of stuck actually rewriting the current simple loop using the iterator. Though at that point, writing it as a gufunc
is almost certainly less work and it would avoid all problems... The one tricky thing there would be to precalculate the fft plan and pass that to the inner loop (via *data
).
else: | ||
a = swapaxes(a, axis, -1) | ||
r = pfi.execute(a, is_real, is_forward, fct) | ||
r = pfi.execute(a, is_real, is_forward, fct, out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have to do exactly the same operation on out
as you do on a
. .resize
definitely is not right as that can allocate new memory.
The only thing that would seem safe is the following:
if out is not None:
out_swapped = swapaxes(out, axis, -1)
pfi.execute(a, is_real, is_forward, fct, out_swapped)
return out
# tests below only test the out parameter | ||
y = random((30, 20)) + 1j*random((30, 20)) | ||
|
||
out = np.zeros_like(x, dtype=complex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why pass in dtype=complex
here?
y = random((30, 20)) + 1j*random((30, 20)) | ||
|
||
out = np.zeros_like(x, dtype=complex) | ||
assert_allclose(fft1(x), np.fft.fft(x, out=out), atol=1e-6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all these tests, you need to check too that out
is actually returned, i.e., have something like
result = np.fft.fft(x, out=out)
assert result is out
assert_array_equal(fft1(x), result)
I replaced also with assert_array_equal
since, hopefuilly, fft code is reproducible on a given machine!
# This extra copy is unfortunately needed if we want `out` | ||
# to retain its original shape while having the correct values. | ||
copyto(out, r) | ||
r = out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is contrary to the regular behaviour of out
- you must ensure out
stays the same object with the same memory layout. Note that in principle, that will be automatic -- if you swapped the axis above, then the swapped case is a view of the original out
, so data will just be written in there. I.e., it should be possible to remove this stanza (as in the suggestion I gave above).
@serge-sans-paille - As it seemed hard to get it right without the iterator, I went ahead and tried calling pocketfft from ufuncs. See #25536. I hope to add your tests soon. |
Just a small comment: pocketfft could even deal with the situation where the input array and |
Sorry, my last comment was confusing, since I was thinking about Still, re-using the input array as output should be doable with not too much effort. Not sure whether this is worth it ... it probably depends on the long-term plans for |
@mreineck - since the |
As the first parameter is always copied to the output, it doesn't have much impact performance wise.
It is useful, however, for those who need fine-grain control over memory allocation and cannot afford the cost of a temporary allocation.