-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
MAINT: Einsum argument parsing cleanup #11095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Off topic. Has there ever been discussion about unifying the subscript parsing in einsum with that of ufunc signatures? They seem similar but different. |
I don't recall any. There are a few differences that I can think off the top of my head, that may make a unified approach not that easy, e.g. einsum can have ellipsis, ufuncs can't; einsum labels are single letters, ufuncs can be any valid Python variable name; different delimiters... But if those differences could be abstracted and the resulting code was simpler, I would be all for it. |
It is something to keep in mind as I work through #9028, one of the alternatives is to extend the generalized ufunc signature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me at least, this really clarifies what was rather opague code. Only nitpicks by way of comments, really.
/* Search for the next matching label */ | ||
next = (char *)memchr(out_labels+idim+1, label, | ||
ndim-idim-1); | ||
/* Search for the next ,atching label. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
if (memchr(subscripts+i+1, label, length-i-1) == NULL) { | ||
/* Check that it was used in the inputs */ | ||
/* Check that it doesn't occur again. */ | ||
if (memchr(subscripts + i + 1, label, length - i - 1) == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd raise an error on != NULL
here, to match the docstring and for ease of comprehension
return -1; | ||
} | ||
/* Check that there is room in out_labels for this label. */ | ||
if (ndim < NPY_MAXDIMS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, just >= NPY_MAXDIMS
-> error.
} | ||
} | ||
/* Check there is room in out_labels for broadcast dims. */ | ||
if (ndim + ndim_broadcast <= NPY_MAXDIMS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd again reverse the logic of the test, but, really, I'm nitpicking!
I have added a new commit following @mhvk's suggestions, which indeed make the code look nicer. Have extended the same idea to 'parse_operand_subscript' as well. |
This looks good. Aside: isn't it odd that one of the Anyway, I think this is a great improvement as is, but probably good to give it another day or so for further comments. |
/* A label for an axis */ | ||
/* Process all labels for this operand */ | ||
for (i = 0; i < length; ++i) { | ||
int label = subscripts[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the cast from char
to int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems unnecessary here. Elsewhere I guess label
is made an int because negative values are relevant and it's shorter than signed char
. I guess it makes sense to keep a consistent typing, even if it is not strictly needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of signed char
and int
is very different here - once will correctly store signed values whatever the system char
is, whereas the other may end up storing an int in [0, 256)
for (i = 0; i < length; ++i) { | ||
int label = subscripts[i]; | ||
|
||
/* A proper label for an axis. */ | ||
if (label > 0 && isalpha(label)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label > 0
is always true on platforms with an unsigned char
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the char is signed, all the values for which isalpha
would return true are < 128, so it seems like a redundant check indeed...
"operand %d", iop); | ||
"einstein sum subscripts string contains a " | ||
"'.' that is not part of an ellipsis ('...') " | ||
"in operand %d", iop); | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @mhvk here: these should return -1
, but that can come in a later PR if needed
Actually it's one returns 0 on error, 1 on success, the other returns -1 on error, the number of dimensions of the output array on success. |
Ah, yes, I stand corrected. Though it was the one that returns 0 on error that I thought made most sense to change to -1 on error, 0 on success. |
I really don't care either way, but we have What we should probably do is make the output dimensions be a reference passed into the |
I'd favor deferring to the kernel coding standard here:
Python pretty much unilaterally returns |
Are you guys expecting me to do something else here, or are you just letting it sit for others to get a chance to take a look? Since this is reproducing the current behavior, I'd rather leave any changes to the signed/unsigned char logic for a different PR. |
I'm happy with merging as is. @eric-wieser? |
50a6970
to
c2d5925
Compare
I'd like to see the return code changed, but I'm not going to insist on it, since what you have is already a strict improvement. Small nit: the last commit is missing the |
I'm going to assume you don't have the bandwidth to go back and change the return code - so lets get this in |
This matches most of the CPython API. Follows on from comments in numpygh-11095.
@mhvk Everything in master at the time of the branch will be in 1.15. |
Several refactorings and simplifications of einsum's argument parsing. Makes some progress towards #10801.