Skip to content

MAINT: Einsum argument parsing cleanup #11095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 29, 2018

Conversation

jaimefrio
Copy link
Member

Several refactorings and simplifications of einsum's argument parsing. Makes some progress towards #10801.

@jaimefrio jaimefrio changed the title MAINT: Einsum argument parisng cleanup MAINT: Einsum argument parsing cleanup May 14, 2018
@mattip
Copy link
Member

mattip commented May 14, 2018

Off topic. Has there ever been discussion about unifying the subscript parsing in einsum with that of ufunc signatures? They seem similar but different.

@jaimefrio
Copy link
Member Author

I don't recall any. There are a few differences that I can think off the top of my head, that may make a unified approach not that easy, e.g. einsum can have ellipsis, ufuncs can't; einsum labels are single letters, ufuncs can be any valid Python variable name; different delimiters... But if those differences could be abstracted and the resulting code was simpler, I would be all for it.

@mattip
Copy link
Member

mattip commented May 14, 2018

It is something to keep in mind as I work through #9028, one of the alternatives is to extend the generalized ufunc signature.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me at least, this really clarifies what was rather opague code. Only nitpicks by way of comments, really.

/* Search for the next matching label */
next = (char *)memchr(out_labels+idim+1, label,
ndim-idim-1);
/* Search for the next ,atching label. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

if (memchr(subscripts+i+1, label, length-i-1) == NULL) {
/* Check that it was used in the inputs */
/* Check that it doesn't occur again. */
if (memchr(subscripts + i + 1, label, length - i - 1) == NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd raise an error on != NULL here, to match the docstring and for ease of comprehension

return -1;
}
/* Check that there is room in out_labels for this label. */
if (ndim < NPY_MAXDIMS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, just >= NPY_MAXDIMS -> error.

}
}
/* Check there is room in out_labels for broadcast dims. */
if (ndim + ndim_broadcast <= NPY_MAXDIMS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd again reverse the logic of the test, but, really, I'm nitpicking!

@jaimefrio
Copy link
Member Author

I have added a new commit following @mhvk's suggestions, which indeed make the code look nicer. Have extended the same idea to 'parse_operand_subscript' as well.

@mhvk
Copy link
Contributor

mhvk commented May 15, 2018

This looks good. Aside: isn't it odd that one of the parse functions returns 0 for error and 1 for success, and the other -1 for error and 0 for success...

Anyway, I think this is a great improvement as is, but probably good to give it another day or so for further comments.

/* A label for an axis */
/* Process all labels for this operand */
for (i = 0; i < length; ++i) {
int label = subscripts[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the cast from char to int?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems unnecessary here. Elsewhere I guess label is made an int because negative values are relevant and it's shorter than signed char. I guess it makes sense to keep a consistent typing, even if it is not strictly needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of signed char and int is very different here - once will correctly store signed values whatever the system char is, whereas the other may end up storing an int in [0, 256)

for (i = 0; i < length; ++i) {
int label = subscripts[i];

/* A proper label for an axis. */
if (label > 0 && isalpha(label)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label > 0 is always true on platforms with an unsigned char

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the char is signed, all the values for which isalpha would return true are < 128, so it seems like a redundant check indeed...

"operand %d", iop);
"einstein sum subscripts string contains a "
"'.' that is not part of an ellipsis ('...') "
"in operand %d", iop);
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mhvk here: these should return -1, but that can come in a later PR if needed

@jaimefrio
Copy link
Member Author

Actually it's one returns 0 on error, 1 on success, the other returns -1 on error, the number of dimensions of the output array on success.

@mhvk
Copy link
Contributor

mhvk commented May 15, 2018

Ah, yes, I stand corrected. Though it was the one that returns 0 on error that I thought made most sense to change to -1 on error, 0 on success.

@jaimefrio
Copy link
Member Author

I really don't care either way, but we have NPY_FAIL defined to be 0 and NPY_SUCEED defined to be 1 somewhere in our code base. Do we have a policy on whether 0 / 1 or -1 / 0 should be used when and where?

What we should probably do is make the output dimensions be a reference passed into the parse_output_subscripts and have the return be a status with the same meaning as parse_operand_subscripts. Will do that in a follow up PR, and try to unify status for all helper functions in the file.

@eric-wieser
Copy link
Member

eric-wieser commented May 15, 2018

Do we have a policy on whether 0 / 1 or -1 / 0 should be used when and where?

I'd favor deferring to the kernel coding standard here:

Functions can return values of many different kinds, and one of the most common is a value indicating whether the function succeeded or failed. Such a value can be represented as an error-code integer (-Exxx = failure, 0 = success) or a succeeded boolean (0 = failure, non-zero = success).

Mixing up these two sorts of representations is a fertile source of difficult-to-find bugs. If the C language included a strong distinction between integers and booleans then the compiler would find these mistakes for us... but it doesn’t. To help prevent such bugs, always follow this convention:

If the name of a function is an action or an imperative command,
the function should return an error-code integer
. If the name
is a predicate, the function should return a "succeeded" boolean.

Python pretty much unilaterally returns -1 when an exception is set in a function that doesn't return PyObject - I think we should be doing the same

@jaimefrio
Copy link
Member Author

Are you guys expecting me to do something else here, or are you just letting it sit for others to get a chance to take a look?

Since this is reproducing the current behavior, I'd rather leave any changes to the signed/unsigned char logic for a different PR.

@mhvk
Copy link
Contributor

mhvk commented May 18, 2018

I'm happy with merging as is. @eric-wieser?

@eric-wieser
Copy link
Member

eric-wieser commented May 18, 2018

char signedness is fine to leave for another time.

I'd like to see the return code changed, but I'm not going to insist on it, since what you have is already a strict improvement.

Small nit: the last commit is missing the MAINT: prefix - I've gone ahead and force-pushed to fix that. Feel free to merge when that commit finishes CI.

@eric-wieser
Copy link
Member

I'm going to assume you don't have the bandwidth to go back and change the return code - so lets get this in

@eric-wieser eric-wieser merged commit 5075933 into numpy:master May 29, 2018
eric-wieser added a commit to eric-wieser/numpy that referenced this pull request May 29, 2018
This matches most of the CPython API.

Follows on from comments in numpygh-11095.
@mhvk
Copy link
Contributor

mhvk commented May 29, 2018

@charris - if you still put this in 1.15, then also do #11187.

@charris
Copy link
Member

charris commented May 29, 2018

@mhvk Everything in master at the time of the branch will be in 1.15.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy