MAINT: Einsum argument parsing cleanup #11095

jaimefrio · 2018-05-14T14:17:04Z

Several refactorings and simplifications of einsum's argument parsing. Makes some progress towards #10801.

mattip · 2018-05-14T15:03:53Z

Off topic. Has there ever been discussion about unifying the subscript parsing in einsum with that of ufunc signatures? They seem similar but different.

jaimefrio · 2018-05-14T15:12:51Z

I don't recall any. There are a few differences that I can think off the top of my head, that may make a unified approach not that easy, e.g. einsum can have ellipsis, ufuncs can't; einsum labels are single letters, ufuncs can be any valid Python variable name; different delimiters... But if those differences could be abstracted and the resulting code was simpler, I would be all for it.

mattip · 2018-05-14T15:18:06Z

It is something to keep in mind as I work through #9028, one of the alternatives is to extend the generalized ufunc signature.

mhvk

To me at least, this really clarifies what was rather opague code. Only nitpicks by way of comments, really.

mhvk · 2018-05-14T17:57:06Z

numpy/core/src/multiarray/einsum.c.src

-            /* Search for the next matching label */
-            next = (char *)memchr(out_labels+idim+1, label,
-                                    ndim-idim-1);
+            /* Search for the next ,atching label. */


mhvk · 2018-05-14T18:00:53Z

numpy/core/src/multiarray/einsum.c.src

-            if (memchr(subscripts+i+1, label, length-i-1) == NULL) {
-                /* Check that it was used in the inputs */
+            /* Check that it doesn't occur again. */
+            if (memchr(subscripts + i + 1, label, length - i - 1) == NULL) {


I'd raise an error on != NULL here, to match the docstring and for ease of comprehension

mhvk · 2018-05-14T18:01:26Z

numpy/core/src/multiarray/einsum.c.src

+                    return -1;
+                }
+                /* Check that there is room in out_labels for this label. */
+                if (ndim < NPY_MAXDIMS) {


Same here, just >= NPY_MAXDIMS -> error.

mhvk · 2018-05-14T19:52:56Z

numpy/core/src/multiarray/einsum.c.src

-        }
-    }
+                /* Check there is room in out_labels for broadcast dims. */
+                if (ndim + ndim_broadcast <= NPY_MAXDIMS) {


I'd again reverse the logic of the test, but, really, I'm nitpicking!

jaimefrio · 2018-05-14T23:52:24Z

I have added a new commit following @mhvk's suggestions, which indeed make the code look nicer. Have extended the same idea to 'parse_operand_subscript' as well.

mhvk · 2018-05-15T00:40:03Z

This looks good. Aside: isn't it odd that one of the parse functions returns 0 for error and 1 for success, and the other -1 for error and 0 for success...

Anyway, I think this is a great improvement as is, but probably good to give it another day or so for further comments.

eric-wieser · 2018-05-15T00:44:24Z

numpy/core/src/multiarray/einsum.c.src

-        /* A label for an axis */
+    /* Process all labels for this operand */
+    for (i = 0; i < length; ++i) {
+        int label = subscripts[i];


Why the cast from char to int?

It seems unnecessary here. Elsewhere I guess label is made an int because negative values are relevant and it's shorter than signed char. I guess it makes sense to keep a consistent typing, even if it is not strictly needed?

The behavior of signed char and int is very different here - once will correctly store signed values whatever the system char is, whereas the other may end up storing an int in [0, 256)

eric-wieser · 2018-05-15T00:44:50Z

numpy/core/src/multiarray/einsum.c.src

+    for (i = 0; i < length; ++i) {
+        int label = subscripts[i];
+
+        /* A proper label for an axis. */
        if (label > 0 && isalpha(label)) {


label > 0 is always true on platforms with an unsigned char

Even if the char is signed, all the values for which isalpha would return true are < 128, so it seems like a redundant check indeed...

eric-wieser · 2018-05-15T00:49:05Z

numpy/core/src/multiarray/einsum.c.src

-                            "operand %d", iop);
+                             "einstein sum subscripts string contains a "
+                             "'.' that is not part of an ellipsis ('...') "
+                             "in operand %d", iop);
                return 0;


I agree with @mhvk here: these should return -1, but that can come in a later PR if needed

jaimefrio · 2018-05-15T05:46:00Z

Actually it's one returns 0 on error, 1 on success, the other returns -1 on error, the number of dimensions of the output array on success.

mhvk · 2018-05-15T13:32:02Z

Ah, yes, I stand corrected. Though it was the one that returns 0 on error that I thought made most sense to change to -1 on error, 0 on success.

jaimefrio · 2018-05-15T16:50:50Z

I really don't care either way, but we have NPY_FAIL defined to be 0 and NPY_SUCEED defined to be 1 somewhere in our code base. Do we have a policy on whether 0 / 1 or -1 / 0 should be used when and where?

What we should probably do is make the output dimensions be a reference passed into the parse_output_subscripts and have the return be a status with the same meaning as parse_operand_subscripts. Will do that in a follow up PR, and try to unify status for all helper functions in the file.

eric-wieser · 2018-05-15T16:55:26Z

Do we have a policy on whether 0 / 1 or -1 / 0 should be used when and where?

I'd favor deferring to the kernel coding standard here:

Functions can return values of many different kinds, and one of the most common is a value indicating whether the function succeeded or failed. Such a value can be represented as an error-code integer (-Exxx = failure, 0 = success) or a succeeded boolean (0 = failure, non-zero = success).

Mixing up these two sorts of representations is a fertile source of difficult-to-find bugs. If the C language included a strong distinction between integers and booleans then the compiler would find these mistakes for us... but it doesn’t. To help prevent such bugs, always follow this convention:

If the name of a function is an action or an imperative command,
the function should return an error-code integer. If the name
is a predicate, the function should return a "succeeded" boolean.

Python pretty much unilaterally returns -1 when an exception is set in a function that doesn't return PyObject - I think we should be doing the same

jaimefrio · 2018-05-17T18:36:06Z

Are you guys expecting me to do something else here, or are you just letting it sit for others to get a chance to take a look?

Since this is reproducing the current behavior, I'd rather leave any changes to the signed/unsigned char logic for a different PR.

mhvk · 2018-05-18T01:26:28Z

I'm happy with merging as is. @eric-wieser?

eric-wieser · 2018-05-18T07:17:19Z

char signedness is fine to leave for another time.

I'd like to see the return code changed, but I'm not going to insist on it, since what you have is already a strict improvement.

Small nit: the last commit is missing the MAINT: prefix - I've gone ahead and force-pushed to fix that. Feel free to merge when that commit finishes CI.

eric-wieser · 2018-05-29T07:57:14Z

I'm going to assume you don't have the bandwidth to go back and change the return code - so lets get this in

This matches most of the CPython API. Follows on from comments in numpygh-11095.

mhvk · 2018-05-29T13:32:28Z

@charris - if you still put this in 1.15, then also do #11187.

charris · 2018-05-29T15:06:50Z

@mhvk Everything in master at the time of the branch will be in 1.15.

jaimefrio added 4 commits May 14, 2018 04:19

MAINT: Remove unused variable from einsum.

ed0815f

MAINT: Refactor parse_operand_subscripts to avoid repetition.

a12174d

MAINT: Avoid creating fake output subscripts.

56b2454

MAINT: Refactor parse_output_subscripts.

1153dbf

jaimefrio changed the title ~~MAINT: Einsum argument parisng cleanup~~ MAINT: Einsum argument parsing cleanup May 14, 2018

jaimefrio mentioned this pull request May 14, 2018

BUG: np.einsum accepts some subscripts only when optimize=True #10926

Open

mhvk reviewed May 14, 2018

View reviewed changes

eric-wieser reviewed May 15, 2018

View reviewed changes

charris added 03 - Maintenance component: numpy._core labels May 15, 2018

MAINT: Change order of error checking for more code clarity.

c2d5925

eric-wieser force-pushed the einsum_cleanup branch from 50a6970 to c2d5925 Compare May 18, 2018 07:16

eric-wieser merged commit 5075933 into numpy:master May 29, 2018

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request May 29, 2018

MAINT: Use the more common -1 / 0 to indicate error / success

50e9786

This matches most of the CPython API. Follows on from comments in numpygh-11095.

eric-wieser mentioned this pull request May 29, 2018

MAINT: Use the more common -1 / 0 to indicate error / success #11187

Merged

jaimefrio deleted the einsum_cleanup branch May 31, 2018 05:35

Strilanc mentioned this pull request Jul 16, 2018

Fix to_unitary_matrix failing on circuits with 13 or more qubits quantumlib/Cirq#683

Closed

charris mentioned this pull request Oct 19, 2019

BUG: Fix np.einsum errors on Power9 Linux and z/Linux. #14692

Closed

Uh oh!

MAINT: Einsum argument parsing cleanup #11095

MAINT: Einsum argument parsing cleanup #11095

Uh oh!

Conversation

jaimefrio commented May 14, 2018

Uh oh!

mattip commented May 14, 2018

Uh oh!

jaimefrio commented May 14, 2018

Uh oh!

mattip commented May 14, 2018

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaimefrio commented May 14, 2018

Uh oh!

mhvk commented May 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaimefrio commented May 15, 2018

Uh oh!

mhvk commented May 15, 2018

Uh oh!

jaimefrio commented May 15, 2018

Uh oh!

eric-wieser commented May 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaimefrio commented May 17, 2018

Uh oh!

mhvk commented May 18, 2018

Uh oh!

eric-wieser commented May 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser commented May 29, 2018

Uh oh!

mhvk commented May 29, 2018

Uh oh!

charris commented May 29, 2018

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

eric-wieser commented May 15, 2018 •

edited

Loading

eric-wieser commented May 18, 2018 •

edited

Loading