Skip to content

ENH: add a casting option 'same_value' and use it in np.astype #29129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
cb8ea97
start implementing same_value casting
mattip Feb 17, 2025
3859a73
work through more places that check 'cast', add a TODO
Feb 17, 2025
9dc63a5
add a test, percolate casting closer to inner loops
mattip May 15, 2025
7cad6af
use SAME_VALUE_CAST flag for one inner loop variant
mattip May 29, 2025
03e9ceb
aligned test of same_value passes. Need more tests
mattip Jun 1, 2025
93b8bce
handle unaligned casting with 'same_value'
mattip Jun 1, 2025
87e2487
extend tests to use source-is-complex
mattip Jun 1, 2025
9aa1e5a
fix more interfaces to pass casting around, disallow using 'same_valu…
mattip Jun 2, 2025
953714e
raise in places that have a kwarg casting, besides np.astype
mattip Jun 2, 2025
cd3e144
refactor based on review comments
mattip Jun 9, 2025
6d6b045
CHAR_MAX,MIN -> SCHAR_MAX,MIN
mattip Jul 21, 2025
1293657
copy context flags
mattip Jul 22, 2025
d151c91
add 'same_value' to typing stubs
mattip Jul 23, 2025
6254ae5
document new feature
mattip Jul 24, 2025
6922b35
test, check exact float->int casting: refactor same_value check into …
mattip Jul 24, 2025
a323a4b
enable astype same_value casting for scalars
mattip Jul 24, 2025
aec8ea1
typo
mattip Jul 24, 2025
26c0fa1
fix ptr-to-src_value -> value casting errors
mattip Jul 25, 2025
0846081
fix linting and docs, ignore warning better
mattip Jul 25, 2025
76e01c1
gcc warning is different
mattip Jul 25, 2025
963ea05
fixes from review, typos
mattip Jul 26, 2025
4a9a498
fix compile warning ignore and make filter in tests more specific, di…
mattip Jul 27, 2025
10c4493
fix warning filters
mattip Jul 28, 2025
58a0a09
emit PyErr inside the loop
mattip Jul 31, 2025
64b8747
macOS can emit FPEs when touching NAN
mattip Jul 31, 2025
9d55847
Fix can-cast logic everywhere for same-value casts (only allow numeric)
seberg Jun 13, 2025
a800154
reorder and simplify, from review
mattip Aug 5, 2025
70d92bc
revert last commit and remove redundant checks
mattip Aug 5, 2025
3549d66
gate and document SAME_VALUE_CASTING for v2.4
mattip Aug 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
use SAME_VALUE_CAST flag for one inner loop variant
  • Loading branch information
mattip committed Jul 21, 2025
commit 7cad6af78867261aad42928492a446026698f888
10 changes: 4 additions & 6 deletions TODO_same_value
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
- Check where PyArray_CopyObject, PyArray_NewCopy, PyArray_CopyInto, array_datetime_as_string, PyArray_Concatenate, PyArray_where are used, do we need a 'same_value' equivalents?
- Is the comment in multiarray/common.c about NPY_DEFAULT_ASSIGN_CASTING warning still correct?
- In PyArray_FromArray(arr, newtype, flags) shoule there be a SAME_VALUE flag?
- In PyArray_FromArray(arr, newtype, flags) should there be a SAME_VALUE flag?
- Examine places where PyArray_CastingConverter is used and add SAME_VALUE handling
- array_astype: now errors, need to fix
- array_datetime_as_string:
- array_copyto:
- PyArray_AssignArray (called with a cast arg)
- PyArray_AssignArray with wheremask (called with a cast arg)
- PyArray_AssignRawScalar with/without wheremask
- PyArray_ConcatenateInto (called with a cast arg)
- PyArray_EinsteinSum (called with a cast arg)
- NpyIter_AdvancedNew (called with a cast arg)

In CanCast, make sure user defined and datetime dtypes will fail with SAME_VALUE
----
latest commit: `git grep UNSAFE_CASTING` up to `numpy/_core/src/multiarray/multiarraymodule.c`
11 changes: 11 additions & 0 deletions numpy/_core/include/numpy/dtype_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,12 @@ typedef struct PyArrayMethod_Context_tag {

/* Operand descriptors, filled in by resolve_descriptors */
PyArray_Descr *const *descriptors;
void * padding;
/*
* Optional flag to pass information into the inner loop
* If set, it will be NPY_CASTING
*/
uint64_t flags;
/* Structure may grow (this is harmless for DType authors) */
} PyArrayMethod_Context;

Expand Down Expand Up @@ -144,6 +150,11 @@ typedef struct {
#define NPY_METH_contiguous_indexed_loop 9
#define _NPY_METH_static_data 10

/*
* Constants for same_value casting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be constant for special array method return values, I think (i.e. not just applicable to casts).

Is it really worth to just keep this inside the inner-loop? Yes, you need to grab the GIL there, but on the plus side, you could actually consider reporting the error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the point of having a return value from the inner loop function is so we can use it to report errors. In general I prefer a code style that tries as much as possible to separate programming metaphors: keep the python c-api calls separate from the "pure" C functions.

Copy link
Member

@seberg seberg Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can be convinced to add this. But it is public API and we call these loops from a lot of places so we need machinery to make sure that all of these places give the same errors.

That may be a great refactor! E.g. we could have a single function that does HandleArrayMethodError("name", method_result, method_flags).
But I guess without such a refactor it feels very bolted on to me, because whether or not we check for this return depends on where we call/expect it.

EDIT: I.e. basically, wherever we have PyUFunc_GiveFloatinpointErrors() we would also pass the actual return value and do the needed logic there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to do this as a separate PR.

*/
#define NPY_SAME_VALUE_OVERFLOW -31


/*
* The resolve descriptors function, must be able to handle NULL values for
Expand Down
9 changes: 9 additions & 0 deletions numpy/_core/src/multiarray/array_assign_array.c
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,10 @@ raw_array_assign_array(int ndim, npy_intp const *shape,
npy_clear_floatstatus_barrier((char*)&src_data);
}

if (same_value_cast) {
cast_info.context.flags |= NPY_SAME_VALUE_CASTING;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this wasn't effectively private API right now, I would be opposed to re-using the NPY_SAME_VALUE_CASTING flag in a context where all the other cast levels don't make any sense (but other flags probably do in the future).

(I would like if we could note that somewhere. Probably here, since this is where the error for unsupported casts happens.)

}

/* Ensure number of elements exceeds threshold for threading */
if (!(method_flags & NPY_METH_REQUIRES_PYAPI)) {
npy_intp nitems = 1, i;
Expand All @@ -149,6 +153,9 @@ raw_array_assign_array(int ndim, npy_intp const *shape,
args, &shape_it[0], strides,
cast_info.auxdata);
if (result < 0) {
if (result == NPY_SAME_VALUE_OVERFLOW) {
goto same_value_overflow;
}
goto fail;
}
} NPY_RAW_ITER_TWO_NEXT(idim, ndim, coord, shape_it,
Expand All @@ -166,6 +173,8 @@ raw_array_assign_array(int ndim, npy_intp const *shape,
}

return 0;
same_value_overflow:
PyErr_SetString(PyExc_ValueError, "overflow in 'same_value' cast");
fail:
NPY_END_THREADS;
NPY_cast_info_xfree(&cast_info);
Expand Down
1 change: 1 addition & 0 deletions numpy/_core/src/multiarray/array_method.c
Original file line number Diff line number Diff line change
Expand Up @@ -797,6 +797,7 @@ boundarraymethod__simple_strided_call(
.caller = NULL,
.method = self->method,
.descriptors = descrs,
.flags = 0,
};
PyArrayMethod_StridedLoop *strided_loop = NULL;
NpyAuxData *loop_data = NULL;
Expand Down
2 changes: 0 additions & 2 deletions numpy/_core/src/multiarray/dtype_transfer.c
Original file line number Diff line number Diff line change
Expand Up @@ -2910,8 +2910,6 @@ _clear_cast_info_after_get_loop_failure(NPY_cast_info *cast_info)
* TODO: Expand the view functionality for general offsets, not just 0:
* Partial casts could be skipped also for `view_offset != 0`.
*
* The `out_needs_api` flag must be initialized.
*
* NOTE: In theory casting errors here could be slightly misleading in case
* of a multi-step casting scenario. It should be possible to improve
* this in the future.
Expand Down
3 changes: 3 additions & 0 deletions numpy/_core/src/multiarray/dtype_transfer.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ NPY_cast_info_init(NPY_cast_info *cast_info)

// TODO: Delete this again probably maybe create a new minimal init macro
cast_info->context.caller = NULL;

cast_info->context.padding = NULL;
cast_info->context.flags = 0;
}


Expand Down
8 changes: 7 additions & 1 deletion numpy/_core/src/multiarray/lowlevel_strided_loops.c.src
Original file line number Diff line number Diff line change
Expand Up @@ -899,6 +899,7 @@ static GCC_CAST_OPT_LEVEL int
#endif

/*printf("@prefix@_cast_@name1@_to_@name2@\n");*/
int same_value_casting = ((context->flags & NPY_SAME_VALUE_CASTING) == NPY_SAME_VALUE_CASTING);

while (N--) {
#if @aligned@
Expand Down Expand Up @@ -939,7 +940,12 @@ static GCC_CAST_OPT_LEVEL int
# elif !@aligned@
dst_value = _CONVERT_FN(src_value);
# else
*(_TYPE2 *)dst = _CONVERT_FN(*(_TYPE1 *)src);
*(_TYPE2 *)dst = _CONVERT_FN(*(_TYPE1 *)src);
if (same_value_casting) {
if (*(_TYPE2 *)dst != *(_TYPE1 *)src) {
return NPY_SAME_VALUE_OVERFLOW;
}
}
# endif
#endif

Expand Down
7 changes: 4 additions & 3 deletions numpy/_core/src/umath/legacy_array_method.c
Original file line number Diff line number Diff line change
Expand Up @@ -440,9 +440,10 @@ PyArray_NewLegacyWrappingArrayMethod(PyUFuncObject *ufunc,
}

PyArrayMethod_Context context = {
(PyObject *)ufunc,
bound_res->method,
descrs,
.caller = (PyObject *)ufunc,
.method = bound_res->method,
.descriptors = descrs,
.flags = 0,
};

int ret = get_initial_from_ufunc(&context, 0, context.method->legacy_initial);
Expand Down
12 changes: 9 additions & 3 deletions numpy/_core/src/umath/ufunc_object.c
Original file line number Diff line number Diff line change
Expand Up @@ -2088,6 +2088,7 @@ PyUFunc_GeneralizedFunctionInternal(PyUFuncObject *ufunc,
.caller = (PyObject *)ufunc,
.method = ufuncimpl,
.descriptors = operation_descrs,
.flags = 0,
};
PyArrayMethod_StridedLoop *strided_loop;
NPY_ARRAYMETHOD_FLAGS flags = 0;
Expand Down Expand Up @@ -2207,6 +2208,7 @@ PyUFunc_GenericFunctionInternal(PyUFuncObject *ufunc,
.caller = (PyObject *)ufunc,
.method = ufuncimpl,
.descriptors = operation_descrs,
.flags = 0,
};

/* Do the ufunc loop */
Expand Down Expand Up @@ -2557,6 +2559,7 @@ PyUFunc_Reduce(PyUFuncObject *ufunc,
.caller = (PyObject *)ufunc,
.method = ufuncimpl,
.descriptors = descrs,
.flags = 0,
};

PyArrayObject *result = PyUFunc_ReduceWrapper(&context,
Expand Down Expand Up @@ -2633,6 +2636,7 @@ PyUFunc_Accumulate(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *out,
.caller = (PyObject *)ufunc,
.method = ufuncimpl,
.descriptors = descrs,
.flags = 0,
};

ndim = PyArray_NDIM(arr);
Expand Down Expand Up @@ -3065,6 +3069,7 @@ PyUFunc_Reduceat(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *ind,
.caller = (PyObject *)ufunc,
.method = ufuncimpl,
.descriptors = descrs,
.flags = 0,
};

ndim = PyArray_NDIM(arr);
Expand Down Expand Up @@ -5903,9 +5908,10 @@ ufunc_at(PyUFuncObject *ufunc, PyObject *args)
}

PyArrayMethod_Context context = {
.caller = (PyObject *)ufunc,
.method = ufuncimpl,
.descriptors = operation_descrs,
.caller = (PyObject *)ufunc,
.method = ufuncimpl,
.descriptors = operation_descrs,
.flags = 0,
};

/* Use contiguous strides; if there is such a loop it may be faster */
Expand Down
3 changes: 1 addition & 2 deletions numpy/_core/tests/test_casting_unittests.py
Original file line number Diff line number Diff line change
Expand Up @@ -848,8 +848,7 @@ def test_same_value(self, from_dtype, to_dtype):
arr2 = np.array([0] * 10, dtype=to_dtype)
assert_equal(arr1.astype(to_dtype, casting='same_value'), arr2, strict=True)
arr1[0] = top1
if 1:
# with pytest.raises(ValueError):
with pytest.raises(ValueError):
# Casting float to float with overflow should raise RuntimeWarning (fperror)
# Casting float to int with overflow sometimes raises RuntimeWarning (fperror)
# Casting with overflow and 'same_value', should raise ValueError
Expand Down
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy