`tf.experimental.numpy.cumsum` handles overflow inconsistently on CPU and GPU

### Issue type

Bug

### Have you reproduced the bug with TensorFlow Nightly?

Yes

### Source

source

### TensorFlow version

2.20.0-dev20250715

### Custom code

Yes

### OS platform and distribution

Linux Ubuntu 20.04

### Mobile device

_No response_

### Python version

3.12

### Bazel version

_No response_

### GCC/compiler version

_No response_

### CUDA/cuDNN version

_No response_

### GPU model and memory

_No response_

### Current behavior?

Running `tf.experimental.numpy.cumsum` with large values cause overflow if the type is set to `int16`, which is expected. However, the behavior is different on CPU vs on GPU, with the CPU version matching the `numpy` version.

Colab with `2.19.0`: [colab](https://colab.research.google.com/drive/18FRrPqP1MgI_35Zk7fvBWIKEFx2D9YH1?usp=sharing)

### Standalone code to reproduce the issue

```shell
import tensorflow as tf
import numpy as np

rng = np.random.default_rng(215)

a = tf.constant(rng.uniform(16211., 1312848., size=(2, 2, 2, 1, 1, 3)), dtype=tf.float32)
axis = -4
dtype = tf.int16

with tf.device("/CPU:0"):
    output_cpu = tf.experimental.numpy.cumsum(a, axis=axis, dtype=dtype)

with tf.device("/GPU:0"):
    output_gpu = tf.experimental.numpy.cumsum(a, axis=axis, dtype=dtype)

output_np = np.cumsum(a.numpy(), axis=axis, dtype=np.int16)
print(tf.__version__)   # 2.20.0-dev20250715
print(output_cpu[0,0,0,0,0,0])  # tf.Tensor(17745, shape=(), dtype=int16)
print(output_gpu[0,0,0,0,0,0])  # tf.Tensor(32767, shape=(), dtype=int16)
print(output_np[0,0,0,0,0,0]) # 17745
```

### Relevant log output

```shell
2.20.0-dev20250715
tf.Tensor(17745, shape=(), dtype=int16)
tf.Tensor(32767, shape=(), dtype=int16)
17745
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`tf.experimental.numpy.cumsum` handles overflow inconsistently on CPU and GPU #97042

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

tf.experimental.numpy.cumsum handles overflow inconsistently on CPU and GPU #97042

Description

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

`tf.experimental.numpy.cumsum` handles overflow inconsistently on CPU and GPU #97042