Segmentation fault in _ctypes when _type_ can't be converted to UTF-8

# Bug report

### Bug description:

## Summary

The initial goal was to **confirm whether a segmentation fault could occur** in the following code path from `_ctypes.c`:

```c
if (dict != NULL && dict->proto != NULL) {
    if (PyUnicode_Check(dict->proto)
        && (strchr("sPzUZXO", PyUnicode_AsUTF8(dict->proto)[0]))) {
        return 1;
    }
}
```

The hypothesis: if `dict->proto` is a malformed `PyUnicodeObject` (e.g. one that bypassed `PyUnicode_READY()`), then `PyUnicode_AsUTF8()` may return `NULL` or point to invalid memory, causing a crash during the `strchr()` call or later Unicode processing.

This behavior was confirmed by crafting an invalid Unicode object in C and assigning it to `_type_` in a `ctypes.POINTER` subclass.

---

## PoC

### `badproto.c`

```c
#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <stdio.h>

static PyObject* crash_on_utf8(PyObject *self, PyObject *args) {
    PyObject *u = PyUnicode_New(5, 127);

    if (!u) {
        PyErr_SetString(PyExc_RuntimeError, "PyUnicode_New failed");
        return NULL;
    }

    ((PyASCIIObject *)u)->state.ready = 0; 
    const char *utf8 = PyUnicode_AsUTF8(u);
    char c = utf8[0];

    return PyLong_FromLong((long)c);
}

static PyMethodDef Methods[] = {
    {"crash_on_utf8", crash_on_utf8, METH_NOARGS, "Force PyUnicode_AsUTF8 to segfault."},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef mod = {
    PyModuleDef_HEAD_INIT,
    "badproto",
    NULL,
    -1,
    Methods
};

PyMODINIT_FUNC PyInit_badproto(void) {
    return PyModule_Create(&mod);
}
```

### `test.py`

```python
import badproto
badproto.crash_on_utf8()
```

---

## Analysis

- The crash path begins in `cast_check_pointertype()` in `_ctypes.c`.
- That code assumes `_type_` is a valid Unicode object and that `PyUnicode_AsUTF8()` is safe to call.
- However, if the object was created via C and is in an invalid state (e.g., `ready = 0`), this assumption may be broken.
- Consequences:
  - `PyUnicode_AsUTF8()` may return `NULL`, leading to undefined behavior in `strchr(...)`.
  - Or it may cause a deep crash elsewhere (e.g. `find_maxchar_surrogates()`).

---

## GDB Trace

```
Program terminated with signal SIGSEGV, Segmentation fault.
#0  find_maxchar_surrogates (begin=0x0, end=0x70, ...) at Objects/unicodeobject.c:1790
```

---

## Expected behavior

CPython should defensively reject `_type_` values that are not fully initialized Unicode objects, or at least guard against `NULL` from `PyUnicode_AsUTF8()`.

Adding an explicit `PyUnicode_READY()` call in this path might be appropriate.

---

## Notes

This is not reachable from pure Python — it depends on constructing an invalid `PyUnicodeObject` in C. However, it reveals a potentially unsafe assumption in `_ctypes` that may be worth hardening.

[core.dmp](https://github.com/user-attachments/files/21387007/core.dmp)


### CPython versions tested on:

3.9

### Operating systems tested on:

Linux

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Segmentation fault in _ctypes when _type_ can't be converted to UTF-8 #137037

Bug report

Bug description:

Summary

PoC

`badproto.c`

`test.py`

Analysis

GDB Trace

Expected behavior

Notes

CPython versions tested on:

Operating systems tested on:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Uh oh!

Segmentation fault in _ctypes when _type_ can't be converted to UTF-8 #137037

Description

Bug report

Bug description:

Summary

PoC

badproto.c

test.py

Analysis

GDB Trace

Expected behavior

Notes

CPython versions tested on:

Operating systems tested on:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

`badproto.c`

`test.py`