-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Description
Bug report
Bug description:
Summary
The initial goal was to confirm whether a segmentation fault could occur in the following code path from _ctypes.c
:
if (dict != NULL && dict->proto != NULL) {
if (PyUnicode_Check(dict->proto)
&& (strchr("sPzUZXO", PyUnicode_AsUTF8(dict->proto)[0]))) {
return 1;
}
}
The hypothesis: if dict->proto
is a malformed PyUnicodeObject
(e.g. one that bypassed PyUnicode_READY()
), then PyUnicode_AsUTF8()
may return NULL
or point to invalid memory, causing a crash during the strchr()
call or later Unicode processing.
This behavior was confirmed by crafting an invalid Unicode object in C and assigning it to _type_
in a ctypes.POINTER
subclass.
PoC
badproto.c
#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <stdio.h>
static PyObject* crash_on_utf8(PyObject *self, PyObject *args) {
PyObject *u = PyUnicode_New(5, 127);
if (!u) {
PyErr_SetString(PyExc_RuntimeError, "PyUnicode_New failed");
return NULL;
}
((PyASCIIObject *)u)->state.ready = 0;
const char *utf8 = PyUnicode_AsUTF8(u);
char c = utf8[0];
return PyLong_FromLong((long)c);
}
static PyMethodDef Methods[] = {
{"crash_on_utf8", crash_on_utf8, METH_NOARGS, "Force PyUnicode_AsUTF8 to segfault."},
{NULL, NULL, 0, NULL}
};
static struct PyModuleDef mod = {
PyModuleDef_HEAD_INIT,
"badproto",
NULL,
-1,
Methods
};
PyMODINIT_FUNC PyInit_badproto(void) {
return PyModule_Create(&mod);
}
test.py
import badproto
badproto.crash_on_utf8()
Analysis
- The crash path begins in
cast_check_pointertype()
in_ctypes.c
. - That code assumes
_type_
is a valid Unicode object and thatPyUnicode_AsUTF8()
is safe to call. - However, if the object was created via C and is in an invalid state (e.g.,
ready = 0
), this assumption may be broken. - Consequences:
PyUnicode_AsUTF8()
may returnNULL
, leading to undefined behavior instrchr(...)
.- Or it may cause a deep crash elsewhere (e.g.
find_maxchar_surrogates()
).
GDB Trace
Program terminated with signal SIGSEGV, Segmentation fault.
#0 find_maxchar_surrogates (begin=0x0, end=0x70, ...) at Objects/unicodeobject.c:1790
Expected behavior
CPython should defensively reject _type_
values that are not fully initialized Unicode objects, or at least guard against NULL
from PyUnicode_AsUTF8()
.
Adding an explicit PyUnicode_READY()
call in this path might be appropriate.
Notes
This is not reachable from pure Python — it depends on constructing an invalid PyUnicodeObject
in C. However, it reveals a potentially unsafe assumption in _ctypes
that may be worth hardening.
CPython versions tested on:
3.9
Operating systems tested on:
Linux