Skip to content

Segmentation fault in _ctypes when _type_ can't be converted to UTF-8 #137037

@3xt3r

Description

@3xt3r

Bug report

Bug description:

Summary

The initial goal was to confirm whether a segmentation fault could occur in the following code path from _ctypes.c:

if (dict != NULL && dict->proto != NULL) {
    if (PyUnicode_Check(dict->proto)
        && (strchr("sPzUZXO", PyUnicode_AsUTF8(dict->proto)[0]))) {
        return 1;
    }
}

The hypothesis: if dict->proto is a malformed PyUnicodeObject (e.g. one that bypassed PyUnicode_READY()), then PyUnicode_AsUTF8() may return NULL or point to invalid memory, causing a crash during the strchr() call or later Unicode processing.

This behavior was confirmed by crafting an invalid Unicode object in C and assigning it to _type_ in a ctypes.POINTER subclass.


PoC

badproto.c

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <stdio.h>

static PyObject* crash_on_utf8(PyObject *self, PyObject *args) {
    PyObject *u = PyUnicode_New(5, 127);

    if (!u) {
        PyErr_SetString(PyExc_RuntimeError, "PyUnicode_New failed");
        return NULL;
    }

    ((PyASCIIObject *)u)->state.ready = 0; 
    const char *utf8 = PyUnicode_AsUTF8(u);
    char c = utf8[0];

    return PyLong_FromLong((long)c);
}

static PyMethodDef Methods[] = {
    {"crash_on_utf8", crash_on_utf8, METH_NOARGS, "Force PyUnicode_AsUTF8 to segfault."},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef mod = {
    PyModuleDef_HEAD_INIT,
    "badproto",
    NULL,
    -1,
    Methods
};

PyMODINIT_FUNC PyInit_badproto(void) {
    return PyModule_Create(&mod);
}

test.py

import badproto
badproto.crash_on_utf8()

Analysis

  • The crash path begins in cast_check_pointertype() in _ctypes.c.
  • That code assumes _type_ is a valid Unicode object and that PyUnicode_AsUTF8() is safe to call.
  • However, if the object was created via C and is in an invalid state (e.g., ready = 0), this assumption may be broken.
  • Consequences:
    • PyUnicode_AsUTF8() may return NULL, leading to undefined behavior in strchr(...).
    • Or it may cause a deep crash elsewhere (e.g. find_maxchar_surrogates()).

GDB Trace

Program terminated with signal SIGSEGV, Segmentation fault.
#0  find_maxchar_surrogates (begin=0x0, end=0x70, ...) at Objects/unicodeobject.c:1790

Expected behavior

CPython should defensively reject _type_ values that are not fully initialized Unicode objects, or at least guard against NULL from PyUnicode_AsUTF8().

Adding an explicit PyUnicode_READY() call in this path might be appropriate.


Notes

This is not reachable from pure Python — it depends on constructing an invalid PyUnicodeObject in C. However, it reveals a potentially unsafe assumption in _ctypes that may be worth hardening.

core.dmp

CPython versions tested on:

3.9

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

Labels

extension-modulesC modules in the Modules dirpendingThe issue will be closed if no feedback is providedtopic-ctypestype-crashA hard crash of the interpreter, possibly with a core dump

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy