Skip to content

Commit 3ff1588

Browse files
committed
Perform conversion from Python unicode to string/bytes object via UTF-8.
We used to convert the unicode object directly to a string in the server encoding by calling Python's PyUnicode_AsEncodedString function. In other words, we used Python's routines to do the encoding. However, that has a few problems. First of all, it required keeping a mapping table of Python encoding names and PostgreSQL encodings. But the real killer was that Python doesn't support EUC_TW and MULE_INTERNAL encodings at all. Instead, convert the Python unicode object to UTF-8, and use PostgreSQL's encoding conversion functions to convert from UTF-8 to server encoding. We were already doing the same in the other direction in PLyUnicode_FromString, so this is more consistent, too. Note: This makes SQL_ASCII to behave more leniently. We used to map SQL_ASCII to Python's 'ascii', which on Python means strict 7-bit ASCII only, so you got an error if the python string contained anything but pure ASCII. You no longer get an error; you get the UTF-8 representation of the string instead. Backpatch to 9.0, where these conversions were introduced. Jan Urbański
1 parent 149ac7d commit 3ff1588

File tree

2 files changed

+44
-108
lines changed

2 files changed

+44
-108
lines changed

src/pl/plpython/expected/plpython_unicode_3.out

Lines changed: 0 additions & 54 deletions
This file was deleted.

src/pl/plpython/plpy_util.c

Lines changed: 44 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -61,66 +61,56 @@ PLy_free(void *ptr)
6161
PyObject *
6262
PLyUnicode_Bytes(PyObject *unicode)
6363
{
64-
PyObject *rv;
65-
const char *serverenc;
64+
PyObject *bytes, *rv;
65+
char *utf8string, *encoded;
66+
67+
/* First encode the Python unicode object with UTF-8. */
68+
bytes = PyUnicode_AsUTF8String(unicode);
69+
if (bytes == NULL)
70+
PLy_elog(ERROR, "could not convert Python Unicode object to bytes");
71+
72+
utf8string = PyBytes_AsString(bytes);
73+
if (utf8string == NULL) {
74+
Py_DECREF(bytes);
75+
PLy_elog(ERROR, "could not extract bytes from encoded string");
76+
}
6677

6778
/*
68-
* Map PostgreSQL encoding to a Python encoding name.
79+
* Then convert to server encoding if necessary.
80+
*
81+
* PyUnicode_AsEncodedString could be used to encode the object directly
82+
* in the server encoding, but Python doesn't support all the encodings
83+
* that PostgreSQL does (EUC_TW and MULE_INTERNAL). UTF-8 is used as an
84+
* intermediary in PLyUnicode_FromString as well.
6985
*/
70-
switch (GetDatabaseEncoding())
86+
if (GetDatabaseEncoding() != PG_UTF8)
7187
{
72-
case PG_SQL_ASCII:
73-
/*
74-
* Mapping SQL_ASCII to Python's 'ascii' is a bit bogus. Python's
75-
* 'ascii' means true 7-bit only ASCII, while PostgreSQL's
76-
* SQL_ASCII means that anything is allowed, and the system doesn't
77-
* try to interpret the bytes in any way. But not sure what else
78-
* to do, and we haven't heard any complaints...
79-
*/
80-
serverenc = "ascii";
81-
break;
82-
case PG_WIN1250:
83-
serverenc = "cp1250";
84-
break;
85-
case PG_WIN1251:
86-
serverenc = "cp1251";
87-
break;
88-
case PG_WIN1252:
89-
serverenc = "cp1252";
90-
break;
91-
case PG_WIN1253:
92-
serverenc = "cp1253";
93-
break;
94-
case PG_WIN1254:
95-
serverenc = "cp1254";
96-
break;
97-
case PG_WIN1255:
98-
serverenc = "cp1255";
99-
break;
100-
case PG_WIN1256:
101-
serverenc = "cp1256";
102-
break;
103-
case PG_WIN1257:
104-
serverenc = "cp1257";
105-
break;
106-
case PG_WIN1258:
107-
serverenc = "cp1258";
108-
break;
109-
case PG_WIN866:
110-
serverenc = "cp866";
111-
break;
112-
case PG_WIN874:
113-
serverenc = "cp874";
114-
break;
115-
default:
116-
/* Other encodings have the same name in Python. */
117-
serverenc = GetDatabaseEncodingName();
118-
break;
88+
PG_TRY();
89+
{
90+
encoded = (char *) pg_do_encoding_conversion(
91+
(unsigned char *) utf8string,
92+
strlen(utf8string),
93+
PG_UTF8,
94+
GetDatabaseEncoding());
95+
}
96+
PG_CATCH();
97+
{
98+
Py_DECREF(bytes);
99+
PG_RE_THROW();
100+
}
101+
PG_END_TRY();
119102
}
103+
else
104+
encoded = utf8string;
105+
106+
/* finally, build a bytes object in the server encoding */
107+
rv = PyBytes_FromStringAndSize(encoded, strlen(encoded));
108+
109+
/* if pg_do_encoding_conversion allocated memory, free it now */
110+
if (utf8string != encoded)
111+
pfree(encoded);
120112

121-
rv = PyUnicode_AsEncodedString(unicode, serverenc, "strict");
122-
if (rv == NULL)
123-
PLy_elog(ERROR, "could not convert Python Unicode object to PostgreSQL server encoding");
113+
Py_DECREF(bytes);
124114
return rv;
125115
}
126116

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy