|
| 1 | +.. _text-bytes: |
| 2 | + |
| 3 | +Bytes/text management |
| 4 | +===================== |
| 5 | + |
| 6 | +Python 3 introduces a hard distinction between *text* (``str``) – sequences of |
| 7 | +characters (formally, *Unicode codepoints*) – and ``bytes`` – sequences of |
| 8 | +8-bit values used to encode *any* kind of data for storage or transmission. |
| 9 | + |
| 10 | +Python 2 has the same distinction between ``str`` (bytes) and |
| 11 | +``unicode`` (text). |
| 12 | +However, values can be implicitly converted between these types as needed, |
| 13 | +e.g. when comparing or writing to disk or the network. |
| 14 | +The implicit encoding and decoding can be a source of subtle bugs when not |
| 15 | +designed and tested adequately. |
| 16 | + |
| 17 | +In python-ldap 2.x (for Python 2), bytes were used for all fields, |
| 18 | +including those guaranteed to be text. |
| 19 | + |
| 20 | +From version 3.0, python-ldap uses text where appropriate. |
| 21 | +On Python 2, the `bytes mode <bytes_mode>`_ setting influences how text is |
| 22 | +handled. |
| 23 | + |
| 24 | + |
| 25 | +What's text, and what's bytes |
| 26 | +----------------------------- |
| 27 | + |
| 28 | +The LDAP protocol states that some fields (distinguished names, relative |
| 29 | +distinguished names, attribute names, queries) be encoded in UTF-8. |
| 30 | +In python-ldap, these are represented as text (``str`` on Python 3, |
| 31 | +``unicode`` on Python 2). |
| 32 | + |
| 33 | +Attribute *values*, on the other hand, **MAY** |
| 34 | +contain any type of data, including text. |
| 35 | +To know what type of data is represented, python-ldap would need access to the |
| 36 | +schema, which is not always available (nor always correct). |
| 37 | +Thus, attribute values are *always* treated as ``bytes``. |
| 38 | +Encoding/decoding to other formats – text, images, etc. – is left to the caller. |
| 39 | + |
| 40 | + |
| 41 | +.. _bytes_mode: |
| 42 | + |
| 43 | +The bytes mode |
| 44 | +-------------- |
| 45 | + |
| 46 | +The behavior of python-ldap 3.0 in Python 2 is influenced by a ``bytes_mode`` |
| 47 | +argument to :func:`ldap.initialize`. |
| 48 | +The argument can take these values: |
| 49 | + |
| 50 | +``bytes_mode=True``: backwards-compatible |
| 51 | + |
| 52 | + Text values returned from python-ldap are always bytes (``str``). |
| 53 | + Text values supplied to python-ldap may be either bytes or Unicode. |
| 54 | + The encoding for bytes is always assumed to be UTF-8. |
| 55 | + |
| 56 | + Not available in Python 3. |
| 57 | + |
| 58 | +``bytes_mode=False``: strictly future-compatible |
| 59 | + |
| 60 | + Text values must be represented as ``unicode``. |
| 61 | + An error is raised if python-ldap receives a text value as bytes (``str``). |
| 62 | + |
| 63 | +Unspecified: relaxed mode with warnings |
| 64 | + |
| 65 | + Causes a warning on Python 2. |
| 66 | + |
| 67 | + Text values returned from python-ldap are always ``unicode``. |
| 68 | + Text values supplied to python-ldap should be ``unicode``; |
| 69 | + warnings are emitted when they are not. |
| 70 | + |
| 71 | +Backwards-compatible behavior is not scheduled for removal until Python 2 |
| 72 | +itself reaches end of life. |
| 73 | + |
| 74 | + |
| 75 | +Porting recommendations |
| 76 | +----------------------- |
| 77 | + |
| 78 | +Since end of life of Python 2 is coming in a few years, |
| 79 | +projects are strongly urged to make their code compatible with Python 3. |
| 80 | +General instructions for this are provided `in Python documentation`_ and in |
| 81 | +the `Conservative porting guide`_. |
| 82 | + |
| 83 | +.. _in Python documentation: https://docs.python.org/3/howto/pyporting.html |
| 84 | +.. _Conservative porting guide: http://portingguide.readthedocs.io/en/latest/ |
| 85 | + |
| 86 | + |
| 87 | +When porting from python-ldap 2.x, users are advised to update their code |
| 88 | +to set ``bytes_mode=False``, and fix any resulting failures. |
| 89 | + |
| 90 | +The typical usage is as follows. |
| 91 | +Note that only the result's *values* are of the ``bytes`` type: |
| 92 | + |
| 93 | +.. code-block:: pycon |
| 94 | +
|
| 95 | + >>> import ldap |
| 96 | + >>> con = ldap.initialize('ldap://localhost:389', bytes_mode=False) |
| 97 | + >>> con.simple_bind_s(u'login', u'secret_password') |
| 98 | + >>> results = con.search_s(u'ou=people,dc=example,dc=org', ldap.SCOPE_SUBTREE, u"(cn=Raphaël)") |
| 99 | + >>> results |
| 100 | + [ |
| 101 | + ("cn=Raphaël,ou=people,dc=example,dc=org", { |
| 102 | + 'cn': [b'Rapha\xc3\xabl'], |
| 103 | + 'sn': [b'Barrois'], |
| 104 | + }), |
| 105 | + ] |
0 commit comments