Skip to content

gh-64612: Update error handlers list under open() #137304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Doc/library/codecs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,8 @@ error handling schemes by accepting the *errors* string argument:
The following error handlers can be used with all Python
:ref:`standard-encodings` codecs:

.. The following tables are reproduced on the library/functions page under open.

.. tabularcolumns:: |l|L|

+-------------------------+-----------------------------------------------+
Expand Down
78 changes: 50 additions & 28 deletions Doc/library/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1423,37 +1423,59 @@ are always available. They are listed here in alphabetical order.
*errors* is an optional string that specifies how encoding and decoding
errors are to be handled—this cannot be used in binary mode.
A variety of standard error handlers are available
(listed under :ref:`error-handlers`), though any
error handling name that has been registered with
(listed under :ref:`error-handlers`, and reproduced below for convenience),
though any error handling name that has been registered with
:func:`codecs.register_error` is also valid. The standard names
include:

* ``'strict'`` to raise a :exc:`ValueError` exception if there is
an encoding error. The default value of ``None`` has the same
effect.

* ``'ignore'`` ignores errors. Note that ignoring encoding errors
can lead to data loss.

* ``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted
where there is malformed data.

* ``'surrogateescape'`` will represent any incorrect bytes as low
surrogate code units ranging from U+DC80 to U+DCFF.
These surrogate code units will then be turned back into
the same bytes when the ``surrogateescape`` error handler is used
when writing data. This is useful for processing files in an
unknown encoding.

* ``'xmlcharrefreplace'`` is only supported when writing to a file.
Characters not supported by the encoding are replaced with the
appropriate XML character reference :samp:`&#{nnn};`.

* ``'backslashreplace'`` replaces malformed data by Python's backslashed
escape sequences.

* ``'namereplace'`` (also only supported when writing)
replaces unsupported characters with ``\N{...}`` escape sequences.
.. list-table::
:header-rows: 1

* - Error handler
- Description
* - ``'strict'``
- Raise a :exc:`UnicodeError` (or a subclass) exception if there is
an error. The default value of ``None`` has the same effect.
* - ``'ignore'``
- Ignore the malformed data and continue without further notice.
Note that ignoring encoding errors can lead to data loss.
* - ``'replace'``
- Replace malformed data with a replacement marker.
On encoding, use ``?`` (ASCII character).
On decoding, use ``�`` (U+FFFD, the official REPLACEMENT CHARACTER)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
On decoding, use ```` (U+FFFD, the official REPLACEMENT CHARACTER)
On decoding, use ```` (U+FFFD, the official REPLACEMENT CHARACTER).

* - ``'backslashreplace'``
- Replace malformed data with backslashed escape sequences.
On encoding, use hexadecimal form of Unicode code point with formats
:samp:`\\x{hh}` :samp:`\\u{xxxx}` :samp:`\\U{xxxxxxxx}`.
On decoding, use hexadecimal form of byte value with format :samp:`\\x{hh}`.
* - ``'surrogateescape'``
- Will represent any incorrect bytes as low
surrogate code units ranging from ``U+DC80`` to ``U+DCFF``.
These surrogate code units will then be turned back into
the same bytes when the ``'surrogateescape'`` error handler is used
when writing data. This is useful for processing files in an
unknown encoding.
* - ``'surrogatepass'``
- Only available for Unicode codecs.
Allow encoding and decoding surrogate code point
(``U+D800`` - ``U+DFFF``) as normal code point. Otherwise these codecs
treat the presence of surrogate code point in :class:`str` as an error.

The following error handlers are only applicable to encoding (within
:term:`text encodings <text encoding>`):

.. list-table::
:header-rows: 1

* - Error handler
- Description
* - ``'xmlcharrefreplace'``
- Only supported when writing to a file.
Characters not supported by the encoding are replaced with the
appropriate XML character reference :samp:`&#{nnn};`.
* - ``'namereplace'``
- Only supported when writing. Replaces unsupported characters with
``\N{...}`` escape sequences.

.. index::
single: universal newlines; open() built-in function
Expand Down
Loading
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy