From 4e6c76a8deb5aa651327f7cb76ee96dd3b73586b Mon Sep 17 00:00:00 2001 From: Harry Date: Mon, 21 Jun 2021 22:16:23 +0100 Subject: [PATCH 001/722] Fix capitalization of PCbuild (GH-719) --- setup.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/setup.rst b/setup.rst index 1a6dde4df..535cdcd2f 100644 --- a/setup.rst +++ b/setup.rst @@ -258,9 +258,9 @@ are downloaded: .. code-block:: dosbatch - PCBuild\build.bat + PCbuild\build.bat -After this build succeeds, you can open the ``PCBuild\pcbuild.sln`` solution in +After this build succeeds, you can open the ``PCbuild\pcbuild.sln`` solution in Visual Studio to continue development. See the `readme`_ for more details on what other software is necessary and how From dd6dba2ca552f2324345cac3350b9ccbf915d524 Mon Sep 17 00:00:00 2001 From: Marco Ippolito Date: Wed, 23 Jun 2021 16:08:25 -0300 Subject: [PATCH 002/722] Add advice on building all modules on Debian-like (GH-673) --- setup.rst | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/setup.rst b/setup.rst index 535cdcd2f..24b7c97d2 100644 --- a/setup.rst +++ b/setup.rst @@ -325,10 +325,14 @@ Then you should update the packages index:: Now you can install the build dependencies via ``apt``:: - $ sudo apt-get build-dep python3.6 + $ sudo apt-get build-dep python3 -If that package is not available for your system, try reducing the minor -version until you find a package that is available. +If you want to build all optional modules, install the following packages and +their dependencies:: + + $ sudo apt-get install build-essential gdb lcov libbz2-dev libffi-dev \ + libgdbm-dev liblzma-dev libncurses5-dev libreadline6-dev \ + libsqlite3-dev libssl-dev lzma lzma-dev tk-dev uuid-dev zlib1g-dev .. _MacOS: From 91893b0246ded3e599f4f876d149d44fe3bb58c6 Mon Sep 17 00:00:00 2001 From: Will Schlitzer Date: Thu, 24 Jun 2021 10:46:42 +0100 Subject: [PATCH 003/722] Fix minor typo in coredev.rst (GH-720) This pull request changed "requied" to "required" in the "Gaining Commit Privileges" section in coredev.rst. --- coredev.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/coredev.rst b/coredev.rst index 195b5f559..1066ae2a0 100644 --- a/coredev.rst +++ b/coredev.rst @@ -70,7 +70,7 @@ The steps to gaining commit privileges are: 2. The poll is announced on python-committers 3. Wait for the poll to close and see if the results confirm your membership - as per the voting results requied by PEP 13 + as per the voting results required by PEP 13 4. The person who nominated you emails the steering council with your email address and a request that the council either accept or reject the proposed membership From 0ae9c851419d5eba1af2623129ffd415d75be73c Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 14 Jul 2021 18:22:42 -0700 Subject: [PATCH 004/722] Bump sphinx from 3.5.4 to 4.1.1 (#725) Bumps [sphinx](https://github.com/sphinx-doc/sphinx) from 3.5.4 to 4.1.1. - [Release notes](https://github.com/sphinx-doc/sphinx/releases) - [Changelog](https://github.com/sphinx-doc/sphinx/blob/4.x/CHANGES) - [Commits](https://github.com/sphinx-doc/sphinx/compare/v3.5.4...v4.1.1) --- updated-dependencies: - dependency-name: sphinx dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index 75e94998f..60f42feeb 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,3 @@ -Sphinx==3.5.4 +Sphinx==4.1.1 furo sphinx_copybutton>=0.3.3 From f8ca097b0eeb1f85863a631044ac2765aea670d2 Mon Sep 17 00:00:00 2001 From: Mariatta Wijaya Date: Thu, 15 Jul 2021 12:11:33 -0700 Subject: [PATCH 005/722] Delete travis.yml file (#726) GitHub Actions seem sufficient for our needs, we don't need both running at the same time. --- .travis.yml | 16 ---------------- 1 file changed, 16 deletions(-) delete mode 100644 .travis.yml diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index 5608206df..000000000 --- a/.travis.yml +++ /dev/null @@ -1,16 +0,0 @@ -language: python -python: 3.6 -cache: pip - -install: python3 -m pip install -U pip -r requirements.txt - -jobs: - include: - - stage: build-docs - script: sphinx-build -n -W -q -b html -d _build/doctrees . _build/html - - - stage: link-check - script: sphinx-build -b linkcheck -d _build/doctrees . _build/linkcheck - - allow_failures: - - stage: link-check From 0e6f02833f0adb4a229bacf425b56a3620a50474 Mon Sep 17 00:00:00 2001 From: Edison J Abahurire <20975616+SimiCode@users.noreply.github.com> Date: Fri, 16 Jul 2021 00:20:56 +0300 Subject: [PATCH 006/722] Fixes #717: Example for Running a single test case fails (#718) The class seems to be reloaded using a different name at the bottom of `https://github.com/python/cpython/blob/main/Lib/test/test_abc.py` --- runtests.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runtests.rst b/runtests.rst index 74e38ff8b..2d6c3eb48 100644 --- a/runtests.rst +++ b/runtests.rst @@ -48,7 +48,7 @@ verbose mode (using ``-v``), so that individual failures are detailed:: To run a single test case, use the ``unittest`` module, providing the import path to the test case:: - ./python -m unittest -v test.test_abc.TestABC + ./python -m unittest -v test.test_abc.TestABC_Py If you have a multi-core or multi-CPU machine, you can enable parallel testing using several Python processes so as to speed up things:: From af52f4be7c9fab2ec2b658b130fc1686c3586b04 Mon Sep 17 00:00:00 2001 From: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com> Date: Fri, 16 Jul 2021 18:03:18 -0400 Subject: [PATCH 007/722] bpo-42349: Clarify basicblocks and give some examples (#714) Co-authored-by: Mariatta Wijaya --- compiler.rst | 56 +++++++++++++++++++++++++++++++++------------------- 1 file changed, 36 insertions(+), 20 deletions(-) diff --git a/compiler.rst b/compiler.rst index baa0ae825..91fe254d2 100644 --- a/compiler.rst +++ b/compiler.rst @@ -288,26 +288,42 @@ number is passed as the last parameter to each ``stmt_ty`` function. Control Flow Graphs ------------------- -A control flow graph (often referenced by its acronym, CFG) is a -directed graph that models the flow of a program using basic blocks that -contain the intermediate representation (abbreviated "IR", and in this -case is Python bytecode) within the blocks. Basic blocks themselves are -a block of IR that has a single entry point but possibly multiple exit -points. The single entry point is the key to basic blocks; it all has -to do with jumps. An entry point is the target of something that -changes control flow (such as a function call or a jump) while exit -points are instructions that would change the flow of the program (such -as jumps and 'return' statements). What this means is that a basic -block is a chunk of code that starts at the entry point and runs to an -exit point or the end of the block. - -As an example, consider an 'if' statement with an 'else' block. The -guard on the 'if' is a basic block which is pointed to by the basic -block containing the code leading to the 'if' statement. The 'if' -statement block contains jumps (which are exit points) to the true body -of the 'if' and the 'else' body (which may be ``NULL``), each of which are -their own basic blocks. Both of those blocks in turn point to the -basic block representing the code following the entire 'if' statement. +A *control flow graph* (often referenced by its acronym, CFG) is a +directed graph that models the flow of a program. A node of a CFG is +not an individual bytecode instruction, but instead represents a +sequence of bytecode instructions that always execute sequentially. +Each node is called a *basic block* and must always execute from +start to finish, with a single entry point at the beginning and a +single exit point at the end. If some bytecode instruction *a* needs +to jump to some other bytecode instruction *b*, then *a* must occur at +the end of its basic block, and *b* must occur at the start of its +basic block. + +As an example, consider the following code snippet: + +.. code-block:: Python + + if x < 10: + f1() + f2() + else: + g() + end() + +The ``x < 10`` guard is represented by its own basic block that +compares ``x`` with ``10`` and then ends in a conditional jump based on +the result of the comparison. This conditional jump allows the block +to point to both the body of the ``if`` and the body of the ``else``. The +``if`` basic block contains the ``f1()`` and ``f2()`` calls and points to +the ``end()`` basic block. The ``else`` basic block contains the ``g()`` +call and similarly points to the ``end()`` block. + +Note that more complex code in the guard, the ``if`` body, or the ``else`` +body may be represented by multiple basic blocks. For instance, +short-circuiting boolean logic in a guard like ``if x or y:`` +will produce one basic block that tests the truth value of ``x`` +and then points both (1) to the start of the ``if`` body and (2) to +a different basic block that tests the truth value of y. CFGs are usually one step away from final code output. Code is directly generated from the basic blocks (with jump targets adjusted based on the From 37c53440bc0b504dc8a54486ca2c5fb0daed338f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Langa?= Date: Sat, 17 Jul 2021 10:26:28 +0200 Subject: [PATCH 008/722] Replace @ilevkivskyi with @Fidget-Spinner as typing expert (#728) --- experts.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/experts.rst b/experts.rst index 001ff642a..dc84bdfb9 100644 --- a/experts.rst +++ b/experts.rst @@ -241,7 +241,7 @@ tracemalloc vstinner tty twouters* turtle gregorlingl, willingc types yselivanov -typing gvanrossum, levkivskyi* +typing gvanrossum, kj unicodedata lemburg, ezio.melotti unittest michael.foord*, ezio.melotti, rbcollins unittest.mock michael.foord* From 69c49e1b8d17c2f3e6fe3f4c22a0d78e6702a7f6 Mon Sep 17 00:00:00 2001 From: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Date: Fri, 30 Jul 2021 23:15:13 +0800 Subject: [PATCH 009/722] Add build.bat counterparts to to checklist (GH-729) --- grammar.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/grammar.rst b/grammar.rst index 73226e115..571d6869c 100644 --- a/grammar.rst +++ b/grammar.rst @@ -23,14 +23,16 @@ Checklist Note: sometimes things mysteriously don't work. Before giving up, try ``make clean``. * :file:`Grammar/python.gram`: The grammar, with actions that build AST nodes. After changing - it, run ``make regen-pegen``, to regenerate :file:`Parser/parser.c`. + it, run ``make regen-pegen`` (or ``build.bat --regen`` on Windows), to + regenerate :file:`Parser/parser.c`. (This runs Python's parser generator, ``Tools/peg_generator``). * :file:`Grammar/Tokens` is a place for adding new token types. After changing it, run ``make regen-token`` to regenerate :file:`Include/token.h`, :file:`Parser/token.c`, :file:`Lib/token.py` and :file:`Doc/library/token-list.inc`. If you change both ``python.gram`` and ``Tokens``, - run ``make regen-token`` before ``make regen-pegen``. + run ``make regen-token`` before ``make regen-pegen``. On Windows, + ``build.bat --regen`` will regenerate both at the same time. * :file:`Parser/Python.asdl` may need changes to match the grammar. Then run ``make regen-ast`` to regenerate :file:`Include/Python-ast.h` and :file:`Python/Python-ast.c`. From 8d97b43d87477a9c7fba439a9a0916e18be97fd8 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri, 30 Jul 2021 17:16:12 +0200 Subject: [PATCH 010/722] Bump sphinx from 4.1.1 to 4.1.2 (GH-732) Bumps [sphinx](https://github.com/sphinx-doc/sphinx) from 4.1.1 to 4.1.2. - [Release notes](https://github.com/sphinx-doc/sphinx/releases) - [Changelog](https://github.com/sphinx-doc/sphinx/blob/4.x/CHANGES) - [Commits](https://github.com/sphinx-doc/sphinx/compare/v4.1.1...v4.1.2) Signed-off-by: dependabot[bot] --- requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index 60f42feeb..c90e30088 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,3 @@ -Sphinx==4.1.1 +Sphinx==4.1.2 furo sphinx_copybutton>=0.3.3 From 187740f715c036de61e4330a47f2ee406d06a335 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Langa?= Date: Fri, 30 Jul 2021 17:19:06 +0200 Subject: [PATCH 011/722] bpo-44777: Explicitly list the python-buildbots mailing list as a contact method (GH-733) --- buildbots.rst | 22 ++++++++++++++++++++++ pullrequest.rst | 3 +++ 2 files changed, 25 insertions(+) diff --git a/buildbots.rst b/buildbots.rst index ca342938d..47f319c30 100644 --- a/buildbots.rst +++ b/buildbots.rst @@ -25,6 +25,28 @@ build results after you push a change to the repository. It is therefore important that you get acquainted with the way these results are presented, and how various kinds of failures can be explained and diagnosed. +In case of trouble +------------------ + +Please read this page in full. If your questions aren't answered here and you +need assistance with the buildbots, a good way to get help is to either: + +* contact the ``python-buildbots@python.org`` mailing list where all buildbot + worker owners are subscribed; or +* contact the release manager of the branch you have issues with. + +Buildbot failures on Pull Requests +---------------------------------- + +The ``bedevere-bot`` on GitHub will put a message on your merged Pull Request +if building your commit on a stable buildbot worker fails. Take care to +evaluate the failure, even if it looks unrelated at first glance. + +Not all failures will generate a notification since not all builds are executed +after each commit. In particular, reference leaks builds take several hours to +complete so they are done periodically. This is why it's important for you to +be able to check the results yourself, too. + Checking results of automatic builds ------------------------------------ diff --git a/pullrequest.rst b/pullrequest.rst index 7cb194490..e6b6b767c 100644 --- a/pullrequest.rst +++ b/pullrequest.rst @@ -37,6 +37,9 @@ Here is a quick overview of how you can contribute to CPython: #. `Create Pull Request`_ on GitHub to merge a branch from your fork +#. Make sure the continuous integration checks on your Pull Request + are green (i.e. successful) + #. Review and address `comments on your Pull Request`_ #. When your changes are merged, you can :ref:`delete the PR branch From 4edf354b82aece8c6b7bc0c5fb2b082a06071f5d Mon Sep 17 00:00:00 2001 From: Pablo Galindo Salgado Date: Wed, 4 Aug 2021 17:54:48 +0100 Subject: [PATCH 012/722] Add a new guide on how to understand and use the PEG parser (#734) --- compiler.rst | 1 + grammar.rst | 16 +- index.rst | 3 +- parser.rst | 915 +++++++++++++++++++++++++++++++++++++++++++++++ requirements.txt | 2 +- 5 files changed, 928 insertions(+), 9 deletions(-) create mode 100644 parser.rst diff --git a/compiler.rst b/compiler.rst index 91fe254d2..85943118b 100644 --- a/compiler.rst +++ b/compiler.rst @@ -40,6 +40,7 @@ these (see :doc:`grammar`). Abstract Syntax Trees (AST) --------------------------- +.. _compiler-ast-trees: .. sidebar:: Green Tree Snakes diff --git a/grammar.rst b/grammar.rst index 571d6869c..471d4e827 100644 --- a/grammar.rst +++ b/grammar.rst @@ -9,13 +9,15 @@ Abstract There's more to changing Python's grammar than editing :file:`Grammar/python.gram`. Here's a checklist. -NOTE: These instructions are for Python 3.9 and beyond. Earlier -versions use a different parser technology. You probably shouldn't -try to change the grammar of earlier Python versions, but if you -really want to, use GitHub to track down the earlier version of this -file in the devguide. (Python 3.9 itself actually supports both -parsers; the old parser can be invoked by passing ``-X oldparser``.) - +.. note:: + These instructions are for Python 3.9 and beyond. Earlier + versions use a different parser technology. You probably shouldn't + try to change the grammar of earlier Python versions, but if you + really want to, use GitHub to track down the earlier version of this + file in the devguide. + +For more information on how to use the new parser, check the +:ref:`section on how to use CPython's parser `. Checklist --------- diff --git a/index.rst b/index.rst index 7c7810ca8..37afb6e85 100644 --- a/index.rst +++ b/index.rst @@ -262,6 +262,7 @@ Additional Resources * Help with ... * :doc:`exploring` * :doc:`grammar` + * :doc:`parser` * :doc:`compiler` * :doc:`garbage_collector` * Tool support @@ -293,7 +294,6 @@ Full Table of Contents ---------------------- .. toctree:: - :numbered: :maxdepth: 3 setup @@ -320,6 +320,7 @@ Full Table of Contents gdb exploring grammar + parser compiler garbage_collector extensions diff --git a/parser.rst b/parser.rst new file mode 100644 index 000000000..71387b639 --- /dev/null +++ b/parser.rst @@ -0,0 +1,915 @@ +.. _parser: + +Guide of CPython's Parser +========================= + +:Author: Pablo Galindo Salgado + +.. highlight:: none + +Abstract +-------- + +The Parser in CPython is currently a `PEG (Parser Expression Grammar) +`_ parser. The first +version of the parser used to be an `LL(1) +`_ based parser that was one of the +oldest parts of CPython implemented before it was replaced by :pep:`617`. In +particular, both the current parser and the old LL(1) parser are the output of a +`parser generator `_. This +means that the way the parser is written is by feeding a description of the +Grammar of the Python language to a special program (the parser generator) which +outputs the parser. The way the Python language is changed is therefore by +modifying the grammar file and developers rarely need to interact with the +parser generator itself other than use it to generate the parser. + +How PEG Parsers Work +-------------------- + +.. _how-peg-parsers-work: + +A PEG (Parsing Expression Grammar) grammar (like the current one) differs from a +context-free grammar in that the way it is written more closely +reflects how the parser will operate when parsing it. The fundamental technical +difference is that the choice operator is ordered. This means that when writing:: + + rule: A | B | C + +a context-free-grammar parser (like an LL(1) parser) will generate constructions +that given an input string will *deduce* which alternative (``A``, ``B`` or ``C``) +must be expanded, while a PEG parser will check if the first alternative succeeds +and only if it fails, will it continue with the second or the third one in the +order in which they are written. This makes the choice operator not commutative. + +Unlike LL(1) parsers, PEG-based parsers cannot be ambiguous: if a string parses, +it has exactly one valid parse tree. This means that a PEG-based parser cannot +suffer from the ambiguity problems that can arise with LL(1) parsers and with +context-free grammars in general. + +PEG parsers are usually constructed as a recursive descent parser in which every +rule in the grammar corresponds to a function in the program implementing the +parser and the parsing expression (the "expansion" or "definition" of the rule) +represents the "code" in said function. Each parsing function conceptually takes +an input string as its argument, and yields one of the following results: + +* A "success" result. This result indicates that the expression can be parsed by + that rule and the function may optionally move forward or consume one or more + characters of the input string supplied to it. +* A "failure" result, in which case no input is consumed. + +Notice that "failure" results do not imply that the program is incorrect, nor do +they necessarily mean that the parsing has failed. Since the choice operator is +ordered, a failure very often merely indicates "try the following option". A +direct implementation of a PEG parser as a recursive descent parser will present +exponential time performance in the worst case, because PEG parsers have +infinite lookahead (this means that they can consider an arbitrary number of +tokens before deciding for a rule). Usually, PEG parsers avoid this exponential +time complexity with a technique called "packrat parsing" [1]_ which not only +loads the entire program in memory before parsing it but also allows the parser +to backtrack arbitrarily. This is made efficient by memoizing the rules already +matched for each position. The cost of the memoization cache is that the parser +will naturally use more memory than a simple LL(1) parser, which normally are +table-based. + + +Key ideas +~~~~~~~~~ + +.. important:: + Don't try to reason about a PEG grammar in the same way you would to with an EBNF + or context free grammar. PEG is optimized to describe **how** input strings will + be parsed, while context-free grammars are optimized to generate strings of the + language they describe (in EBNF, to know if a given string is in the language, you need + to do work to find out as it is not immediately obvious from the grammar). + +* Alternatives are ordered ( ``A | B`` is not the same as ``B | A`` ). +* If a rule returns a failure, it doesn't mean that the parsing has failed, + it just means "try something else". +* By default PEG parsers run in exponential time, which can be optimized to linear by + using memoization. +* If parsing fails completely (no rule succeeds in parsing all the input text), the + PEG parser doesn't have a concept of "where the :exc:`SyntaxError` is". + + +Consequences or the ordered choice operator +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. _consequences-of-ordered-choice: + +Although PEG may look like EBNF, its meaning is quite different. The fact +that in PEG parsers alternatives are ordered (which is at the core of how PEG +parsers work) has deep consequences, other than removing ambiguity. + +If a rule has two alternatives and the first of them succeeds, the second one is +**not** attempted even if the caller rule fails to parse the rest of the input. +Thus the parser is said to be "eager". To illustrate this, consider +the following two rules (in these examples, a token is an individual character): :: + + first_rule: ( 'a' | 'aa' ) 'a' + second_rule: ('aa' | 'a' ) 'a' + +In a regular EBNF grammar, both rules specify the language ``{aa, aaa}`` but +in PEG, one of these two rules accepts the string ``aaa`` but not the string +``aa``. The other does the opposite -- it accepts the string the string ``aa`` +but not the string ``aaa``. The rule ``('a'|'aa')'a'`` does +not accept ``aaa`` because ``'a'|'aa'`` consumes the first ``a``, letting the +final ``a`` in the rule consume the second, and leaving out the third ``a``. +As the rule has succeeded, no attempt is ever made to go back and let +``'a'|'aa'`` try the second alternative. The expression ``('aa'|'a')'a'`` does +not accept ``aa`` because ``'aa'|'a'`` accepts all of ``aa``, leaving nothing +for the final ``a``. Again, the second alternative of ``'aa'|'a'`` is not +tried. + +.. caution:: + + The effects of ordered choice, such as the ones illustrated above, may be hidden by many levels of rules. + +For this reason, writing rules where an alternative is contained in the next one is in almost all cases a mistake, +for example: :: + + my_rule: + | 'if' expression 'then' block + | 'if' expression 'then' block 'else' block + +In this example, the second alternative will never be tried because the first one will +succeed first (even if the input string has an ``'else' block`` that follows). To correctly +write this rule you can simply alter the order: :: + + my_rule: + | 'if' expression 'then' block 'else' block + | 'if' expression 'then' block + +In this case, if the input string doesn't have an ``'else' block``, the first alternative +will fail and the second will be attempted without said part. + +Syntax +------ + +The grammar consists of a sequence of rules of the form: :: + + rule_name: expression + +Optionally, a type can be included right after the rule name, which +specifies the return type of the C or Python function corresponding to +the rule: :: + + rule_name[return_type]: expression + +If the return type is omitted, then a ``void *`` is returned in C and an +``Any`` in Python. + +Grammar Expressions +~~~~~~~~~~~~~~~~~~~ + +``# comment`` +''''''''''''' + +Python-style comments. + +``e1 e2`` +''''''''' + +Match e1, then match e2. + +:: + + rule_name: first_rule second_rule + +``e1 | e2`` +''''''''''' + +Match e1 or e2. + +The first alternative can also appear on the line after the rule name +for formatting purposes. In that case, a \| must be used before the +first alternative, like so: + +:: + + rule_name[return_type]: + | first_alt + | second_alt + +``( e )`` +''''''''' + +Match e. + +:: + + rule_name: (e) + +A slightly more complex and useful example includes using the grouping +operator together with the repeat operators: + +:: + + rule_name: (e1 e2)* + +``[ e ] or e?`` +''''''''''''''' + +Optionally match e. + +:: + + rule_name: [e] + +A more useful example includes defining that a trailing comma is +optional: + +:: + + rule_name: e (',' e)* [','] + +``e*`` +'''''' + +Match zero or more occurrences of e. + +:: + + rule_name: (e1 e2)* + +``e+`` +'''''' + +Match one or more occurrences of e. + +:: + + rule_name: (e1 e2)+ + +``s.e+`` +'''''''' + +Match one or more occurrences of e, separated by s. The generated parse +tree does not include the separator. This is otherwise identical to +``(e (s e)*)``. + +:: + + rule_name: ','.e+ + +``&e`` +'''''' + +.. _peg-positive-lookahead: + +Succeed if e can be parsed, without consuming any input. + +``!e`` +'''''' + +.. _peg-negative-lookahead: + +Fail if e can be parsed, without consuming any input. + +An example taken from the Python grammar specifies that a primary +consists of an atom, which is not followed by a ``.`` or a ``(`` or a +``[``: + +:: + + primary: atom !'.' !'(' !'[' + +``~`` +'''''' + +Commit to the current alternative, even if it fails to parse. + +:: + + rule_name: '(' ~ some_rule ')' | some_alt + +In this example, if a left parenthesis is parsed, then the other +alternative won’t be considered, even if some_rule or ‘)’ fail to be +parsed. + +Left recursion +~~~~~~~~~~~~~~ + +PEG parsers normally do not support left recursion but CPython's parser +generator implements a technique similar to the one described in Medeiros et al. +[2]_ but using the memoization cache instead of static variables. This approach +is closer to the one described in Warth et al. [3]_. This allows us to write not +only simple left-recursive rules but also more complicated rules that involve +indirect left-recursion like:: + + rule1: rule2 | 'a' + rule2: rule3 | 'b' + rule3: rule1 | 'c' + +and "hidden left-recursion" like:: + + rule: 'optional'? rule '@' some_other_rule + +Variables in the Grammar +~~~~~~~~~~~~~~~~~~~~~~~~ + +A sub-expression can be named by preceding it with an identifier and an +``=`` sign. The name can then be used in the action (see below), like this: :: + + rule_name[return_type]: '(' a=some_other_rule ')' { a } + +Grammar actions +~~~~~~~~~~~~~~~ + +.. _peg-grammar-actions: + +To avoid the intermediate steps that obscure the relationship between the +grammar and the AST generation the PEG parser allows directly generating AST +nodes for a rule via grammar actions. Grammar actions are language-specific +expressions that are evaluated when a grammar rule is successfully parsed. These +expressions can be written in Python or C depending on the desired output of the +parser generator. This means that if one would want to generate a parser in +Python and another in C, two grammar files should be written, each one with a +different set of actions, keeping everything else apart from said actions +identical in both files. As an example of a grammar with Python actions, the +piece of the parser generator that parses grammar files is bootstrapped from a +meta-grammar file with Python actions that generate the grammar tree as a result +of the parsing. + +In the specific case of the PEG grammar for Python, having actions allows +directly describing how the AST is composed in the grammar itself, making it +more clear and maintainable. This AST generation process is supported by the use +of some helper functions that factor out common AST object manipulations and +some other required operations that are not directly related to the grammar. + +To indicate these actions each alternative can be followed by the action code +inside curly-braces, which specifies the return value of the alternative:: + + rule_name[return_type]: + | first_alt1 first_alt2 { first_alt1 } + | second_alt1 second_alt2 { second_alt1 } + +If the action is omitted and C code is being generated, then there are two +different possibilities: + +1. If there’s a single name in the alternative, this gets returned. +2. If not, a dummy name object gets returned (this case should be avoided). + +If the action is ommited, a default action is generated: + +* If there's a single name in the rule in the rule, it gets returned. + +* If there is more than one name in the rule, a collection with all parsed + expressions gets returned (the type of the collection will be different + in C and Python). + +This default behaviour is primarily made for very simple situations and for +debugging pourposes. + +The full meta-grammar for the grammars supported by the PEG generator is: + +:: + + start[Grammar]: grammar ENDMARKER { grammar } + + grammar[Grammar]: + | metas rules { Grammar(rules, metas) } + | rules { Grammar(rules, []) } + + metas[MetaList]: + | meta metas { [meta] + metas } + | meta { [meta] } + + meta[MetaTuple]: + | "@" NAME NEWLINE { (name.string, None) } + | "@" a=NAME b=NAME NEWLINE { (a.string, b.string) } + | "@" NAME STRING NEWLINE { (name.string, literal_eval(string.string)) } + + rules[RuleList]: + | rule rules { [rule] + rules } + | rule { [rule] } + + rule[Rule]: + | rulename ":" alts NEWLINE INDENT more_alts DEDENT { + Rule(rulename[0], rulename[1], Rhs(alts.alts + more_alts.alts)) } + | rulename ":" NEWLINE INDENT more_alts DEDENT { Rule(rulename[0], rulename[1], more_alts) } + | rulename ":" alts NEWLINE { Rule(rulename[0], rulename[1], alts) } + + rulename[RuleName]: + | NAME '[' type=NAME '*' ']' {(name.string, type.string+"*")} + | NAME '[' type=NAME ']' {(name.string, type.string)} + | NAME {(name.string, None)} + + alts[Rhs]: + | alt "|" alts { Rhs([alt] + alts.alts)} + | alt { Rhs([alt]) } + + more_alts[Rhs]: + | "|" alts NEWLINE more_alts { Rhs(alts.alts + more_alts.alts) } + | "|" alts NEWLINE { Rhs(alts.alts) } + + alt[Alt]: + | items '$' action { Alt(items + [NamedItem(None, NameLeaf('ENDMARKER'))], action=action) } + | items '$' { Alt(items + [NamedItem(None, NameLeaf('ENDMARKER'))], action=None) } + | items action { Alt(items, action=action) } + | items { Alt(items, action=None) } + + items[NamedItemList]: + | named_item items { [named_item] + items } + | named_item { [named_item] } + + named_item[NamedItem]: + | NAME '=' ~ item {NamedItem(name.string, item)} + | item {NamedItem(None, item)} + | it=lookahead {NamedItem(None, it)} + + lookahead[LookaheadOrCut]: + | '&' ~ atom {PositiveLookahead(atom)} + | '!' ~ atom {NegativeLookahead(atom)} + | '~' {Cut()} + + item[Item]: + | '[' ~ alts ']' {Opt(alts)} + | atom '?' {Opt(atom)} + | atom '*' {Repeat0(atom)} + | atom '+' {Repeat1(atom)} + | sep=atom '.' node=atom '+' {Gather(sep, node)} + | atom {atom} + + atom[Plain]: + | '(' ~ alts ')' {Group(alts)} + | NAME {NameLeaf(name.string) } + | STRING {StringLeaf(string.string)} + + # Mini-grammar for the actions + + action[str]: "{" ~ target_atoms "}" { target_atoms } + + target_atoms[str]: + | target_atom target_atoms { target_atom + " " + target_atoms } + | target_atom { target_atom } + + target_atom[str]: + | "{" ~ target_atoms "}" { "{" + target_atoms + "}" } + | NAME { name.string } + | NUMBER { number.string } + | STRING { string.string } + | "?" { "?" } + | ":" { ":" } + +As an illustrative example this simple grammar file allows directly +generating a full parser that can parse simple arithmetic expressions and that +returns a valid C-based Python AST: + +:: + + start[mod_ty]: a=expr_stmt* ENDMARKER { _PyAST_Module(a, NULL, p->arena) } + expr_stmt[stmt_ty]: a=expr NEWLINE { _PyAST_Expr(a, EXTRA) } + + expr[expr_ty]: + | l=expr '+' r=term { _PyAST_BinOp(l, Add, r, EXTRA) } + | l=expr '-' r=term { _PyAST_BinOp(l, Sub, r, EXTRA) } + | term + + term[expr_ty]: + | l=term '*' r=factor { _PyAST_BinOp(l, Mult, r, EXTRA) } + | l=term '/' r=factor { _PyAST_BinOp(l, Div, r, EXTRA) } + | factor + + factor[expr_ty]: + | '(' e=expr ')' { e } + | atom + + atom[expr_ty]: + | NAME + | NUMBER + +Here ``EXTRA`` is a macro that expands to ``start_lineno, start_col_offset, +end_lineno, end_col_offset, p->arena``, those being variables automatically +injected by the parser; ``p`` points to an object that holds on to all state +for the parser. + +A similar grammar written to target Python AST objects: + +:: + + start[ast.Module]: a=expr_stmt* ENDMARKER { ast.Module(body=a or [] } + expr_stmt: a=expr NEWLINE { ast.Expr(value=a, EXTRA) } + + expr: + | l=expr '+' r=term { ast.BinOp(left=l, op=ast.Add(), right=r, EXTRA) } + | l=expr '-' r=term { ast.BinOp(left=l, op=ast.Sub(), right=r, EXTRA) } + | term + + term: + | l=term '*' r=factor { ast.BinOp(left=l, op=ast.Mult(), right=r, EXTRA) } + | l=term '/' r=factor { ast.BinOp(left=l, op=ast.Div(), right=r, EXTRA) } + | factor + + factor: + | '(' e=expr ')' { e } + | atom + + atom: + | NAME + | NUMBER + + +Pegen +----- + +Pegen is the parser generator used in CPython to produce the final PEG parser used by the interpreter. It is the +program that can be used to read the python grammar located in :file:`Grammar/Python.gram` and produce the final C +parser. It contains the following pieces: + +* A parser generator that can read a grammar file and produce a PEG parser written in Python or C that can parse + said grammar. The generator is located at :file:`Tools/peg_generator/pegen`. +* A PEG meta-grammar that automatically generates a Python parser that is used for the parser generator itself + (this means that there are no manually-written parsers). The meta-grammar is + located at :file:`Tools/peg_generator/pegen/metagrammar.gram`. +* A generated parser (using the parser generator) that can directly produce C and Python AST objects. + +The source code for Pegen lives at :file:`Tools/peg_generator/pegen` but normally all typical commands to interact +with the parser generator are executed from the main makefile. + +How to regenerate the parser +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Once you have made the changes to the grammar files, to regenerate the ``C`` +parser (the one used by the interpreter) just execute: :: + + make regen-pegen + +using the :file:`Makefile` in the main directory. If you are on Windows you can +use the Visual Studio project files to regenerate the parser or to execute: :: + + ./PCbuild/build.bat --regen + +The generated parser file is located at :file:`Parser/parser.c`. + +How to regenerate the meta-parser +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The meta-grammar (the grammar that describes the grammar for the grammar files +themselves) is located at :file:`Tools/peg_generator/pegen/metagrammar.gram`. +Although it is very unlikely that you will ever need to modify it, if you make any modifications +to this file (in order to implement new Pegen features) you will need to regenerate +the meta-parser (the parser that parses the grammar files). To do so just execute: :: + + make regen-pegen-metaparser + +If you are on Windows you can use the Visual Studio project files +to regenerate the parser or to execute: :: + + ./PCbuild/build.bat --regen + + +Grammatical elements and rules +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Pegen has some special grammatical elements and rules: + +* Strings with single quotes (') (e.g. ``'class'``) denote KEYWORDS. +* Strings with double quotes (") (e.g. ``"match"``) denote SOFT KEYWORDS. +* Upper case names (e.g. ``NAME``) denote tokens in the :file:`Grammar/Tokens` file. +* Rule names starting with `invalid_` are used for specialized syntax errors. + + - These rules are NOT used in the first pass of the parser. + - Only if the first pass fails to parse, a second pass including the invalid + rules will be executed. + - If the parser fails in the second phase with a generic syntax error, the + location of the generic failure of the first pass will be used (this avoids + reporting incorrect locations due to the invalid rules). + - The order of the alternatives involving invalid rules matter + (like any rule in PEG). + +Tokenization +~~~~~~~~~~~~ + +It is common among PEG parser frameworks that the parser does both the parsing and the tokenization, +but this does not happen in Pegen. The reason is that the Python language needs a custom tokenizer +to handle things like indentation boundaries, some special keywords like ``ASYNC`` and ``AWAIT`` +(for compatibility purposes), backtracking errors (such as unclosed parenthesis), dealing with encoding, +interactive mode and much more. Some of these reasons are also there for historical purposes, and some +others are useful even today. + +The list of tokens (all uppercase names in the grammar) that you can use can be found in the :file:`Grammar/Tokens` +file. If you change this file to add new tokens, make sure to regenerate the files by executing: :: + + make regen-token + +If you are on Windows you can use the Visual Studio project files to regenerate the tokens or to execute: :: + + ./PCbuild/build.bat --regen + +How tokens are generated and the rules governing this is completely up to the tokenizer (:file:`Parser/tokenizer.c`) +and the parser just receives tokens from it. + +Memoization +~~~~~~~~~~~ + +As described previously, to avoid exponential time complexity in the parser, memoization is used. + +The C parser used by Python is highly optimized and memoization can be expensive both in memory and time. Although +the memory cost is obvious (the parser needs memory for storing previous results in the cache) the execution time +cost comes for continuously checking if the given rule has a cache hit or not. In many situations, just parsing it +again can be faster. Pegen **disables memoization by default** except for rules with the special marker `memo` after +the rule name (and type, if present): :: + + rule_name[typr] (memo): + ... + +By selectively turning on memoization for a handful of rules, the parser becomes faster and uses less memory. + +.. note:: + Left-recursive rules always use memoization, since the implementation of left-recursion depends on it. + +To know if a new rule needs memoization or not, benchmarking is required +(comparing execution times and memory usage of some considerably big files with +and without memoization). There is a very simple instrumentation API available +in the generated C parse code that allows to measure how much each rule uses +memoization (check the :file:`Parser/pegen.c` file for more information) but it +needs to be manually activated. + +Automatic variables +~~~~~~~~~~~~~~~~~~~ + +To make writing actions easier, Pegen injects some automatic variables in the namespace available +when writing actions. In the C parser, some of these automatic variable names are: + +* ``p``: The parser structure. +* ``EXTRA``: This is a macro that expands to ``(_start_lineno, _start_col_offset, _end_lineno, _end_col_offset, p->arena)``, + which is normally used to create AST nodes as almost all constructors need these attributes to be provided. All of the + location variables are taken from the location information of the current token. + +Hard and Soft keywords +~~~~~~~~~~~~~~~~~~~~~~ + +.. note:: + In the grammar files, keywords are defined using **single quotes** (e.g. `'class'`) while soft + keywords are defined using **double quotes** (e.g. `"match"`). + +There are two kinds of keywords allowed in pegen grammars: *hard* and *soft* +keywords. The difference between hard and soft keywords is that hard keywords +are always reserved words, even in positions where they make no sense (e.g. ``x = class + 1``), +while soft keywords only get a special meaning in context. Trying to use a hard +keyword as a variable will always fail: + +.. code-block:: + + >>> class = 3 + File "", line 1 + class = 3 + ^ + SyntaxError: invalid syntax + >>> foo(class=3) + File "", line 1 + foo(class=3) + ^^^^^ + SyntaxError: invalid syntax + +While soft keywords don't have this limitation if used in a context other the one where they +are defined as keywords: + +.. code-block:: python + + >>> match = 45 + >>> foo(match="Yeah!") + +The ``match`` and ``case`` keywords are soft keywords, so that they are recognized as +keywords at the beginning of a match statement or case block respectively, but are +allowed to be used in other places as variable or argument names. + +You can get a list of all keywords defined in the grammar from Python: + +.. code-block:: python + + >>> import keyword + >>> keyword.kwlist + ['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break', + 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', + 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', + 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield'] + +as well as soft keywords: + +.. code-block:: python + + >>> import keyword + >>> keyword.softkwlist + ['_', 'case', 'match'] + +.. caution:: + Soft keywords can be a bit challenging to manage as they can be accepted in + places you don't intend to, given how the order alternatives behave in PEG + parsers (see :ref:`consequences of ordered choice section + ` for some background on this). In general, + try to define them in places where there is not a lot of alternatives. + +Error handling +~~~~~~~~~~~~~~ + +When a pegen-generated parser detects that an exception is raised, it will +**automatically stop parsing**, no matter what the current state of the parser +is and it will unwind the stack and report the exception. This means that if a +:ref:`rule action ` raises an exception all parsing will +stop at that exact point. This is done to allow to correctly propagate any +exception set by calling Python C-API functions. This also includes :exc:`SyntaxError` +exceptions and this is the main mechanism the parser uses to report custom syntax +error messages. + +.. note:: + Tokenizer errors are normally reported by raising exceptions but some special + tokenizer errors such as unclosed parenthesis will be reported only after the + parser finishes without returning anything. + +How Syntax errors are reported +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As described previously in the :ref:`how PEG parsers work section +`, PEG parsers don't have a defined concept of where +errors happened in the grammar, because a rule failure doesn't imply a +parsing failure like in context free grammars. This means that some heuristic +has to be used to report generic errors unless something is explicitly declared +as an error in the grammar. + +To report generic syntax errors, pegen uses a common heuristic in PEG parsers: +the location of *generic* syntax errors is reported in the furthest token that +was attempted to be matched but failed. This is only done if parsing has failed +(the parser returns ``NULL`` in C or ``None`` in Python) but no exception has +been raised. + +.. caution:: + Positive and negative lookaheads will try to match a token so they will affect + the location of generic syntax errors. Use them carefully at boundaries + between rules. + +As the Python grammar was primordially written as an LL(1) grammar, this heuristic +has an extremely high success rate, but some PEG features can have small effects, +such as :ref:`positive lookaheads ` and +:ref:`negative lookaheads `. + +To generate more precise syntax errors, custom rules are used. This is a common practice +also in context free grammars: the parser will try to accept some construct that is known +to be incorrect just to report a specific syntax error for that construct. In pegen grammars, +these rules start with the ``invalid_`` prefix. This is because trying to match these rules +normally has a performance impact on parsing (and can also affect the 'correct' grammar itself +in some tricky cases, depending on the ordering of the rules) so the generated parser acts in +two phases: + +1. The first phase will try to parse the input stream without taking into account rules that + start with the ``invalid_`` prefix. If the parsing succeeds it will return the generated AST + and the second phase will not be attempted. + +2. If the first phase failed, a second parsing attempt is done including the rules that start + with an ``invalid_`` prefix. By design this attempt **cannot succeed** and is only executed + to give to the invalid rules a chance to detect specific situations where custom, more precise, + syntax errors can be raised. This also allows to trade a bit of performance for precision reporting + errors: given that we know that the input text is invalid, there is no need to be fast because + the interpreter is going to stop anyway. + +.. important:: + When defining invalid rules: + + * Make sure all custom invalid rules raise :exc:`SyntaxError` exceptions (or a subclass of it). + * Make sure **all** invalid rules start with the ``invalid_`` prefix to not + impact performance of parsing correct Python code. + * Make sure the parser doesn't behave differently for regular rules when you introduce invalid rules + (see the :ref:`how PEG parsers work section ` for more information). + +You can find a collection of macros to raise specialized syntax errors in the +:file:`Parser/pegen.h` header file. These macros allow also to report ranges for +the custom errors that will be highlighted in the tracebacks that will be +displayed when the error is reported. + +.. tip:: + A good way to test if an invalid rule will be triggered when you expect is to test if introducing + a syntax error **after** valid code triggers the rule or not. For example: :: + + $ 42 + + Should trigger the syntax error in the ``$`` character. If your rule is not correctly defined this + won't happen. For example, if you try to define a rule to match Python 2 style ``print`` statements + to make a better error message and you define it as: :: + + invalid_print: "print" expression + + This will **seem** to work because the parser will correctly parse ``print(something)`` because it is valid + code and the second phase will never execute but if you try to parse ``print(something) $ 3`` the first pass + of the parser will fail (because of the ``$``) and in the second phase, the rule will match the + ``print(something)`` as ``print`` followed by the variable ``something`` between parentheses and the error + will be reported there instead of the ``$`` character. + +Generating AST objects +~~~~~~~~~~~~~~~~~~~~~~ + +The output of the C parser used by CPython that is generated by the +:file:`Grammar/Python.gram` grammar file is a Python AST object (using C +structures). This means that the actions in the grammar file generate AST objects +when they succeed. Constructing these objects can be quite cumbersome (see +the :ref:`AST compiler section ` for more information +on how these objects are constructed and how they are used by the compiler) so +special helper functions are used. These functions are declared in the +:file:`Parser/pegen.h` header file and defined in the :file:`Parser/pegen.c` +file. These functions allow you to join AST sequences, get specific elements +from them or to do extra processing on the generated tree. + +.. caution:: + Actions must **never** be used to accept or reject rules. It may be tempting + in some situations to write a very generic rule and then check the generated + AST to decide if is valid or not but this will render the `official grammar + `_ partially incorrect + (because actions are not included) and will make it more difficult for other + Python implementations to adapt the grammar to their own needs. + +As a general rule, if an action spawns multiple lines or requires something more +complicated than a single expression of C code, is normally better to create a +custom helper in :file:`Parser/pegen.c` and expose it in the +:file:`Parser/pegen.h` header file so it can be used from the grammar. + +If the parsing succeeds, the parser **must** return a **valid** AST object. + +Testing +------- + +There are three files that contain tests for the grammar and the parser: + +* `Lib/test/test_grammar.py`. +* `Lib/test/test_syntax.py`. +* `Lib/test/test_exceptions.py`. + +Check the contents of these files to know which is the best place to place new tests depending +on the nature of the new feature you are adding. + +Tests for the parser generator itself can be found in the :file:`Lib/test/test_peg_generator` directory. + + +Debugging generated parsers +--------------------------- + +Making experiments +~~~~~~~~~~~~~~~~~~ + +As the generated C parser is the one used by Python, this means that if something goes wrong when adding some +new rules to the grammar you cannot correctly compile and execute Python anymore. This makes it a bit challenging +to debug when something goes wrong, especially when making experiments. + +For this reason it is a good idea to experiment first by generating a Python parser. To do this, you can go to the +:file:`Tools/peg_generator/` directory on the CPython repository and manually call the parser generator by executing: + +.. code-block:: shell + + $ python -m pegen python ~/github/pegen/data/expr.gram + +This will generate a file called :file:`parse.py` in the same directory that you can use to parse some input: + +.. code-block:: shell + + $ python parse.py file_with_source_code_to_test.py + +As the generated :file:`parse.py` file is just Python code, you can modify it and add breakpoints to debug or +better understand some complex situations. + + +Verbose mode +~~~~~~~~~~~~ + +When Python is compiled in debug mode (by adding ``--with-pydebug`` when running the configure step in Linux or by +adding ``-d`` when calling the :file:`PCbuild/python.bat` script in Windows), is possible to activate a **very** verbose +mode in the generated parser. This is very useful to debug the generated parser and to understand how it works, but it +can be a bit hard to understand at first. + +.. note:: + + When activating verbose mode in the Python parser, it is better to not use interactive mode as it can be much harder to + understand, because interactive mode involves some special steps compared to regular parsing. + +To activate verbose mode you can add the ``-d`` flag when executing Python: + +.. code-block:: shell + + $ python -d file_to_test.py + +This will print **a lot** of output to ``stderr`` so is probably better to dump it to a file for further analysis. The output +consists of trace lines with the following structure: + + ('>'|'-'|'+'|'!') []: ... + +Every line is indented by a different amount (````) depending on how deep the call stack is. The next +character marks the type of the trace: + +* ``>`` indicates that a rule is going to be attempted to be parsed. +* ``-`` indicates that a rule has failed to be parsed. +* ``+`` indicates that a rule has been parsed correctly. +* ``!`` indicates that an exception or an error has been detected and the parser is unwinding. + +The part indicates the current index in the token array, the + part indicates what rule is being parsed and the part +indicates what alternative within that rule is being attempted. + + +References +---------- + +.. [1] Ford, Bryan + http://pdos.csail.mit.edu/~baford/packrat/thesis + +.. [2] Medeiros et al. + https://arxiv.org/pdf/1207.0443.pdf + +.. [3] Warth et al. + http://web.cs.ucla.edu/~todd/research/pepm08.pdf diff --git a/requirements.txt b/requirements.txt index c90e30088..829318d20 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,3 @@ Sphinx==4.1.2 -furo +furo<=2021.7.5b38 sphinx_copybutton>=0.3.3 From d9c1441301d494c2e751479fb45043b139733cc6 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Wed, 4 Aug 2021 21:54:27 +0100 Subject: [PATCH 013/722] Minor correction on the parser document --- parser.rst | 6 ------ 1 file changed, 6 deletions(-) diff --git a/parser.rst b/parser.rst index 71387b639..6364584b1 100644 --- a/parser.rst +++ b/parser.rst @@ -343,12 +343,6 @@ inside curly-braces, which specifies the return value of the alternative:: | first_alt1 first_alt2 { first_alt1 } | second_alt1 second_alt2 { second_alt1 } -If the action is omitted and C code is being generated, then there are two -different possibilities: - -1. If there’s a single name in the alternative, this gets returned. -2. If not, a dummy name object gets returned (this case should be avoided). - If the action is ommited, a default action is generated: * If there's a single name in the rule in the rule, it gets returned. From 14e6f3ea0c2731ddebf38224566d138b6ecd2d1f Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Wed, 25 Aug 2021 11:52:23 +0100 Subject: [PATCH 014/722] Correct invocation command for testing the parser --- parser.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/parser.rst b/parser.rst index 6364584b1..27ce46d62 100644 --- a/parser.rst +++ b/parser.rst @@ -847,7 +847,7 @@ For this reason it is a good idea to experiment first by generating a Python par .. code-block:: shell - $ python -m pegen python ~/github/pegen/data/expr.gram + $ python -m pegen python This will generate a file called :file:`parse.py` in the same directory that you can use to parse some input: From 03d8f1bd581319a63aeedde5654485342a572e2c Mon Sep 17 00:00:00 2001 From: Brett Cannon Date: Thu, 26 Aug 2021 11:42:02 -0700 Subject: [PATCH 015/722] Add Ammar Askar and Ken Jin to the developer log --- developers.csv | 2 ++ 1 file changed, 2 insertions(+) diff --git a/developers.csv b/developers.csv index b37e5a15e..5b4551629 100644 --- a/developers.csv +++ b/developers.csv @@ -1,3 +1,5 @@ +Ken Jin,Fidget-Spinner,2021-08-26,, +Ammar Askar,ammaraskar,2021-07-30,, Irit Katriel,iritkatriel,2021-05-10,, Batuhan Taskaya,isidentical,2020-11-08,, Brandt Bucher,brandtbucher,2020-09-14,, From dd55ff4527b8211c01131ed11db28cabe450991d Mon Sep 17 00:00:00 2001 From: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Date: Fri, 27 Aug 2021 17:23:46 +0800 Subject: [PATCH 016/722] Describe test-with-buildbots label for triagers (#740) --- triaging.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/triaging.rst b/triaging.rst index 6c7468b18..7419aaa05 100644 --- a/triaging.rst +++ b/triaging.rst @@ -170,6 +170,12 @@ type-security type-tests Used for PRs that exclusively involve changes to the tests. +test-with-buildbots + Used on PRs to test the latest commit with the buildbot fleet. Generally for + PRs with large code changes requiring more testing before merging. This + may take multiple hours to complete. Triagers can also stop a stuck build + using the web interface. + Fields in the Issue Tracker --------------------------- From 1e0275e3b2477cefbf6aed9920b7d8e9e9b85654 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Langa?= Date: Wed, 8 Sep 2021 17:44:09 +0200 Subject: [PATCH 017/722] Mention DiR role in devcycle.rst --- devcycle.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/devcycle.rst b/devcycle.rst index e8bb60934..cfb18430a 100644 --- a/devcycle.rst +++ b/devcycle.rst @@ -321,6 +321,7 @@ Current Administrators | | Maintainer of buildbot.python.org | | +-------------------+----------------------------------------------------------+-----------------+ | Łukasz Langa | Python 3.8 and 3.9 Release Manager | ambv | +| | PSF CPython Developer in Residence 2021-2022 | | +-------------------+----------------------------------------------------------+-----------------+ | Ned Deily | Python 3.6 and 3.7 Release Manager | ned-deily | +-------------------+----------------------------------------------------------+-----------------+ From eeadb62751fc9011830a637e492debc18a214ce3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Langa?= Date: Wed, 8 Sep 2021 17:44:34 +0200 Subject: [PATCH 018/722] Update planned EOL for all feature branches, add 3.11 release PEP --- index.rst | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/index.rst b/index.rst index 37afb6e85..5f7d82632 100644 --- a/index.rst +++ b/index.rst @@ -97,11 +97,11 @@ Status of Python branches +------------------+--------------+-------------+----------------+----------------+-----------------------+ | Branch | Schedule | Status | First release | End-of-life | Release manager | +==================+==============+=============+================+================+=======================+ -| main | *TBD* | features | *TBD* | *TBD* | Pablo Galindo Salgado | +| main | :pep:`664` | features | *2022-10-03* | *2027-10* | Pablo Galindo Salgado | +------------------+--------------+-------------+----------------+----------------+-----------------------+ -| 3.10 | :pep:`619` | prerelease | *2021-10-04* | *TBD* | Pablo Galindo Salgado | +| 3.10 | :pep:`619` | prerelease | *2021-10-04* | *2026-10* | Pablo Galindo Salgado | +------------------+--------------+-------------+----------------+----------------+-----------------------+ -| 3.9 | :pep:`596` | bugfix | 2020-10-05 | *TBD* | Łukasz Langa | +| 3.9 | :pep:`596` | bugfix | 2020-10-05 | *2025-10* | Łukasz Langa | +------------------+--------------+-------------+----------------+----------------+-----------------------+ | 3.8 | :pep:`569` | security | 2019-10-14 | *2024-10* | Łukasz Langa | +------------------+--------------+-------------+----------------+----------------+-----------------------+ @@ -112,6 +112,8 @@ Status of Python branches .. Remember to update the end-of-life table in devcycle.rst. +Dates in *italic* are scheduled and can be adjusted. + The main branch is currently the future Python 3.11, and is the only branch that accepts new features. The latest release for each Python version can be found on the `download page `_. @@ -127,13 +129,12 @@ Status: but new source-only versions can be released :end-of-life: release cycle is frozen; no further changes can be pushed to it. -Dates in *italic* are scheduled and can be adjusted. +See also the :ref:`devcycle` page for more information about branches. By default, the end-of-life is scheduled 5 years after the first release, but can be adjusted by the release manager of each branch. All Python 2 versions have reached end-of-life. -See also the :ref:`devcycle` page for more information about branches. .. _contributing: From f5c76b915e04bdbe246dfece7033ff02e1a1304c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Langa?= Date: Wed, 8 Sep 2021 17:51:07 +0200 Subject: [PATCH 019/722] Add missing comma --- devcycle.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/devcycle.rst b/devcycle.rst index cfb18430a..35bd96ab9 100644 --- a/devcycle.rst +++ b/devcycle.rst @@ -320,7 +320,7 @@ Current Administrators | Pablo Galindo | Python 3.10 and 3.11 Release Manager, | pablogsal | | | Maintainer of buildbot.python.org | | +-------------------+----------------------------------------------------------+-----------------+ -| Łukasz Langa | Python 3.8 and 3.9 Release Manager | ambv | +| Łukasz Langa | Python 3.8 and 3.9 Release Manager, | ambv | | | PSF CPython Developer in Residence 2021-2022 | | +-------------------+----------------------------------------------------------+-----------------+ | Ned Deily | Python 3.6 and 3.7 Release Manager | ned-deily | From 54cd7c1dcd9ddc082ded55651696052fc2700c59 Mon Sep 17 00:00:00 2001 From: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Date: Fri, 10 Sep 2021 23:41:33 +0800 Subject: [PATCH 020/722] Link to configure docs in compilation guide (#743) --- setup.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/setup.rst b/setup.rst index 24b7c97d2..0c69af695 100644 --- a/setup.rst +++ b/setup.rst @@ -128,6 +128,8 @@ when you shouldn't is if you are taking performance measurements). Even when working only on pure Python code the pydebug build provides several useful checks that one should not skip. +.. seealso:: The effects of various configure and build flags are documented in + the `Python configure docs `_. .. _unix-compiling: From 7a7a663e7f0d1ad6360bb35b58dd2357d3b7bf17 Mon Sep 17 00:00:00 2001 From: Dmitriy Fishman Date: Mon, 27 Sep 2021 14:55:18 +0300 Subject: [PATCH 021/722] Typo fix in committing.rst (GH-749) --- committing.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/committing.rst b/committing.rst index 213a2292e..9f1951ba7 100644 --- a/committing.rst +++ b/committing.rst @@ -102,7 +102,7 @@ These are the two exceptions: change and the original** ``NEWS`` **entry remains valid**, then no additional entry is needed. -If a change needs an entry in ``What's New in Python``, then it very +If a change needs an entry in ``What's New in Python``, then it is very likely not suitable for including in a maintenance release. ``NEWS`` entries go into the ``Misc/NEWS.d`` directory as individual files. The From b1efd6a9ee74fad66d853ad13856f815a2822f8f Mon Sep 17 00:00:00 2001 From: Pablo Galindo Salgado Date: Mon, 4 Oct 2021 01:05:39 +0100 Subject: [PATCH 022/722] Add instructions on running autoreconf with pkg-config (#750) --- setup.rst | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/setup.rst b/setup.rst index 0c69af695..42cbe1758 100644 --- a/setup.rst +++ b/setup.rst @@ -447,7 +447,14 @@ example, ``autoconf`` by itself will not regenerate ``pyconfig.h.in``. appropriate. Python's ``configure.ac`` script typically requires a specific version of -Autoconf. At the moment, this reads: ``AC_PREREQ(2.69)``. +Autoconf. At the moment, this reads: ``AC_PREREQ(2.69)``. It also requires +to have the ``autoconf-archive`` and ``pkg-config`` utilities installed in +the system and the ``pkg.m4`` macro file located in the appropriate ``alocal`` +location. You can easily check if this is correctly configured by running: + +.. code-block:: bash + + ls $(aclocal --print-ac-dir) | grep pkg.m4 If the system copy of Autoconf does not match this version, you will need to install your own copy of Autoconf. From ebc5777ac4332f4507f73331057f670715e44f47 Mon Sep 17 00:00:00 2001 From: Ned Deily Date: Mon, 4 Oct 2021 16:09:38 -0400 Subject: [PATCH 023/722] update for 3.10 release --- index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.rst b/index.rst index 5f7d82632..3e311ae4a 100644 --- a/index.rst +++ b/index.rst @@ -99,7 +99,7 @@ Status of Python branches +==================+==============+=============+================+================+=======================+ | main | :pep:`664` | features | *2022-10-03* | *2027-10* | Pablo Galindo Salgado | +------------------+--------------+-------------+----------------+----------------+-----------------------+ -| 3.10 | :pep:`619` | prerelease | *2021-10-04* | *2026-10* | Pablo Galindo Salgado | +| 3.10 | :pep:`619` | bugfix | 2021-10-04 | *2026-10* | Pablo Galindo Salgado | +------------------+--------------+-------------+----------------+----------------+-----------------------+ | 3.9 | :pep:`596` | bugfix | 2020-10-05 | *2025-10* | Łukasz Langa | +------------------+--------------+-------------+----------------+----------------+-----------------------+ From 0dc9c92bafb58e61cde43bf57b3f31b65d78fd46 Mon Sep 17 00:00:00 2001 From: Ee Durbin Date: Tue, 19 Oct 2021 09:20:33 -0400 Subject: [PATCH 024/722] update Python org owners (#753) --- devcycle.rst | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/devcycle.rst b/devcycle.rst index 35bd96ab9..e6e20b4fe 100644 --- a/devcycle.rst +++ b/devcycle.rst @@ -260,7 +260,7 @@ This role is paramount to the security of the Python Language, Community, and Infrastructure. The Executive Director of the Python Software Foundation delegates authority on -GitHub Organization Owner Status to Ee W. Durbin III - Python Software +GitHub Organization Owner Status to Ee Durbin - Python Software Foundation Director of Infrastructure. Common reasons for this role are: Infrastructure Staff Membership, Python Software Foundation General Counsel, and Python Software Foundation Staff as fallback. @@ -285,10 +285,14 @@ Current Owners +----------------------+--------------------------------+-----------------+ | Ewa Jodlowska | PSF Executive Director | ejodlowska | +----------------------+--------------------------------+-----------------+ -| Ee W. Durbin III | PSF Director of Infrastructure | ewdurbin | +| Ee Durbin | PSF Director of Infrastructure | ewdurbin | +----------------------+--------------------------------+-----------------+ | Van Lindberg | PSF General Counsel | VanL | +----------------------+--------------------------------+-----------------+ +| Ezio Melotti | roundup -> github migration | ezio-melotti | ++----------------------+--------------------------------+-----------------+ +| Łukasz Langa | CPython Developr in Residence | ambv | ++----------------------+--------------------------------+-----------------+ Repository Administrator Role Policy ------------------------------------ From f34e3442c3e80683df4c738352e8139992fb08b0 Mon Sep 17 00:00:00 2001 From: Zachary Ware Date: Tue, 19 Oct 2021 11:08:03 -0500 Subject: [PATCH 025/722] Typo fix --- devcycle.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/devcycle.rst b/devcycle.rst index e6e20b4fe..5c1b4b53e 100644 --- a/devcycle.rst +++ b/devcycle.rst @@ -291,7 +291,7 @@ Current Owners +----------------------+--------------------------------+-----------------+ | Ezio Melotti | roundup -> github migration | ezio-melotti | +----------------------+--------------------------------+-----------------+ -| Łukasz Langa | CPython Developr in Residence | ambv | +| Łukasz Langa | CPython Developer in Residence | ambv | +----------------------+--------------------------------+-----------------+ Repository Administrator Role Policy From c4f3c7300c59b19a91cc66f3363b8f5985635f12 Mon Sep 17 00:00:00 2001 From: Wey-Han Liaw Date: Sun, 24 Oct 2021 02:44:19 -0700 Subject: [PATCH 026/722] Change the coordinator of the zh-tw translation (#752) --- documenting.rst | 7 ++++--- experts.rst | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/documenting.rst b/documenting.rst index a0521ce9e..6b91652ca 100644 --- a/documenting.rst +++ b/documenting.rst @@ -1641,9 +1641,9 @@ in production, other are work in progress: +-----------------+-------------------------------+----------------------------+ | Spanish (es) | Raúl Cumplido | `GitHub `_ | +-----------------+-------------------------------+----------------------------+ -| Traditional | 廖偉涵 Adrian Liaw | `GitHub `_ | -| Chinese | | `Transifex `_ | -| (zh-tw) | | `Doc `_ | +| Traditional | `王威翔 Matt Wang | `GitHub `_ | +| Chinese | `_, | `Transifex `_ | +| (zh-tw) | Josix Wang | `Doc `_ | +-----------------+-------------------------------+----------------------------+ | Turkish (tr) | | `GitHub `_ | +-----------------+-------------------------------+----------------------------+ @@ -1655,6 +1655,7 @@ in production, other are work in progress: .. _bpo_mdk: https://bugs.python.org/user23063 .. _bpo_oonid: https://bugs.python.org/user32660 .. _bpo_zhsj: https://bugs.python.org/user24811 +.. _bpo_mattwang44: https://bugs.python.org/user39654 .. _chat_pt_br: https://t.me/pybr_i18n .. _doc_ja: https://docs.python.org/ja/ .. _doc_ko: https://docs.python.org/ko/ diff --git a/experts.rst b/experts.rst index dc84bdfb9..f1495ca6c 100644 --- a/experts.rst +++ b/experts.rst @@ -368,6 +368,6 @@ Korean flowdas Bengali India kushal.das Hungarian gbtami Portuguese rougeth -Chinese (TW) adrianliaw +Chinese (TW) mattwang44, josix Chinese (CN) zhsj ============= ============ From 2e6b3279711d4d528ee75533c46acb87519d62c0 Mon Sep 17 00:00:00 2001 From: Christian Heimes Date: Thu, 28 Oct 2021 12:24:45 +0300 Subject: [PATCH 027/722] Update Debian build requirements (GH-755) --- setup.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/setup.rst b/setup.rst index 42cbe1758..782a498a5 100644 --- a/setup.rst +++ b/setup.rst @@ -328,13 +328,15 @@ Then you should update the packages index:: Now you can install the build dependencies via ``apt``:: $ sudo apt-get build-dep python3 + $ sudo apt-get install pkg-config If you want to build all optional modules, install the following packages and their dependencies:: - $ sudo apt-get install build-essential gdb lcov libbz2-dev libffi-dev \ - libgdbm-dev liblzma-dev libncurses5-dev libreadline6-dev \ - libsqlite3-dev libssl-dev lzma lzma-dev tk-dev uuid-dev zlib1g-dev + $ sudo apt-get install build-essential gdb lcov pkg-config \ + libbz2-dev libffi-dev libgdbm-dev libgdbm-compat-dev liblzma-dev \ + libncurses5-dev libreadline6-dev libsqlite3-dev libssl-dev \ + lzma lzma-dev tk-dev uuid-dev zlib1g-dev .. _MacOS: From 8d13a975939b9d9b93f1cff28b0ade3be7744695 Mon Sep 17 00:00:00 2001 From: Itamar Ostricher Date: Tue, 2 Nov 2021 22:14:35 -0700 Subject: [PATCH 028/722] Minor typo fix in Issue Tracking page (#758) --- tracker.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tracker.rst b/tracker.rst index 9f28aa3c6..367306dee 100644 --- a/tracker.rst +++ b/tracker.rst @@ -30,7 +30,7 @@ already been reported. Checking if the problem is an existing issue will: * determine if additional information, such as how to replicate the issue, is needed -To do see if the issue already exists, search the bug database using the +To see if an issue already exists, search the bug database using the search box on the top of the issue tracker page. An `advanced search`_ is also available by clicking on "Search" in the sidebar. From 5db50fc080e102b194a8e7ca46bc6f94cff3bd0d Mon Sep 17 00:00:00 2001 From: Abdur-Rahmaan Janhangeer Date: Mon, 8 Nov 2021 00:08:40 +0400 Subject: [PATCH 029/722] Update Arabic coordinator (#760) --- documenting.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/documenting.rst b/documenting.rst index 6b91652ca..4649bab3d 100644 --- a/documenting.rst +++ b/documenting.rst @@ -1595,7 +1595,8 @@ in production, other are work in progress: +-----------------+-------------------------------+----------------------------+ | Language | Contact | Links | +=================+===============================+============================+ -| Arabic (ar) | Ibrahim Elbouhissi | `GitHub `_ | +| Arabic (ar) | `Abdur-Rahmaan Janhangeer | `GitHub `_ | +| | `_ | | +-----------------+-------------------------------+----------------------------+ | Bengali as | `Kushal Das `_ | `GitHub `_ | | spoken in | | | @@ -1654,6 +1655,7 @@ in production, other are work in progress: .. _bpo_kushal: https://bugs.python.org/user16382 .. _bpo_mdk: https://bugs.python.org/user23063 .. _bpo_oonid: https://bugs.python.org/user32660 +.. _bpo_osdotsystem: https://bugs.python.org/user28057 .. _bpo_zhsj: https://bugs.python.org/user24811 .. _bpo_mattwang44: https://bugs.python.org/user39654 .. _chat_pt_br: https://t.me/pybr_i18n From 75e0e7ece4e967cac88d4a707a5c52c98d9b33c6 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Salgado Date: Sun, 7 Nov 2021 23:21:36 +0000 Subject: [PATCH 030/722] Fix grammar in the title of the Parser guide (#762) --- parser.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/parser.rst b/parser.rst index 27ce46d62..491074022 100644 --- a/parser.rst +++ b/parser.rst @@ -1,6 +1,6 @@ .. _parser: -Guide of CPython's Parser +Guide to CPython's Parser ========================= :Author: Pablo Galindo Salgado From 46590a51d1e1224e9b987cb4c33b313937fd5ecd Mon Sep 17 00:00:00 2001 From: Itamar Ostricher Date: Sun, 14 Nov 2021 10:29:40 -0800 Subject: [PATCH 031/722] Fix typo in garbage_collector (#768) Minor fix / nit --- garbage_collector.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index 0dbac59fc..7e15a4ea2 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -303,7 +303,7 @@ In order to limit the time each garbage collection takes, the GC uses a popular optimization: generations. The main idea behind this concept is the assumption that most objects have a very short lifespan and can thus be collected shortly after their creation. This has proven to be very close to the reality of many Python programs as -many temporarily objects are created and destroyed very fast. The older an object is +many temporary objects are created and destroyed very fast. The older an object is the less likely it is that it will become unreachable. To take advantage of this fact, all container objects are segregated into From 3036184234b4821b380bec0dd4e2bbf81ae812b1 Mon Sep 17 00:00:00 2001 From: Itamar Ostricher Date: Sun, 14 Nov 2021 10:29:58 -0800 Subject: [PATCH 032/722] Update reference to the parser from LL(1) to PEG (#767) --- langchanges.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/langchanges.rst b/langchanges.rst index f7984d608..3dbaf6d47 100644 --- a/langchanges.rst +++ b/langchanges.rst @@ -46,9 +46,8 @@ the change. This process is the Python Enhancement Proposal (PEP) process. You will first need a PEP that you will present to python-ideas. You may be a little hazy on the technical details as various core developers can help with that, but do realize that if you do not present your idea to python-ideas or -python-list ahead of time you may find out it is technically not possible -(e.g., Python's parser will not support the grammar change as it is an LL(1) -parser). Expect extensive comments on the PEP, some of which will be negative. +python-list ahead of time you may find out it is technically not possible. +Expect extensive comments on the PEP, some of which will be negative. Once your PEP has been modified to be of proper quality and to take into account comments made on python-ideas, it may proceed to python-dev. There it From 90bd4308657ea61ef740c648ac87e49dde819fa3 Mon Sep 17 00:00:00 2001 From: Dino Viehland Date: Mon, 15 Nov 2021 13:29:39 -0800 Subject: [PATCH 033/722] Update motivations.rst for Dino Way out of date... --- motivations.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/motivations.rst b/motivations.rst index f1a1a41cc..27361311d 100644 --- a/motivations.rst +++ b/motivations.rst @@ -244,16 +244,16 @@ participating in the CPython core development process: .. topic:: Dino Viehland (United States) - * Microsoft: ``_ (Software Engineer) - * Email address: dinov@microsoft.com + * Meta (Software Engineer) + * Email address: dinoviehland@gmail.com Dino started working with Python in 2005 by working on IronPython, an implementation of Python running on .NET. He was one of the primary developers on the project for 6 years. After that he started the Python Tools for Visual Studio project focusing on providing advanced code completion and debugging features for Python. Today he works on - `Azure Notebooks `_ bringing the Python based - Jupyter notebook as a hosted on-line service. + `Cinder `_ improving Python + performance for Instagram. .. topic:: Carol Willing (United States) From 3f372f9a1c5416df0f3efaa529c23c0a2fc4aac3 Mon Sep 17 00:00:00 2001 From: Arthur Milchior Date: Tue, 16 Nov 2021 16:06:47 +0100 Subject: [PATCH 034/722] Replacing "country" by "translation" (GH-757) While it is easy to understand what was meant by "country", this is inaccurate. Some countries have multiple languages, each having their translation (e.g. Canada with English and French) and some languages are spoken in many countries, and the regional variation are small enough that there will probably never be multiple translations. (While Canadian French and France French are so different that it's sometimes hard for French native to understand French Canadian speaking, I doubt anybody will want to create a ca-fr translation, given that the technical language will probably be almost the same.) --- documenting.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documenting.rst b/documenting.rst index 4649bab3d..852b16237 100644 --- a/documenting.rst +++ b/documenting.rst @@ -1771,7 +1771,7 @@ Here's what's we're using: How a coordinator is elected? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -There is no election, each country have to sort this out. Here are some suggestions. +There is no election, each translation have to sort this out. Here are some suggestions. - Coordinator requests are to be public on doc-sig mailing list. - If the given language have a native core dev, the core dev have its From 266f1c0eb60cc2165f833da0d6d1d6f4772ea2fd Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 16 Nov 2021 09:56:22 -0800 Subject: [PATCH 035/722] Update furo requirement from <=2021.7.5b38 to <2021.11.16 (#770) Updates the requirements on [furo](https://github.com/pradyunsg/furo) to permit the latest version. - [Release notes](https://github.com/pradyunsg/furo/releases) - [Changelog](https://github.com/pradyunsg/furo/blob/main/docs/changelog.md) - [Commits](https://github.com/pradyunsg/furo/compare/2020.08.14.beta5...2021.11.15) --- updated-dependencies: - dependency-name: furo dependency-type: direct:production ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index 829318d20..bd66e003c 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,3 @@ Sphinx==4.1.2 -furo<=2021.7.5b38 +furo<2021.11.16 sphinx_copybutton>=0.3.3 From 1ff3dc740e946aace6b62ff85821eafd4b98d0a4 Mon Sep 17 00:00:00 2001 From: Itamar Ostricher Date: Fri, 19 Nov 2021 00:10:26 -0800 Subject: [PATCH 036/722] Four fixes in parser page (#772) Two duplications, a typo, and minor grammar fix. --- parser.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/parser.rst b/parser.rst index 491074022..06434fc57 100644 --- a/parser.rst +++ b/parser.rst @@ -110,7 +110,7 @@ the following two rules (in these examples, a token is an individual character): In a regular EBNF grammar, both rules specify the language ``{aa, aaa}`` but in PEG, one of these two rules accepts the string ``aaa`` but not the string -``aa``. The other does the opposite -- it accepts the string the string ``aa`` +``aa``. The other does the opposite -- it accepts the string ``aa`` but not the string ``aaa``. The rule ``('a'|'aa')'a'`` does not accept ``aaa`` because ``'a'|'aa'`` consumes the first ``a``, letting the final ``a`` in the rule consume the second, and leaving out the third ``a``. @@ -345,14 +345,14 @@ inside curly-braces, which specifies the return value of the alternative:: If the action is ommited, a default action is generated: -* If there's a single name in the rule in the rule, it gets returned. +* If there's a single name in the rule, it gets returned. * If there is more than one name in the rule, a collection with all parsed expressions gets returned (the type of the collection will be different in C and Python). This default behaviour is primarily made for very simple situations and for -debugging pourposes. +debugging purposes. The full meta-grammar for the grammars supported by the PEG generator is: @@ -863,7 +863,7 @@ Verbose mode ~~~~~~~~~~~~ When Python is compiled in debug mode (by adding ``--with-pydebug`` when running the configure step in Linux or by -adding ``-d`` when calling the :file:`PCbuild/python.bat` script in Windows), is possible to activate a **very** verbose +adding ``-d`` when calling the :file:`PCbuild/python.bat` script in Windows), it is possible to activate a **very** verbose mode in the generated parser. This is very useful to debug the generated parser and to understand how it works, but it can be a bit hard to understand at first. From 9d880b6f54aa2ee193fccdb1242a52923df8c8aa Mon Sep 17 00:00:00 2001 From: Carl Friedrich Bolz-Tereick Date: Fri, 19 Nov 2021 22:16:43 +0100 Subject: [PATCH 037/722] =?UTF-8?q?Improvements=20to=20"Guide=20to=20CPyth?= =?UTF-8?q?on=E2=80=99s=20Parser"=20(#763)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * put references to names from rules into ``backticks`` * two typos * mention that ~ is called "the cut" (The grammar gives that name, and I as a prolog programmer was looking for the terminology ;-)) --- parser.rst | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/parser.rst b/parser.rst index 06434fc57..6001051a6 100644 --- a/parser.rst +++ b/parser.rst @@ -169,7 +169,7 @@ Python-style comments. ``e1 e2`` ''''''''' -Match e1, then match e2. +Match ``e1``, then match ``e2``. :: @@ -178,7 +178,7 @@ Match e1, then match e2. ``e1 | e2`` ''''''''''' -Match e1 or e2. +Match ``e1`` or ``e2``. The first alternative can also appear on the line after the rule name for formatting purposes. In that case, a \| must be used before the @@ -193,7 +193,7 @@ first alternative, like so: ``( e )`` ''''''''' -Match e. +Match ``e``. :: @@ -209,7 +209,7 @@ operator together with the repeat operators: ``[ e ] or e?`` ''''''''''''''' -Optionally match e. +Optionally match ``e``. :: @@ -225,7 +225,7 @@ optional: ``e*`` '''''' -Match zero or more occurrences of e. +Match zero or more occurrences of ``e``. :: @@ -234,7 +234,7 @@ Match zero or more occurrences of e. ``e+`` '''''' -Match one or more occurrences of e. +Match one or more occurrences of ``e``. :: @@ -243,7 +243,7 @@ Match one or more occurrences of e. ``s.e+`` '''''''' -Match one or more occurrences of e, separated by s. The generated parse +Match one or more occurrences of ``e``, separated by ``s``. The generated parse tree does not include the separator. This is otherwise identical to ``(e (s e)*)``. @@ -256,14 +256,14 @@ tree does not include the separator. This is otherwise identical to .. _peg-positive-lookahead: -Succeed if e can be parsed, without consuming any input. +Succeed if ``e`` can be parsed, without consuming any input. ``!e`` '''''' .. _peg-negative-lookahead: -Fail if e can be parsed, without consuming any input. +Fail if ``e`` can be parsed, without consuming any input. An example taken from the Python grammar specifies that a primary consists of an atom, which is not followed by a ``.`` or a ``(`` or a @@ -276,14 +276,15 @@ consists of an atom, which is not followed by a ``.`` or a ``(`` or a ``~`` '''''' -Commit to the current alternative, even if it fails to parse. +Commit to the current alternative, even if it fails to parse (this is called +the "cut"). :: rule_name: '(' ~ some_rule ')' | some_alt In this example, if a left parenthesis is parsed, then the other -alternative won’t be considered, even if some_rule or ‘)’ fail to be +alternative won’t be considered, even if some_rule or ``)`` fail to be parsed. Left recursion @@ -343,7 +344,7 @@ inside curly-braces, which specifies the return value of the alternative:: | first_alt1 first_alt2 { first_alt1 } | second_alt1 second_alt2 { second_alt1 } -If the action is ommited, a default action is generated: +If the action is omitted, a default action is generated: * If there's a single name in the rule, it gets returned. From ed2af4311a8db23b0d6267fea45090c22a3d8ce7 Mon Sep 17 00:00:00 2001 From: Carl Friedrich Bolz-Tereick Date: Fri, 19 Nov 2021 22:16:59 +0100 Subject: [PATCH 038/722] add warning about actions mutating things (#766) --- parser.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/parser.rst b/parser.rst index 6001051a6..c773b00c7 100644 --- a/parser.rst +++ b/parser.rst @@ -355,6 +355,15 @@ If the action is omitted, a default action is generated: This default behaviour is primarily made for very simple situations and for debugging purposes. +.. warning:: + + It's important that the actions don't mutate any AST nodes that are passed + into them via variables referring to other rules. The reason for mutation + being not allowed is that the AST nodes are cached by memoization and could + potentially be reused in a different context, where the mutation would be + invalid. If an action needs to change an AST node, it should instead make a + new copy of the node and change that. + The full meta-grammar for the grammars supported by the PEG generator is: :: From f3ecfb8a525a971662413c1ecdc0c8d8fc540393 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Salgado Date: Sun, 21 Nov 2021 20:41:08 +0000 Subject: [PATCH 039/722] Update AST actions helper location in the PEG parser (#773) --- parser.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/parser.rst b/parser.rst index c773b00c7..1bc795dec 100644 --- a/parser.rst +++ b/parser.rst @@ -808,7 +808,7 @@ when they succeed. Constructing these objects can be quite cumbersome (see the :ref:`AST compiler section ` for more information on how these objects are constructed and how they are used by the compiler) so special helper functions are used. These functions are declared in the -:file:`Parser/pegen.h` header file and defined in the :file:`Parser/pegen.c` +:file:`Parser/pegen.h` header file and defined in the :file:`Parser/action_helpers.c` file. These functions allow you to join AST sequences, get specific elements from them or to do extra processing on the generated tree. @@ -822,7 +822,7 @@ from them or to do extra processing on the generated tree. As a general rule, if an action spawns multiple lines or requires something more complicated than a single expression of C code, is normally better to create a -custom helper in :file:`Parser/pegen.c` and expose it in the +custom helper in :file:`Parser/action_helpers.c` and expose it in the :file:`Parser/pegen.h` header file so it can be used from the grammar. If the parsing succeeds, the parser **must** return a **valid** AST object. From 0b95382d21c77eed3f496585844d683f43603aa9 Mon Sep 17 00:00:00 2001 From: Ee Durbin Date: Thu, 2 Dec 2021 13:40:16 -0500 Subject: [PATCH 040/722] rename default branch to main (#776) also covers a handful of external references who have similarly made this change --- .github/workflows/ci.yml | 2 +- runtests.rst | 2 +- setup.rst | 2 +- tools/templates/customsourcelink.html | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 09cd927e7..a01ee9f37 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -3,7 +3,7 @@ name: Tests on: pull_request: push: - branches: master + branches: main jobs: test: diff --git a/runtests.rst b/runtests.rst index 2d6c3eb48..e018faa25 100644 --- a/runtests.rst +++ b/runtests.rst @@ -132,4 +132,4 @@ Benchmarking is useful to test that a change does not degrade performance. `The Python Benchmark Suite `_ has a collection of benchmarks for all Python implementations. Documentation about running the benchmarks is in the `README.txt -`_ of the repo. +`_ of the repo. diff --git a/setup.rst b/setup.rst index 782a498a5..7ef2bc460 100644 --- a/setup.rst +++ b/setup.rst @@ -13,7 +13,7 @@ directory structure of the CPython source code. Alternatively, if you have `Docker `_ installed you might want to use `our official images -`_. These +`_. These contain the latest releases of several Python versions, along with git head, and are provided for development and testing purposes only. diff --git a/tools/templates/customsourcelink.html b/tools/templates/customsourcelink.html index 2487a0b04..a50734f18 100644 --- a/tools/templates/customsourcelink.html +++ b/tools/templates/customsourcelink.html @@ -3,7 +3,7 @@

{{ _('This Page') }}