Skip to content

f-string debug expressions containing hash '#' are malformed #137182

@kcdodd

Description

@kcdodd

Bug report

Bug description:

There is a bug somewhere in f-string implementation starting around version 3.12, where the presence of a "#" and equal repr in the string causes leading string to be removed: E.G.

f"{'#'=}"

gives

"''#'"

but should give

"'#'='#'".

Note: The following explanation was found by asking https://chatgpt.com/codex to locate the problem. This appears to me to be a correct explanation, but please use with caution.

The bug comes from the change that started stripping text after a “#” when capturing the expression text for an f-string debug expression (f'{expr=}'). This logic was introduced in commit d59feb5 (“gh-112243: Don’t include comments in f-string debug expressions”) dated 2023‑11‑20.

Inside Parser/lexer/lexer.c, set_fstring_expr() now scans the expression buffer for “#” and removes everything from that point until the next newline. The relevant lines introduced in that commit are:

 // Check if there is a # character in the expression
 int hash_detected = 0;
 for (Py_ssize_t i = 0; i < tok_mode->last_expr_size - tok_mode->last_expr_end; i++) {
     if (tok_mode->last_expr_buffer[i] == '#') {
         hash_detected = 1;
         break;
     }
 }

 if (hash_detected) {
     Py_ssize_t input_length = tok_mode->last_expr_size - tok_mode->last_expr_end;
     char *result = (char *)PyMem_Malloc((input_length + 1) * sizeof(char));
     ...
     for (i = 0, j = 0; i < input_length; i++) {
         if (tok_mode->last_expr_buffer[i] == '#') {
             // Skip characters until newline or end of string
             while (tok_mode->last_expr_buffer[i] != '\0' && i < input_length) {
                 if (tok_mode->last_expr_buffer[i] == '\n') {
                     result[j++] = tok_mode->last_expr_buffer[i];
                     break;
                 }
                 i++;
             }
         } else {
             result[j++] = tok_mode->last_expr_buffer[i];
         }
     }
     result[j] = '\0';
     res = PyUnicode_DecodeUTF8(result, j, NULL);
     PyMem_Free(result);
 } else {
     res = PyUnicode_DecodeUTF8(
         tok_mode->last_expr_buffer,
         tok_mode->last_expr_size - tok_mode->last_expr_end,
         NULL
     );
 }

Because this heuristic doesn’t check whether “#” is inside a quoted string, an expression such as '#' is mistakenly treated as starting a comment, leading to the debug string being truncated. This code was added in commit d59feb5, visible in the repository’s history:

commit d59feb5dbe5395615d06c30a95e6a6a9b7681d4d
Author: Pablo Galindo Salgado <Pablogsal@gmail.com>
Date:   Mon Nov 20 15:18:24 2023 +0000

    gh-112243: Don't include comments in f-string debug expressions (#112284)

Therefore the likely cause of the bug appeared in commit d59feb5, modifying Parser/lexer/lexer.c. This commit landed in the 3.12 development cycle and introduced the faulty handling of “#” inside f-string debug expressions.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-parsertype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy