-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Description
Bug report
Bug description:
There is a bug somewhere in f-string implementation starting around version 3.12, where the presence of a "#" and equal repr in the string causes leading string to be removed: E.G.
f"{'#'=}"
gives
"''#'"
but should give
"'#'='#'".
Note: The following explanation was found by asking https://chatgpt.com/codex to locate the problem. This appears to me to be a correct explanation, but please use with caution.
The bug comes from the change that started stripping text after a “#” when capturing the expression text for an f-string debug expression (f'{expr=}'). This logic was introduced in commit d59feb5 (“gh-112243: Don’t include comments in f-string debug expressions”) dated 2023‑11‑20.
Inside Parser/lexer/lexer.c, set_fstring_expr() now scans the expression buffer for “#” and removes everything from that point until the next newline. The relevant lines introduced in that commit are:
// Check if there is a # character in the expression
int hash_detected = 0;
for (Py_ssize_t i = 0; i < tok_mode->last_expr_size - tok_mode->last_expr_end; i++) {
if (tok_mode->last_expr_buffer[i] == '#') {
hash_detected = 1;
break;
}
}
if (hash_detected) {
Py_ssize_t input_length = tok_mode->last_expr_size - tok_mode->last_expr_end;
char *result = (char *)PyMem_Malloc((input_length + 1) * sizeof(char));
...
for (i = 0, j = 0; i < input_length; i++) {
if (tok_mode->last_expr_buffer[i] == '#') {
// Skip characters until newline or end of string
while (tok_mode->last_expr_buffer[i] != '\0' && i < input_length) {
if (tok_mode->last_expr_buffer[i] == '\n') {
result[j++] = tok_mode->last_expr_buffer[i];
break;
}
i++;
}
} else {
result[j++] = tok_mode->last_expr_buffer[i];
}
}
result[j] = '\0';
res = PyUnicode_DecodeUTF8(result, j, NULL);
PyMem_Free(result);
} else {
res = PyUnicode_DecodeUTF8(
tok_mode->last_expr_buffer,
tok_mode->last_expr_size - tok_mode->last_expr_end,
NULL
);
}
Because this heuristic doesn’t check whether “#” is inside a quoted string, an expression such as '#' is mistakenly treated as starting a comment, leading to the debug string being truncated. This code was added in commit d59feb5, visible in the repository’s history:
commit d59feb5dbe5395615d06c30a95e6a6a9b7681d4d
Author: Pablo Galindo Salgado <Pablogsal@gmail.com>
Date: Mon Nov 20 15:18:24 2023 +0000
gh-112243: Don't include comments in f-string debug expressions (#112284)
Therefore the likely cause of the bug appeared in commit d59feb5, modifying Parser/lexer/lexer.c. This commit landed in the 3.12 development cycle and introduced the faulty handling of “#” inside f-string debug expressions.
CPython versions tested on:
3.12
Operating systems tested on:
Linux