-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
gh-137353: Add t-string support to gettext + pygettext #137354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split this into separate PRs for pygettext and gettext. They also need blurbs.
There would be some overlap between the two PRs, since parts of the code are required both for gettext and pygettext. Do you still prefer having two separate PRs even in this early stage? I can do it of course, but my preference would be splitting it later, once any discussions that may come up are resolved and I made whatever changes may be necessary... |
0d75d3c
to
2b7fa77
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
||
|
||
# utils for t-string handling in gettext translation + pygettext extraction | ||
# TBD where they should go, and whether this should be a public API or internal, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should limit what is exposed in gettext apart from the core API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's why I underscored all of them for now.
FWIW, I think exposing at least the utils to convert a template string to a format string makes sense, because tools like Babel would need to use the exact same logic, or risk inconsistencies between implementations.
# beneficial to have in stdlib so any implementation can re-use it without | ||
# risking diverging behavior for the same expression between implementations | ||
|
||
class _NameTooComplexError(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This however should be IMO documented, since it is “public”. I however don’t like this, I think a general (new) gettext error (or, much simpler, a ValueError) would be clearer, thoughts, Tomas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I just used this to avoid catching some other (unexpected) ValueError
that may or may not come out of the ast visitor. A custom exception just for this may indeed be overkill.
But I like the idea of a GettextError
:)
Lib/gettext.py
Outdated
def _template_node_to_format(node: ast.TemplateStr) -> str: | ||
"""Generate a format string from a template string AST node. | ||
|
||
This fails with a :exc:`_NameTooComplexError` in case the expression is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings are not restructured text in CPython.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, lmk in case you don't like backticks around class names etc
Lib/test/test_gettext.py
Outdated
@@ -38,6 +38,27 @@ | |||
bmsgd2luayAoaW4gIm15IG90aGVyIGNvbnRleHQiKQB3aW5rIHdpbmsA | |||
''' | |||
|
|||
GNU_TMO_DATA = b'''\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s not a particularly clear name IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I just stayed consistent w/ the existing ones. Updated to something more meaningful.
2b7fa77
to
665ae49
Compare
We generally tend to avoid mixing large changes to different modules (tool in this case) as it makes the PR much larger. Also, please avoid force pushing, gh is unable to distinguish differences between them. |
665ae49
to
23ad71a
Compare
23ad71a
to
d603887
Compare
_(t'Weird {meow[69j]}') | ||
_(t'Weird {meow[...]}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think these two are stupid and will never be used, but I didn't see much value in adding an extra check for certain type of Constant values to just to avoid those.
|
||
@property | ||
def name(self) -> str: | ||
name = '__'.join(self._name_parts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used __
as a separator between parts since initially I thought that it might be nice to make it clearer that the placeholder for {user.name}
isn't just something named user_name
.
However, maybe using a single underscore would be fine here:
t'{user.name} {user_name}'
would simply fail due to the check that a name doesn't map to different expressions- I can't come up with a good example where you would use
foo.bar
andfoo_bar
in the same string
# We use this weird naming of the gettext functions here to allow | ||
# easy extraction of the .po file using pygettext; see the comment | ||
# next to the po file content near the bottom of this file on how | ||
# to regenerate it. | ||
self.gettexT = self.t.gettext | ||
self.ngettexT = self.t.ngettext | ||
self.pgettexT = self.t.pgettext | ||
self.npgettexT = self.t.npgettext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect to keep these weird names, but for the sake of making it easier to update the tests in the future, I'd probably go for something like t_gettext
etc. in here.
OK, will keep this in mind for future changes. |
Please see #137353 for details; TL;DR is that with this PR you can use t-strings for i18n, instead of having to call
_(...).format(...)