-
Notifications
You must be signed in to change notification settings - Fork 5.4k
[DOC] Tweaks for String#dump #13883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC] Tweaks for String#dump #13883
Conversation
doc/string/dump.rdoc
Outdated
s # => "\a\b\t\n\v\f\r" | ||
s.dump # => "\"\\a\\b\\t\\n\\v\\f\\r\"" | ||
|
||
Multi-byte characters are rendered in unicode notation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only for Unicode encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that there are multi-byte characters that are not in Unicode encodings? If so, I'll need examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@peterzhu2118, I'll need help with this:
|
For example: 'тест'.dump # => "\"\\u0442\\u0435\\u0441\\u0442\""
'тест'.encode('utf-16le').dump # => "\"B\\x045\\x04A\\x04B\\x04\".dup.force_encoding(\"UTF-16LE\")"
Sorry, I don't understand what you mean by this. |
Thanks, @peterzhu2118. What you've written above answers both questions (however poorly they're posed). |
s = 'hello' | ||
s.encoding # => #<Encoding:UTF-8> | ||
s.dump # => "\"hello\"" | ||
s.encode('utf-16').dump # => "\"\\xFE\\xFF\\x00h\\x00e\\x00l\\x00l\\x00o\".dup.force_encoding(\"UTF-16\")" | ||
s.encode('utf-16le').dump # => "\"h\\x00e\\x00l\\x00l\\x00o\\x00\".dup.force_encoding(\"UTF-16LE\")" | ||
|
||
s = 'тест' | ||
s.encoding # => #<Encoding:UTF-8> | ||
s.dump # => "\"\\u0442\\u0435\\u0441\\u0442\"" | ||
s.encode('utf-16').dump # => "\"\\xFE\\xFF\\x04B\\x045\\x04A\\x04B\".dup.force_encoding(\"UTF-16\")" | ||
s.encode('utf-16le').dump # => "\"B\\x045\\x04A\\x04B\\x04\".dup.force_encoding(\"UTF-16LE\")" | ||
|
||
s = 'こんにちは' | ||
s.encoding # => #<Encoding:UTF-8> | ||
s.dump # => "\"\\u3053\\u3093\\u306B\\u3061\\u306F\"" | ||
s.encode('utf-16').dump # => "\"\\xFE\\xFF0S0\\x930k0a0o\".dup.force_encoding(\"UTF-16\")" | ||
s.encode('utf-16le').dump # => "\"S0\\x930k0a0o0\".dup.force_encoding(\"UTF-16LE\")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to move the examples of non-UTF8 encodings to a separate section with some text describing it (e.g. using hexadecimal format and adding dup.force_encoding(<encoding name>)
. This is because non-UTF8 is more of an edge case rather than a commonly used case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the cited lines to the end. I think you want other changes, but I'm not sure what exactly is needed. Can you fix up one, as a guide for me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterzhu2118, I'll take another shot at this; marking as Draft in the interim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterzhu2118, I take it back. I don't know what to do with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #13965
No description provided.