Skip to content

Commit 3ba88d8

Browse files
committed
Fixed   for XHTML in Shift_JS
Fixes jhy#523
1 parent ddf4c1b commit 3ba88d8

File tree

3 files changed

+31
-1
lines changed

3 files changed

+31
-1
lines changed

CHANGES

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,11 @@ jsoup changelog
1818
* Fixed an issue where a table nested within a TH cell would parse to an incorrect tree.
1919
<https://github.com/jhy/jsoup/issues/575>
2020

21+
* When serializing a document using the XHTML encoding entities, if the character set did not support &nbsp; chars
22+
(such as Shit_JIS), the character would be skipped. For visibility, will now always output &xa0; when using XHTML
23+
encoding entities (as &nbsp; is not defined), regardless of the output character set.
24+
<https://github.com/jhy/jsoup/issues/523>
25+
2126
*** Release 1.8.2 [2015-Apr-13]
2227
* Performance improvements for parsing HTML on Android, of 1.5x to 1.9x, with larger parses getting a bigger
2328
speed increase. For non-Android JREs, around 1.1x to 1.2x.

src/main/java/org/jsoup/nodes/Entities.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ static void escape(StringBuilder accum, String string, Document.OutputSettings o
115115
if (escapeMode != EscapeMode.xhtml)
116116
accum.append("&nbsp;");
117117
else
118-
accum.append(c);
118+
accum.append("&#xa0;");
119119
break;
120120
case '<':
121121
if (!inAttribute)

src/test/java/org/jsoup/nodes/DocumentTest.java

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
package org.jsoup.nodes;
22

3+
import java.io.ByteArrayInputStream;
34
import java.io.File;
45
import java.io.IOException;
6+
import java.io.InputStream;
57
import java.nio.charset.Charset;
68
import org.jsoup.Jsoup;
79
import org.jsoup.TextUtil;
@@ -382,4 +384,27 @@ private Document createXmlDocument(String version, String charset, boolean addDe
382384

383385
return doc;
384386
}
387+
388+
@Test
389+
public void testShiftJisRoundtrip() throws Exception {
390+
String input =
391+
"<html>"
392+
+ "<head>"
393+
+ "<meta http-equiv=\"content-type\" content=\"text/html; charset=Shift_JIS\" />"
394+
+ "</head>"
395+
+ "<body>"
396+
+ "before&nbsp;after"
397+
+ "</body>"
398+
+ "</html>";
399+
InputStream is = new ByteArrayInputStream(input.getBytes(Charset.forName("ASCII")));
400+
401+
Document doc = Jsoup.parse(is, null, "http://example.com");
402+
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
403+
404+
String output = new String(doc.html().getBytes(doc.outputSettings().charset()), doc.outputSettings().charset());
405+
406+
assertFalse("Should not have contained a '?'.", output.contains("?"));
407+
assertTrue("Should have contained a '&#xa0;' or a '&nbsp;'.",
408+
output.contains("&#xa0;") || output.contains("&nbsp;"));
409+
}
385410
}

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy