Skip to content

Commit 641a5b7

Browse files
committed
doc: improve build for non-Latin1 characters
Add README.non-ASCII to explain non-ASCII doc behavior; some text moved from release.sgml. Change UTF8 SGML characters to use HTML entities. Remove unnecessary UTF8 spaces. Add SVG file check for check-nbsp target. Add dummy 'pdf' Makefile target. Reported-by: Yugo Nagata Discussion: https://postgr.es/m/20241011114122.c90f8a871462da36f2e2afeb@sraoss.co.jp Backpatch-through: master
1 parent fc7dded commit 641a5b7

File tree

6 files changed

+56
-36
lines changed

6 files changed

+56
-36
lines changed

doc/src/sgml/Makefile

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ GENERATED_SGML = version.sgml \
5959
features-supported.sgml features-unsupported.sgml errcodes-table.sgml \
6060
keywords-table.sgml targets-meson.sgml wait_event_types.sgml
6161

62-
ALLSGML := $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml) $(GENERATED_SGML)
62+
ALL_SGML := $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml) $(GENERATED_SGML)
6363

6464
ALL_IMAGES := $(wildcard $(srcdir)/images/*.svg)
6565

@@ -68,7 +68,7 @@ ALL_IMAGES := $(wildcard $(srcdir)/images/*.svg)
6868
# we're at it, also resolve all entities (that is, copy all included
6969
# files into one big file). This helps tools that don't understand
7070
# vpath builds (such as dbtoepub).
71-
postgres-full.xml: postgres.sgml $(ALLSGML)
71+
postgres-full.xml: postgres.sgml $(ALL_SGML)
7272
$(XMLLINT) $(XMLINCLUDE) --output $@ --noent --valid $<
7373

7474

@@ -143,11 +143,12 @@ postgres.txt: postgres.html
143143
## Print
144144
##
145145

146-
postgres.pdf:
146+
postgres.pdf pdf:
147147
$(error Invalid target; use postgres-A4.pdf or postgres-US.pdf as targets)
148148

149149
XSLTPROC_FO_FLAGS += --stringparam img.src.path '$(srcdir)/'
150150

151+
# XSL Formatting Objects (FO), https://en.wikipedia.org/wiki/XSL_Formatting_Objects
151152
%-A4.fo: stylesheet-fo.xsl %-full.xml
152153
$(XSLTPROC) $(XMLINCLUDE) $(XSLTPROCFLAGS) $(XSLTPROC_FO_FLAGS) --stringparam paper.type A4 -o $@ $^
153154

@@ -194,7 +195,7 @@ MAKEINFO = makeinfo
194195
##
195196

196197
# Quick syntax check without style processing
197-
check: postgres.sgml $(ALLSGML) check-tabs check-nbsp
198+
check: postgres.sgml $(ALL_SGML) check-tabs check-nbsp
198199
$(XMLLINT) $(XMLINCLUDE) --noout --valid $<
199200

200201

@@ -264,7 +265,7 @@ check-tabs:
264265
# Use perl command because non-GNU grep or sed could not have hex escape sequence.
265266
check-nbsp:
266267
@ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \
267-
$(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
268+
$(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/images/*.svg $(srcdir)/*.xsl $(srcdir)/images/*.xsl) ) || \
268269
(echo "Non-breaking spaces appear in SGML/XML files" 1>&2; exit 1)
269270

270271
##

doc/src/sgml/README.non-ASCII

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
<!-- doc/src/sgml/README.non-ASCII -->
2+
3+
Representation of non-ASCII characters
4+
--------------------------------------
5+
6+
Find non-ASCII characters using:
7+
8+
grep --recursive --color='auto' -P '[\x80-\xFF]' .
9+
10+
Convert to HTML4 named entity (&) escapes
11+
-----------------------------------------
12+
13+
We support several output formats:
14+
15+
* html (supports all Unicode characters)
16+
* man (supports all Unicode characters)
17+
* pdf (supports only Latin-1 characters)
18+
* info
19+
20+
While some output formatting tools support all Unicode characters,
21+
others only support Latin-1 characters. Specifically, the PDF rendering
22+
engine can only display Latin-1 characters; non-Latin-1 Unicode
23+
characters are displayed as "###".
24+
25+
Therefore, in the SGML files, we only use Latin-1 characters. We
26+
typically encode these characters as HTML entities, e.g., &Aacute;lvaro.
27+
It is also possible to safely represent Latin-1 characters in UTF8
28+
encoding for all output formats.
29+
30+
Do not use UTF numeric character escapes (&#nnn;).
31+
32+
HTML entities
33+
official: http://www.w3.org/TR/html4/sgml/entities.html
34+
one page: http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
35+
other lists: http://www.zipcon.net/~swhite/docs/computers/browsers/entities.html
36+
http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
37+
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

doc/src/sgml/charset.sgml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1225,7 +1225,7 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
12251225
<programlisting>
12261226
-- ignore differences in accents and case
12271227
CREATE COLLATION ignore_accent_case (provider = icu, deterministic = false, locale = 'und-u-ks-level1');
1228-
SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
1228+
SELECT '&Aring;' = 'A' COLLATE ignore_accent_case; -- true
12291229
SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
12301230

12311231
-- upper case letters sort before lower case.
@@ -1282,7 +1282,7 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
12821282
<entry><literal>'ab' = U&amp;'a\2063b'</literal></entry>
12831283
<entry><literal>'x-y' = 'x_y'</literal></entry>
12841284
<entry><literal>'g' = 'G'</literal></entry>
1285-
<entry><literal>'n' = 'ñ'</literal></entry>
1285+
<entry><literal>'n' = '&ntilde;'</literal></entry>
12861286
<entry><literal>'y' = 'z'</literal></entry>
12871287
</row>
12881288
</thead>
@@ -1346,7 +1346,7 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
13461346

13471347
<para>
13481348
At every level, even with full normalization off, basic normalization is
1349-
performed. For example, <literal>'á'</literal> may be composed of the
1349+
performed. For example, <literal>'&aacute;'</literal> may be composed of the
13501350
code points <literal>U&amp;'\0061\0301'</literal> or the single code
13511351
point <literal>U&amp;'\00E1'</literal>, and those sequences will be
13521352
considered equal even at the <literal>identic</literal> level. To treat
@@ -1430,8 +1430,8 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
14301430
<entry><literal>false</literal></entry>
14311431
<entry>
14321432
Backwards comparison for the level 2 differences. For example,
1433-
locale <literal>und-u-kb</literal> sorts <literal>'àe'</literal>
1434-
before <literal>''</literal>.
1433+
locale <literal>und-u-kb</literal> sorts <literal>'&agrave;e'</literal>
1434+
before <literal>'a&eacute;'</literal>.
14351435
</entry>
14361436
</row>
14371437

doc/src/sgml/images/genetic-algorithm.svg

Lines changed: 2 additions & 2 deletions
Loading

doc/src/sgml/release.sgml

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -16,24 +16,6 @@ pg_[A-Za-z0-9_]+ <application>, <structname>
1616
\<[a-z]+_[a-z_]+\> <varname>, <structfield>
1717
<systemitem class="osname">
1818

19-
non-ASCII characters find using grep -P '[\x80-\xFF]' or
20-
(remove 'X') grep -X-color='auto' -P -n "[\x80-\xFF]"
21-
convert to HTML4 named entity (&) escapes
22-
23-
official: http://www.w3.org/TR/html4/sgml/entities.html
24-
one page: http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
25-
other lists: http://www.zipcon.net/~swhite/docs/computers/browsers/entities.html
26-
http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
27-
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
28-
29-
We cannot use UTF8 because rendering engines have to
30-
support the referenced characters.
31-
32-
Do not use numeric _UTF_ numeric character escapes (&#nnn;),
33-
we can only use Latin1.
34-
35-
Example: Alvaro Herrera is &Aacute;lvaro Herrera
36-
3719
wrap long lines
3820

3921
For new features, add links to the documentation sections.

doc/src/sgml/stylesheet-man.xsl

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -213,12 +213,12 @@
213213
<!-- Slight rephrasing to indicate that missing sections are found
214214
in the documentation. -->
215215
<l:context name="xref-number-and-title">
216-
<l:template name="chapter" text="Chapter %n, %t, in the documentation"/>
217-
<l:template name="sect1" text="Section %n, “%t”, in the documentation"/>
218-
<l:template name="sect2" text="Section %n, “%t”, in the documentation"/>
219-
<l:template name="sect3" text="Section %n, “%t”, in the documentation"/>
220-
<l:template name="sect4" text="Section %n, “%t”, in the documentation"/>
221-
<l:template name="sect5" text="Section %n, “%t”, in the documentation"/>
216+
<l:template name="chapter" text="Chapter %n, &quot;%t&quot;, in the documentation"/>
217+
<l:template name="sect1" text="Section %n, &quot;%t&quot;, in the documentation"/>
218+
<l:template name="sect2" text="Section %n, &quot;%t&quot;, in the documentation"/>
219+
<l:template name="sect3" text="Section %n, &quot;%t&quot;, in the documentation"/>
220+
<l:template name="sect4" text="Section %n, &quot;%t&quot;, in the documentation"/>
221+
<l:template name="sect5" text="Section %n, &quot;%t&quot;, in the documentation"/>
222222
</l:context>
223223
</l:l10n>
224224
</l:i18n>

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy