Skip to content

[HTML] Update comment highlighting to match WHATWG HTML standard #232

@zufuliu

Description

@zufuliu

See python/cpython#102555, for following snippet, all text before number is treated as comment by browser:

<!---->1 normal comment
<!--<!--->2 nested-comment
<!-->3 abrupt-closing-of-empty-comment
<!--->4 abrupt-closing-of-empty-comment
<!----!>5 incorrectly-closed-comment

currently only line 1 and line 2 are correctly handled.

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-nested-comment

This error occurs if the parser encounters a nested comment (e.g., <!-- <!-- nested --> -->). Such a comment will be closed by the first occurring "-->" code point sequence and everything that follows will be treated as markup.

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment

This error occurs if the parser encounters an empty comment that is abruptly closed by a U+003E (>) code point (i.e., <!--> or <!--->). The parser behaves as if the comment is closed correctly.

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment

This error occurs if the parser encounters a comment that is closed by the "--!>" code point sequence. The parser treats such comments as if they are correctly closed by the "-->" code point sequence.

Patch html-comment-0401.zip

diff --git a/lexers/LexHTML.cxx b/lexers/LexHTML.cxx
index 0ed9f90b..942c6a56 100644
--- a/lexers/LexHTML.cxx
+++ b/lexers/LexHTML.cxx
@@ -1572,6 +1572,13 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
 				state = SCE_H_COMMENT; // wait for a pending command
 				styler.ColourTo(i + 2, SCE_H_COMMENT);
 				i += 2; // follow styling after the --
+				chNext = SafeGetUnsignedCharAt(styler, i + 1);
+				if ((chNext == '>') || (chNext == '-' && SafeGetUnsignedCharAt(styler, i + 2) == '>')) {
+					// https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment
+					i -= (ch == '>') ? 2 : 1;
+					chPrev = '-';
+					ch = '-';
+				}
 			} else if (isWordCdata(i + 1, i + 7, styler)) {
 				state = SCE_H_CDATA;
 			} else {
@@ -1843,7 +1850,11 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
 			}
 			break;
 		case SCE_H_COMMENT:
-			if ((scriptLanguage != eScriptComment) && (chPrev2 == '-') && (chPrev == '-') && (ch == '>')) {
+			if ((scriptLanguage != eScriptComment) && (chPrev2 == '-') && (chPrev == '-') && (ch == '>' || (ch == '!' && chNext == '>'))) {
+				// https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
+				if (ch == '!') {
+					i += 1;
+				}
 				styler.ColourTo(i, StateToPrint);
 				state = SCE_H_DEFAULT;
 				levelCurrent--;

More tests may needed as there are restrictions listed at https://html.spec.whatwg.org/multipage/syntax.html#comments

Comments must have the following format:

The string "<!--".
Optionally, text, with the additional restriction that the text must not start with the string ">", nor start with the string "->", nor contain the strings "<!--", "-->", or "--!>", nor end with the string "<!-".
The string "-->".

Metadata

Metadata

Assignees

No one assigned

    Labels

    committedIssue fixed in repository but not in releasehtmlCaused by the hypertext lexer

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy