-
Notifications
You must be signed in to change notification settings - Fork 72
Description
See python/cpython#102555, for following snippet, all text before number is treated as comment by browser:
<!---->1 normal comment
<!--<!--->2 nested-comment
<!-->3 abrupt-closing-of-empty-comment
<!--->4 abrupt-closing-of-empty-comment
<!----!>5 incorrectly-closed-comment
currently only line 1 and line 2 are correctly handled.
https://html.spec.whatwg.org/multipage/parsing.html#parse-error-nested-comment
This error occurs if the parser encounters a nested comment (e.g.,
<!-- <!-- nested --> -->
). Such a comment will be closed by the first occurring "-->
" code point sequence and everything that follows will be treated as markup.
https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment
This error occurs if the parser encounters an empty comment that is abruptly closed by a U+003E (>) code point (i.e.,
<!-->
or<!--->
). The parser behaves as if the comment is closed correctly.
https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
This error occurs if the parser encounters a comment that is closed by the "
--!>
" code point sequence. The parser treats such comments as if they are correctly closed by the "-->
" code point sequence.
Patch html-comment-0401.zip
diff --git a/lexers/LexHTML.cxx b/lexers/LexHTML.cxx
index 0ed9f90b..942c6a56 100644
--- a/lexers/LexHTML.cxx
+++ b/lexers/LexHTML.cxx
@@ -1572,6 +1572,13 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
state = SCE_H_COMMENT; // wait for a pending command
styler.ColourTo(i + 2, SCE_H_COMMENT);
i += 2; // follow styling after the --
+ chNext = SafeGetUnsignedCharAt(styler, i + 1);
+ if ((chNext == '>') || (chNext == '-' && SafeGetUnsignedCharAt(styler, i + 2) == '>')) {
+ // https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment
+ i -= (ch == '>') ? 2 : 1;
+ chPrev = '-';
+ ch = '-';
+ }
} else if (isWordCdata(i + 1, i + 7, styler)) {
state = SCE_H_CDATA;
} else {
@@ -1843,7 +1850,11 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
}
break;
case SCE_H_COMMENT:
- if ((scriptLanguage != eScriptComment) && (chPrev2 == '-') && (chPrev == '-') && (ch == '>')) {
+ if ((scriptLanguage != eScriptComment) && (chPrev2 == '-') && (chPrev == '-') && (ch == '>' || (ch == '!' && chNext == '>'))) {
+ // https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
+ if (ch == '!') {
+ i += 1;
+ }
styler.ColourTo(i, StateToPrint);
state = SCE_H_DEFAULT;
levelCurrent--;
More tests may needed as there are restrictions listed at https://html.spec.whatwg.org/multipage/syntax.html#comments
Comments must have the following format:
The string "
<!--
".
Optionally, text, with the additional restriction that the text must not start with the string ">
", nor start with the string "->
", nor contain the strings "<!--
", "-->
", or "--!>
", nor end with the string "<!-
".
The string "-->
".