For issues relating to RemexHtml, an HTML 5 parser library for PHP.
See https://www.mediawiki.org/wiki/RemexHtml
Maintained by MediaWiki-Engineering
For issues relating to RemexHtml, an HTML 5 parser library for PHP.
See https://www.mediawiki.org/wiki/RemexHtml
Maintained by MediaWiki-Engineering
In T287972#9771259, @Zabe wrote:Mid-2021 was some time ago :p
Mid-2021 was some time ago :p
Change #1017392 merged by jenkins-bot:
[mediawiki/core@REL1_40] build: Raise TestingAccessWrapper from 2.0.0 to 3.0.0
Change #1017393 merged by jenkins-bot:
[mediawiki/core@REL1_39] build: Raise TestingAccessWrapper from 2.0.0 to 3.0.0
Change #1017391 merged by jenkins-bot:
[mediawiki/core@REL1_41] build: Raise TestingAccessWrapper from 2.0.0 to 3.0.0
Change #1017393 had a related patch set uploaded (by Reedy; author: Jforrester):
[mediawiki/core@REL1_39] build: Raise TestingAccessWrapper from 2.0.0 to 3.0.0
Change #1017392 had a related patch set uploaded (by Reedy; author: Jforrester):
[mediawiki/core@REL1_40] build: Raise TestingAccessWrapper from 2.0.0 to 3.0.0
Change #1017391 had a related patch set uploaded (by Reedy; author: Jforrester):
[mediawiki/core@REL1_41] build: Raise TestingAccessWrapper from 2.0.0 to 3.0.0
This seems to be fixed in Parsoid.
FYI; this revert also solved T353849
The reverting patch is deployed, my own basic tests seem to confirm that the issue is resolved. Affected pages might require a reparse to display correctly.
Mentioned in SAL (#wikimedia-operations) [2023-12-22T13:45:13Z] <reedy@deploy2002> Finished scap: T353920 (duration: 08m 02s)
Mentioned in SAL (#wikimedia-operations) [2023-12-22T13:37:11Z] <reedy@deploy2002> Started scap: T353920
Change 985033 merged by jenkins-bot:
[mediawiki/core@wmf/1.42.0-wmf.10] Revert "Use Remex for DeduplicateStyles transform"
Change 985033 had a related patch set uploaded (by Reedy; author: Isabelle Hurbain-Palatin):
[mediawiki/core@wmf/1.42.0-wmf.10] Revert "Use Remex for DeduplicateStyles transform"
Change 985120 merged by jenkins-bot:
[mediawiki/core@master] Revert "Use Remex for DeduplicateStyles transform"
Agreed on the fact it should exist; not _entirely_ sure about the fact it would have caught it. Will have a look.
We should probably have a parser test with the &ndash; combo btw. I did a quick grep, and couldn't find any test case that had an escaped amp followed by an entity name (which makes sense, cause otherwise we would have caught this of course). But also entity forms in general seem not well guarded. A tests/parser/entities.txt might make sense after this.
Change 985120 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):
[mediawiki/core@master] Revert "Use Remex for DeduplicateStyles transform"
The issue is as follows: instead of going through a regex to do the deduplicating of styles, we pass the whole document through Remex, which apparently at some points decides to interpret &ndash; as −. This is indeed caused by the above patch, which I believe is safe to revert (pages parsed in the meantime may require cache purging for the fix to go through.)
Interestingly, the issue does NOT trigger on parsoid rendering, which I suspect may have something to do with " $options['isParsoidContent'] ?? false" that sets thing to html5format vs not in remex.
I'm marking this as UBN.
Change 980452 merged by jenkins-bot:
[mediawiki/core@master] build: Raise TestingAccessWrapper from 2.0.0 to 3.0.0
Change 980452 had a related patch set uploaded (by Jforrester; author: Jforrester):
[mediawiki/core@master] build: Raise TestingAccessWrapper from 2.0.0 to 3.0.0
Change 970422 merged by jenkins-bot:
[AhoCorasick@master] Release v2.0.0