Why do I get exactly the same count (≈983) for both
The first one is a redirect to the second one. The first one has less than 200 transclusions, the second one does contain all transclusions of the first one, but 750 more.
Why do I get exactly the same count (≈983) for both
The first one is a redirect to the second one. The first one has less than 200 transclusions, the second one does contain all transclusions of the first one, but 750 more.
As far as cirrussearch is concerned a template and it's redirects are the same page. Will have to find some time next week to look closer into this particular case, but in general there is no useful distinction in CirrusSearch between redirects and the pages they redirect to.
Well, the use case is that I want to edit those pages which transclude the redirect, in order to modify template name and old parameters, rather than pages transcluding the generic template name where new parameters are already used.
That works fine if generic template name and redirect name differ significantly.
Change 512198 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/core@master] Templates in search should be case sensitive
I took a closer look at this and indeed, the mapping is performing lowercasing for all queries to the template field (and we only have a single analysis chain applied). We can probably simply change the analysis chain there, will require some quick review that template boosting is all using appropriately cased template names across wikis that have configured it.
I'm not sure if there might be more knock-on effects though...
Change 512198 abandoned by EBernhardson:
Templates in search should be case sensitive
Reason:
discussed with Stas, we think a better way forward is to have the field indexed both ways (case sensitive and insensitive). Unfortunately we are having some disk space problems and adding new fields will have to wait for Q1 to replace aging servers.
Thank you for now.
BTW, linksto: and incategory: are both using page names as well.
I do not expect categories distinguished by letter case only, but for linksto: there might be a difference between BIOS and Bios articles, even more at Wiktionary with significant first letter.
High level plan:
references
Today the template field is defined as:
$fields['template'] = $engine->makeSearchFieldMapping( 'template', SearchIndexField::INDEX_TYPE_KEYWORD ); $fields['template']->setFlag( SearchIndexField::FLAG_CASEFOLD );
FLAG_CASEFOLD is used to tell the search engine that it should ignore case for this field. It seems like what we actually want to tell the search engine is that casefolding is convenient for default searches, but to identify a specific template requires case-sensitive matching. Whatever name is chosen to indicate this, KeywordIndexField::getMapping will need to be adjusted to recognize the flag and generate an appropriate multi-field.
https://wikitech.wikimedia.org/wiki/Search#In_place_reindex
Adjust CirrusSearch\Query\HasTemplateFeature::parseValue to recognize whatever syntax is agreed on to trigger case-sensitive matching, returning a 'case-sensitive' property along with the current templates. Use this value in HasTemplateFeature::doGetFilterQuery to decide the appropriate field to filter on.
Just a reminder:
I suppose that the last remark refers to the $wgCapitalLinks and
$wgCapitalLinkOverrides configuration variables.
When querying cirrus properly honors these parameters in a way that searching for hastemplate:foo will actually search for Template:Foo on english wikipedia but Template:foo on english wiktionary.
For the indexed value a single flag is needed I think because the wiki configuration will be taken into account by CirrusSearch when searching.
Well, on a wiktionary only pages in main namespace are SENSITIVE_ALL, but templates and categories do behave like every other wiki.
On any non-wiktionary main space pages and all others are IGNORE1_SENSITIVE, afaik.
Change 565370 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[mediawiki/extensions/CirrusSearch@master] Allow template keyword to be case sensitive
Change 566389 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[mediawiki/extensions/CirrusSearch@master] Allow search for case sensitive template keyword
Change 565370 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Add case sensitive subfield for template keyword
The reindex for this is in progress, will be another week or more before it's complete.
Change 566389 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Allow search for case sensitive template keyword
The patch will go out with the train in the last week of april (no train running next week). The reindex that allows this to work has mostly completed, a few wikis have to be re-run but will hopefully be finished at or soon after this train rolls forward.