Wiktionary:Beer parlour/2025/January

Bad ledes in Thesaurus namespace

Latest comment: 8 days ago1 comment1 person in discussion

@qwertygiy Standard practice in the Thesaurus namespace is currently putting a blank line between {{ws header}} and the first L2. See, for example, Thesaurus:person, Thesaurus:berry, etc. This gives a warning to anyone who makes a new Thesaurus entry (e.g. at this trigger of filter 115), because it's in violation of WT:NORM. So either the Thesaurus namespace should be excluded from filter 115 or the practice should be changed to match NORM. -saph (user—talk—contribs) 03:12, 1 January 2025 (UTC)Reply

2024 – Top pageviews statistics

Latest comment: 8 days ago1 comment1 person in discussion

The top for en.wiktionary.org (unfiltered list from dump files) is:

	25840383 Special:Search
	21800424 Wiktionary:Main_Page
	 1993839 Appendix:Glossary
	 1645478 rainbow_kiss
	 1506823 xxx
	 1451758 -
	 1244194 黑料
	 1145183 吃瓜
	  938861 Category:English_swear_words
	  681510 bokep
	  672460 I'll
	  633276 视频
	  599259 XXXX
	  598818 aww
	  567081 colmek
	  527691 麻豆
	  523495 bocil
	  493837 Appendix:Protologisms/Long_words/Titin
	  463888 Appendix:Filipino_surnames
	  444811 Wiktionary:International_Phonetic_Alphabet
	  427258 XXX
	  395379 لا_إله_إلا_الله_محمد_رسول_الله
	  395192 
	  386054 «
	  377134 astaghfirullah
	  356207 変態
	  349388 Category:English_surnames_from_Old_English
	  343450 سکس
	  342540 ‘
	  338717 pajeet

Detail and another WMF projects: https://archive.org/details/2024-top_2k_user_pageviews Dušan Kreheľ (talk) 08:49, 1 January 2025 (UTC)Reply

Pronunciation of irregular plurals

Latest comment: 4 days ago12 comments5 people in discussion

Currently there is no way of knowing how to pronounce, for example, ibices or sphinges. JMGN (talk) 00:46, 2 January 2025 (UTC)Reply

@JMGN: As īʹbĭsēz, /ˈaɪbɪsiːz/ and sfĭnʹjēz, /ˈsfɪnd͡ʒiːz/, respectively. 0DF (talk) 13:30, 4 January 2025 (UTC)Reply

@0DF: Should we add them to the headword entries too, since they appear there and are irregular? Namely, in ibex & sphinge. JMGN (talk) 18:03, 4 January 2025 (UTC)Reply

Then add the pronunciations or add a pronunciation request? -saph (user—talk—contribs) 03:42, 2 January 2025 (UTC)Reply

To all of them? I thought this was Beer parlor... JMGN (talk) 12:17, 2 January 2025 (UTC)Reply

There's nothing preventing anyone from adding them now, hence, no need to bring the issue to Beer parlour. See, for instance, vertices or indices. Andrew Sheedy (talk) 18:33, 2 January 2025 (UTC)Reply

Really surprized that this cannot be automated as the rest of the pronunciations though... JMGN (talk) 21:21, 3 January 2025 (UTC)Reply

English is a special case: it represents the collision of two branches of Indo-European, followed by a thousand years of history including serving as the main language in two whole continents and in numerous countries worldwide, and as a second language for over a billion people. It has huge numbers of loanwords coming from languages all over the world throughout known history. On top of all that, it has no authoritative standard. Although it may be possible to automate the pronunciation, it would be a huge project and would probably add considerably to system overhead on the million+ pages where it would be deployed. Chuck Entz (talk) 02:01, 4 January 2025 (UTC)Reply

Hard to believe that there're over a million entries with irregular plurals... JMGN (talk) 12:57, 4 January 2025 (UTC)Reply

There don't have to be. What is there about irregular plurals that requires special treatment? Chuck Entz (talk) 14:01, 4 January 2025 (UTC)Reply

@JMGN: We have 18,548 entries for English nouns with irregular plurals. 0DF (talk) 15:43, 4 January 2025 (UTC)Reply

@0DF: Thnx. Let's do it then! JMGN (talk) 18:00, 4 January 2025 (UTC)Reply

"number" or "numeral"?

Latest comment: 6 days ago10 comments4 people in discussion

We currently have a POS "numeral", hence CAT:Numerals by language, CAT:English numerals, etc. but for some reason we have CAT:Cardinal numbers by language and CAT:Ordinal numbers by language not #cardinal numeral or #ordinal numeral. Category:Numerical appendices has a mixture of appendices called "Foo numbers" and "Foo numerals". I'd like to straighten this out by using a consistent naming scheme, probably numeral instead of number. [The root of the issue seems to be that numeral is normally taken to be a symbol (like 2, 3, 4 in the Hindu-Arabic system) that refers to a number, which is an abstract concept, but (a) whether numerical words like two, three, four are considered "numerals" or "numbers" is less clear (technically it appears they are numerals, being symbols of sorts, maybe more correctly signs, that refer to abstract numbers), and (b) in common parlance, the distinction between numeral and number is elided.] Mixing both terms is unhelpful, so if we can settle on consistent terminology, either "numeral" or "number", I can do the renames. Benwing2 (talk) 03:33, 2 January 2025 (UTC)Reply

I think I have a preference for numeral. Vininn126 (talk) 03:45, 2 January 2025 (UTC)Reply

If we are to standardize to one of them, then "numeral" is the better option of the two. But I wonder if we could instead come up with some consistent distinction between the two. — SURJECTION ^{/ T / C / L /} 07:58, 2 January 2025 (UTC)Reply

@Surjection Can you be more specific? What sort of distinction were you thinking of? Benwing2 (talk) 09:17, 2 January 2025 (UTC)Reply

Numerals are a POS, numbers are a semantic category. Ordinal numbers are often not numerals in many languages, same goes for adverbial numbers (once, twice) and fractional numbers (half, third), for instance.

In many languages, the numeral POS has different grammatical properties than other POS: in Finnish and Russian, it governs very specific cases on the adjacent noun, for instance. In many other languages, there is no numeral POS at all (e.g. Afar nammáy or Tokelauan lua). They are still numbers though.

So, basically, the appendices calling these "numerals" are 'wrong' (in the sense that they don't follow the above distinction), and should probably be standardised to numbers. Thadh (talk) 09:10, 2 January 2025 (UTC)Reply

I'm not sure where you got the idea that a numeral word has to be its own part of speech to be a numeral. See Ordinal numeral on Wikipedia. That seems a Thadh-ism (if I may call it that), which you've extrapolated from a handful of languages. Benwing2 (talk) 09:16, 2 January 2025 (UTC)Reply

OK, that was a bit snarky and I apologize for that, but I'm still confused as to where you've gotten your ideas from. Benwing2 (talk) 09:27, 2 January 2025 (UTC)Reply

Yes, it was... No worries. I don't think I've said that, but rather that for now that's the distinction we do (or should/could) handle. 'Numeral' as a grammatical category is very useful for these languages that do have one, whereas they still have a distinct semantic category including other parts of speech. We can and should make a distinction between the two, and I think calling them by the same name will lead to much more confusion than the status quo. Thadh (talk) 09:16, 3 January 2025 (UTC)Reply

I am not convinced of this. Not all Russian numbers work the same way by any means; they range from один (pure adjective) to миллион (pure noun), with in-between numbers getting progressively more noun-like and less adjective-like. I don't know Finnish but I wouldn't be surprised things are similar. If you want to make a number vs. numeral distinction, you need to spell out when one term is used and when another is used, and what renames need to happen; otherwise I have no idea what you're getting at. Benwing2 (talk) 09:38, 3 January 2025 (UTC)Reply

All right:

- number is used to denote any member of the semantic category/ies that denote a specified amount, position etc. that can be theoretically counted.

- numeral is used to denote any member of a syntactic category of nominals that exhibits syntactic behaviour not found in other nouns and adjectives, and is typically associated with amounts that can or cannot be counted.

Since POSs already are language-specific, I can't give you hard-and-fast rules where to use the latter: just like I can't give you a way to recognise a noun or a verb or an adjective, it differs by language. However, it is clear to me that there are languages where numerals are nominals (in flectional languages, they can usually be inflected, in others, they can be used as a head of a nominal phrase) that do not syntactically behave the same way as nouns or adjectives or (if those exist) determiners, and have very specific rules of governing the head noun.

Maybe indeed the numeral 'one' (yksi) could better be analysed as an adjective, but that doesn't take away that kaksi is neither an adjective nor a noun, as it agrees in the oblique cases but takes a partitive-case noun in the nominative (non-agreement). Same thing for Russian два (dva): две девушки, but двум девушкам - partial agreement, not identical to adjectives or nouns. Thadh (talk) 09:55, 3 January 2025 (UTC)Reply

Extended Mover request: User:Rex Aurorum

Latest comment: 3 days ago2 comments2 people in discussion

Hello. I'd like to request extended mover rights, mainly to be able to fix issues: 1. Typos made by earlier editors (non-Indonesian speakers) 2. Typos made by myself (frequently made typos in certain clusters) ―Rex Aurōrum^{｢Disputātiō｣} 10:29, 2 January 2025 (UTC)Reply

Nominated at WT:WL. Svārtava (t ɕ) 16:18, 5 January 2025 (UTC)Reply

Sundanese main entries

Latest comment: 6 days ago1 comment1 person in discussion

I noticed that Sundanese entries have the main entries in the Sundanese script but according to @Udaradingin, the most common script used nowadays is the modern script. Shouldn't the main entries (and possibly also links) be moved to Latin script entries just like how Tagalog use Latin script and the Baybayin spelling is just shown as an alternative spelling? Thanks. 𝄽 ysrael214 (talk) 08:51, 3 January 2025 (UTC)Reply

Stray Arabic-script digit entries

Latest comment: 5 days ago2 comments2 people in discussion

We have entries for the main series of Arabic-Indic digits in Unicode, and what Unicode refers to as the Extended Arabic-Indic digits

Comparison of digits
Digit	Main series	Main series language sections	Extended series	Extended series language sections
1	١	Translingual	۱	Ottoman Turkish Persian Punjabi Urdu
2	٢	Translingual	۲	Ottoman Turkish Persian Punjabi Urdu
3	٣	Translingual	۳	Ottoman Turkish Persian Punjabi Urdu
4	٤	Translingual Ottoman Turkish	۴	Persian Punjabi Urdu
5	٥	Translingual Ottoman Turkish	۵	Persian Punjabi Urdu
6	٦	Translingual Ottoman Turkish	۶	Pashto Persian Punjabi Urdu
7	٧	Translingual	۷	Ottoman Turkish Persian Punjabi Urdu
8	٨	Translingual	۸	Ottoman Turkish Persian Punjabi Urdu
9	٩	Translingual	۹	Ottoman Turkish Persian Punjabi Urdu
0	٠	Translingual	۰	Ottoman Turkish Persian Punjabi Urdu

As you can see, the main series are all straightforward Translingual entries like we have for the Latin-script digits, though a few also have Ottoman Turkish entries. The Extended series, however, are all nothing but entries for individual languages. What's more, many of them have no headword templates, and the ones that do treat them as entries for the spelled-out word that the symbol represents in that language. I did my best to fix the entries at ۱ (in the Extended series), but then I realized that there shouldn't be entries for specific languages at all, just a Translingual section at the top.

I'm not sure what the Translingual entries for these characters should look like- some, at least, seem like variants used in some Arabic-script languages, but not others. Others seem like identical glyphs that are separate due to some quirk in the early history of Unicode. There's a task listed in WT:Todo for fixing entries that use the wrong character for a given language, so there's probably a lot more to this.

I do think that all of the pages in both series should have only a Translingual section, and all the other language sections should be merged into the spelled-out versions that the digits represent (if there's anything worth keeping). I didn't see any idiomatic senses like "4" used for "for" in texting.

The main problem is that I'm not really proficient in these languages, so I'm not sure how, exactly, to fix this- but I am sure it needs to be fixed, somehow. Thanks, Chuck Entz (talk) 01:28, 4 January 2025 (UTC)Reply

You are making sense. The Ottoman entries were added by @Moonpulsar in 2023, the Persian and Urdu ones in 2006 and 2007, when rules, standards or consistency on Wiktionary were not developed in a now relevant extent, as formatting was wild. The analogy to non-Arabic-script languages suggest that we keep but translingual entries, perhaps even with hard-redirects of alternative forms.

I have seen both series of numbers in either Arabic, Persian, and Ottoman prints, a second one did not need to have been encoded by Unicode in the first place, rather than being relegated to font systems varying display by language, as say, italic б looks different depending on whether it is Serbian or Russian, and less clearly Bulgarian.

The numbers did not even have distinct names from the ones we use in Europe, once again I note that the terminology of Eastern Arabic numerals vs. Western Arabic numerals is Wikipedia’s citogenesis, with their frequent problem of citing terms added to some list but barely used, in case anyone attempts to conceive what Wiktionary has to portray. Fay Freak (talk) 02:18, 4 January 2025 (UTC)Reply

Affix template standardization

Latest comment: 3 days ago11 comments7 people in discussion

The templates prefix, suffix and confix (and their shortcuts pre, suf and con, respectively), can all be handled by affix (and its shortcut af). The template compound (and its shortcut com) can also be handed by af, although compound+ (and its shortcut com+) provides additional text that is not currently replicable with af. Both pre and suf are designated as "less-preferred" on category pages in favor of af, so it appears that af is the de facto standard. However, the other templates can still be found on many pages so converting them to af will need to be done. Once that is done, the templates prefix, suffix and confix (and their shortcuts pre, suf and con, respectively) can be formally depreciated, similar to circumfix. Netizen3102 (talk) 17:11, 4 January 2025 (UTC)Reply

What is the rationale? By analogy, changing all the for loops in a computer program into while loops reduces the number of keywords but why is that better? Makes it harder to understand. 2A00:23C5:FE1C:3701:C9DA:1ED4:BE2C:8235 17:13, 4 January 2025 (UTC)Reply

One rational is that it would be easier for NEW users. Less complexity of templates equals lower barrier to entry. The downside is that these are fairly used templates and editors who have been using them for a while will have to adjust. I still think we should try to make it easier for new people, however, even if it is in a small way. Vininn126 (talk) 17:15, 4 January 2025 (UTC)Reply

Just remember that most people know what prefixes and suffixes are, but quite a few have never heard of affixes. Chuck Entz (talk) 19:51, 4 January 2025 (UTC)Reply

What would be interesting is if we were to merge these templates, would we suddenly see threads popping up about the lack of {{pre}} and {{suf}}, asking how to deal with prefixes and suffixes? A hypothetical, to be sure, but I think an interesting one. Vininn126 (talk) 19:55, 4 January 2025 (UTC)Reply

This can be addressed by having a few well maintained perfectly formatted role model entries for each language, provided as examples for new users. Similar to how parrot is used as an example for Wiktionary:Quotations. --Ssvb (talk) 12:55, 5 January 2025 (UTC)Reply

Oppose: I don't think consolidating the complexity of these several templates into one would really make things easier for new editors. — excarnateSojourner (ta·co) 01:19, 6 January 2025 (UTC)Reply

The af template requires dashes similar to how prefixes and suffixes are traditionally written, whereas pre, suf and con do not. For example, (af) un- +‎ do and (pre) un- +‎ do both produce the same output, but pre does not require the dash after the prefix, which could be confusing to editors. Netizen3102 (talk) 17:22, 4 January 2025 (UTC)Reply

I also personally find it easier to keep track of affixes when the dashes are present - the other prefixes do ALLOW for dashes, but they are not required. Vininn126 (talk) 17:25, 4 January 2025 (UTC)Reply

I'm not convinced at all. Continuing the for/while analogy, that's like pointing out that while only needs a single expression and doesn't require semicolons or a "stepwise rule". Sure! But that's why we don't use it for everything. Because humans are not instruction sets, and benefit from context. I'm sure there are ways to DRY it by having one template call into another. Reducing everything to eventual Turing tape is nasty. 2A00:23C5:FE1C:3701:C54E:F82E:FAA1:E7A5 05:54, 6 January 2025 (UTC)Reply

Fewer varieties of different templates make Wiktionary more machine readable, even though I'm not sure whether this is considered to be a desirable goal. --Ssvb (talk) 19:11, 4 January 2025 (UTC)Reply

Category:Ojibwe stem-building elements

Latest comment: 3 days ago4 comments3 people in discussion

There are a few Ojibwe entries that keep showing up in WT:Todo lists because they're not in Category:Ojibwe lemmas or Category:Ojibwe non-lemma forms, and lots more that don't show up in the lists, but have similar problems.

First, some background: as with many American Indian languages, Ojibwe is polysynthetic, meaning that it uses mostly complex systems of morphemes bound together instead of separate words. That makes it hard to analyze Ojibwe grammar using the categories established for the better known European languages. There are prefixes, suffixes, infixes, and circumfixes that attach, not just to a central root or stem, but also to each other in very complicated ways.

Apparently Ojibwe carries this even farther by having stems that are made up of separate sub-elements: initials, medials, and finals, as explained on the page for Category:Ojibwe stem-building elements. These aren't completely arbitrary: they each have specific roles and carry specific types of information.

For 5 months in 2020, @SteveGat spent a great deal of time expanding our coverage of Ojibwe, but in ways that never really got integrated into our POS headers and categories. I would like to do that part now.

The question is: how should we do that. I can see a few approaches:

Make initials, medials and finals into prefixes, suffixes, and/or infixes
Make all of them just plain morphemes
Add them to the modules as lemmas

For the first two options, we would want to have secondary categories to preserve the information. These entries already have secondary categories such as Category:Ojibwe noun finals and Category:Ojibwe verb finals and tertiary categories attached to those. For the third option, we would want to also integrate the new lemma types into the category-tree modules so the categories can use {{auto cat}}. For that matter, we could do the same for the secondary and tertiary categories no matter what we do with the rest.

I should also mention Category:Ottawa initials, which indicates that there are probably more languages with similar issues that I don't know about. That one shows up in Wiktionary:Todo/Lists/Uncategorised pages (all namespaces)#Category, but there may be more with categories added by hand. Chuck Entz (talk) 19:38, 4 January 2025 (UTC)Reply

I forgot to ping @-sche, who knows a lot more about Algonquian languages like this one than I do. Chuck Entz (talk) 19:40, 4 January 2025 (UTC)Reply

That's also a feature of other Algonquian languages like Cree (however that language(s) happens to be treated on Wiktionary), though Ojibwe is the one with the most content. Circeus (talk) 17:21, 5 January 2025 (UTC)Reply

I can't say I know a lot about Ojibwe but in general I would prefer to try and fit things like initials, medials and finals into existing categories like prefixes, suffixes and infixes rather than just use the language-specific terminology directly. This latter approach, in the extreme, leads to a proliferation of lemma types that is singularly unhelpful, e.g. as was done with Lojban, where someone added Lojban-specific lemma types cmavo, cmene, fu'ivla, gismu, lujvo and rafsi to Module:headword/data. I and most people can't tell a gismu from a Ginsu knife, making these terms completely opaque. I went through a year or two ago and tried to rewrite the opaque Lojban grammatical terminology into the most similar comprehensible term, hence the terms in the category Category:Lojban gismu now have a header "Root" instead of "gismu"; similarly "Predicate" in place of "lujvo"; etc. The actual categories haven't yet been renamed but should be. IMO if there's a one-to-one mapping between initial <-> prefix, final <-> suffix, etc. there is no need to have the same term categorized into both CAT:Ojibwe prefixes and CAT:Ojibwe initials (just use the former), but if there is some extra information in the CAT:Ojibwe initials category, I am not averse to having the term categorized both ways. I can add the {{auto cat}} support for language-specific (or family-specific) terminology like "initials", "finals", etc.; this is not hard as the underlying functionality for language-specific categories is already present. The only other thing I'd add is that we have nastily-named categories in Special:WantedCategories like Category:Unami animate intransitive (vai), Category:Unami verb transitive inanimate and Category:Unami inanimate intransitive verb. I remember having a discussion with someone (probably the same SteveGat) about putting these into separate Category:Unami animate verbs and Category:Unami intransitive verbs categories; he eventually convinced me that there is a reason for combining them, as apparently a "transitive inanimate" verb is a different beast from an "inanimate intransitive" verb, not just the transitive equivalent. But these definitely should use the Wiktionary standard naming format Category:Unami inanimate intransitive verbs and such, not Category:Unami verb inanimate intransitive or some other weirdness, even if the latter is the standard format used in the grammar of these languages. Benwing2 (talk) 06:07, 6 January 2025 (UTC)Reply

How about...

Latest comment: 3 days ago13 comments8 people in discussion

creating a template called "all", that can do everything? You just need to know what to put in the parameters, as described in the template documentation subpages (we're limited to 2 MB per page, so there would have to be a number of them). In other news, there's a new Swiss Army Knife™ that can do thousands of different things. The only problem: with all the attachments, it's over a meter wide...

There's a certain amount of complexity inherent in any given task. The question is, how do we distribute it?

With lots of templates, we don't have to know as much to do any one task. With fewer templates, we have to know about more things to do one task, but if we do multiple tasks, the information is in fewer places.

We should be thinking about what tasks go together, and have one template that does the things that go together, but multiple templates to do things that don't.

Also, we need to think about the range of things the individual user deals with: someone who edits Mandarin Chinese needs to know about Han characters, tones, and various particles like classifiers, but not affixes or grammatical gender. Someone who works with most European languages, on the other hand, needs to know about the morphology for things like cases, gender, number, mood, voice, tense, aspect, etc. Someone who works with Celtic languages needs to consider the interactions in sounds between syllables, separate words, and even sentences, while a Hawaiian doesn't really encounter synchronic phonological changes in some vowels, and nothing at all in consonants.

Considering this, we should think about whether all of those people need to use the same templates for everything. Yes, we have specialized templates for specific languages that do extra things, but we should also think about whether to have templates for specific languages that do less so users don't have think about as much. We've been deleting lots of language-specific templates that can't do things that the general templates can, but also can't do anything that the general templates can't. The question that doesn't get asked is: are the things that the templates can't do things that editors in those languages will want to do.

Another thing to think about: knowing that a template called "xyz-noun" is all you need for headwords in language-xyz noun entries should make it easier to get started in that language. If it doesn't have features needed for that language, you can always use {{head}}, or learn to customize it. It's also nice to have things that are just for you and your community of editors.

That's not to say that such things should be used as barriers to keep others out or as a way to claim ownership over language entries or anything else. All of the things I mentioned above should be considered as needed, but shouldn't override all the other things we already look at- I'm talking about broadening the discourse, not replacing it. Chuck Entz (talk) 21:45, 4 January 2025 (UTC)Reply

How about we not be so reliant on templates? There's too many templates as is and they change way too frequently. (And while we're at it, you shouldn't be required to code to create or edit a category). Purplebackpack89 16:12, 5 January 2025 (UTC)Reply

Lua may have allowed incredible flexibility in templates, but it has also made them impossible to edit for 99% of people. I do not consider this a good thing. Circeus (talk) 17:23, 5 January 2025 (UTC)Reply

Plus, any given template relies on a stack of dependencies that is completely impenetrable. I've given up trying on many templates and modules and I'm more knowledgeable than your average person (but still pretty ignorant about programming). —Justin (koavf)❤T☮C☺M☯ 17:24, 5 January 2025 (UTC)Reply

Hear, hear DCDuring (talk) 17:39, 5 January 2025 (UTC)Reply

Lua is by far nicer, cleaner and easier to read than wiki templates. Complex templates are way too cryptic. --Ssvb (talk) 21:27, 5 January 2025 (UTC)Reply

The closest thing I've ever come up with is a universal definition template. I don't understand the bellyaching here, where we already have to deal with tons of functions, regardless of language (those that think you can't are sorely mistaken) - on the otherhand, I'm not sure you can create something THAT universal, at least not in one fell swoop. Vininn126 (talk) 17:50, 5 January 2025 (UTC)Reply

I think the point is that templates shouldn't be designed for those who put in more than 30 hours a week on Wiktionary and/or have IQs over 200. {{en-noun}} is wonderfully powerful, but even high-volume contributors have had trouble with the keystroke-saving features using "+", "-", "~", not to mention the complexities of auto-pluralization. Why should users have to consult the documentation every third time they try to use the template? DCDuring (talk) 19:43, 5 January 2025 (UTC)Reply

Sweet Heaven, you are singing my song. At the very least, there's no reason to not have more intelligible fall-back aliases like "pural=[foo]" or something for a normal person who casually edits. If you've ever edited SVGs at c: and tried to use c:Template:Valid SVG and its successor Templates, it's completely infuriating how clipped and counter-intuitive all the inputs are. —Justin (koavf)❤T☮C☺M☯ 19:58, 5 January 2025 (UTC)Reply

@Chuck Entz What is this in reference to? Is there something specific you're annoyed about? Benwing2 (talk) 21:33, 5 January 2025 (UTC)Reply

I'm going to take a wild guess and say that it is not a matter of one or a few templates or even one or a new types of templates, but rather an attitude toward usability and the population of potential contributors. DCDuring (talk) 03:04, 6 January 2025 (UTC)Reply

The downside is that the "xyz-noun" templates for inflected languages are enormously difficult to use and have a steep learning curve. Very few of the new editors can use them correctly on their first try, so their initial edits tend to need corrections. At least that's what I observed when looking at the new Belarusian entries added by new people. And I suspect that many potential new editors probably just give up rather than contributing incorrect edits if they notice problems in the previews of their edits. --Ssvb (talk) 21:42, 5 January 2025 (UTC)Reply

@Ssvb I agree that many of the xyz-noun templates are complex, but I'm not sure there's anything much that can be done about this. The root of the issue is that your typical inflection system is itself quite complex, and if you want to support the system fully, the template itself will necessarily be complex. One alternative is only to support the most regular inflections, but (a) most people are more interested in the harder, less regular words, which also are usually the most common words; and (b) the templates are already designed (at least the ones I've designed) so they have sensible defaults in most cases that make it relatively easy to specify the inflection of words with regular declensions or conjugations. Another alternative is to require people to specify a lot more information manually in the case of irregular inflections (e.g. just type out the entire inflection by hand); on the surface that may make it easier to enter for a native speaker who knows the inflection but doesn't want to or can't figure out the syntax of something like {{be-ndecl}}. But in practice (a) it's extremely tedious, with the result that a lot of words never get inflections; (b) it leads to lots of mistakes. Whenever I design a new template for entering the noun, verb or adjective inflection of language Foo and convert old template uses, I invariably find tons of mistakes due to bad design in the previous template where too much info has to be given manually. So I'm not really sure what a better approach would be. Benwing2 (talk) 05:38, 6 January 2025 (UTC)Reply

th-cls

Latest comment: 3 days ago16 comments4 people in discussion

(Notifying Alifshinobi, Octahedron80, YURi, Judexvivorum, หมวดซาโต้, Atitarev, GinGlaep, RichardW57, Noktonissian):

I've made an inline classifier template for Thai similar to {{zh-mw}}. Here's an example of what it looks like:

(Classifier: ลูก (lûuk); ใบ (bai); ผล (pǒn); หวี (wǐi); เครือ (krʉʉa))

Are there any objections to me moving it into mainspace / any feedback? - saph ^_^^⠀talk⠀ 00:28, 6 January 2025 (UTC)Reply

I already made {{cls}} that can be used by many languages, not only Thai. (Tai languages and Vietnamese also use classifier.) I oppose to make template for only Thai. You should expand into this template instead. (It is used a lot at thwikt) --Octahedron80 (talk) 01:07, 6 January 2025 (UTC)Reply

I'm not sure I would agree that this is a situation where a one-size-fits-all template is ideal, especially not a wikitext template. {{th-cls}} has automatic translit where {{cls}} does not, for one. - saph ^_^^⠀talk⠀ 01:13, 6 January 2025 (UTC)Reply

Because I add tr=- to prevent translit since it results to many parentheses. It is no need to show them all. --Octahedron80 (talk) 01:17, 6 January 2025 (UTC)Reply

See th:อาทิตย์ th:ຄຳ th:ᦋᦲᧃᧉ th:ကျား for example. --Octahedron80 (talk) 01:25, 6 January 2025 (UTC)Reply

Didn't notice that, fair enough. I'll wait for other people to comment. - saph ^_^^⠀talk⠀ 01:20, 6 January 2025 (UTC)Reply

@Saph I support @Octahedron80's view that we should have a single language-independent {{cls}} template, since there are a lot of languages with classifiers and otherwise we'd end up with a proliferation of incompatible and subtly different templates. This template can have language-specific behaviors for certain languages if it makes sense to do so, e.g. we could make the default transliterating and turn it off for certain languages. (IMO however, transliteration should usually be enabled, since most non-Latin scripts are unfamiliar and hard to read for the average Wiktionary user; it might make sense, for example, to turn off translit in some circumstances for Greek and Cyrillic, which aren't so hard to read and with which many people will be familiar, but for most scripts transliteration is helpful. If the issue with transliteration is display-related, we should be able to come up with a display format that works better.) If Thai needs some special behavior of some sort, that could be supported under the hood in {{cls}}.

@Octahedron80 My main complaint about {{cls}} is not its implementation but the default positioning before the headword. This is nonstandard (we usually put labels and other information after the headword) and IMO looks bad. If you're OK with it, I can do a bot run moving the {{cls}} invocations after the headword. Benwing2 (talk) 05:27, 6 January 2025 (UTC)Reply

~~No. Don't do that.~~ Originally we put classifier(s) after th-noun (and lots of Tai's noun headword). But there are many cases that it cannot share the same classifer(s) with other senses, or some senses cannot have classifier at all. So the template cls is born to add classifier per sense (just like zh-mw you know; what is mw anyway?). --Octahedron80 (talk) 05:49, 6 January 2025 (UTC)Reply

About transliteration, you can make it turn on or off tr display as you like. By the way, the zh-mw doesn't show pinyin, so I just follow that. --Octahedron80 (talk) 06:10, 6 January 2025 (UTC)Reply

About Tày language, the template is not tended to be used with Tày before headword, but someone is already widely using it. And I cannot make them off. Their classifiers should integrate with its Tày tyz-noun, like Vietnamese vi-noun. See bó for comparing. If tyz-noun support classifier by itself, so we can remove cls there. --Octahedron80 (talk) 05:54, 6 January 2025 (UTC)Reply

@Octahedron80 You are misunderstanding me. I'm not objecting to putting classifiers per sense, following the sense definition. What I'm objecting to is putting the classifier directly *before* the headword. If it goes on the headword line, it needs to follow. So I'm suggesting moving {{cls}} uses from before the headword to after the headword. BTW this is largely with Vietnamese, not with Tày or Thai. If it's better to not have it on the headword line at all, but instead on a sense line, that's fine, but I can't do that by bot; in the meantime it's better to have the classifiers after the headword than before. And since I assume the issue with per-sense classifiers occurs with all languages using classifiers (since classifiers are essentially semantic-based), so I don't see how it's useful to integrate classifiers into the headword. BTW "mw" means "measure word". See measure word and classifier on Wikipedia. Benwing2 (talk) 06:13, 6 January 2025 (UTC)Reply

Tày and Vietnamese use classifier before noun (same as Chinese), unlike other Tai languages that use classifier after number and noun. Do Wiktionary need to show classifier in headword before noun? If you asked Vietnamese users, they would say yes I guess. --Octahedron80 (talk) 06:28, 6 January 2025 (UTC)Reply

Whether the classifier comes before the noun or after the noun in the grammar of the language has nothing to do with where we should put the classifier in the headword. All headword-related information always goes after the headword itself. There is no other situation that I know of where we put any headword-related information before the headword. Thus, putting the classifier before the headword is highly nonstandard and looks really awful (IMO) and janky. So it's important we move its position. Benwing2 (talk) 06:34, 6 January 2025 (UTC)Reply

Okay. You can move cls to end of Tày headword at first, until we can make tyz (and vi?) templates better. --Octahedron80 (talk) 06:39, 6 January 2025 (UTC)Reply

@Octahedron80: Aside: I think classifier before noun is quite common amongst Tai languages in northern regions - quite possibly alignment with Chinese. --RichardW57 (talk) 08:18, 6 January 2025 (UTC)Reply

If we wanted to do per-language transliteration (/per-language turning off transliteration), would we keep it as wikitext? That seems like it would make the template a lot less readable. - saph ^_^^⠀talk⠀ 11:33, 6 January 2025 (UTC)Reply

Category:Artsakh and subcats

Latest comment: 4 hours ago11 comments6 people in discussion

Are these needed anymore? The Republic of Artsakh dissolved a year ago. 115.188.138.105 11:16, 6 January 2025 (UTC)Reply

Cf. Category:Soviet Union. The words still exist or existed in regular use. Why would we delete this category? —Justin (koavf)❤T☮C☺M☯ 11:19, 6 January 2025 (UTC)Reply

To OP's point, the category descriptions are worded as if Artsakh still exists. - saph ^_^^⠀talk⠀ 12:43, 6 January 2025 (UTC)Reply

What? His point was about the existence of the category, not the wording of the description. No one needs to start a conversation about modifying the module's wording (which I will do now). —Justin (koavf)❤T☮C☺M☯ 12:54, 6 January 2025 (UTC)Reply

https://en.wiktionary.org/w/index.php?title=Module%3Aplace%2Fshared-data&diff=83485136&oldid=83484470 —Justin (koavf)❤T☮C☺M☯ 12:58, 6 January 2025 (UTC)Reply

Well, theres a Category:Rivers in Artsakh but no corresponding Category:Rivers in the Soviet Union or other toponym categories. 115.188.138.105 20:19, 6 January 2025 (UTC)Reply

Sure, but the premise is "this place no longer exists (i.e. the state was dissolved), therefore, should we delete the categories?" and the answer is "no". There may be some subcats that shouldn't have existed or should be deleted, but that's not because the breakaway republic has been reintegrated into Azerbaijan. —Justin (koavf)❤T☮C☺M☯ 11:19, 7 January 2025 (UTC)Reply

By that argument we should have categories for all polities that have ever existed. Category:en:Rivers in the Aztec Empire? I'd rather just have categories for currently-existing polities, as well as those which are of particular historical significance to particular languages (Category:la:Towns in the Roman Empire?). This, that and the other (talk) 12:00, 7 January 2025 (UTC)Reply

By what argument? My argument was "there are enough words about [topic] to have a category", so yes, if there are enough words about that topic, go for it. Why would polities be any different than sports or political movements or breads or any of the other things we have categories about? —Justin (koavf)❤T☮C☺M☯ 12:06, 7 January 2025 (UTC)Reply

As a general rule, geographic features such as rivers, cities, etc. are categorized according to current political boundaries. So there should be no Category:Rivers in Artsakh any longer. Possibly an exception could be made for cities that no longer exist, but I'm skeptical of that, and rivers usually don't come and go, so there's no reason to put anything in Category:Rivers in Artsakh. Benwing2 (talk) 00:10, 9 January 2025 (UTC)Reply

The river categories can go but village and city categories should stay, because the invaders have either destroyed or renamed them. For example, Karin Tak belongs in the Category:en:Villages in Artsakh because it was a village only when Artsakh existed. Now neither the population, nor the village nor the name are there anymore. Vahag (talk) 08:37, 9 January 2025 (UTC)Reply

Para-Nakh languages

Latest comment: 2 days ago5 comments3 people in discussion

I would like to discuss here the addition of a reference to the Para-Nakh languages to the etymology for the Nakh languages. In my opinion, it should work the same way Ancient Greek forms refer to a pre-Greek substrate. An explanation of this is given in detail in Johanna Nichols' work {{R:cau-nkh:Nichols:2004}}. Particularly from page 145 onwards. I think this option works well for explaining forms with phonologically close but irregular correspondences. For example, (1) Ingush ӏаж (ˀaž, “apple”), Chechen ӏа̄ж (ˀaaž, “id.”); (2) Ingush нихь (niḥʳ, “hide, animal skin”), Chechen неӏ (neˀ, “id.”); (3) Ingush зӏамига (zˀamiga, “little, small”), Chechen жима (žima, “id.”); (4) Ingush чил (čil, “ashes”), Chechen чим (čim, “id.”); (5) Ingush муа (mwa, “scar”), Chechen мо (mo, “id.”); (6) Ingush миинг (miı̇ng, “alder”), Chechen маъ (maʔ, “id.”), муъ (muʔ), Bats მურყაჼ (murq̇ã, “id.”) → Georgian მურყანი (murq̇ani, “id.”) as a suggestion from user:კვარია; (7) Ingush шуа (šwa, “abomasum”), Chechen шуа (šwa, “id.”) and their doublet forms with normal development Ingush шоа (šoa, “id.”), Chechen шо (šo, “id.”) as my example. Still, I think it would be wrong to reconstruct the Proto-Nakh form on the basis of these irregular daughter forms. So it was very much not wanted to get a situation like, for example, with Proto-Finnic *omëna (“apple”), where the daughter forms have no regularity. If you have a better idea on how to handle it here, please let me know. @Vahagn Petrosyan, კვარია, Tollef Salemann, Tropylium, Chuck Entz, Thadh, Fay Freak, Surjection ɶLerman (talk) 15:36, 6 January 2025 (UTC)Reply

Just to be clear, do you propose simply making a code for a pre-Nakh substrate? Thadh (talk) 16:08, 6 January 2025 (UTC)Reply

@Thadh Yes, that's right, although Nichols doesn't have that. ɶLerman (talk) 16:16, 6 January 2025 (UTC)Reply

Why can't you simply say "borrowed from a {{bor|ce|qfa-sub}} language"? That would put the term into Category:Chechen terms borrowed from substrate languages. By the way, I consider Category:Ancient Greek terms borrowed from a Pre-Greek substrate redundant to Category:Ancient Greek terms borrowed from substrate languages. I don't believe people who claim they can distinguish between different substrate sources within the same language. Vahag (talk) 17:18, 6 January 2025 (UTC)Reply

Ok, I'll try to use this template, thanks. ɶLerman (talk) 10:53, 7 January 2025 (UTC)Reply

Splitting WT:RFVE?

Latest comment: 1 day ago3 comments3 people in discussion

This page is one of the slowest-to-load high-usage pages we have. Despite User:Pious Eterino's best efforts, it has been above 700K almost all the time since 12/7/24. It would help to find ways to split it. I can imagine three basic ways:

by whether or not another dictionary has the challenged definition.
by whether or not the challenged definition is labeled as restricted geographically (or otherwise?).
by whether the challenged definition is hard to cite because it is for a term that is highly polysemic.

I don't know which one is the best to start with. The first might encourage people to at least look at a few other dictionaries (I like to use OneLook.com for convenient access to multiple dictionaries but OED is an obvious resource for those who have access. The second would encourage those with familiar with the restricted domain to focus their efforts on those areas. The third would be useful for isolating terms that should have long dwell time in RfV.

It might also be useful to use categories and subcategories of RfVed items to isolate, say, challenged UK- or Commonwealth-specific definitions and those with other attributes suggested above as bases for splitting the page. Such a category system could be applied in other languages as well. DCDuring (talk) 22:14, 7 January 2025 (UTC)Reply

In my view the problem is that the rate of new RFVs exceeds the capacity of cite-seekers to process the requests.

I would point the finger particularly at an IP-hopping user who has, in recent weeks, been posting large volumes of words from Webster, without (as far as I can tell) making any effort to assist with other requests. I believe it is incumbent on such users to help out by looking for cites for RFVs posted by others on the page. Perhaps we need to be a bit heavy-handed in making this into a proper obligation and enforcing it, a bit like Wikipedia's "quid pro quo" system for "did you know" entries on their Main Page.

I'd prefer to try this before splitting the page. But if a split is absolutely necessary, I think the best way would be to create a (hopefully temporary) subpage WT:Requests for verification/English/Old, which would be for RFVs of {{Webster 1913}} words, words/senses marked (obsolete) and the like. This, that and the other (talk) 23:30, 7 January 2025 (UTC)Reply

@This, that and the other: I like that idea. Alternatively, we could impose a rule of, say, no more than two nominations a day. (In the case of nominations by IP addresses, nominations originating from the same IP range would be deemed to be from the same editor.) Also, I have asked this IP to sign and date nominations. If this request is ignored, I feel the nominations should simply be removed. — Sgconlaw (talk) 23:41, 7 January 2025 (UTC)Reply

Analysis of words in terms of Pali roots

Latest comment: 18 hours ago4 comments3 people in discussion

Is it legitimate to present an analysis of Pali words inherited from Old Indic in terms of Pali roots? For example, text books on Pali will present many participles sensu lato as being root + -ta or root + -ya, even though they have been inherited from Sanskrit. I believe that these are worthy as inclusion as surface analyses at the very least.

@Pulimaiyi has objected, «It has been brought to my attention that you have been creating Pali roots - some of which, have a questionable form - and then using these to synchronically derive terms which are clear-cut cases of inheritance from Old Indo-Aryan and as such cannot be analysed as intra-Pali derivations. This is very misleading. Case in point: satta and sakka. How can sakka, for instance, be categorised or analyzed as a "-ya" formation if it is indistinguishable from another hypothetical form "sakka", which could hypothetically derive from a hypothetical adjective *śakra (which would be a "-ra" suffix adjective)? These derivations were not done at the Pali-level. sakka is inherited from a "-ya" formation, it is not itself a "-ya" formation. Please desist from such edits. Thanks. -- 𝘗𝘶𝘭𝘪𝘮𝘢𝘪𝘺𝘪(𝘵𝘢𝘭𝘬) 17:07, 7 January 2025 (UTC)»Reply

In so far as sakka is indeed a gerundive (aka future passive participle), this analysis is legitimate. Multiple ancestries are possible, and can be noted in the etymology section. Kindly curb your objections to the presentation of Pali internal analyses, but rather enmhance etymology sections if appropriate. So the short answer to your request is 'no'm but I will heed a consensus. --RichardW57 (talk) 15:14, 8 January 2025 (UTC)Reply

There was some debate regarding another word on surface analysis in terms of Pali roots, but sakka and satta are exceptionally clear-cut non-surface-analysable terms. Their function as participles is duely documented on the respective pages; but that is not any reason to be necessarily be able to analyse them using the participle-forming suffixes inherited from Sanskrit - as a result, I have removed the surface analysis from these two pages. Svārtava (t ɕ) 18:01, 8 January 2025 (UTC)Reply

sakka cannot be analysed as sak + ya. Where is the -ya component in it that the etymology claims? -- 𝘗𝘶𝘭𝘪 𝘮𝘢𝘪𝘺𝘪^{(𝘵𝘢𝘭𝘬)} 18:45, 8 January 2025 (UTC)Reply

Exceptional behavior for modern Greek?

Latest comment: 2 hours ago2 comments2 people in discussion

A lot of modern Greek pages do things differently from other languages, usually for no clear reason that I can see, e.g.:

Many uses of {{col}}, {{col2}}, etc. set |sort=0 and |collapse=0.
Many terms in {{col}}, {{col2}}, etc. manually disable transliteration.
Modern Greek and Ancient Greek seem to be essentially the only users of {{see}}, which is used heavily in these two languages, esp. modern Greek.
Several pages do unusual things like {{l|el|αγριοκοιτάω|αγριοκοιτάω/αγριοκοιτώ|t=}}, {{l|el|αγριοκοιτάζω}} (instead of just e.g. {{l|el|αγριοκοιτάω}}/{{l|el|αγριοκοιτώ}}, {{l|el|αγριοκοιτάζω}}).

The page κοιτάζω illustrates the first three, and κοιτάω illustrates (1) and (4). I am in the process of cleaning up {{col}} and variants and I'm going to fix (1) and (2) pending clear reasons why these things should remain. (1) requires manual auditing to see whether any invocations of |sort=0 should stay, but I expect there to be few cases of this. @Sarri.greek @Saltmarsh Benwing2 (talk) 00:24, 9 January 2025 (UTC)Reply

Dear @Benwing2, Happy 2025! Please do as you wish so that we can copypaste your final style to our cheatsheets. I see that you discuss your changes at Discord. Just to explain:

_1 I used to copypaste it from an older style found in 2019. collapse=0 when the reader should view the whole table (without the silly-hide of few lines). Later, I found the Template:topx handling columns ad libitum, now i see top2, 3 etc.- which does not use pipes, but normal links with asteriscs: they can be copypasted easily, here and at other wiktionaries. I was about to use it at all Related sections, but if you do not like it, please use the pattern that is advisable to copypaste everywhere.

_2 t=- for repetitive similar transliterations; but if a reader wishes, repeat them.

_3 Template:see, used heavily at Related section. When the Related section is “polyplethes” -more than 5 words- we do not repeat it at each of its members. At a member (a derived or related word) we give links for its closest Rels +all its compounds and for the rest we urge the reader to see the full list at the central lemma which has the complete index of the etymological field (as we often see it in Ety.Dictionaries). Example: modern πείθω (peítho) with Rels by stem. Or ψήφος (psífos) with Rels by meaning, plus a large α...ω index (with Hide) to be able to find words easily. E.g. at the verb ψηφίζω (psifízo, “I vote”) or at ancient ψηφίζω (psēphízō, “I count; vote”) a selection of close Rels +the compounds are given plus the {see|el|and=1|ψήφος} link to the rest of the ety.field. The ancient ψῆφος (psêphos) has so many Derived terms, that, for the moment, I just gave a link to perseus.

_4 modern -άω/ώ twin.variant verbs Appendix:Greek_verbs#2nd_Conjugation my.2024.notes. The {link|el|ωωωάω|ωωωάω/ωωωώ}} was used when it is judged that the only thing to see at the -ώ variant is "go to -άω". A separate link for -ώ variant is used when it is still in use, sometimes as isodynamous to the -άω, not dated. Rarely, an -ώ is the main lemma, not the -άω (τηλεφωνώ (tilefonó, “I phone”). Or, an -ώ variant does not exist at all in practice. The modern -ώ is not a contraction of -άω (-áo) which is a modern suffix unlike the ancient uncontracted -άω (-áō), also -έω, -όω > contructing to -ῶ (-ô) which is the basic verb from Hellenistic Koine onwards. The -ίζω (-ízo) is a completely different verb and always has a separate link.

Thank you for your hard work. PS ...#Waiting for Medieval Greek. ‑‑Sarri.greek ^♫ I 10:31, 9 January 2025 (UTC)Reply

Waiting for Medieval Greek

Latest comment: 2 hours ago1 comment1 person in discussion

Still waiting for Medieval Greek (2024) L2 title to be implemented. En.wiktionary could become a pioneer, rectifying the absurd "Ancient Greek up to 1453" seen at all official catalogues, still standing as a relic, in 2025! Happy New Year! ‑‑Sarri.greek ^♫ I 10:31, 9 January 2025 (UTC)Reply

Add topic