Wikidata:Requests for permissions/Bot/CarbonBot
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Withdrawn--Ymblanter (talk) 07:46, 3 November 2024 (UTC)[reply]
CarbonBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Iamcarbon (talk • contribs • logs)
Task/s:
1) Add default mul labels to given and family names when the item has an existing default label with a mul language.
2) Remove duplicated aliases matching the items mul label, when the item has a native label in with a mul language. As mul has not been fully adopted, a limited of aliases would be modified each day to ensure existing workflows are not disrupted.
I have withdrawn the proposal to delete duplicate aliases as there are concerns that these will reduce the visibility of these items in the search rankings.
It is expected that these tasks will apply to roughy 800,000 given and family names.
Code:
Function details:
The bot runs as a console application using the new wikidata REST api.
The application executes a query for items containing a native label that do not yet have a mul label.
--Iamcarbon (talk) 18:41, 16 October 2024 (UTC)[reply]
- Question Will this take into account the issue of duplicate items?StarTrekker (talk) 20:50, 16 October 2024 (UTC)[reply]
- The unique constraint preventing duplicates is based on (label + description). This bot task only proposes to remove duplicate aliases - so this constraint will still prevent duplicates. Iamcarbon (talk) 21:09, 16 October 2024 (UTC)[reply]
- Support but this would probably also be useful for many websites and organizations. A problem there could be that when removing one could lose the info which languages have the default name as in some cases there could be some that do not have the default label. Some info on that would be good later. Prototyperspective (talk) 22:04, 16 October 2024 (UTC)[reply]
- Agreed that this can likely be extended to other types (e.g. editions of work, humans, and organizations) in the future. However, these will each need their own discussions to determine which rules work - and cause the least distruption - as these label sets have 10+ years of history that needs to be considered.
- The initial tasks here are limited to given and family names where there is an explicit default that is known to be mul, without historical baggage. We also aren't removing labels yet, as this impacts search rankings and removes an important check that prevents duplicate items from being created. Iamcarbon (talk) 22:35, 16 October 2024 (UTC)[reply]
- Support Previously discussed at Help talk:Default values for labels and aliases, there's concensus to remove these aliases in favor of mul. Midleading (talk) 01:58, 17 October 2024 (UTC)[reply]
I am Oppose it in this form. I only support an intervention where the tag and alias are the same within a language. If there is only an alias in the given language, I am convinced that it should not be removed for the time being. I also do not support intervention in non-Latin languages. I don't want to describe the expected problems from the beginning, I have already summarized them here. Unfortunately, subsequent edits ignored my comments. Pallor (talk) 06:42, 20 October 2024 (UTC)[reply]
- Some context to the opposition above: Adding labels in a given language currently gives the item a boost in search, and removing that label remove the boost.
- In languages that have a limited number of localized labels, items (particularly names) can be easily boosted above others by adding a localized or duplicate label.
- For less common names, where the search results only return a few items, removals have no impact - as all items are still returned. However, for other more popular items, these removals can make those names harder to discover. The impact varies per language, depending on how many other items have localized labels.
- We also need to consider that WMDE may change the algorithm in the future, as default labels become more popular, to provide additional weight to site links and other factors. Any short term "boosts" that we get for certain languages are likely to be nullified in the future.
- We should be working toward sustainable long term solutions that do not rely on duplication. For example, adding contextual suggestions (i.e. suggesting only family names, when adding a family name.)
- It should also be considered that while only ~0.5% of items are given and family names, they are currently responsible for nearly 10% off all labels and aliases. At 500 labels per name, this requires us to maintain 350,000,000+ additional labels. These labels have real storage and indexing costs, make the site less responsive, account for millions of unnecessary edits (and related watchlist notifications), and require significant time from the community to provide oversight.
- The default names are project has been in the works for years to facilitate the removal of these labels. While we can consider keeping them to keep the status-quo, I believe this would be a grave mistake that would postpone us from making bigger long term improvements.
- This will cause some short term disruption, but can also be the catalyst for the community to react and improve. Iamcarbon (talk) 05:26, 24 October 2024 (UTC)[reply]
- It's also occurred to me, that any lost search rankings may be regained once we delete duplicate the duplicated human name labels. Iamcarbon (talk) 16:06, 24 October 2024 (UTC)[reply]
- Until the prioritization of the free-word search engine is improved, I will not support the launch of the bot, but after that I see no obstacle. I think - on other discussion pages - we have both already written all our arguments. Deletion of tags currently results in a decrease in the distribution of certain data types, an increase in the number of duplicate elements (which we will not or will have difficulty noticing), Even if I do not consider this to be the most optimal solution for reducing the size of the database, I fundamentally support the "mul" project.
- But only with the reservation that we do everything we can to neutralize the negative effects before starting it. As it stands now, I think launching this bot will do more harm than good. Pallor (talk) 10:17, 25 October 2024 (UTC)[reply]
- I have withdrawn the proposal to delete duplicate aliases. Iamcarbon (talk) 20:43, 25 October 2024 (UTC)[reply]
- Oppose as per Pallor. First we fix the search engine so that deleting labels does not affect results negatively and then we can start removing labels. --So9q (talk) 18:48, 25 October 2024 (UTC)[reply]
- What exactly needs to be fixed with the search engine? Iamcarbon (talk) 19:07, 25 October 2024 (UTC)[reply]
- I have withdrawn the proposal to delete duplicate aliases. Iamcarbon (talk) 20:43, 25 October 2024 (UTC)[reply]
Question As we didn't gain consensus for theses task, does any one know if there is there a way to withdraw this proposal, and propose a new primary task? It appears that our bot submission request process requests that the initial bot request be approved, before additional tasks can be requested.
- I can close this one as withdrawn, and you can then open another request.--Ymblanter (talk) 20:15, 2 November 2024 (UTC)[reply]
- @Ymblanter Yes, please close this as withdrawn. Thank you for your help. It is much appreciated. Iamcarbon (talk) 04:32, 3 November 2024 (UTC)[reply]