John of Reading
This is John of Reading's talk page, where you can send him messages and comments. |
|
Archives: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28Auto-archiving period: 21 days |
Removing Template Assistance
editHi, I'm not an experienced editor here, though I did contribute significantly lately to the Zahran tribe page and would like you to review the authenticity of the template that reads "This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed." 144.86.34.230 (talk) 05:03, 24 May 2024 (UTC)
- Also, happy birthday! 144.86.34.230 (talk) 05:04, 24 May 2024 (UTC)
- Zahran tribe (edit | talk | history | protect | delete | links | watch | logs | views)
- Thank you for the birthday wishes - that's a few weeks ago now.
- Let's see. The tag was added in 2018 by Bradv (talk · contribs) when the article looked like this. Since then, yes, the article has changed substantially, and many new sources have been added. I'm going to remove the tag. -- John of Reading (talk) 10:48, 26 May 2024 (UTC)
Direct uses of Template:Infobox
editA decade(!) ago, you kindly created User:Pigsonthewing/Direct calls to Infobox. Please could you repeat that exercise (feel free to overwrite the original). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:32, 6 June 2024 (UTC)
- Doing... -- John of Reading (talk) 16:43, 6 June 2024 (UTC)
- @Pigsonthewing: Done - pretty speedy now I have the uncompressed dump on an SSD drive. At the bottom of User:Pigsonthewing/Direct calls to Infobox there's a short list of articles using redirects to {{Infobox}}. There aren't many redirects, and they aren't used much, so I looked through them all manually. I fixed Federal College of Agriculture, Akure. -- John of Reading (talk) 17:17, 6 June 2024 (UTC)
- Very helpful. Thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:50, 7 June 2024 (UTC)
Deletion policy
editHello, what is the deletion policy? Gdfctjmm (talk) 19:48, 24 July 2024 (UTC)
- @Gdfctjmm: You can read about Wikipedia's deletion policy at Wikipedia:Deletion policy. -- John of Reading (talk) 07:25, 3 August 2024 (UTC)
Always precious
editTen years ago, you were found precious. That's what you are, always. --Gerda Arendt (talk) 09:37, 2 August 2024 (UTC)
- @Gerda Arendt: How the time flies! Thank you. -- John of Reading (talk) 07:25, 3 August 2024 (UTC)
Wikipedia edits
editGood evening, I wanted to ask about a problem I'm having. In this account (MarianoMora23) I can move articles to the mainspace with no problem after barely making 10 edits. However in this SAME account (MarianoMora23) but in the SPANISH wikipedia I have more than 20 edits and still can't move from my sandbox to the mainspace. Any idea why? MarianoMora23 (talk) 03:46, 27 August 2024 (UTC)
- @MarianoMora23: Each version of Wikipedia sets its own rules. At es:Ayuda:Cómo cambiar el nombre de una página, it says you have to be autoconfirmed to move a page; at es:Wikipedia:Autoconfirmados, it says you have to make 50 edits to be autoconfirmed - as opposed to only 10 at the English-language Wikipedia. -- John of Reading (talk) 06:45, 27 August 2024 (UTC)
Hi John!
editYou like typofixing? I got tens of thousands of typos and I can't fix em all alone. Perhaps we can combine our forces? User:Polygnotus/typos. Polygnotus (talk) 16:21, 8 September 2024 (UTC)
- @Polygnotus: Interesting. I'm finding typos by running regular expressions on a database dump; how are you creating your work list? What's your false positive rate?
- I confess I'm so used to working with AWB and my 4000+ regular expressions that I'm unlikely to switch to a radically different method. -- John of Reading (talk) 16:47, 8 September 2024 (UTC)
- I take a list of the most frequently used words, create typos with a Levenshtein distance of 1, and check which occur in the dump. Then I do a bunch of filtering and I check which exist in the live version of Wikipedia.
- Which programming languages, if any, are you familiar with?
- We could use a custom AWB module in C# or perhaps just use some custom Selenium-based tool (which would be pretty damn similar, not radically different). Or perhaps a JWB-like interface on wiki. Haven't really decided how to approach that yet.
- I never really bothered to create stats of the amount of skips vs the amount of fixes but that is a good idea to have.
- I use a lot of regex to avoid typos that shouldn't be fixed, see User:Polygnotus/typo.js.
- I have at least 60.000 potential typos left to fix so it is probably worth it to create a decent tool for that.
- Polygnotus (talk) 17:14, 8 September 2024 (UTC)
- @Polygnotus: Languages? Assembler, BCPL, C, C++ - all unused for a decade, I'm afraid. But I've used regular expressions on a copy of User:Polygnotus/typos to extract the 3000+ article names and the alleged typos, and have begun an AWB run to detect those words in those articles. So far I've saved 23 edits and have skipped 25 other articles - not a bad hit rate, by my standards, so I'll press on with this over the next few days. "Gettig" is a surname; "protectin" is a kind of protein; Supremme de Luxe is a stage name; and so on. -- John of Reading (talk) 18:08, 8 September 2024 (UTC)
- Yeah that is 3489 typos and then we got 2800 here and 9300 there and 1200 here. When my Raspberry Pi is done I will have another ~60.000. The typos already have very similar regex ran on them as you saw in typo.js so much of the WONTFIX stuff has been filtered out already. Polygnotus (talk) 18:15, 8 September 2024 (UTC)
- @Polygnotus: Languages? Assembler, BCPL, C, C++ - all unused for a decade, I'm afraid. But I've used regular expressions on a copy of User:Polygnotus/typos to extract the 3000+ article names and the alleged typos, and have begun an AWB run to detect those words in those articles. So far I've saved 23 edits and have skipped 25 other articles - not a bad hit rate, by my standards, so I'll press on with this over the next few days. "Gettig" is a surname; "protectin" is a kind of protein; Supremme de Luxe is a stage name; and so on. -- John of Reading (talk) 18:08, 8 September 2024 (UTC)
- In an ideal world, AWB would accept lists in this format (christmas|chirstmas|My Christmas) as a list generator source. And AWB would contain code (very similar to typo.js) to not fix typos in certain situations. Do you know how we can get closer to that goal? WP:AWB lists some developers in the infobox. Polygnotus (talk) 18:44, 8 September 2024 (UTC)
- AWB has two checkboxes at the top left of the "Find & Replace" configuration, which aim to cover the "certain situations". I run with those turned off, though, so that I do fix errors in quotations, references, foreign-language text and so on - with appropriate care and checking. -- John of Reading (talk) 18:50, 8 September 2024 (UTC)
- I boldy created the WP:QUOTETYPO shortcut at some point and it hasn't been reverted yet. It doesn't really make sense to faithfully reproduce simple mistakes made by others when they are irrelevant and only distract imo. Your approach does affect the hitrate tho. Are there others who I should contact? I assume the 16789 typos above will keep you busy for a while but you know where to find me when you want more. Perhaps I should stick the lists in a subpage of WP:TYPO? I'll dive in the AWB code, thanks. Polygnotus (talk) 19:40, 8 September 2024 (UTC)
- Wikipedia:Quotations is marked as an essay; the authoritative guide is at MOS:QUOTE. Fortunately they say the same thing! I do fix typos in quotations if I think they are "insignificant" or are likely to have been copying errors. See User:John of Reading/Typo fixing with AutoWikiBrowser#Editing quotes, book titles and such like.
- If you post your links at Wikipedia talk:Typo Team you may attract more helpers. Oh, and are you aware of the Wikipedia:Typo Team/moss project? That's another attempt at co-ordinated checking using data-crunching techniques. -- John of Reading (talk) 20:14, 8 September 2024 (UTC)
- Thank you, redirect target improved. I combined typolist, typolist2 and typolist3 above (but not User:Polygnotus/typos, which you imported into AWB) into User:Polygnotus/Data/Typolist. If you want some, please delete them from the list so that its clear that they've been handled.
- I added Moss and the (code behind the) AWB checkboxes to my todolist, thanks again! Polygnotus (talk) 04:30, 9 September 2024 (UTC)
- I boldy created the WP:QUOTETYPO shortcut at some point and it hasn't been reverted yet. It doesn't really make sense to faithfully reproduce simple mistakes made by others when they are irrelevant and only distract imo. Your approach does affect the hitrate tho. Are there others who I should contact? I assume the 16789 typos above will keep you busy for a while but you know where to find me when you want more. Perhaps I should stick the lists in a subpage of WP:TYPO? I'll dive in the AWB code, thanks. Polygnotus (talk) 19:40, 8 September 2024 (UTC)
- AWB has two checkboxes at the top left of the "Find & Replace" configuration, which aim to cover the "certain situations". I run with those turned off, though, so that I do fix errors in quotations, references, foreign-language text and so on - with appropriate care and checking. -- John of Reading (talk) 18:50, 8 September 2024 (UTC)
- In an ideal world, AWB would accept lists in this format (christmas|chirstmas|My Christmas) as a list generator source. And AWB would contain code (very similar to typo.js) to not fix typos in certain situations. Do you know how we can get closer to that goal? WP:AWB lists some developers in the infobox. Polygnotus (talk) 18:44, 8 September 2024 (UTC)
- @Polygnotus: I've restarted the list after telling AWB not to sort the pages alphabetically, so I'm now processing them in the same order as they were listed in User:Polygnotus/typos. This makes it easier for me, as the fixes for the same target word turn up together, and perhaps for you, since you can compare my contribution list with the list I'm working from.
- Two of your "don't fix" tests aren't working correctly:
- In many cases the typo is embedded within a URL - example
mmiller
within Merle Miller - In some cases the typo is embedded within a file name - example
distribuion
within Lesser blue-eared starling. I exclude those by peeking ahead for a known image suffix -(?![ \(\)\.\,\;\-\'\"\+\&\%\w\d]*\.(?i:(?:gif|jpe?g|ogg|ogv|pdf|png|svg|tiff?|webm))\b)
- this regular expression isn't perfect, I know.
- -- John of Reading (talk) 07:26, 9 September 2024 (UTC)
- I make the lists with Java and then I use Javascript to actually make the edits. When I improved the url regex in Javascript I forgot to add it to the Java code as well. I had a bunch of ideas to improve my workflow so I am cooking up a fresh batch for you. Might take a while, even on a modern pc. Polygnotus (talk) 03:33, 10 September 2024 (UTC)
- Originally I used
((http|https)://)(www.)?[-a-z0-9@:%._\+~#?&//=]{2,256}\.[-a-z]{2,26}\b([-a-z0-9@:%._\+~#?&//=]*)
for URLs but a lot of them escaped the wrath of the regex. - I am considering using something like:
\b((?:https?://|www\.)(?:\S+(?::\S*)?@)?(?:(?:[0-9]{1,3}\.){3}[0-9]{1,3}|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[/?#]\S*)?\b)
- instead unless you have a better idea.
- For files I used:
- File:(.*?)(\\.|\\|)"
- Image:(.*?)\\.
- Category:(.*?)\\.
- and I haven't really decided how to improve on that. Not all of them have file extensions. Perhaps Commons Special:MediaStatistics and the local one can be used?
- My todolist is steadily growing. Polygnotus (talk) 03:41, 10 September 2024 (UTC)
- Are the URL regexes running with "ignore case" turned on? If not, the first URL regex fails to match the whole URL in the Merle Miller example because parts of it are uppercase.
- The filename in the Lesser blue-eared starling has no
File:
prefix because it is being used as an infobox parameter. To exclude those, you'll either have to look backwards forrange_map =
or similar, or look forwards for.png
or similar. -- John of Reading (talk) 07:01, 10 September 2024 (UTC)- I use
Pattern.CASE_INSENSITIVE
andPattern.UNICODE_CASE
. I have added range_map to the list of disallowed parameters. I am currently trying to figure out whether Ollama can help identify typos better than a coinflip. Polygnotus (talk) 07:47, 10 September 2024 (UTC)
- I use
Wikipedia:Talk page guidelines has an RfC for possible consensus. A discussion is taking place. If you would like to participate in the discussion, you are invited to add your comments on the discussion page. Thank you. Gnomingstuff (talk) 18:14, 16 October 2024 (UTC)