[Foundation-l] Old Wikipedia backups discovered

FT2 ft2.wiki at gmail.com
Tue Dec 14 16:24:28 UTC 2010


Wow, Tim. Just wow!

Is it just me who sees NYT carrying a headline, "On eve of 10th anniversary,
WIkipedia developers turn up earliest records" ?

FT2



On Tue, Dec 14, 2010 at 3:54 PM, Tim Starling <tstarling at wikimedia.org>wrote:

> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


More information about the foundation-l mailing list
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy