This task tracks a wish from the 2016 Community Wishlist Survey: https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/Wikisource#Upload_Wikisource_text_wizard
Original proposal
Upload Wikisource text wizard:
Problem: The text upload process is complex across many projects.
Who would benefit: Uploaders
Proposed solution: Create a wizard that includes the upload text process - search Internet Archive, use IA uploader to commons, set index at Wikisource to match Commons, adjust 'page offset' on index page.
Phabricator tickets: T49561: Book upload customization — to integrate book-specific features into UploadWizard
Proposer: Slowking4 02:46, 8 November 2016 (UTC)
Background
A common process for getting a work into Wikisource and ready for proofreading is as follows:
- Find the scan on the Internet Archive and download the DjVu version (which is no longer created by IA but still exists for older items)
- Upload the DjVu to Commons and populate the {{book}} template with the metadata
- Create a matching Index page on Wikisource and populate it with the metadata
- Set up the 'pagelist', which is a mapping of scan page numbers (i.e. starting from 1 for the cover of the book) to book page numbers (i.e. which can include independently-numbered sections such as frontmatter, and un-numbered pages)
- Create a Wikidata item, again with all the above metadata as well as: Wikisource index page (P1957), Internet Archive ID (P724), and scanned file on Wikimedia Commons (P996)
Possible solutions
- Extend the UploadWizard extension to specifically handle books (this is what T49561 is about, and was the subject of a GSoC project)
- Create a new MediaWiki extension
- Extend the Book Uploader Bot tool (c.f. T59813 which is about the creation of Bub, and still open)
- Create a new tool
Requirements
Starting from one of:
- a set of scan files,
- PDF file,
- DjVu file,
- Internet Archive identifier, or
- other online library identifier (maybe Bub negates this?)
we want to end up with:
- the original files uploaded to Commons (each with {{book}}) and the Internet Archive (from where there will be a link back to the Commons category, in a review if we're unable to edit the original item)
- a generated DjVu file on Commons, also with {{book}}
- a category on Commons for the above
- a Wikisource Index page (as an index to the DjVu on Commons)
- the pagelist on the Index page
- a Wikidata item linking to all of the above
Of course, many works will start with some of these resources in place already (e.g. we don't need to generate a DjVu file if it already exists on IA; or a Wikidata item exists and links to the IA but there's nothing on Commons or Wikisource yet) so the system needs to be able to work with partially-imported works.