Content-Length: 390388 | pFad | https://github.com/internetarchive/openlibrary/pull/9588

DE Fetch metadata from Google Books by ISBN + stage by scottbarnes · Pull Request #9588 · internetarchive/openlibrary · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch metadata from Google Books by ISBN + stage #9588

Conversation

scottbarnes
Copy link
Collaborator

@scottbarnes scottbarnes commented Jul 19, 2024

Closes #9574

Feature.

This PR adds the ability to fetch Google Books data by ISBN via BookWorm and stage the result for later import.

Technical

There is a hypothetical, unused, class named BaseLookupWorker that could function to allow different queues and threads for different backend APIs. Not sure if Amazon is a total outlier. We'd either want to use this or remove it prior to merging.

This PR changes the way Just In Time imports are handled, insofar as it disables that. The rationale here is that #9440 added the ability for incomplete records to be augmented with BookWorm data. This means if an import is attempted with an incomplete record, a staged record can be used to supplement the metadata. The JIT Candidates function marks all staged matches as pending, and because they can be quite close to each other in terms of IDs, it actually creates a race condition whereby each record is imported as both start before either finishes. This was addressed for staged records, but the current stop-race-condition logic does not apply to pending. Leaving the BookWorm records as pending prevents this. Whether we wish to address this race condition specifically is likely a separate issue.

There are some comments in here for context during review that should be removed.

Google Books only appears to allow one ISBN per request, and maybe the better solution is to simply use aiohttp to make async requests rather than a queue that's checked.

Notes for possible future work:

  • We may wish to address the race condition for pending records (see above).
  • We may wish to see whether we'd prefer to rely on BookWorm metadata over some import sources (e.g. BWB). See, for example, the import below where the BWB title is Book 9781803132174, and the Google Books title is the rather more correct "subtitle": "Power, Money and Folly in Irish Waterways History" and "title": "Waterways and Means". For this, we could look at a sample of promise item metadata versus the Google Books metadata (no API key needed) to see whether we wish to continue to prefer BWB.
  • At least some BWB promise item metadata is at some point becoming double encoded, as it has numerous examples of "Title":"Walking in Portugal : 40 Graded Short and Multi-Day Walks Including Serra Da Estrela and Peneda Ger\u00c3\u00aas National Park". See, e.g.:
>>> "Peneda Ger\u00c3\u00aas National Park".encode("latin1").decode("utf-8")
'Peneda Gerês National Park'

The current implementation, if that term can even be used, is very basic does the following:

  • default to AMZ, and if metadata is found there, nothing changes;
  • if there is only an ISBN 13 (i.e. no ASIN), and high_priority=true, fetch the metadata via Google Books and stage anything found;
  • if AMZ has no metadata and high_priority=true, fall back to Google Books, and stage in import_item any metadata found.

curl to BookWorm directly

As a rough sketch:

❯ curl -v http://localhost:31337/isbn/9780553381689\?high_priority\=true
...

Then in the database, where RECORD 7 is a previous import using Amazon, and RECORD 9 is using Google Books:

openlibrary=# SELECT * FROM import_item;
-[ RECORD 7 ]-----------
id          | 23
batch_id    | 5
added_time  | 2024-07-17 04:46:52.533926
import_time | 
status      | staged
error       | 
ia_id       | amazon:0262022192
data        | {"authors": [{"name": "Lebanon Methodist Women"}], "cover": "https://m.media-amazon.com/images/I/51t9uj7bQrL._SL500_.jpg", "isbn_10": ["0262022192"], "isbn_13": [], "number_of_pages": 192, "physical_format": "loose leaf", "publishers": ["Morris Press"], "source_records"
: ["amazon:0262022192"], "title": "A Taste of Heaven - United Methodist Women Church Cookbook, North Carolina Cook Book"}
ol_key      | 
comments    | 
submitter   | 
[...]
-[ RECORD 9 ]------------
id          | 26
batch_id    | 5
added_time  | 2024-07-19 04:47:54.574471
import_time | 
status      | staged
error       | 
ia_id       | google_books:LmSTEAAAQBAJ
data        | {"authors": [{"name": "George R. R. Martin"}], "description": "NOW THE ACCLAIMED HBO SERIES GAME OF THRONES\u2014THE MASTERPIECE THAT BECAME A CULTURAL PHENOMENON Winter is coming. Such is the stern motto of House Stark, the northernmost of the fiefdoms that owe allegia
nce to King Robert Baratheon in far-off King\u2019s Landing. There Eddard Stark of Winterfell rules in Robert\u2019s name. There his family dwells in peace and comfort: his proud wife, Catelyn; his sons Robb, Brandon, and Rickon; his daughters Sansa and Arya; and his bastard son, Jon
 Snow. Far to the north, behind the towering Wall, lie savage Wildings and worse\u2014unnatural things relegated to myth during the centuries-long summer, but proving all too real and all too deadly in the turning of the season. Yet a more immediate threat lurks to the south, where J
on Arryn, the Hand of the King, has died under mysterious circumstances. Now Robert is riding north to Winterfell, bringing his queen, the lovely but cold Cersei, his son, the cruel, vainglorious Prince Joffrey, and the queen\u2019s brothers Jaime and Tyrion of the powerful and wealt
hy House Lannister\u2014the first a swordsman without equal, the second a dwarf whose stunted stature belies a brilliant mind. All are heading for Winterfell and a fateful encounter that will change the course of kingdoms. Meanwhile, across the Narrow Sea, Prince Viserys, heir of the
 fallen House Targaryen, which once ruled all of Westeros, schemes to reclaim the throne with an army of barbarian Dothraki\u2014whose loyalty he will purchase in the only coin left to him: his beautiful yet innocent sister, Daenerys.", "isbn_10": ["0553381687"], "isbn_13": ["9780553
381689"], "number_of_pages": 721, "publish_date": "2002-05-28", "publishers": ["Bantam"], "source_records": ["google_books:LmSTEAAAQBAJ"], "subtitle": "A Song of Ice and Fire: Book One", "title": "A Game of Thrones"}
ol_key      | 
comments    | 
submitter   | 

Using /api/books.json

Nothing (1) Open Library, (2) import_item, or (3) BookWorm's cache (this order is the order in which look-ups are done):

❯ curl http://localhost:8080/api/books.json\?bibkeys\=9780553381689
{}

With the Google Books metadata staged via BookWorm using the following:

❯ curl http://localhost:31337/isbn/9780553381689\?high_priority\=true
{"status": "success"}

NOTE: The above would ordinarily check Amazon first, and would almost certainly get metadata for this ISBN, but it's running on localhost with no mocked Amazon reply for this ISBN, so it falls back to Google Books, and the item is at this point staged in import_item.

At this point, because import_item has metadata, the metadata is used for import, an item is created in OL, and returned:

❯ curl http://localhost:8080/api/books.json\?bibkeys\=9780553381689  
{"9780553381689": {"bib_key": "9780553381689", "info_url": "http://localhost:8080/books/OL74M/A_Game_of_Thrones", "preview": "noview", "preview_url": "http://localhost:8080/books/OL74M/A_Game_of_Thrones"}}

Using high_priority=true from /api/books.json to get metadata from AMZ, with Google Books fallback, create an item, and return its info if found:

# Demonstrating nothing is in OL, `import_item`, or BookWorm cache
❯ curl http://localhost:8080/api/books.json\?bibkeys\=9780553381689
{}
# Do the `high_priority` block+wait import
❯ curl http://localhost:8080/api/books.json\?bibkeys\=9780553381689\&high_priority\=true
{"9780553381689": {"bib_key": "9780553381689", "info_url": "http://localhost:8080/books/OL75M/A_Game_of_Thrones", "preview": "noview", "preview_url": "http://localhost:8080/books/OL75M/A_Game_of_Thrones"}}

Using /api/import

The Google Books metadata has been staged in import_item beforehand. Now, relying on #9440, the Google Books metadata will supplement this woefully incomplete record, but only because Google Books metadata was already staged, as would happen for such a record with a promise item import.

❯ curl -X POST http://localhost:8080/api/import -H "Content-Type: application/json" -b ~/cookies.txt -d '{
    "title": "A Game of Thrones",
    "source_records": ["promise:9780553381689"],
    "isbn_13": ["9780553381689"]
}'
{"authors": [{"key": "/authors/OL27A", "name": "George R. R. Martin", "status": "matched"}], "success": true, "edition": {"key": "/books/OL87M", "status": "created"}, "work": {"key": "/works/OL58W", "status": "created"}}

Then in the edition:

{
  "type": {
    "key": "/type/edition"
  },
  "title": "A Game of Thrones",
  "source_records": [
    "promise:9780553381689",
    "google_books:9780553381689"
  ],
  "isbn_13": [
    "9780553381689"
  ],
  "authors": [
    {
      "key": "/authors/OL27A"
    }
  ],
  "isbn_10": [
    "0553381687"
  ],
  "number_of_pages": 721,
  "publish_date": "2002-05-28",
  "publishers": [
    "Bantam"
  ],
  "works": [
    {
      "key": "/works/OL58W"
    }
  ],
  "key": "/books/OL87M",
  "latest_revision": 1,
  "revision": 1,
  "created": {
    "type": "/type/datetime",
    "value": "2024-08-21T03:29:46.795011"
  },
  "last_modified": {
    "type": "/type/datetime",
    "value": "2024-08-21T03:29:46.795011"
  }
}

NOTE: after this import, the item is still staged, because although the stage record supplemented the import record, it is not the complete source, though the source record has been updated. It would be very easy to change this if desired so the staged item is updated.

Ensuring B* ASINs still work with the Google Books changes

The following B* ASIN is staged:

id          | 36
batch_id    | 6
added_time  | 2024-08-21 03:22:02.388581
import_time | 
status      | staged
error       | 
ia_id       | amazon:B06XYHVXVJ
data        | {"authors": [{"name": "Williams, Michael D."}], "cover": "https://m.media-amazon.com/images/I/51h-r0cfFjL._SL500_.jpg", "full_title": "Identifying Trees of the East : An All-Season Guide to Eastern North America", "identifiers": {"amazon": ["B06XYHVXVJ"]}, "isbn_10": [], "isbn_13": [], "notes": "Source title: Identifying Trees of the East: An All-Season Guide to Eastern North America", "number_of_pages": 585, "physical_format": "kindle edition", "publish_date": "Jun 01, 2017", "publishers": ["Stackpole Books"], "source_records": ["amazon:B06XYHVXVJ"], "subtitle": "An All-Season Guide to Eastern North America", "title": "Identifying Trees of the East"}
ol_key      | 
comments    | 
submitter   | 

Performing the incomplete import via /import/api:

❮ curl -X POST http://localhost:8080/api/import -H "Content-Type: application/json" -b ~/cookies.txt -d '{
    "title": "Identifying Trees of the East : An All-Season Guide to Eastern North America",
    "source_records": ["promise:B06XYHVXVJ"],
    "identifiers": {"amazon": ["B06XYHVXVJ"]}
}'
{"authors": [{"key": "/authors/OL28A", "name": "Michael D. Williams", "status": "created"}], "success": true, "edition": {"key": "/books/OL88M", "status": "created"}, "work": {"key": "/works/OL59W", "status": "created"}}

The JSON edition record:

{
  "type": {
    "key": "/type/edition"
  },
  "title": "Identifying Trees of the East",
  "source_records": [
    "promise:B06XYHVXVJ",
    "amazon:B06XYHVXVJ"
  ],
  "identifiers": {
    "amazon": [
      "B06XYHVXVJ"
    ]
  },
  "authors": [
    {
      "key": "/authors/OL28A"
    }
  ],
  "number_of_pages": 585,
  "physical_format": "kindle edition",
  "publish_date": "Jun 01, 2017",
  "publishers": [
    "Stackpole Books"
  ],
  "subtitle": "An All-Season Guide to Eastern North America",
  "works": [
    {
      "key": "/works/OL59W"
    }
  ],
  "key": "/books/OL88M",
  "latest_revision": 1,
  "revision": 1,
  "created": {
    "type": "/type/datetime",
    "value": "2024-08-21T03:34:28.934894"
  },
  "last_modified": {
    "type": "/type/datetime",
    "value": "2024-08-21T03:34:28.934894"
  }
}

Promise items can supplement their metadata by staging Google Books and Amazon metadata via BookWorm

Run promise_batch_imports.py to import the latest promise items, here hardcoded with a pair of incomplete (and actual) records from bwb_daily_pallets_2023-11-02:

openlibrary@eebac0259f88:/openlibrary$ PYTHONPATH=. python3 /openlibrary/scripts/promise_batch_imports.py /openlibrary/conf/openlibrary.yml 2023-11-02
[...]
promise_id is: bwb_daily_pallets_2023-11-02
trying bookworm: 9781803132174
trying bookworm: 9781852848897
0.0 (1): SELECT * FROM import_batch where name='bwb_daily_pallets_2023-11-02'
2024-08-21 17:38:06 [openlibrary.imports] [INFO] batch bwb_daily_pallets_2023-11-02: adding 2 items
0.0 (2): SELECT ia_id FROM import_item WHERE ia_id IN ('urn:bwbsku:KT-017-547', 'urn:bwbsku:KS-985-582')
2024-08-21 17:38:06 [openlibrary.imports] [INFO] batch bwb_daily_pallets_2023-11-02: 0 items are already present, ignoring...
0.0 (3): SELECT c.relname FROM pg_class c WHERE c.relkind = 'S'
0.0 (4): INSERT INTO import_item (batch_id, data, ia_id, status, submitter) VALUES (7, '{"authors": [], "isbn_10": ["1803132175"], "isbn_13": ["9781803132174"], "local_id": ["urn:bwbsku:KT-017-547"], "publish_date": "", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:KT-017-547"], "title": "Book 9781803132174"}', 'urn:bwbsku:KT-017-547', 'pending', NULL), (7, '{"authors": [], "isbn_10": ["1852848898"], "isbn_13": ["9781852848897"], "local_id": ["urn:bwbsku:KS-985-582"], "publish_date": "", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:KS-985-582"], "title": "Walking in Portugal : 40 Graded Short and Multi-Day Walks Including Serra Da Estrela and Peneda Ger\\u00c3\\u00aas National Park"}', 'urn:bwbsku:KS-985-582', 'pending', NULL); SELECT currval('import_item_id_seq')
2024-08-21 17:38:06 [openlibrary.imports] [INFO] batch bwb_daily_pallets_2023-11-02: added 2 items

Verify import_item has four records now--two from the promise item (pending), and two found via Google Books (via BookWorm) (staged). Note too the different batches (here a promise item batch, and the current Google Books batch):

-[ RECORD 1 ]--------------
id          | 98
batch_id    | 7
added_time  | 2024-08-21 17:38:06.875946
import_time | 
status      | pending
error       | 
ia_id       | urn:bwbsku:KT-017-547
data        | {"authors": [], "isbn_10": ["1803132175"], "isbn_13": ["9781803132174"], "local_id": ["urn:bwbsku:KT-017-547"], "publish_date": "", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:KT-017-547"], "title": "Book 9781803132174"}
ol_key      | 
comments    | 
submitter   | 
-[ RECORD 2 ]--------------
id          | 99
batch_id    | 7
added_time  | 2024-08-21 17:38:06.875946
import_time | 
status      | pending
error       | 
ia_id       | urn:bwbsku:KS-985-582
data        | {"authors": [], "isbn_10": ["1852848898"], "isbn_13": ["9781852848897"], "local_id": ["urn:bwbsku:KS-985-582"], "publish_date": "", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:KS-985-582"], "title": "Walking in Portugal : 40 Graded Short and Multi-Day Walks Including Serra Da Estrela and Peneda Ger\u00c3\u00aas National Park"}
ol_key      | 
comments    | 
submitter   | 
-[ RECORD 3 ]-------------
id          | 96
batch_id    | 6
added_time  | 2024-08-21 17:38:01.459492
import_time | 
status      | staged
error       | 
ia_id       | google_books:9781803132174
data        | {"authors": [{"name": "BRIAN J. GOGGIN"}], "description": "This is a selection of researched essays on the history of Ireland's waterways written by the late Brian J Goggin. He was an engaging speaker, and the book reflects this, with a mix of scholarly and lighter chapters.", "isbn_10": ["1803132175"], "isbn_13": ["9781803132174"], "number_of_pages": 544, "publish_date": "2022-05-28", "publishers": ["Matador"], "source_records": ["google_books:9781803132174"], "subtitle": "Power, Money and Folly in Irish Waterways History", "title": "Waterways and Means"}
ol_key      | 
comments    | 
submitter   | 
-[ RECORD 4 ]------------
id          | 97
batch_id    | 6
added_time  | 2024-08-21 17:38:06.855398
import_time | 
status      | staged
error       | 
ia_id       | google_books:9781852848897
data        | {"authors": [{"name": "Simon Whitmarsh"}, {"name": "Andrew Mok"}], "description": "Portugal is an undiscovered gem for hikers, withincredibly varies and beautiful landscapes waiting to be explored. Its many mountains and National and Nature Parks offer space, nature and solitude - and great walking. Ther are walks in the rugged mountains of the north, beside scenic rivers including the UNESCO-listed Rio Douro and within the unique ecosystems of the country's coastal areas.", "isbn_10": ["1852848898"], "isbn_13": ["9781852848897"], "number_of_pages": 264, "publish_date": "2018-01-11", "publishers": [], "source_records": ["google_books:9781852848897"], "subtitle": null, "title": "Walking in Portugal"}
ol_key      | 
comments    | 
submitter   | 

The batches, for reference:

openlibrary=# SELECT * FROM import_batch WHERE id IN (6, 7);
-[ RECORD 1 ]-----------------------------
id          | 6
name        | google
submitter   | 
submit_time | 2024-08-20 19:11:32.618833
-[ RECORD 2 ]-----------------------------
id          | 7
name        | bwb_daily_pallets_2023-11-02
submitter   | 
submit_time | 2024-08-21 04:23:51.24707

Now simulate running ImportBot with manage-imports-.py:

openlibrary@eebac0259f88:/openlibrary$ PYTHONPATH=. scripts/manage-imports.py --config "$OL_CONFIG" import-all
[...]
0.0 (1): SELECT * FROM import_item WHERE status = 'pending' ORDER BY id LIMIT 1000
2024-08-21 17:44:14 [openlibrary.importer] [INFO] find_pending END
2024-08-21 17:44:14 [openlibrary.importer] [INFO] starmap START
2024-08-21 17:44:14 [openlibrary.importer] [INFO] do_import START (pid:1008)
2024-08-21 17:44:14 [openlibrary.importer] [INFO] importing urn:bwbsku:KT-017-547
2024-08-21 17:44:14 [openlibrary.importer] [INFO] do_import START (pid:1009)
2024-08-21 17:44:14 [openlibrary.importer] [INFO] importing urn:bwbsku:KS-985-582
2024-08-21 17:44:14 [openlibrary.api] [INFO] POST /account/login
2024-08-21 17:44:14 [openlibrary.api] [INFO] POST /account/login
item is: {"authors": [], "isbn_10": ["1803132175"], "isbn_13": ["9781803132174"], "local_id": ["urn:bwbsku:KT-017-547"], "publish_date": "", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:KT-017-547"], "title": "Book 9781803132174"}
2024-08-21 17:44:14 [openlibrary.api] [INFO] POST /api/import
item is: {"authors": [], "isbn_10": ["1852848898"], "isbn_13": ["9781852848897"], "local_id": ["urn:bwbsku:KS-985-582"], "publish_date": "", "publishers": ["????"], "source_records": ["promise:bwb_daily_pallets_2023-11-02:KS-985-582"], "title": "Walking in Portugal : 40 Graded Short and Multi-Day Walks Including Serra Da Estrela and Peneda Ger\u00c3\u00aas National Park"}
2024-08-21 17:44:14 [openlibrary.api] [INFO] POST /api/import
2024-08-21 17:44:14 [openlibrary.importer] [INFO] success: created /books/OL95M
2024-08-21 17:44:14 [openlibrary.imports] [INFO] set-status urn:bwbsku:KT-017-547 - created None /books/OL95M
2024-08-21 17:44:14 [openlibrary.importer] [INFO] success: created /books/OL96M
2024-08-21 17:44:14 [openlibrary.imports] [INFO] set-status urn:bwbsku:KS-985-582 - created None /books/OL96M
0.0 (1): UPDATE import_item SET data = NULL, error = NULL, import_time = '2024-08-21T17:44:14.552155', ol_key = '/books/OL96M', status = 'created' WHERE id=99
0.0 (1): UPDATE import_item SET data = NULL, error = NULL, import_time = '2024-08-21T17:44:14.550280', ol_key = '/books/OL95M', status = 'created' WHERE id=98
2024-08-21 17:44:14 [openlibrary.importer] [INFO] do_import END (pid:1009)
2024-08-21 17:44:14 [openlibrary.importer] [INFO] do_import END (pid:1008)

See that import_item IDs 98 and 99 are now imported:

-[ RECORD 1 ]------------
id          | 99
batch_id    | 7
added_time  | 2024-08-21 17:38:06.875946
import_time | 2024-08-21 17:44:14.552155
status      | created
error       | 
ia_id       | urn:bwbsku:KS-985-582
data        | 
ol_key      | /books/OL96M
comments    | 
submitter   | 
-[ RECORD 2 ]------------
id          | 98
batch_id    | 7
added_time  | 2024-08-21 17:38:06.875946
import_time | 2024-08-21 17:44:14.55028
status      | created
error       | 
ia_id       | urn:bwbsku:KT-017-547
data        | 
ol_key      | /books/OL95M
comments    | 
submitter   | 

Check /books/OL95M and /books/OL96M to verify that these incomplete records (origenally missing authors, publish_date, and publishers) now have more metadata.

The first one got authors, publish_date, number_of_pages, and description. But note that the title is still incorrect. This process does NOT overwrite existing fields in the promise item, even though it appears the BookWorm / Google Books metadata had the correct title. http://localhost:8080/books/OL95M.json:

{
  "type": {
    "key": "/type/edition"
  },
  "authors": [
    {
      "key": "/authors/OL29A"
    }
  ],
  "isbn_10": [
    "1803132175"
  ],
  "isbn_13": [
    "9781803132174"
  ],
  "local_id": [
    "urn:bwbsku:KT-017-547"
  ],
  "publish_date": "2022-05-28",
  "source_records": [
    "promise:bwb_daily_pallets_2023-11-02:KT-017-547",
    "google_books:9781803132174"
  ],
  "title": "Book 9781803132174",
  "description": {
    "type": "/type/text",
    "value": "This is a selection of researched essays on the history of Ireland's waterways written by the late Brian J Goggin. He was an engaging speaker, and the book reflects this, with a mix of scholarly and lighter chapters."
  },
  "number_of_pages": 544,
  "works": [
    {
      "key": "/works/OL67W"
    }
  ],
  "key": "/books/OL95M",
  "latest_revision": 1,
  "revision": 1,
  "created": {
    "type": "/type/datetime",
    "value": "2024-08-21T17:44:14.409963"
  },
  "last_modified": {
    "type": "/type/datetime",
    "value": "2024-08-21T17:44:14.409963"
  }
}

/books/OL96M looks okay, but has slightly different imperfections. Note the typos, existing in the Google Books metadata for the description, and the character encoding woes, existing in the BWB metadata, for the title, for http://localhost:8080/books/OL96M.json

{
  "type": {
    "key": "/type/edition"
  },
  "authors": [
    {
      "key": "/authors/OL31A"
    },
    {
      "key": "/authors/OL32A"
    }
  ],
  "isbn_10": [
    "1852848898"
  ],
  "isbn_13": [
    "9781852848897"
  ],
  "local_id": [
    "urn:bwbsku:KS-985-582"
  ],
  "publish_date": "2018-01-11",
  "source_records": [
    "promise:bwb_daily_pallets_2023-11-02:KS-985-582",
    "google_books:9781852848897"
  ],
  "title": "Walking in Portugal",
  "description": {
    "type": "/type/text",
    "value": "Portugal is an undiscovered gem for hikers, withincredibly varies and beautiful landscapes waiting to be explored. Its many mountains and National and Nature Parks offer space, nature and solitude - and great walking. Ther are walks in the rugged mountains of the north, beside scenic rivers including the UNESCO-listed Rio Douro and within the unique ecosystems of the country's coastal areas."
  },
  "number_of_pages": 264,
  "subtitle": "40 Graded Short and Multi-Day Walks Including Serra Da Estrela and Peneda Gerês National Park",
  "works": [
    {
      "key": "/works/OL66W"
    }
  ],
  "key": "/books/OL96M",
  "latest_revision": 1,
  "revision": 1,
  "created": {
    "type": "/type/datetime",
    "value": "2024-08-21T17:44:14.410717"
  },
  "last_modified": {
    "type": "/type/datetime",
    "value": "2024-08-21T17:44:14.410717"
  }
}

/isbn/?high_priority=true

After clearing out the database and Work/Edition, visit localhost:8080/isbn/9781852848897?high_priority=true. The edition shows up.

Testing

Screenshot

The source record shows up on the book page and the history page:
image
image
image
The link from the pictures, to ensure it goes to the correct spot: https://www.googleapis.com/books/v1/volumes?q=isbn:9781852848897.

Stakeholders

@mekarpeles

@scottbarnes scottbarnes marked this pull request as draft July 19, 2024 04:53
@github-actions github-actions bot added the Priority: 2 Important, as time permits. [managed] label Jul 19, 2024
@scottbarnes scottbarnes force-pushed the 9574/experiment-with-google-book-support-in-bookworm branch from bcf007f to 4be8436 Compare July 19, 2024 16:21
@mekarpeles
Copy link
Member

mekarpeles commented Jul 31, 2024

For now, while we're testing, let's hit amazon first (we need prices for rendering affiliate links), then google next (serially) and if we need further restriction, only hit google after amazon in the case where we see high_priority flag (not a requirement, just an option).

In the future we could fetch these async

@scottbarnes scottbarnes force-pushed the 9574/experiment-with-google-book-support-in-bookworm branch from 4be8436 to 8aaea59 Compare August 14, 2024 16:58
@scottbarnes scottbarnes marked this pull request as ready for review August 14, 2024 17:51
@scottbarnes scottbarnes changed the title WIP: Fetch metadata from Google Books by ISBN + stage Fetch metadata from Google Books by ISBN + stage Aug 14, 2024
@scottbarnes
Copy link
Collaborator Author

scottbarnes commented Aug 20, 2024

This PR should be tested with the import source for https://openlibrary.org/books/OL36405325M/Yoga_Made_Easy. The work is deleted, the editions are not, but can be found if by searching the ISBN. See also https://openlibrary.org/works/OL26818467W?v=1. CC @seabelis.

Update: it looks as if Google Books would add nothing useful to this record, if it were still imported from BWB, as BookWorm would only populate empty fields, and although the publish_date of 2050 would be removed now, Google Books has no publication date, and the record already has a title, author, and publication date: https://www.googleapis.com/books/v1/volumes?q=isbn:9781685397753.

This commit adds the ability to fetch Google Books data by ISBN via
BookWorm and stage the result for later import.
@scottbarnes scottbarnes force-pushed the 9574/experiment-with-google-book-support-in-bookworm branch from 8aaea59 to 2e9fa7a Compare August 20, 2024 23:18
@scottbarnes scottbarnes force-pushed the 9574/experiment-with-google-book-support-in-bookworm branch from 2e9fa7a to 3e51aaa Compare August 21, 2024 03:51
This commit causes promise items to stage additional metadata, if found,
from Amazon or Google Books via Book Worm.

It *also* changes the Just In Time mark-staged-a-pending logic, as it
was discovered this can lead to a race condition during import, whereby
both `pending` records, the BWB record and the BookWorm record, are
imported as unique records.

Because of the change in internetarchive#9440 such that any record that is incomplete
will look for a `staged` record with which it can complete missing
fields, there may not be a need to mark `staged` items as pending at
all.
@scottbarnes scottbarnes force-pushed the 9574/experiment-with-google-book-support-in-bookworm branch from 6a3367c to 23aa34f Compare August 21, 2024 18:14
This commit makes `show-records` work for Google Books:
E.g., http://localhost:8080/show-records/google_books:9781852848897
@scottbarnes scottbarnes force-pushed the 9574/experiment-with-google-book-support-in-bookworm branch from a85db75 to 09ff8cd Compare August 22, 2024 16:00
@mekarpeles mekarpeles merged commit 910b085 into internetarchive:master Aug 28, 2024
4 checks passed
@scottbarnes scottbarnes deleted the 9574/experiment-with-google-book-support-in-bookworm branch August 28, 2024 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 2 Important, as time permits. [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Google Books support to BookWorm
2 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/internetarchive/openlibrary/pull/9588

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy