Content-Length: 406192 | pFad | http://github.com/internetarchive/openlibrary/pull/10460

52 Fix issue #2723: Improve list search functionality by isabellabonilla · Pull Request #10460 · internetarchive/openlibrary · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #2723: Improve list search functionality #10460

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

isabellabonilla
Copy link

Closes #2723

Fix: Improves list search functionality to allow partial matches in list titles.

Technical

  • Added a new ListSearchScheme to leverage Solr for list searches.
  • Updated /search/lists controller to use the new Solr-backed search.
  • Refactored the template (lists.html) to align with patterns used in other templates like authors.html and subjects.html to display improved search results.
  • Added pagination support to /search/lists, allowing users to navigate through large sets of search results.
  • Implemented error handling for list searches, displaying errors when they occur.
  • Changes were made in:
    • openlibrary/plugins/worksearch/code.py
    • openlibrary/plugins/worksearch/schemes/lists.py (new file)
    • openlibrary/templates/search/lists.html

Testing

  1. Navigate to /search/lists.
  2. Search for a term that is part of a list title (e.g., "banned").
  3. Verify that all lists containing the search term in their titles are displayed, not just exact matches.

Stakeholders

@cdrini
@seabelis
@el4ctr0n

@github-actions github-actions bot added the Priority: 2 Important, as time permits. [managed] label Feb 17, 2025
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome great work @isabellabonilla ! This was a trickier issue, nice work getting all the difference pieces implemented and playing together -- and as a first time contributor! 😊

A few pieces of feedback to fix some issues with wrong field names, and allowing for backwards compatibility of the public search API 👍

'key', # unique identifier for the list
'name', # name/title of the list
'description', # short description of the list
'created', # timestamp when the list was created
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field unfortunately doesn't exist in our solr, so we can't use it.

Suggested change
'created', # timestamp when the list was created

all_fields = {
'key', # unique identifier for the list
'name', # name/title of the list
'description', # short description of the list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto with this one, but this does exist in our db, so you can add it to the non_solr_fields so that it can be fetched for reading.

Comment on lines +26 to +27
'created desc': 'created desc', # sort by newest lists first (default)
'created asc': 'created asc', # sort by oldest lists first
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here, these fields don't exist in our solr.

Suggested change
'created desc': 'created desc', # sort by newest lists first (default)
'created asc': 'created asc', # sort by oldest lists first

'name', # name/title of the list
'description', # short description of the list
'created', # timestamp when the list was created
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a few other keys that will be available for search:

Suggested change
}
"subject",
"subject_key",
"person",
"person_key",
"place",
"place_key",
"time",
"time_key",
}

Comment on lines +39 to +40
'description',
'created',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'description',
'created',

Comment on lines +643 to +644
fields=fields,
sort="created_desc", # default sorting with most recently created lists first
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fields=fields,
sort="created_desc", # default sorting with most recently created lists first


web.header('Content-Type', 'application/json')
return delegate.RawText(json.dumps(response))
return delegate.RawText(json.dumps(raw_resp))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is going to be a breaking change to our public lists search API... we'll likely have to have both for a little while. To do this, let's add this code:

Suggested change
return delegate.RawText(json.dumps(raw_resp))
if i.api == 'next':
return delegate.RawText(json.dumps(raw_resp))
else:
# Default to the old API shape for a while, then we'll flip
return delegate.RawText(json.dumps({
'start': offset,
'docs': [
lst.preview()
for lst in web.ctx.site.get_many(doc['key'] for doc in raw_resp['docs'])
]
})

This will let us use the better solr-based search, while maintaining the same shape of response.

"limit": int(limit),
"offset": int(offset),
}
def get_results(elf, q, offset=0, limit=100, fields='*', sort=''):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to avoid defaulting to * now since that results in large responses ; I reckon you got this from the authors page which we have yet to fix 😁 Defaulting it to None should do the trick 👍 The down stream code should then just the default fields from the Scheme.

Suggested change
def get_results(elf, q, offset=0, limit=100, fields='*', sort=''):
def get_results(elf, q, offset=0, limit=100, fields=None, sort=''):


docs = self.get_results(i.q, offset=offset, limit=limit)
response = self.get_results(i.q, offset=offset, limit=limit)
Copy link
Collaborator

@cdrini cdrini Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add the sort=i.sort and fields=i.fields.split(',') if i.fields else None so we can test those are working correctly.

@cdrini cdrini added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] Priority: 2 Important, as time permits. [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

List search only finds exact matches: Switch to use Solr
2 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/internetarchive/openlibrary/pull/10460

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy