-
Notifications
You must be signed in to change notification settings - Fork 333
pgml sdk examples #669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgml sdk examples #669
Conversation
@@ -777,5 +777,8 @@ def vector_search( | |||
|
|||
search_results = run_select_statement(conn, cte_select_statement) | |||
self.pool.putconn(conn) | |||
|
|||
# Sort the list of dictionaries based on the 'score' key in descending order | |||
search_results = sorted(search_results, key=lambda x: x['score'], reverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's already done in SQL above. Postgres is much better at sorting than Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I convert this to JSON using SQL that returns unsorted array. Is there a way to return a sorted dict using SQL?
SELECT array_to_json(array_agg(row_to_json(t)))
FROM ({select_statement}) t;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
select array(select unnest(array[1.0, 2.0, 3.0]) order by 1 desc);
https://stackoverflow.com/questions/2913368/sorting-array-elements
context = " ".join(results[0]["chunk"].strip().split()) | ||
context = context.replace('"', '\\"').replace("'", "''") | ||
|
||
select_statement = """SELECT pgml.transform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a great future iteration for the pgml.transform
call, would be to have collection.vector_search()
return a query object rather than a result, and then pass that query object as "context" to a new python function question_answering
api. That API should then build the context as a sub-select inside the question-answering, so the documents never leave the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the scripts by setting LOGLEVEL=INFO
writes the query to the terminal. We could have a dry_run
option that outputs a query instead of result.
LOGLEVEL=INFO python examples/question_answering.py
Details on examples here
Also added a roadmap section to main readme