Content-Length: 13802 | pFad | http://github.com/postgresml/postgresml/pull/953.patch
thub.com From 67300b2b66de54145bd84d4312ce49fd79905fa4 Mon Sep 17 00:00:00 2001 From: SilasMarvin <19626586+SilasMarvin@users.noreply.github.com> Date: Fri, 25 Aug 2023 12:54:43 -0700 Subject: [PATCH 1/2] Added new examples for JavaScript --- .../rust/pgml/javascript/examples/README.md | 14 ++-- .../examples/extractive_question_answering.js | 64 +++++++++++++++++++ .../examples/getting-started/README.md | 12 ---- .../{getting-started => }/package-lock.json | 0 .../{getting-started => }/package.json | 0 .../javascript/examples/question_answering.js | 55 ++++++++++++++++ .../examples/question_answering_instructor.js | 60 +++++++++++++++++ .../index.js => semantic_search.js} | 6 +- pgml-sdks/rust/pgml/python/examples/README.md | 2 +- 9 files changed, 195 insertions(+), 18 deletions(-) create mode 100644 pgml-sdks/rust/pgml/javascript/examples/extractive_question_answering.js delete mode 100644 pgml-sdks/rust/pgml/javascript/examples/getting-started/README.md rename pgml-sdks/rust/pgml/javascript/examples/{getting-started => }/package-lock.json (100%) rename pgml-sdks/rust/pgml/javascript/examples/{getting-started => }/package.json (100%) create mode 100644 pgml-sdks/rust/pgml/javascript/examples/question_answering.js create mode 100644 pgml-sdks/rust/pgml/javascript/examples/question_answering_instructor.js rename pgml-sdks/rust/pgml/javascript/examples/{getting-started/index.js => semantic_search.js} (90%) diff --git a/pgml-sdks/rust/pgml/javascript/examples/README.md b/pgml-sdks/rust/pgml/javascript/examples/README.md index 3c93410c5..440058e4f 100644 --- a/pgml-sdks/rust/pgml/javascript/examples/README.md +++ b/pgml-sdks/rust/pgml/javascript/examples/README.md @@ -1,7 +1,13 @@ -## Javascript Examples +## Examples -Here we have a set of examples of different use cases of the pgml javascript SDK. +### [Semantic Search](./semantic_search.js) +This is a basic example to perform semantic search on a collection of documents. Embeddings are created using `intfloat/e5-small` model. The results are semantically similar documemts to the query. Finally, the collection is archived. -## Examples: +### [Question Answering](./question_answering.js) +This is an example to find documents relevant to a question from the collection of documents. The query is passed to vector search to retrieve documents that match closely in the embeddings space. A score is returned with each of the search result. -1. [Getting Started](./getting-started/) - Simple project that uses the pgml SDK to create a collection, add a pipeline, upsert documents, and run a vector search on the collection. +### [Question Answering using Instructore Model](./question_answering_instructor.js) +In this example, we will use `hknlp/instructor-base` model to build text embeddings instead of the default `intfloat/e5-small` model. + +### [Extractive Question Answering](./extractive_question_answering.js) +In this example, we will show how to use `vector_recall` result as a `context` to a HuggingFace question answering model. We will use `Builtins.transform()` to run the model on the database. diff --git a/pgml-sdks/rust/pgml/javascript/examples/extractive_question_answering.js b/pgml-sdks/rust/pgml/javascript/examples/extractive_question_answering.js new file mode 100644 index 000000000..7483f5507 --- /dev/null +++ b/pgml-sdks/rust/pgml/javascript/examples/extractive_question_answering.js @@ -0,0 +1,64 @@ +const pgml = require("pgml"); +require("dotenv").config(); + +pgml.js_init_logger(); + +const main = async () => { + // Initialize the collection + const collection = pgml.newCollection("my_javascript_eqa_collection_2"); + + // Add a pipeline + const model = pgml.newModel(); + const splitter = pgml.newSplitter(); + const pipeline = pgml.newPipeline( + "my_javascript_eqa_pipeline_1", + model, + splitter, + ); + await collection.add_pipeline(pipeline); + + // Upsert documents, these documents are automatically split into chunks and embedded by our pipeline + const documents = [ + { + id: "Document One", + text: "PostgresML is the best tool for machine learning applications!", + }, + { + id: "Document Two", + text: "PostgresML is open source and available to everyone!", + }, + ]; + await collection.upsert_documents(documents); + + const query = "What is the best tool for machine learning?"; + + // Perform vector search + const queryResults = await collection + .query() + .vector_recall(query, pipeline) + .limit(1) + .fetch_all(); + + // Construct context from results + const context = queryResults + .map((result) => { + return result[1]; + }) + .join("\n") + .replace('"', '\\"') + .replace("'", "''"); + + // Query for answer + const builtins = pgml.newBuiltins(); + const answer = await builtins.transform("question-answering", [ + JSON.stringify({ question: query, context: context }), + ]); + + // Archive the collection + await collection.archive(); + return answer; +}; + +main().then((results) => { + console.log("Question answer: \n", results); +}); diff --git a/pgml-sdks/rust/pgml/javascript/examples/getting-started/README.md b/pgml-sdks/rust/pgml/javascript/examples/getting-started/README.md deleted file mode 100644 index 293b5a3ca..000000000 --- a/pgml-sdks/rust/pgml/javascript/examples/getting-started/README.md +++ /dev/null @@ -1,12 +0,0 @@ -# Getting Started with the PGML Javascript SDK - -In this example repo you will find a basic script that you can run to get started with the PGML Javascript SDK. This script will create a collection, add a pipeline, and run a vector search on the collection. - -## Steps to run the example - -1. Clone the repo -2. Install dependencies - `npm install` -3. Create a .env file and set `DATABASE_URL` to your Postgres connection string -4. Open index.js and check out the code -5. Run the script `node index.js` diff --git a/pgml-sdks/rust/pgml/javascript/examples/getting-started/package-lock.json b/pgml-sdks/rust/pgml/javascript/examples/package-lock.json similarity index 100% rename from pgml-sdks/rust/pgml/javascript/examples/getting-started/package-lock.json rename to pgml-sdks/rust/pgml/javascript/examples/package-lock.json diff --git a/pgml-sdks/rust/pgml/javascript/examples/getting-started/package.json b/pgml-sdks/rust/pgml/javascript/examples/package.json similarity index 100% rename from pgml-sdks/rust/pgml/javascript/examples/getting-started/package.json rename to pgml-sdks/rust/pgml/javascript/examples/package.json diff --git a/pgml-sdks/rust/pgml/javascript/examples/question_answering.js b/pgml-sdks/rust/pgml/javascript/examples/question_answering.js new file mode 100644 index 000000000..f8f7f83f5 --- /dev/null +++ b/pgml-sdks/rust/pgml/javascript/examples/question_answering.js @@ -0,0 +1,55 @@ +const pgml = require("pgml"); +require("dotenv").config(); + +const main = async () => { + // Initialize the collection + const collection = pgml.newCollection("my_javascript_qa_collection"); + + // Add a pipeline + const model = pgml.newModel(); + const splitter = pgml.newSplitter(); + const pipeline = pgml.newPipeline( + "my_javascript_qa_pipeline", + model, + splitter, + ); + await collection.add_pipeline(pipeline); + + // Upsert documents, these documents are automatically split into chunks and embedded by our pipeline + const documents = [ + { + id: "Document One", + text: "PostgresML is the best tool for machine learning applications!", + }, + { + id: "Document Two", + text: "PostgresML is open source and available to everyone!", + }, + ]; + await collection.upsert_documents(documents); + + // Perform vector search + const queryResults = await collection + .query() + .vector_recall("What is the best tool for machine learning?", pipeline) + .limit(1) + .fetch_all(); + + // Convert the results to an array of objects + const results = queryResults.map((result) => { + const [similarity, text, metadata] = result; + return { + similarity, + text, + metadata, + }; + }); + + // Archive the collection + await collection.archive(); + return results; +}; + +main().then((results) => { + console.log("Vector search Results: \n", results); +}); diff --git a/pgml-sdks/rust/pgml/javascript/examples/question_answering_instructor.js b/pgml-sdks/rust/pgml/javascript/examples/question_answering_instructor.js new file mode 100644 index 000000000..1e4c22164 --- /dev/null +++ b/pgml-sdks/rust/pgml/javascript/examples/question_answering_instructor.js @@ -0,0 +1,60 @@ +const pgml = require("pgml"); +require("dotenv").config(); + +const main = async () => { + // Initialize the collection + const collection = pgml.newCollection("my_javascript_qai_collection"); + + // Add a pipeline + const model = pgml.newModel("hkunlp/instructor-base", "pgml", { + instruction: "Represent the Wikipedia document for retrieval: ", + }); + const splitter = pgml.newSplitter(); + const pipeline = pgml.newPipeline( + "my_javascript_qai_pipeline", + model, + splitter, + ); + await collection.add_pipeline(pipeline); + + // Upsert documents, these documents are automatically split into chunks and embedded by our pipeline + const documents = [ + { + id: "Document One", + text: "PostgresML is the best tool for machine learning applications!", + }, + { + id: "Document Two", + text: "PostgresML is open source and available to everyone!", + }, + ]; + await collection.upsert_documents(documents); + + // Perform vector search + const queryResults = await collection + .query() + .vector_recall("What is the best tool for machine learning?", pipeline, { + instruction: + "Represent the Wikipedia question for retrieving supporting documents: ", + }) + .limit(1) + .fetch_all(); + + // Convert the results to an array of objects + const results = queryResults.map((result) => { + const [similarity, text, metadata] = result; + return { + similarity, + text, + metadata, + }; + }); + + // Archive the collection + await collection.archive(); + return results; +}; + +main().then((results) => { + console.log("Vector search Results: \n", results); +}); diff --git a/pgml-sdks/rust/pgml/javascript/examples/getting-started/index.js b/pgml-sdks/rust/pgml/javascript/examples/semantic_search.js similarity index 90% rename from pgml-sdks/rust/pgml/javascript/examples/getting-started/index.js rename to pgml-sdks/rust/pgml/javascript/examples/semantic_search.js index 0c1c5f7eb..b1458e889 100644 --- a/pgml-sdks/rust/pgml/javascript/examples/getting-started/index.js +++ b/pgml-sdks/rust/pgml/javascript/examples/semantic_search.js @@ -27,7 +27,10 @@ const main = async () => { // Perform vector search const queryResults = await collection .query() - .vector_recall("Some user query that will match document one first", pipeline) + .vector_recall( + "Some user query that will match document one first", + pipeline, + ) .limit(2) .fetch_all(); @@ -41,6 +44,7 @@ const main = async () => { }; }); + // Archive the collection await collection.archive(); return results; }; diff --git a/pgml-sdks/rust/pgml/python/examples/README.md b/pgml-sdks/rust/pgml/python/examples/README.md index dc8fce385..e2e22eb6e 100644 --- a/pgml-sdks/rust/pgml/python/examples/README.md +++ b/pgml-sdks/rust/pgml/python/examples/README.md @@ -1,7 +1,7 @@ ## Examples ### [Semantic Search](./semantic_search.py) -This is a basic example to perform semantic search on a collection of documents. It loads the Quora dataset, creates a collection in a PostgreSQL database, upserts documents, generates chunks and embeddings, and then performs a vector search on a query. Embeddings are created using `intfloat/e5-small` model. The results are are semantically similar documemts to the query. Finally, the collection is archived. +This is a basic example to perform semantic search on a collection of documents. It loads the Quora dataset, creates a collection in a PostgreSQL database, upserts documents, generates chunks and embeddings, and then performs a vector search on a query. Embeddings are created using `intfloat/e5-small` model. The results are semantically similar documemts to the query. Finally, the collection is archived. ### [Question Answering](./question_answering.py) This is an example to find documents relevant to a question from the collection of documents. It loads the Stanford Question Answering Dataset (SQuAD) into the database, generates chunks and embeddings. Query is passed to vector search to retrieve documents that match closely in the embeddings space. A score is returned with each of the search result. From 78a26b95de38ba9af02a6865abddad0dda306325 Mon Sep 17 00:00:00 2001 From: SilasMarvin <19626586+SilasMarvin@users.noreply.github.com> Date: Fri, 25 Aug 2023 12:56:14 -0700 Subject: [PATCH 2/2] Small cleanup --- .../pgml/javascript/examples/extractive_question_answering.js | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/pgml-sdks/rust/pgml/javascript/examples/extractive_question_answering.js b/pgml-sdks/rust/pgml/javascript/examples/extractive_question_answering.js index 7483f5507..fac0925ff 100644 --- a/pgml-sdks/rust/pgml/javascript/examples/extractive_question_answering.js +++ b/pgml-sdks/rust/pgml/javascript/examples/extractive_question_answering.js @@ -44,9 +44,7 @@ const main = async () => { .map((result) => { return result[1]; }) - .join("\n") - .replace('"', '\\"') - .replace("'", "''"); + .join("\n"); // Query for answer const builtins = pgml.newBuiltins();Fetched URL: http://github.com/postgresml/postgresml/pull/953.patch
Alternative Proxies: