Updated README and added requirements.txt

postgresml · SilasMarvin · Aug 28, 2023 · Aug 24, 2023 · Aug 24, 2023 · Aug 24, 2023
commit 59aa1e9163307eb922ac981ffcb178901dbe916a
diff --git a/pgml-sdks/pgml/python/examples/README.md b/pgml-sdks/pgml/python/examples/README.md
@@ -1,20 +1,28 @@
-## Examples
+# Examples
 
-### [Semantic Search](./semantic_search.py)
+## Prerequisites
+Before running any examples first install dependencies and set the DATABASE_URL environment variable:
+```
+pip install -r requirements.txt
+export DATABASE_URL={YOUR DATABASE URL}
+```
+
+Optionally, configure a .env file containing a DATABASE_URL variable.
+
+## [Semantic Search](./semantic_search.py)
 This is a basic example to perform semantic search on a collection of documents. It loads the Quora dataset, creates a collection in a PostgreSQL database, upserts documents, generates chunks and embeddings, and then performs a vector search on a query. Embeddings are created using `intfloat/e5-small` model. The results are semantically similar documemts to the query. Finally, the collection is archived.
 
-### [Question Answering](./question_answering.py)
+## [Question Answering](./question_answering.py)
 This is an example to find documents relevant to a question from the collection of documents. It loads the Stanford Question Answering Dataset (SQuAD) into the database, generates chunks and embeddings. Query is passed to vector search to retrieve documents that match closely in the embeddings space. A score is returned with each of the search result.
 
-### [Question Answering using Instructore Model](./question_answering_instructor.py)
+## [Question Answering using Instructore Model](./question_answering_instructor.py)
 In this example, we will use `hknlp/instructor-base` model to build text embeddings instead of the default `intfloat/e5-small` model.
 
-### [Extractive Question Answering](./extractive_question_answering.py)
+## [Extractive Question Answering](./extractive_question_answering.py)
 In this example, we will show how to use `vector_recall` result as a `context` to a HuggingFace question answering model. We will use `Builtins.transform()` to run the model on the database.
 
-### [Table Question Answering](./table_question_answering.py)
-In this example, we will use [Open Table-and-Text Question Answering (OTT-QA)
-](https://github.com/wenhuchen/OTT-QA) dataset to run queries on tables. We will use `deepset/all-mpnet-base-v2-table` model that is trained for embedding tabular data for retrieval tasks. 
+## [Table Question Answering](./table_question_answering.py)
+In this example, we will use [Open Table-and-Text Question Answering (OTT-QA)](https://github.com/wenhuchen/OTT-QA) dataset to run queries on tables. We will use `deepset/all-mpnet-base-v2-table` model that is trained for embedding tabular data for retrieval tasks. 
 
-### [Summarizing Question Answering](./summarizing_question_answering.py)
+## [Summarizing Question Answering](./summarizing_question_answering.py)
 This is an example to find documents relevant to a question from the collection of documents and then summarize those documents.
diff --git a/pgml-sdks/pgml/python/examples/requirements.txt b/pgml-sdks/pgml/python/examples/requirements.txt
@@ -0,0 +1,36 @@
+aiohttp==3.8.5
+aiosignal==1.3.1
+async-timeout==4.0.3
+attrs==23.1.0
+certifi==2023.7.22
+charset-normalizer==3.2.0
+datasets==2.14.4
+dill==0.3.7
+filelock==3.12.3
+frozenlist==1.4.0
+fsspec==2023.6.0
+huggingface-hub==0.16.4
+idna==3.4
+markdown-it-py==3.0.0
+mdurl==0.1.2
+multidict==6.0.4
+multiprocess==0.70.15
+numpy==1.25.2
+packaging==23.1
+pandas==2.0.3
+pgml==0.9.0
+pyarrow==13.0.0
+Pygments==2.16.1
+python-dateutil==2.8.2
+python-dotenv==1.0.0
+pytz==2023.3
+PyYAML==6.0.1
+requests==2.31.0
+rich==13.5.2
+six==1.16.0
+tqdm==4.66.1
+typing_extensions==4.7.1
+tzdata==2023.3
+urllib3==2.0.4
+xxhash==3.3.0
+yarl==1.9.2