postgresml · SilasMarvin · Mar 1, 2024 · Mar 1, 2024 · Mar 1, 2024 · Mar 1, 2024
diff --git a/pgml-cms/docs/SUMMARY.md b/pgml-cms/docs/SUMMARY.md
@@ -38,12 +38,11 @@
     * [Overview](introduction/apis/client-sdks/getting-started.md)
     * [Collections](introduction/apis/client-sdks/collections.md)
     * [Pipelines](introduction/apis/client-sdks/pipelines.md)
-    * [Search](introduction/apis/client-sdks/search.md)
+    * [Vector Search](introduction/apis/client-sdks/search.md)
+    * [Document Search](introduction/apis/client-sdks/document-search.md)
     * [Tutorials](introduction/apis/client-sdks/tutorials/README.md)
       * [Semantic Search](introduction/apis/client-sdks/tutorials/semantic-search.md)
-      * [Semantic Search using Instructor model](introduction/apis/client-sdks/tutorials/semantic-search-using-instructor-model.md)
-      * [Extractive Question Answering](introduction/apis/client-sdks/tutorials/extractive-question-answering.md)
-      * [Summarizing Question Answering](introduction/apis/client-sdks/tutorials/summarizing-question-answering.md)
+      * [Semantic Search Using Instructor Model](introduction/apis/client-sdks/tutorials/semantic-search-1.md)
 
 ## Product
 

diff --git a/pgml-cms/docs/introduction/apis/client-sdks/collections.md b/pgml-cms/docs/introduction/apis/client-sdks/collections.md
@@ -1,16 +1,16 @@
 ---
-description: >-
-  Organizational building blocks of the SDK. Manage all documents and related chunks, embeddings, tsvectors, and pipelines.
+description: Organizational building blocks of the SDK. Manage all documents and related chunks, embeddings, tsvectors, and pipelines.
 ---
+
 # Collections
 
 Collections are the organizational building blocks of the SDK. They manage all documents and related chunks, embeddings, tsvectors, and pipelines.
 
 ## Creating Collections
 
-By default, collections will read and write to the database specified by `DATABASE_URL` environment variable.
+By default, collections will read and write to the database specified by `PGML_DATABASE_URL` environment variable.
 
-### **Default `DATABASE_URL`**
+### **Default `PGML_DATABASE_URL`**
 
 {% tabs %}
 {% tab title="JavaScript" %}
@@ -26,9 +26,9 @@ collection = Collection("test_collection")
 {% endtab %}
 {% endtabs %}
 
-### **Custom DATABASE\_URL**
+### Custom `PGML_DATABASE_URL`
 
-Create a Collection that reads from a different database than that set by the environment variable `DATABASE_URL`.
+Create a Collection that reads from a different database than that set by the environment variable `PGML_DATABASE_URL`.
 
 {% tabs %}
 {% tab title="Javascript" %}
@@ -46,21 +46,23 @@ collection = Collection("test_collection", CUSTOM_DATABASE_URL)
 
 ## Upserting Documents
 
-Documents are dictionaries with two required keys: `id` and `text`. All other keys/value pairs are stored as metadata for the document.
+Documents are dictionaries with one required key: `id`. All other keys/value pairs are stored and can be chunked, embedded, broken into tsvectors, and searched over as specified by a `Pipeline`.
 
 {% tabs %}
 {% tab title="JavaScript" %}
 ```javascript
 const documents = [
   {
-    id: "Document One",
+    id: "document_one",
+    title: "Document One",
     text: "document one contents...",
-    random_key: "this will be metadata for the document",
+    random_key: "here is some random data",
   },
   {
-    id: "Document Two",
+    id: "document_two",
+    title: "Document Two",
     text: "document two contents...",
-    random_key: "this will be metadata for the document",
+    random_key: "here is some random data",
   },
 ];
 await collection.upsert_documents(documents);
@@ -71,35 +73,40 @@ await collection.upsert_documents(documents);
 ```python
 documents = [
     {
-        "id": "Document 1",
+        "id": "document_one",
+        "title": "Document One",
         "text": "Here are the contents of Document 1",
-        "random_key": "this will be metadata for the document"
+        "random_key": "here is some random data",
     },
     {
-        "id": "Document 2",
+        "id": "document_two",
+        "title": "Document Two",
         "text": "Here are the contents of Document 2",
-        "random_key": "this will be metadata for the document"
-    }
+        "random_key": "here is some random data",
+    },
 ]
-collection = Collection("test_collection")
 await collection.upsert_documents(documents)
 ```
 {% endtab %}
 {% endtabs %}
 
-Document metadata can be replaced by upserting the document without the `text` key.
+Documents can be replaced by upserting documents with the same `id`.
 
 {% tabs %}
 {% tab title="JavaScript" %}
 ```javascript
 const documents = [
   {
-    id: "Document One",
-    random_key: "this will be NEW metadata for the document",
+    id: "document_one",
+    title: "Document One New Title",
+    text: "Here is some new text for document one",
+    random_key: "here is some new random data",
   },
   {
-    id: "Document Two",
-    random_key: "this will be NEW metadata for the document",
+    id: "document_two",
+    title: "Document Two New Title",
+    text: "Here is some new text for document two",
+    random_key: "here is some new random data",
   },
 ];
 await collection.upsert_documents(documents);
@@ -110,39 +117,42 @@ await collection.upsert_documents(documents);
 ```python
 documents = [
     {
-        "id": "Document 1",
-        "random_key": "this will be NEW metadata for the document"
+        "id": "document_one",
+        "title": "Document One",
+        "text": "Here is some new text for document one",
+        "random_key": "here is some random data",
     },
     {
-        "id": "Document 2",
-        "random_key": "this will be NEW metadata for the document"
-    }
+        "id": "document_two",
+        "title": "Document Two",
+        "text": "Here is some new text for document two",
+        "random_key": "here is some random data",
+    },
 ]
-collection = Collection("test_collection")
 await collection.upsert_documents(documents)
 ```
 {% endtab %}
 {% endtabs %}
 
-Document metadata can be merged with new metadata by upserting the document without the `text` key and specifying the merge option.
+Documents  can be merged by setting the `merge` option. On conflict, new document keys will override old document keys.
 
 {% tabs %}
 {% tab title="JavaScript" %}
 ```javascript
 const documents = [
   {
-    id: "Document One",
-    text: "document one contents...",
+    id: "document_one",
+    new_key: "this will be a new key in document one",
+    random_key: "this will replace old random_key"
   },
   {
-    id: "Document Two",
-    text: "document two contents...",
+    id: "document_two",
+    new_key: "this will bew a new key in document two",
+    random_key: "this will replace old random_key"
   },
 ];
 await collection.upsert_documents(documents, {
-  metdata: {
-    merge: true
-  }
+  merge: true
 });
 ```
 {% endtab %}
@@ -151,20 +161,17 @@ await collection.upsert_documents(documents, {
 ```python
 documents = [
     {
-        "id": "Document 1",
-        "random_key": "this will be NEW merged metadata for the document"
+        "id": "document_one",
+        "new_key": "this will be a new key in document one",
+        "random_key": "this will replace old random_key",
     },
     {
-        "id": "Document 2",
-        "random_key": "this will be NEW merged metadata for the document"
-    }
+        "id": "document_two",
+        "new_key": "this will be a new key in document two",
+        "random_key": "this will replace old random_key",
+    },
 ]
-collection = Collection("test_collection")
-await collection.upsert_documents(documents, {
-    "metadata": {
-        "merge": True
-    }
-})
+await collection.upsert_documents(documents, {"merge": True})
 ```
 {% endtab %}
 {% endtabs %}
@@ -176,14 +183,12 @@ Documents can be retrieved using the `get_documents` method on the collection ob
 {% tabs %}
 {% tab title="JavaScript" %}
 ```javascript
-const collection = Collection("test_collection")
 const documents = await collection.get_documents({limit: 100 })
 ```
 {% endtab %}
 
 {% tab title="Python" %}
 ```python
-collection = Collection("test_collection")
 documents = await collection.get_documents({ "limit": 100 })
 ```
 {% endtab %}
@@ -198,14 +203,12 @@ The SDK supports limit-offset pagination and keyset pagination.
 {% tabs %}
 {% tab title="JavaScript" %}
 ```javascript
-const collection = pgml.newCollection("test_collection")
 const documents = await collection.get_documents({ limit: 100, offset: 10 })
 ```
 {% endtab %}
 
 {% tab title="Python" %}
 ```python
-collection = Collection("test_collection")
 documents = await collection.get_documents({ "limit": 100, "offset": 10 })
 ```
 {% endtab %}
@@ -216,41 +219,31 @@ documents = await collection.get_documents({ "limit": 100, "offset": 10 })
 {% tabs %}
 {% tab title="JavaScript" %}
 ```javascript
-const collection = Collection("test_collection")
 const documents = await collection.get_documents({ limit: 100, last_row_id: 10 })
 ```
 {% endtab %}
 
 {% tab title="Python" %}
 ```python
-collection = Collection("test_collection")
 documents = await collection.get_documents({ "limit": 100, "last_row_id": 10 })
 ```
 {% endtab %}
 {% endtabs %}
 
-The `last_row_id` can be taken from the `row_id` field in the returned document's dictionary.
+The `last_row_id` can be taken from the `row_id` field in the returned document's dictionary. Keyset pagination does not currently work when specifying the `order_by` key.
 
 ### Filtering Documents
 
-Metadata and full text filtering are supported just like they are in vector recall.
+Documents can be filtered by passing in the `filter` key.
 
 {% tabs %}
 {% tab title="JavaScript" %}
 ```javascript
-const collection = pgml.newCollection("test_collection")
 const documents = await collection.get_documents({
-  limit: 100,
-  offset: 10,
+  limit: 10,
   filter: {
-    metadata: {
-      id: {
-        $eq: 1
-      }
-    },
-    full_text_search: {
-      configuration: "english",
-      text: "Some full text query"
+    id: {
+      $eq: "document_one"
     }
   }
 })
@@ -259,34 +252,25 @@ const documents = await collection.get_documents({
 
 {% tab title="Python" %}
 ```python
-collection = Collection("test_collection")
-documents = await collection.get_documents({
-    "limit": 100,
-    "offset": 10,
-    "filter": {
-        "metadata": {
-            "id": {
-                "$eq": 1
-            }
+documents = await collection.get_documents(
+    {
+        "limit": 100,
+        "filter": {
+            "id": {"$eq": "document_one"},
         },
-        "full_text_search": {
-            "configuration": "english",
-            "text": "Some full text query"
-        }
     }
-})
+)
 ```
 {% endtab %}
 {% endtabs %}
 
 ### Sorting Documents
 
-Documents can be sorted on any metadata key. Note that this does not currently work well with Keyset based pagination. If paginating and sorting, use Limit-Offset based pagination.
+Documents can be sorted on any key. Note that this does not currently work well with Keyset based pagination. If paginating and sorting, use Limit-Offset based pagination.
 
 {% tabs %}
 {% tab title="JavaScript" %}
 ```javascript
-const collection = pgml.newCollection("test_collection")
 const documents = await collection.get_documents({
   limit: 100,
   offset: 10,
@@ -299,7 +283,6 @@ const documents = await collection.get_documents({
 
 {% tab title="Python" %}
 ```python
-collection = Collection("test_collection")
 documents = await collection.get_documents({
     "limit": 100,
     "offset": 10,
@@ -315,39 +298,24 @@ documents = await collection.get_documents({
 
 Documents can be deleted with the `delete_documents` method on the collection object.
 
-Metadata and full text filtering are supported just like they are in vector recall.
-
 {% tabs %}
 {% tab title="JavaScript" %}
 ```javascript
-const collection = pgml.newCollection("test_collection")
 const documents = await collection.delete_documents({
-  metadata: {
     id: {
       $eq: 1
     }
-  },
-  full_text_search: {
-    configuration: "english",
-    text: "Some full text query"
-  }
 })
 ```
 {% endtab %}
 
 {% tab title="Python" %}
 ```python
-documents = await collection.delete_documents({
-    "metadata": {
-        "id": {
-            "$eq": 1
-        }
-    },
-    "full_text_search": {
-        "configuration": "english",
-        "text": "Some full text query"
+documents = await collection.delete_documents(
+    {
+        "id": {"$eq": 1},
     }
-})
+)
 ```
 {% endtab %}
 {% endtabs %}