Skip to content

Revert changes made to SDK docs in Revert "GITBOOK-120: Update to SDK… #1344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions pgml-cms/docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,11 @@
* [Overview](introduction/apis/client-sdks/getting-started.md)
* [Collections](introduction/apis/client-sdks/collections.md)
* [Pipelines](introduction/apis/client-sdks/pipelines.md)
* [Search](introduction/apis/client-sdks/search.md)
* [Vector Search](introduction/apis/client-sdks/search.md)
* [Document Search](introduction/apis/client-sdks/document-search.md)
* [Tutorials](introduction/apis/client-sdks/tutorials/README.md)
* [Semantic Search](introduction/apis/client-sdks/tutorials/semantic-search.md)
* [Semantic Search using Instructor model](introduction/apis/client-sdks/tutorials/semantic-search-using-instructor-model.md)
* [Extractive Question Answering](introduction/apis/client-sdks/tutorials/extractive-question-answering.md)
* [Summarizing Question Answering](introduction/apis/client-sdks/tutorials/summarizing-question-answering.md)
* [Semantic Search Using Instructor Model](introduction/apis/client-sdks/tutorials/semantic-search-1.md)

## Product

Expand Down
172 changes: 70 additions & 102 deletions pgml-cms/docs/introduction/apis/client-sdks/collections.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
---
description: >-
Organizational building blocks of the SDK. Manage all documents and related chunks, embeddings, tsvectors, and pipelines.
description: Organizational building blocks of the SDK. Manage all documents and related chunks, embeddings, tsvectors, and pipelines.
---

# Collections

Collections are the organizational building blocks of the SDK. They manage all documents and related chunks, embeddings, tsvectors, and pipelines.

## Creating Collections

By default, collections will read and write to the database specified by `DATABASE_URL` environment variable.
By default, collections will read and write to the database specified by `PGML_DATABASE_URL` environment variable.

### **Default `DATABASE_URL`**
### **Default `PGML_DATABASE_URL`**

{% tabs %}
{% tab title="JavaScript" %}
Expand All @@ -26,9 +26,9 @@ collection = Collection("test_collection")
{% endtab %}
{% endtabs %}

### **Custom DATABASE\_URL**
### Custom `PGML_DATABASE_URL`

Create a Collection that reads from a different database than that set by the environment variable `DATABASE_URL`.
Create a Collection that reads from a different database than that set by the environment variable `PGML_DATABASE_URL`.

{% tabs %}
{% tab title="Javascript" %}
Expand All @@ -46,21 +46,23 @@ collection = Collection("test_collection", CUSTOM_DATABASE_URL)

## Upserting Documents

Documents are dictionaries with two required keys: `id` and `text`. All other keys/value pairs are stored as metadata for the document.
Documents are dictionaries with one required key: `id`. All other keys/value pairs are stored and can be chunked, embedded, broken into tsvectors, and searched over as specified by a `Pipeline`.

{% tabs %}
{% tab title="JavaScript" %}
```javascript
const documents = [
{
id: "Document One",
id: "document_one",
title: "Document One",
text: "document one contents...",
random_key: "this will be metadata for the document",
random_key: "here is some random data",
},
{
id: "Document Two",
id: "document_two",
title: "Document Two",
text: "document two contents...",
random_key: "this will be metadata for the document",
random_key: "here is some random data",
},
];
await collection.upsert_documents(documents);
Expand All @@ -71,35 +73,40 @@ await collection.upsert_documents(documents);
```python
documents = [
{
"id": "Document 1",
"id": "document_one",
"title": "Document One",
"text": "Here are the contents of Document 1",
"random_key": "this will be metadata for the document"
"random_key": "here is some random data",
},
{
"id": "Document 2",
"id": "document_two",
"title": "Document Two",
"text": "Here are the contents of Document 2",
"random_key": "this will be metadata for the document"
}
"random_key": "here is some random data",
},
]
collection = Collection("test_collection")
await collection.upsert_documents(documents)
```
{% endtab %}
{% endtabs %}

Document metadata can be replaced by upserting the document without the `text` key.
Documents can be replaced by upserting documents with the same `id`.

{% tabs %}
{% tab title="JavaScript" %}
```javascript
const documents = [
{
id: "Document One",
random_key: "this will be NEW metadata for the document",
id: "document_one",
title: "Document One New Title",
text: "Here is some new text for document one",
random_key: "here is some new random data",
},
{
id: "Document Two",
random_key: "this will be NEW metadata for the document",
id: "document_two",
title: "Document Two New Title",
text: "Here is some new text for document two",
random_key: "here is some new random data",
},
];
await collection.upsert_documents(documents);
Expand All @@ -110,39 +117,42 @@ await collection.upsert_documents(documents);
```python
documents = [
{
"id": "Document 1",
"random_key": "this will be NEW metadata for the document"
"id": "document_one",
"title": "Document One",
"text": "Here is some new text for document one",
"random_key": "here is some random data",
},
{
"id": "Document 2",
"random_key": "this will be NEW metadata for the document"
}
"id": "document_two",
"title": "Document Two",
"text": "Here is some new text for document two",
"random_key": "here is some random data",
},
]
collection = Collection("test_collection")
await collection.upsert_documents(documents)
```
{% endtab %}
{% endtabs %}

Document metadata can be merged with new metadata by upserting the document without the `text` key and specifying the merge option.
Documents can be merged by setting the `merge` option. On conflict, new document keys will override old document keys.

{% tabs %}
{% tab title="JavaScript" %}
```javascript
const documents = [
{
id: "Document One",
text: "document one contents...",
id: "document_one",
new_key: "this will be a new key in document one",
random_key: "this will replace old random_key"
},
{
id: "Document Two",
text: "document two contents...",
id: "document_two",
new_key: "this will bew a new key in document two",
random_key: "this will replace old random_key"
},
];
await collection.upsert_documents(documents, {
metdata: {
merge: true
}
merge: true
});
```
{% endtab %}
Expand All @@ -151,20 +161,17 @@ await collection.upsert_documents(documents, {
```python
documents = [
{
"id": "Document 1",
"random_key": "this will be NEW merged metadata for the document"
"id": "document_one",
"new_key": "this will be a new key in document one",
"random_key": "this will replace old random_key",
},
{
"id": "Document 2",
"random_key": "this will be NEW merged metadata for the document"
}
"id": "document_two",
"new_key": "this will be a new key in document two",
"random_key": "this will replace old random_key",
},
]
collection = Collection("test_collection")
await collection.upsert_documents(documents, {
"metadata": {
"merge": True
}
})
await collection.upsert_documents(documents, {"merge": True})
```
{% endtab %}
{% endtabs %}
Expand All @@ -176,14 +183,12 @@ Documents can be retrieved using the `get_documents` method on the collection ob
{% tabs %}
{% tab title="JavaScript" %}
```javascript
const collection = Collection("test_collection")
const documents = await collection.get_documents({limit: 100 })
```
{% endtab %}

{% tab title="Python" %}
```python
collection = Collection("test_collection")
documents = await collection.get_documents({ "limit": 100 })
```
{% endtab %}
Expand All @@ -198,14 +203,12 @@ The SDK supports limit-offset pagination and keyset pagination.
{% tabs %}
{% tab title="JavaScript" %}
```javascript
const collection = pgml.newCollection("test_collection")
const documents = await collection.get_documents({ limit: 100, offset: 10 })
```
{% endtab %}

{% tab title="Python" %}
```python
collection = Collection("test_collection")
documents = await collection.get_documents({ "limit": 100, "offset": 10 })
```
{% endtab %}
Expand All @@ -216,41 +219,31 @@ documents = await collection.get_documents({ "limit": 100, "offset": 10 })
{% tabs %}
{% tab title="JavaScript" %}
```javascript
const collection = Collection("test_collection")
const documents = await collection.get_documents({ limit: 100, last_row_id: 10 })
```
{% endtab %}

{% tab title="Python" %}
```python
collection = Collection("test_collection")
documents = await collection.get_documents({ "limit": 100, "last_row_id": 10 })
```
{% endtab %}
{% endtabs %}

The `last_row_id` can be taken from the `row_id` field in the returned document's dictionary.
The `last_row_id` can be taken from the `row_id` field in the returned document's dictionary. Keyset pagination does not currently work when specifying the `order_by` key.

### Filtering Documents

Metadata and full text filtering are supported just like they are in vector recall.
Documents can be filtered by passing in the `filter` key.

{% tabs %}
{% tab title="JavaScript" %}
```javascript
const collection = pgml.newCollection("test_collection")
const documents = await collection.get_documents({
limit: 100,
offset: 10,
limit: 10,
filter: {
metadata: {
id: {
$eq: 1
}
},
full_text_search: {
configuration: "english",
text: "Some full text query"
id: {
$eq: "document_one"
}
}
})
Expand All @@ -259,34 +252,25 @@ const documents = await collection.get_documents({

{% tab title="Python" %}
```python
collection = Collection("test_collection")
documents = await collection.get_documents({
"limit": 100,
"offset": 10,
"filter": {
"metadata": {
"id": {
"$eq": 1
}
documents = await collection.get_documents(
{
"limit": 100,
"filter": {
"id": {"$eq": "document_one"},
},
"full_text_search": {
"configuration": "english",
"text": "Some full text query"
}
}
})
)
```
{% endtab %}
{% endtabs %}

### Sorting Documents

Documents can be sorted on any metadata key. Note that this does not currently work well with Keyset based pagination. If paginating and sorting, use Limit-Offset based pagination.
Documents can be sorted on any key. Note that this does not currently work well with Keyset based pagination. If paginating and sorting, use Limit-Offset based pagination.

{% tabs %}
{% tab title="JavaScript" %}
```javascript
const collection = pgml.newCollection("test_collection")
const documents = await collection.get_documents({
limit: 100,
offset: 10,
Expand All @@ -299,7 +283,6 @@ const documents = await collection.get_documents({

{% tab title="Python" %}
```python
collection = Collection("test_collection")
documents = await collection.get_documents({
"limit": 100,
"offset": 10,
Expand All @@ -315,39 +298,24 @@ documents = await collection.get_documents({

Documents can be deleted with the `delete_documents` method on the collection object.

Metadata and full text filtering are supported just like they are in vector recall.

{% tabs %}
{% tab title="JavaScript" %}
```javascript
const collection = pgml.newCollection("test_collection")
const documents = await collection.delete_documents({
metadata: {
id: {
$eq: 1
}
},
full_text_search: {
configuration: "english",
text: "Some full text query"
}
})
```
{% endtab %}

{% tab title="Python" %}
```python
documents = await collection.delete_documents({
"metadata": {
"id": {
"$eq": 1
}
},
"full_text_search": {
"configuration": "english",
"text": "Some full text query"
documents = await collection.delete_documents(
{
"id": {"$eq": 1},
}
})
)
```
{% endtab %}
{% endtabs %}
Loading
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy