diff --git a/pgml-cms/docs/SUMMARY.md b/pgml-cms/docs/SUMMARY.md index 84e656fcb..bfc9ef6a1 100644 --- a/pgml-cms/docs/SUMMARY.md +++ b/pgml-cms/docs/SUMMARY.md @@ -36,7 +36,7 @@ * [pgml.tune()](introduction/apis/sql-extensions/pgml.tune.md) * [Client SDKs](introduction/apis/client-sdks/README.md) * [Overview](introduction/apis/client-sdks/getting-started.md) - * [Collections](../../pgml-docs/docs/guides/sdks/collections.md) + * [Collections](introduction/apis/client-sdks/collections.md) * [Pipelines](introduction/apis/client-sdks/pipelines.md) * [Search](introduction/apis/client-sdks/search.md) * [Tutorials](introduction/apis/client-sdks/tutorials/README.md) diff --git a/pgml-cms/docs/introduction/apis/sql-extensions/pgml.deploy.md b/pgml-cms/docs/introduction/apis/sql-extensions/pgml.deploy.md index e24dabf05..e5c52f793 100644 --- a/pgml-cms/docs/introduction/apis/sql-extensions/pgml.deploy.md +++ b/pgml-cms/docs/introduction/apis/sql-extensions/pgml.deploy.md @@ -26,11 +26,11 @@ pgml.deploy( There are 3 different deployment strategies available: -| Strategy | Description | -| ------------- | --------------------------------------------------------------------------------------------------------------------- | -| `most_recent` | The most recently trained model for this project is immediately deployed, regardless of metrics. | -| `best_score` | The model that achieved the best key metric score is immediately deployed. | -| `rollback` | The model that was last deployed for this project is immediately redeployed, overriding the currently deployed model. | +| Strategy | Description | +| ------------- |--------------------------------------------------------------------------------------------------| +| `most_recent` | The most recently trained model for this project is immediately deployed, regardless of metrics. | +| `best_score` | The model that achieved the best key metric score is immediately deployed. | +| `rollback` | The model that was deployed before to the current one is deployed. | The default deployment behavior allows any algorithm to qualify. It's automatically used during training, but can be manually executed as well: @@ -40,11 +40,12 @@ The default deployment behavior allows any algorithm to qualify. It's automatica #### SQL -

SELECT * FROM pgml.deploy(
-    'Handwritten Digit Image Classifier',
+```sql
+SELECT * FROM pgml.deploy(
+   'Handwritten Digit Image Classifier',
     strategy => 'best_score'
 );
-

+``` #### Output @@ -121,3 +122,22 @@ SELECT * FROM pgml.deploy( Handwritten Digit Image Classifier | rollback | xgboost (1 row) ``` + +### Specific Model IDs + +In the case you need to deploy an exact model that is not the `most_recent` or `best_score`, you may deploy a model by id. Model id's can be found in the `pgml.models` table. + +#### SQL + +```sql +SELECT * FROM pgml.deploy(12); +``` + +#### Output + +```sql + project | strategy | algorithm +------------------------------------+----------+----------- + Handwritten Digit Image Classifier | specific | xgboost +(1 row) +``` diff --git a/pgml-cms/docs/introduction/apis/sql-extensions/pgml.train/data-pre-processing.md b/pgml-cms/docs/introduction/apis/sql-extensions/pgml.train/data-pre-processing.md index 8d4aeb222..3362c99bd 100644 --- a/pgml-cms/docs/introduction/apis/sql-extensions/pgml.train/data-pre-processing.md +++ b/pgml-cms/docs/introduction/apis/sql-extensions/pgml.train/data-pre-processing.md @@ -25,11 +25,11 @@ In this example: There are 3 steps to preprocessing data: -* [Encoding](data-pre-processing.md#ordinal-encoding) categorical values into quantitative values -* [Imputing](data-pre-processing.md#imputing-missing-values) NULL values to some quantitative value -* [Scaling](data-pre-processing.md#scaling-values) quantitative values across all variables to similar ranges +* [Encoding](../../../../../../pgml-dashboard/content/docs/training/preprocessing.md#categorical-encodings) categorical values into quantitative values +* [Imputing](../../../../../../pgml-dashboard/content/docs/training/preprocessing.md#imputing-missing-values) NULL values to some quantitative value +* [Scaling](../../../../../../pgml-dashboard/content/docs/training/preprocessing.md#scaling-values) quantitative values across all variables to similar ranges -These preprocessing steps may be specified on a per-column basis to the [train()](./) function. By default, PostgresML does minimal preprocessing on training data, and will raise an error during analysis if NULL values are encountered without a preprocessor. All types other than `TEXT` are treated as quantitative variables and cast to floating point representations before passing them to the underlying algorithm implementations. +These preprocessing steps may be specified on a per-column basis to the [train()](../../../../../../docs/training/overview/) function. By default, PostgresML does minimal preprocessing on training data, and will raise an error during analysis if NULL values are encountered without a preprocessor. All types other than `TEXT` are treated as quantitative variables and cast to floating point representations before passing them to the underlying algorithm implementations. ```sql SELECT pgml.train( diff --git a/pgml-cms/docs/resources/developer-docs/contributing.md b/pgml-cms/docs/resources/developer-docs/contributing.md index 38688dc26..3648acbe3 100644 --- a/pgml-cms/docs/resources/developer-docs/contributing.md +++ b/pgml-cms/docs/resources/developer-docs/contributing.md @@ -67,7 +67,7 @@ Once there, you can initialize `pgrx` and get going: #### Pgrx command line and environments ```commandline -cargo install cargo-pgrx --version "0.9.8" --locked && \ +cargo install cargo-pgrx --version "0.11.2" --locked && \ cargo pgrx init # This will take a few minutes ``` diff --git a/pgml-cms/docs/resources/developer-docs/installation.md b/pgml-cms/docs/resources/developer-docs/installation.md index 990cec5a8..119080bf2 100644 --- a/pgml-cms/docs/resources/developer-docs/installation.md +++ b/pgml-cms/docs/resources/developer-docs/installation.md @@ -36,7 +36,7 @@ brew bundle PostgresML is written in Rust, so you'll need to install the latest compiler from [rust-lang.org](https://rust-lang.org). Additionally, we use the Rust PostgreSQL extension framework `pgrx`, which requires some initialization steps: ```bash -cargo install cargo-pgrx --version 0.9.8 && \ +cargo install cargo-pgrx --version 0.11.2 && \ cargo pgrx init ``` @@ -63,8 +63,7 @@ To install the necessary Python packages into a virtual environment, use the `vi ```bash virtualenv pgml-venv && \ source pgml-venv/bin/activate && \ -pip install -r requirements.txt && \ -pip install -r requirements-xformers.txt --no-dependencies +pip install -r requirements.txt ``` {% endtab %} @@ -146,7 +145,7 @@ pgml_test=# SELECT pgml.version(); We like and use pgvector a lot, as documented in our blog posts and examples, to store and search embeddings. You can install pgvector from source pretty easily: ```bash -git clone --branch v0.4.4 https://github.com/pgvector/pgvector && \ +git clone --branch v0.5.0 https://github.com/pgvector/pgvector && \ cd pgvector && \ echo "trusted = true" >> vector.control && \ make && \ @@ -288,7 +287,7 @@ We use the `pgrx` Postgres Rust extension framework, which comes with its own in ```bash cd pgml-extension && \ -cargo install cargo-pgrx --version 0.9.8 && \ +cargo install cargo-pgrx --version 0.11.2 && \ cargo pgrx init ``` diff --git a/pgml-docs/docs/guides/sdks/collections.md b/pgml-docs/docs/guides/sdks/collections.md deleted file mode 100644 index 2ebc415d5..000000000 --- a/pgml-docs/docs/guides/sdks/collections.md +++ /dev/null @@ -1,349 +0,0 @@ -# Collections - -Collections are the organizational building blocks of the SDK. They manage all documents and related chunks, embeddings, tsvectors, and pipelines. - -## Creating Collections - -By default, collections will read and write to the database specified by `DATABASE_URL` environment variable. - -### **Default `DATABASE_URL`** - -{% tabs %} -{% tab title="JavaScript" %} -```javascript -const collection = pgml.newCollection("test_collection") -``` -{% endtab %} - -{% tab title="Python" %} -```python -collection = Collection("test_collection") -``` -{% endtab %} -{% endtabs %} - -### **Custom DATABASE\_URL** - -Create a Collection that reads from a different database than that set by the environment variable `DATABASE_URL`. - -{% tabs %} -{% tab title="Javascript" %} -```javascript -const collection = pgml.newCollection("test_collection", CUSTOM_DATABASE_URL) -``` -{% endtab %} - -{% tab title="Python" %} -```python -collection = Collection("test_collection", CUSTOM_DATABASE_URL) -``` -{% endtab %} -{% endtabs %} - -## Upserting Documents - -Documents are dictionaries with two required keys: `id` and `text`. All other keys/value pairs are stored as metadata for the document. - -{% tabs %} -{% tab title="JavaScript" %} -```javascript -const documents = [ - { - id: "Document One", - text: "document one contents...", - random_key: "this will be metadata for the document", - }, - { - id: "Document Two", - text: "document two contents...", - random_key: "this will be metadata for the document", - }, -]; -await collection.upsert_documents(documents); -``` -{% endtab %} - -{% tab title="Python" %} -```python -documents = [ - { - "id": "Document 1", - "text": "Here are the contents of Document 1", - "random_key": "this will be metadata for the document" - }, - { - "id": "Document 2", - "text": "Here are the contents of Document 2", - "random_key": "this will be metadata for the document" - } -] -collection = Collection("test_collection") -await collection.upsert_documents(documents) -``` -{% endtab %} -{% endtabs %} - -Document metadata can be replaced by upserting the document without the `text` key. - -{% tabs %} -{% tab title="JavaScript" %} -```javascript -const documents = [ - { - id: "Document One", - random_key: "this will be NEW metadata for the document", - }, - { - id: "Document Two", - random_key: "this will be NEW metadata for the document", - }, -]; -await collection.upsert_documents(documents); -``` -{% endtab %} - -{% tab title="Python" %} -```python -documents = [ - { - "id": "Document 1", - "random_key": "this will be NEW metadata for the document" - }, - { - "id": "Document 2", - "random_key": "this will be NEW metadata for the document" - } -] -collection = Collection("test_collection") -await collection.upsert_documents(documents) -``` -{% endtab %} -{% endtabs %} - -Document metadata can be merged with new metadata by upserting the document without the `text` key and specifying the merge option. - -{% tabs %} -{% tab title="JavaScript" %} -```javascript -const documents = [ - { - id: "Document One", - text: "document one contents...", - }, - { - id: "Document Two", - text: "document two contents...", - }, -]; -await collection.upsert_documents(documents, { - metdata: { - merge: true - } -}); -``` -{% endtab %} - -{% tab title="Python" %} -```python -documents = [ - { - "id": "Document 1", - "random_key": "this will be NEW merged metadata for the document" - }, - { - "id": "Document 2", - "random_key": "this will be NEW merged metadata for the document" - } -] -collection = Collection("test_collection") -await collection.upsert_documents(documents, { - "metadata": { - "merge": True - } -}) -``` -{% endtab %} -{% endtabs %} - -## Getting Documents - -Documents can be retrieved using the `get_documents` method on the collection object. - -{% tabs %} -{% tab title="JavaScript" %} -```javascript -const collection = Collection("test_collection") -const documents = await collection.get_documents({limit: 100 }) -``` -{% endtab %} - -{% tab title="Python" %} -```python -collection = Collection("test_collection") -documents = await collection.get_documents({ "limit": 100 }) -``` -{% endtab %} -{% endtabs %} - -### Paginating Documents - -The SDK supports limit-offset pagination and keyset pagination. - -#### Limit-Offset Pagination - -{% tabs %} -{% tab title="JavaScript" %} -```javascript -const collection = pgml.newCollection("test_collection") -const documents = await collection.get_documents({ limit: 100, offset: 10 }) -``` -{% endtab %} - -{% tab title="Python" %} -```python -collection = Collection("test_collection") -documents = await collection.get_documents({ "limit": 100, "offset": 10 }) -``` -{% endtab %} -{% endtabs %} - -#### Keyset Pagination - -{% tabs %} -{% tab title="JavaScript" %} -```javascript -const collection = Collection("test_collection") -const documents = await collection.get_documents({ limit: 100, last_row_id: 10 }) -``` -{% endtab %} - -{% tab title="Python" %} -```python -collection = Collection("test_collection") -documents = await collection.get_documents({ "limit": 100, "last_row_id": 10 }) -``` -{% endtab %} -{% endtabs %} - -The `last_row_id` can be taken from the `row_id` field in the returned document's dictionary. - -### Filtering Documents - -Metadata and full text filtering are supported just like they are in vector recall. - -{% tabs %} -{% tab title="JavaScript" %} -```javascript -const collection = pgml.newCollection("test_collection") -const documents = await collection.get_documents({ - limit: 100, - offset: 10, - filter: { - metadata: { - id: { - $eq: 1 - } - }, - full_text_search: { - configuration: "english", - text: "Some full text query" - } - } -}) -``` -{% endtab %} - -{% tab title="Python" %} -```python -collection = Collection("test_collection") -documents = await collection.get_documents({ - "limit": 100, - "offset": 10, - "filter": { - "metadata": { - "id": { - "$eq": 1 - } - }, - "full_text_search": { - "configuration": "english", - "text": "Some full text query" - } - } -}) -``` -{% endtab %} -{% endtabs %} - -### Sorting Documents - -Documents can be sorted on any metadata key. Note that this does not currently work well with Keyset based pagination. If paginating and sorting, use Limit-Offset based pagination. - -{% tabs %} -{% tab title="JavaScript" %} -```javascript -const collection = pgml.newCollection("test_collection") -const documents = await collection.get_documents({ - limit: 100, - offset: 10, - order_by: { - id: "desc" - } -}) -``` -{% endtab %} - -{% tab title="Python" %} -```python -collection = Collection("test_collection") -documents = await collection.get_documents({ - "limit": 100, - "offset": 10, - "order_by": { - "id": "desc" - } -}) -``` -{% endtab %} -{% endtabs %} - -### Deleting Documents - -Documents can be deleted with the `delete_documents` method on the collection object. - -Metadata and full text filtering are supported just like they are in vector recall. - -{% tabs %} -{% tab title="JavaScript" %} -```javascript -const collection = pgml.newCollection("test_collection") -const documents = await collection.delete_documents({ - metadata: { - id: { - $eq: 1 - } - }, - full_text_search: { - configuration: "english", - text: "Some full text query" - } -}) -``` -{% endtab %} - -{% tab title="Python" %} -```python -documents = await collection.delete_documents({ - "metadata": { - "id": { - "$eq": 1 - } - }, - "full_text_search": { - "configuration": "english", - "text": "Some full text query" - } -}) -``` -{% endtab %} -{% endtabs %} pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies: