postgresml · levkk · Apr 29, 2024 · Apr 27, 2024 · Apr 27, 2024 · Apr 27, 2024
diff --git a/pgml-cms/docs/.gitbook/assets/architecture_1.png b/pgml-cms/docs/.gitbook/assets/architecture_1.png
diff --git a/pgml-cms/docs/.gitbook/assets/architecture_2.png b/pgml-cms/docs/.gitbook/assets/architecture_2.png
diff --git a/pgml-cms/docs/.gitbook/assets/architecture_3.png b/pgml-cms/docs/.gitbook/assets/architecture_3.png
diff --git a/pgml-cms/docs/.gitbook/assets/performance_1.png b/pgml-cms/docs/.gitbook/assets/performance_1.png
diff --git a/pgml-cms/docs/.gitbook/assets/performance_2.png b/pgml-cms/docs/.gitbook/assets/performance_2.png
diff --git a/pgml-cms/docs/README.md b/pgml-cms/docs/README.md
@@ -21,16 +21,18 @@ PostgresML allows you to take advantage of the fundamental relationship between
 
 <figure><img src=".gitbook/assets/ml_system.svg" alt="Machine Learning Infrastructure (2.0) by a16z"><figcaption class="mt-2"><p>PostgresML handles all of the functions <a href="https://a16z.com/emerging-architectures-for-modern-data-infrastructure/">described by a16z</a></p></figcaption></figure>
 
-These capabilities are primarily provided by two open-source software projects, that may be used independently, but are designed to be used with the rest of the Postgres ecosystem:
+These capabilities are primarily provided by two open-source software projects, that may be used independently, but are designed to be used together with the rest of the Postgres ecosystem:
 
-* **pgml** - an open source extension for PostgreSQL. It adds support for GPUs and the latest ML & AI algorithms _inside_ the database with a SQL API and no additional infrastructure, networking latency, or reliability costs
-* **PgCat** - an open source pooler for PostgreSQL. It abstracts the scalability and reliability concerns of managing a distributed cluster of Postgres databases. Client applications connect only to the pooler, which handles load balancing, sharding, and failover, outside of any single database server.
+* [**pgml**](/docs/api/sql-extension/) - an open source extension for PostgreSQL. It adds support for GPUs and the latest ML & AI algorithms _inside_ the database with a SQL API and no additional infrastructure, networking latency, or reliability costs.
+* [**PgCat**](/docs/product/pgcat/) - an open source connection pooler for PostgreSQL. It abstracts the scalability and reliability concerns of managing a distributed cluster of Postgres databases. Client applications connect only to the pooler, which handles load balancing, sharding, and failover, outside of any single database server.
 
 <figure><img src=".gitbook/assets/architecture.png" alt="PostgresML architectural diagram"><figcaption></figcaption></figure>
 
+To learn more about how we designed PostgresML, take a look at our [architecture overview](/docs/resources/architecture/).
+
 ## Client SDK
 
-The PostgresML team also provides [native language SDKs](https://github.com/postgresml/postgresml/tree/master/pgml-sdks/pgml) which implement best practices for common ML & AI applications. The JavaScript and Python SDKs are generated from the a core Rust library, which provides a uniform API, correctness and efficiency across all environments.
+The PostgresML team also provides [native language SDKs](/docs/api/client-sdk/) which implement best practices for common ML & AI applications. The JavaScript and Python SDKs are generated from the a core Rust library, which provides a uniform API, correctness and efficiency across all environments.
 
 While using the SDK is completely optional, SDK clients can perform advanced machine learning tasks in a single SQL request, without having to transfer additional data, models, hardware or dependencies to the client application.
 

diff --git a/pgml-cms/docs/SUMMARY.md b/pgml-cms/docs/SUMMARY.md
@@ -16,8 +16,18 @@
 
 * [Overview](api/apis.md)
 * [SQL extension](api/sql-extension/README.md)
-  * [pgml.deploy()](api/sql-extension/pgml.deploy.md)
   * [pgml.embed()](api/sql-extension/pgml.embed.md)
+  * [pgml.transform()](api/sql-extension/pgml.transform/README.md)
+    * [Fill Mask](api/sql-extension/pgml.transform/fill-mask.md)
+    * [Question Answering](api/sql-extension/pgml.transform/question-answering.md)
+    * [Summarization](api/sql-extension/pgml.transform/summarization.md)
+    * [Text Classification](api/sql-extension/pgml.transform/text-classification.md)
+    * [Text Generation](api/sql-extension/pgml.transform/text-generation.md)
+    * [Text-to-Text Generation](api/sql-extension/pgml.transform/text-to-text-generation.md)
+    * [Token Classification](api/sql-extension/pgml.transform/token-classification.md)
+    * [Translation](api/sql-extension/pgml.transform/translation.md)
+    * [Zero-shot Classification](api/sql-extension/pgml.transform/zero-shot-classification.md)
+  * [pgml.deploy()](api/sql-extension/pgml.deploy.md)
   * [pgml.chunk()](api/sql-extension/pgml.chunk.md)
   * [pgml.generate()](api/sql-extension/pgml.generate.md)
   * [pgml.predict()](api/sql-extension/pgml.predict/README.md)
@@ -29,16 +39,6 @@
     * [Data Pre-processing](api/sql-extension/pgml.train/data-pre-processing.md)
     * [Hyperparameter Search](api/sql-extension/pgml.train/hyperparameter-search.md)
     * [Joint Optimization](api/sql-extension/pgml.train/joint-optimization.md)
-  * [pgml.transform()](api/sql-extension/pgml.transform/README.md)
-    * [Fill Mask](api/sql-extension/pgml.transform/fill-mask.md)
-    * [Question Answering](api/sql-extension/pgml.transform/question-answering.md)
-    * [Summarization](api/sql-extension/pgml.transform/summarization.md)
-    * [Text Classification](api/sql-extension/pgml.transform/text-classification.md)
-    * [Text Generation](api/sql-extension/pgml.transform/text-generation.md)
-    * [Text-to-Text Generation](api/sql-extension/pgml.transform/text-to-text-generation.md)
-    * [Token Classification](api/sql-extension/pgml.transform/token-classification.md)
-    * [Translation](api/sql-extension/pgml.transform/translation.md)
-    * [Zero-shot Classification](api/sql-extension/pgml.transform/zero-shot-classification.md)
   * [pgml.tune()](api/sql-extension/pgml.tune.md)
 * [Client SDK](api/client-sdk/README.md)
   * [Collections](api/client-sdk/collections.md)
@@ -79,6 +79,8 @@
 
 ## Resources
 
+* [Architecture](resources/architecture/README.md)
+  * [Why PostgresML?](resources/architecture/why-postgresml.md)
 * [FAQs](resources/faqs.md)
 * [Data Storage & Retrieval](resources/data-storage-and-retrieval/tabular-data.md)
   * [Tabular data](resources/data-storage-and-retrieval/tabular-data.md)
@@ -97,8 +99,6 @@
   * [Contributing](resources/developer-docs/contributing.md)
   * [Distributed Training](resources/developer-docs/distributed-training.md)
   * [GPU Support](resources/developer-docs/gpu-support.md)
-  * [Deploying PostgresML](resources/developer-docs/deploying-postgresml/README.md)
-    * [Monitoring](resources/developer-docs/deploying-postgresml/monitoring.md)
   * [Self-hosting](resources/developer-docs/self-hosting/README.md)
     * [Pooler](resources/developer-docs/self-hosting/pooler.md)
     * [Building from source](resources/developer-docs/self-hosting/building-from-source.md)

diff --git a/pgml-cms/docs/api/sql-extension/pgml.embed.md b/pgml-cms/docs/api/sql-extension/pgml.embed.md
@@ -6,48 +6,99 @@ description: >-
 
 # pgml.embed()
 
-Embeddings are a numeric representation of text. They are used to represent words and sentences as vectors, an array of numbers. Embeddings can be used to find similar pieces of text, by comparing the similarity of the numeric vectors using a distance measure, or they can be used as input features for other machine learning models, since most algorithms can't use text directly.
-
-Many pretrained LLMs can be used to generate embeddings from text within PostgresML. You can browse all the [models](https://huggingface.co/models?library=sentence-transformers) available to find the best solution on Hugging Face.
+The `pgml.embed()` function generates [embeddings](/docs/use-cases/embeddings/) from text, using in-database models downloaded from Hugging Face. Thousands of [open-source models](https://huggingface.co/models?library=sentence-transformers) are available and new and better ones are being published regularly.
 
 ## API
 
 ```sql
 pgml.embed(
-    transformer TEXT, -- huggingface sentence-transformer name
-    text TEXT,        -- input to embed
-    kwargs JSON       -- optional arguments (see below)
+    transformer TEXT,
+    "text" TEXT,
+    kwargs JSON
 )
 ```
 
-## Example
+| Argument | Description | Example |
+|----------|-------------|---------|
+| transformer | The name of a Hugging Face embedding model. | `intfloat/e5-large-v2` |
+| text | The text to embed. This can be a string or the name of a column from a PostgreSQL table. | `'I am your father, Luke'` |
+| kwargs | Additional arguments that are passed to the model. | |
 
-Let's use the `pgml.embed` function to generate embeddings for tweets, so we can find similar ones. We will use the `distilbert-base-uncased` model from :hugging: HuggingFace. This model is a small version of the `bert-base-uncased` model. It is a good choice for short texts like tweets. To start, we'll load a dataset that provides tweets classified into different topics.
+### Examples
 
-```sql
-SELECT pgml.load_dataset('tweet_eval', 'sentiment');
-```
+#### Generate embeddings from text
 
-View some tweets and their topics.
+Creating an embedding from text is as simple as calling the function with the text you want to embed:
 
-```sql
-SELECT *
-FROM pgml.tweet_eval
-LIMIT 10;
+{% tabs %}
+{% tab title="SQL" %}
+
+```postgresql
+SELECT * FROM pgml.embed(
+  'intfloat/e5-small',
+  'No, that''s not true, that''s impossible.'
+) AS star_wars_embedding;
 ```
 
-Get a preview of the embeddings for the first 10 tweets. This will also download the model and cache it for reuse, since it's the first time we've used it.
+{% endtab %}
+{% endtabs %}
 
-```sql
-SELECT text, pgml.embed('distilbert-base-uncased', text)
-FROM pgml.tweet_eval
-LIMIT 10;
+#### Generate embeddings from a table
+
+SQL functions can be used as part of a query to insert, update, or even automatically generate column values of any table:
+
+{% tabs %}
+{% tab title="SQL" %}
+
+```postgresql
+CREATE TABLE star_wars_quotes (
+  quote TEXT NOT NULL,
+  embedding vector(384) GENERATED ALWAYS AS (
+    pgml.embed('intfloat/e5-small', quote)
+  ) STORED
+);
+
+INSERT INTO
+  star_wars_quotes (quote)
+VALUES
+('I find your lack of faith disturbing'),
+('I''ve got a bad feeling about this.'),
+('Do or do not, there is no try.');
 ```
 
-It will take a few minutes to generate the embeddings for the entire dataset. We'll save the results to a new table.
+{% endtab %}
+{% endtabs %}
 
-```sql
-CREATE TABLE tweet_embeddings AS
-SELECT text, pgml.embed('distilbert-base-uncased', text) AS embedding
-FROM pgml.tweet_eval;
+In this example, we're using [generated columns](https://www.postgresql.org/docs/current/ddl-generated-columns.html) to automatically create an embedding of the `quote` column every time the column value is updated.
+
+#### Using embeddings in queries
+
+Once you have embeddings, you can use them in queries to find text with similar semantic meaning:
+
+{% tabs %}
+{% tab title="SQL" %}
+
+```postgresql
+SELECT
+  quote
+FROM
+  star_wars_quotes
+ORDER BY
+  pgml.embed(
+    'intfloat/e5-small',
+    'Feel the force!',
+  ) <=> embedding
+  DESC
+LIMIT 1;
 ```
+
+{% endtab %}
+{% endtabs %}
+
+This query will return the quote with the most similar meaning to `'Feel the force!'` by generating an embedding of that quote and comparing it to all other embeddings in the table, using vector cosine similarity as the measure of distance.
+
+## Performance
+
+First time `pgml.embed()` is called with a new model, it is downloaded from Hugging Face and saved in the cache directory. Subsequent calls will use the cached model, which is faster, and if the connection to the database is kept open, the model will be reused across multiple queries without being unloaded from memory.
+
+If a GPU is available, the model will be automatically loaded onto the GPU and the embedding generation will be even faster.
diff --git a/pgml-cms/docs/api/sql-extension/pgml.transform/README.md b/pgml-cms/docs/api/sql-extension/pgml.transform/README.md
@@ -17,37 +17,134 @@ layout:
 
 # pgml.transform()
 
-PostgresML integrates [🤗 Hugging Face Transformers](https://huggingface.co/transformers) to bring state-of-the-art models into the data layer. There are tens of thousands of pre-trained models with pipelines to turn raw inputs into useful results. Many state of the art deep learning architectures have been published and made available for download. You will want to browse all the [models](https://huggingface.co/models) available to find the perfect solution for your [dataset](https://huggingface.co/dataset) and [task](https://huggingface.co/tasks).
+The `pgml.transform()` function is the most powerful feature of PostgresML. It integrates open-source large language models, like Llama, Mixtral, and many more, which allows to perform complex tasks on your data.
 
-We'll demonstrate some of the tasks that are immediately available to users of your database upon installation: [translation](https://github.com/postgresml/postgresml/blob/v2.7.12/pgml-dashboard/content/docs/guides/transformers/pre\_trained\_models.md#translation), [sentiment analysis](https://github.com/postgresml/postgresml/blob/v2.7.12/pgml-dashboard/content/docs/guides/transformers/pre\_trained\_models.md#sentiment-analysis), [summarization](https://github.com/postgresml/postgresml/blob/v2.7.12/pgml-dashboard/content/docs/guides/transformers/pre\_trained\_models.md#summarization), [question answering](https://github.com/postgresml/postgresml/blob/v2.7.12/pgml-dashboard/content/docs/guides/transformers/pre\_trained\_models.md#question-answering) and [text generation](https://github.com/postgresml/postgresml/blob/v2.7.12/pgml-dashboard/content/docs/guides/transformers/pre\_trained\_models.md#text-generation).
+The models are downloaded from [🤗 Hugging Face](https://huggingface.co/transformers) which hosts tens of thousands of pre-trained and fine-tuned models for various tasks like text generation, question answering, summarization, text classification, and more.
 
-### Examples
+## API
 
-All of the tasks and models demonstrated here can be customized by passing additional arguments to the `Pipeline` initializer or call. You'll find additional links to documentation in the examples below.
+The `pgml.transform()` function comes in two flavors, task-based and model-based.
 
-The Hugging Face [`Pipeline`](https://huggingface.co/docs/transformers/main\_classes/pipelines) API is exposed in Postgres via:
+### Task-based API
 
-```sql
+The task-based API automatically chooses a model to use based on the task:
+
+```postgresql
+pgml.transform(
+    task TEXT,
+    args JSONB,
+    inputs TEXT[]
+)
+```
+
+| Argument | Description | Example |
+|----------|-------------|---------|
+| task | The name of a natural language processing task. | `text-generation` |
+| args | Additional kwargs to pass to the pipeline. | `{"max_new_tokens": 50}` |
+| inputs | Array of prompts to pass to the model for inference. | `['Once upon a time...']` |
+
+#### Example
+
+{% tabs %}
+{% tab title="SQL" %}
+
+```postgresql
+SELECT *
+FROM pgml.transform (
+  'translation_en_to_fr',
+  'How do I say hello in French?',
+);
+```
+
+{% endtab %}
+{% endtabs %}
+
+### Model-based API
+
+The model-based API requires the name of the model and the task, passed as a JSON object, which allows it to be more generic:
+
+```postgresql
 pgml.transform(
-    task TEXT OR JSONB,      -- task name or full pipeline initializer arguments
-    call JSONB,              -- additional call arguments alongside the inputs
-    inputs TEXT[] OR BYTEA[] -- inputs for inference
+    model JSONB,
+    args JSONB,
+    inputs TEXT[]
 )
 ```
 
-This is roughly equivalent to the following Python:
+| Argument | Description | Example |
+|----------|-------------|---------|
+| task | Model configuration, including name and task. | `{"task": "text-generation", "model": "mistralai/Mixtral-8x7B-v0.1"}` |
+| args | Additional kwargs to pass to the pipeline. | `{"max_new_tokens": 50}` |
+| inputs | Array of prompts to pass to the model for inference. | `['Once upon a time...']` |
+
+#### Example
+
+{% tabs %}
+{% tab title="SQL" %}
+
+```postgresql
+SELECT pgml.transform(
+  task   => '{
+    "task": "text-generation",
+    "model": "TheBloke/zephyr-7B-beta-GPTQ",
+    "model_type": "mistral",
+    "revision": "main",
+  }'::JSONB,
+  inputs  => ['AI is going to change the world in the following ways:'],
+  args   => '{
+    "max_new_tokens": 100
+  }'::JSONB
+);
+```
+
+{% endtab %}
+
+{% tab title="Equivalent Python" %}
 
 ```python
 import transformers
 
 def transform(task, call, inputs):
     return transformers.pipeline(**task)(inputs, **call)
+
+transform(
+    {
+        "task": "text-generation",
+        "model": "TheBloke/zephyr-7B-beta-GPTQ",
+        "model_type": "mistral",
+        "revision": "main",
+    },
+    {"max_new_tokens": 100},
+    ['AI is going to change the world in the following ways:']
+)
 ```
 
-Most pipelines operate on `TEXT[]` inputs, but some require binary `BYTEA[]` data like audio classifiers. `inputs` can be `SELECT`ed from tables in the database, or they may be passed in directly with the query. The output of this call is a `JSONB` structure that is task specific. See the [Postgres JSON](https://www.postgresql.org/docs/14/functions-json.html) reference for ways to process this output dynamically.
+{% endtab %}
+{% endtabs %}
+
+
+### Supported tasks
+
+PostgresML currently supports most NLP tasks available on Hugging Face:
+
+| Task | Name | Description |
+|------|-------------|---------|
+| [Fill mask](fill-mask) | `key-mask` | Fill in the blank in a sentence. |
+| [Question answering](question-answering) | `question-answering` | Answer a question based on a context. |
+| [Summarization](summarization) | `summarization` | Summarize a long text. |
+| [Text classification](text-classification) | `text-classification` | Classify a text as positive or negative. |
+| [Text generation](text-generation) | `text-generation` | Generate text based on a prompt. |
+| [Text-to-text generation](text-to-text-generation) | `text-to-text-generation` | Generate text based on an instruction in the prompt. |
+| [Token classification](token-classification) | `token-classification` | Classify tokens in a text. |
+| [Translation](translation) | `translation` | Translate text from one language to another. |
+| [Zero-shot classification](zero-shot-classification) | `zero-shot-classification` | Classify a text without training data. |
+
+
+## Performance
 
-!!! tip
+Much like `pgml.embed()`, the models used in `pgml.transform()` are downloaded from Hugging Face and cached locally. If the connection to the database is kept open, the model remains in memory, which allows for faster inference on subsequent calls. If you want to free up memory, you can close the connection.
 
-Models will be downloaded and stored locally on disk after the first call. They are also cached per connection to improve repeated calls in a single session. To free that memory, you'll need to close your connection. You may want to establish dedicated credentials and connection pools via [pgcat](https://github.com/levkk/pgcat) or [pgbouncer](https://www.pgbouncer.org/) for larger models that have billions of parameters. You may also pass `{"cache": false}` in the JSON `call` args to prevent this behavior.
+## Additional resources
 
-!!!
+- [Hugging Face datasets](https://huggingface.co/datasets)
+- [Hugging Face tasks](https://huggingface.co/tasks)