From 2c353b10fce89610c658b839e15ca39ca99c7941 Mon Sep 17 00:00:00 2001 From: Montana Low Date: Tue, 16 Jul 2024 17:35:04 -0700 Subject: [PATCH 1/4] move guides under pgml --- ...s-with-open-source-models-in-postgresml.md | 2 +- ...rom-closed-to-open-source-ai-in-minutes.md | 2 +- ...mantic-search-in-postgres-in-15-minutes.md | 2 +- pgml-cms/docs/SUMMARY.md | 26 +++++++++---------- .../pgml}/guides/chatbots/README.md | 0 .../pgml}/guides/embeddings/README.md | 2 +- .../embeddings/dimensionality-reduction.md | 0 .../embeddings/in-database-generation.md | 0 .../guides/embeddings/indexing-w-pgvector.md | 0 .../guides/embeddings/proprietary-models.md | 0 .../re-ranking-nearest-neighbors.md | 0 .../guides/embeddings/vector-aggregation.md | 0 .../guides/embeddings/vector-normalization.md | 0 .../guides/embeddings/vector-similarity.md | 0 ...ve-search-results-with-machine-learning.md | 0 .../guides/natural-language-processing.md | 0 .../pgml}/guides/opensourceai.md | 0 .../pgml}/guides/supervised-learning.md | 0 .../pgml}/guides/unified-rag.md | 2 +- .../pgml}/guides/vector-database.md | 4 +-- pgml-cms/docs/summary_draft.md | 22 ++++++++-------- .../navigation/navbar/marketing/template.html | 10 +++---- .../sections/footers/marketing_footer/mod.rs | 10 +++---- 23 files changed, 41 insertions(+), 41 deletions(-) rename pgml-cms/docs/{ => open-source/pgml}/guides/chatbots/README.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/embeddings/README.md (93%) rename pgml-cms/docs/{ => open-source/pgml}/guides/embeddings/dimensionality-reduction.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/embeddings/in-database-generation.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/embeddings/indexing-w-pgvector.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/embeddings/proprietary-models.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/embeddings/re-ranking-nearest-neighbors.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/embeddings/vector-aggregation.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/embeddings/vector-normalization.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/embeddings/vector-similarity.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/improve-search-results-with-machine-learning.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/natural-language-processing.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/opensourceai.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/supervised-learning.md (100%) rename pgml-cms/docs/{ => open-source/pgml}/guides/unified-rag.md (99%) rename pgml-cms/docs/{ => open-source/pgml}/guides/vector-database.md (97%) diff --git a/pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md b/pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md index 664569814..6171b93b9 100644 --- a/pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md +++ b/pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md @@ -120,7 +120,7 @@ LIMIT 5; ## Generating embeddings from natural language text -PostgresML provides a simple interface to generate embeddings from text in your database. You can use the [`pgml.embed`](https://postgresml.org/docs/guides/transformers/embeddings) function to generate embeddings for a column of text. The function takes a transformer name and a text value. The transformer will automatically be downloaded and cached on your connection process for reuse. You can see a list of potential good candidate models to generate embeddings on the [Massive Text Embedding Benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard). +PostgresML provides a simple interface to generate embeddings from text in your database. You can use the [`pgml.embed`](https://postgresml.org/docs/open-source/pgml/guides/transformers/embeddings) function to generate embeddings for a column of text. The function takes a transformer name and a text value. The transformer will automatically be downloaded and cached on your connection process for reuse. You can see a list of potential good candidate models to generate embeddings on the [Massive Text Embedding Benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard). Since our corpus of documents (movie reviews) are all relatively short and similar in style, we don't need a large model. [`Alibaba-NLP/gte-base-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) will be a good first attempt. The great thing about PostgresML is you can always regenerate your embeddings later to experiment with different embedding models. diff --git a/pgml-cms/blog/introducing-the-openai-switch-kit-move-from-closed-to-open-source-ai-in-minutes.md b/pgml-cms/blog/introducing-the-openai-switch-kit-move-from-closed-to-open-source-ai-in-minutes.md index 8384b6fc8..196c4fb37 100644 --- a/pgml-cms/blog/introducing-the-openai-switch-kit-move-from-closed-to-open-source-ai-in-minutes.md +++ b/pgml-cms/blog/introducing-the-openai-switch-kit-move-from-closed-to-open-source-ai-in-minutes.md @@ -210,7 +210,7 @@ We have truncated the output to two items !!! -We also have asynchronous versions of the create and `create_stream` functions relatively named `create_async` and `create_stream_async`. Checkout [our documentation](https://postgresml.org/docs/guides/opensourceai) for a complete guide of the open-source AI SDK including guides on how to specify custom models. +We also have asynchronous versions of the create and `create_stream` functions relatively named `create_async` and `create_stream_async`. Checkout [our documentation](https://postgresml.org/docs/open-source/pgml/guides/opensourceai) for a complete guide of the open-source AI SDK including guides on how to specify custom models. PostgresML is free and open source. To run the above examples yourself [create an account](https://postgresml.org/signup), install korvus, and get running! diff --git a/pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md b/pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md index e638e4b47..34cc0ae1b 100644 --- a/pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md +++ b/pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md @@ -152,7 +152,7 @@ SELECT '[1,2,3]'::vector <=> '[2,3,4]'::vector; !!! -Other distance functions have similar formulas and provide convenient operators to use as well. It may be worth testing other operators and to see which performs better for your use case. For more information on the other distance functions, take a look at our [Embeddings guide](https://postgresml.org/docs/guides/embeddings/vector-similarity). +Other distance functions have similar formulas and provide convenient operators to use as well. It may be worth testing other operators and to see which performs better for your use case. For more information on the other distance functions, take a look at our [Embeddings guide](https://postgresml.org/docs/open-source/pgml/guides/embeddings/vector-similarity). Going back to our search example, we can compute the cosine distance between our query embedding and our documents: diff --git a/pgml-cms/docs/SUMMARY.md b/pgml-cms/docs/SUMMARY.md index b29645395..01a9704c1 100644 --- a/pgml-cms/docs/SUMMARY.md +++ b/pgml-cms/docs/SUMMARY.md @@ -72,20 +72,20 @@ ## Guides -* [Embeddings](guides/embeddings/README.md) - * [In-database Generation](guides/embeddings/in-database-generation.md) - * [Dimensionality Reduction](guides/embeddings/dimensionality-reduction.md) - * [Aggregation](guides/embeddings/vector-aggregation.md) - * [Similarity](guides/embeddings/vector-similarity.md) - * [Normalization](guides/embeddings/vector-normalization.md) -* [Search](guides/improve-search-results-with-machine-learning.md) -* [Chatbots](guides/chatbots/README.md) +* [Embeddings](open-source/pgml/guides/embeddings/README.md) + * [In-database Generation](open-source/pgml/guides/embeddings/in-database-generation.md) + * [Dimensionality Reduction](open-source/pgml/guides/embeddings/dimensionality-reduction.md) + * [Aggregation](open-source/pgml/guides/embeddings/vector-aggregation.md) + * [Similarity](open-source/pgml/guides/embeddings/vector-similarity.md) + * [Normalization](open-source/pgml/guides/embeddings/vector-normalization.md) +* [Search](open-source/pgml/guides/improve-search-results-with-machine-learning.md) +* [Chatbots](open-source/pgml/guides/chatbots/README.md) * [Example Application](use-cases/chatbots.md) -* [Supervised Learning](guides/supervised-learning.md) -* [Unified RAG](guides/unified-rag.md) -* [OpenSourceAI](guides/opensourceai.md) -* [Natural Language Processing](guides/natural-language-processing.md) -* [Vector database](guides/vector-database.md) +* [Supervised Learning](open-source/pgml/guides/supervised-learning.md) +* [Unified RAG](open-source/pgml/guides/unified-rag.md) +* [OpenSourceAI](open-source/pgml/guides/opensourceai.md) +* [Natural Language Processing](open-source/pgml/guides/natural-language-processing.md) +* [Vector database](open-source/pgml/guides/vector-database.md) ## Resources diff --git a/pgml-cms/docs/guides/chatbots/README.md b/pgml-cms/docs/open-source/pgml/guides/chatbots/README.md similarity index 100% rename from pgml-cms/docs/guides/chatbots/README.md rename to pgml-cms/docs/open-source/pgml/guides/chatbots/README.md diff --git a/pgml-cms/docs/guides/embeddings/README.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/README.md similarity index 93% rename from pgml-cms/docs/guides/embeddings/README.md rename to pgml-cms/docs/open-source/pgml/guides/embeddings/README.md index 39557d79f..46b303ecd 100644 --- a/pgml-cms/docs/guides/embeddings/README.md +++ b/pgml-cms/docs/open-source/pgml/guides/embeddings/README.md @@ -39,7 +39,7 @@ Vectors can be stored in the native Postgres [`ARRAY[]`](https://www.postgresql. !!! warning -Other cloud providers claim to offer embeddings "inside the database", but [benchmarks](../../resources/benchmarks/mindsdb-vs-postgresml.md) show that they are orders of magnitude slower than PostgresML. The reason is they don't actually run inside the database with hardware acceleration. They are thin wrapper functions that make network calls to remote service providers. PostgresML is the only cloud that puts GPU hardware in the database for full acceleration, and it shows. +Other cloud providers claim to offer embeddings "inside the database", but [benchmarks](../../../../resources/benchmarks/mindsdb-vs-postgresml.md) show that they are orders of magnitude slower than PostgresML. The reason is they don't actually run inside the database with hardware acceleration. They are thin wrapper functions that make network calls to remote service providers. PostgresML is the only cloud that puts GPU hardware in the database for full acceleration, and it shows. !!! diff --git a/pgml-cms/docs/guides/embeddings/dimensionality-reduction.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/dimensionality-reduction.md similarity index 100% rename from pgml-cms/docs/guides/embeddings/dimensionality-reduction.md rename to pgml-cms/docs/open-source/pgml/guides/embeddings/dimensionality-reduction.md diff --git a/pgml-cms/docs/guides/embeddings/in-database-generation.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/in-database-generation.md similarity index 100% rename from pgml-cms/docs/guides/embeddings/in-database-generation.md rename to pgml-cms/docs/open-source/pgml/guides/embeddings/in-database-generation.md diff --git a/pgml-cms/docs/guides/embeddings/indexing-w-pgvector.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/indexing-w-pgvector.md similarity index 100% rename from pgml-cms/docs/guides/embeddings/indexing-w-pgvector.md rename to pgml-cms/docs/open-source/pgml/guides/embeddings/indexing-w-pgvector.md diff --git a/pgml-cms/docs/guides/embeddings/proprietary-models.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/proprietary-models.md similarity index 100% rename from pgml-cms/docs/guides/embeddings/proprietary-models.md rename to pgml-cms/docs/open-source/pgml/guides/embeddings/proprietary-models.md diff --git a/pgml-cms/docs/guides/embeddings/re-ranking-nearest-neighbors.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/re-ranking-nearest-neighbors.md similarity index 100% rename from pgml-cms/docs/guides/embeddings/re-ranking-nearest-neighbors.md rename to pgml-cms/docs/open-source/pgml/guides/embeddings/re-ranking-nearest-neighbors.md diff --git a/pgml-cms/docs/guides/embeddings/vector-aggregation.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/vector-aggregation.md similarity index 100% rename from pgml-cms/docs/guides/embeddings/vector-aggregation.md rename to pgml-cms/docs/open-source/pgml/guides/embeddings/vector-aggregation.md diff --git a/pgml-cms/docs/guides/embeddings/vector-normalization.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/vector-normalization.md similarity index 100% rename from pgml-cms/docs/guides/embeddings/vector-normalization.md rename to pgml-cms/docs/open-source/pgml/guides/embeddings/vector-normalization.md diff --git a/pgml-cms/docs/guides/embeddings/vector-similarity.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/vector-similarity.md similarity index 100% rename from pgml-cms/docs/guides/embeddings/vector-similarity.md rename to pgml-cms/docs/open-source/pgml/guides/embeddings/vector-similarity.md diff --git a/pgml-cms/docs/guides/improve-search-results-with-machine-learning.md b/pgml-cms/docs/open-source/pgml/guides/improve-search-results-with-machine-learning.md similarity index 100% rename from pgml-cms/docs/guides/improve-search-results-with-machine-learning.md rename to pgml-cms/docs/open-source/pgml/guides/improve-search-results-with-machine-learning.md diff --git a/pgml-cms/docs/guides/natural-language-processing.md b/pgml-cms/docs/open-source/pgml/guides/natural-language-processing.md similarity index 100% rename from pgml-cms/docs/guides/natural-language-processing.md rename to pgml-cms/docs/open-source/pgml/guides/natural-language-processing.md diff --git a/pgml-cms/docs/guides/opensourceai.md b/pgml-cms/docs/open-source/pgml/guides/opensourceai.md similarity index 100% rename from pgml-cms/docs/guides/opensourceai.md rename to pgml-cms/docs/open-source/pgml/guides/opensourceai.md diff --git a/pgml-cms/docs/guides/supervised-learning.md b/pgml-cms/docs/open-source/pgml/guides/supervised-learning.md similarity index 100% rename from pgml-cms/docs/guides/supervised-learning.md rename to pgml-cms/docs/open-source/pgml/guides/supervised-learning.md diff --git a/pgml-cms/docs/guides/unified-rag.md b/pgml-cms/docs/open-source/pgml/guides/unified-rag.md similarity index 99% rename from pgml-cms/docs/guides/unified-rag.md rename to pgml-cms/docs/open-source/pgml/guides/unified-rag.md index ee7e38941..cf37afba7 100644 --- a/pgml-cms/docs/guides/unified-rag.md +++ b/pgml-cms/docs/open-source/pgml/guides/unified-rag.md @@ -18,7 +18,7 @@ RAG has grown rapidly in popularity. It is not an esoteric practice run only by As quick reminder, the typical modern RAG workflow looks like this: -

Steps one through three prepare our RAG system, and steps four through eight are RAG itself.

+

Steps one through three prepare our RAG system, and steps four through eight are RAG itself.

## Unified RAG diff --git a/pgml-cms/docs/guides/vector-database.md b/pgml-cms/docs/open-source/pgml/guides/vector-database.md similarity index 97% rename from pgml-cms/docs/guides/vector-database.md rename to pgml-cms/docs/open-source/pgml/guides/vector-database.md index a28d88218..17700a4d2 100644 --- a/pgml-cms/docs/guides/vector-database.md +++ b/pgml-cms/docs/open-source/pgml/guides/vector-database.md @@ -18,7 +18,7 @@ Vectors can be stored in columns, just like any other data type. To add a vector #### Adding a vector column -Using the example from [Tabular data](../resources/data-storage-and-retrieval/README.md), let's add a vector column to our USA House Prices table: +Using the example from [Tabular data](../../../resources/data-storage-and-retrieval/README.md), let's add a vector column to our USA House Prices table: {% tabs %} {% tab title="SQL" %} @@ -288,4 +288,4 @@ CREATE INDEX #### Maintaining an HNSW index -HNSW requires little to no maintenance. When new vectors are added, they are automatically inserted at the optimal place in the graph. However, as the graph gets bigger, rebalancing it becomes more expensive, and inserting new rows becomes slower. We address this trade-off and how to solve this problem in [Partitioning](../resources/data-storage-and-retrieval/partitioning.md). +HNSW requires little to no maintenance. When new vectors are added, they are automatically inserted at the optimal place in the graph. However, as the graph gets bigger, rebalancing it becomes more expensive, and inserting new rows becomes slower. We address this trade-off and how to solve this problem in [Partitioning](../../../resources/data-storage-and-retrieval/partitioning.md). diff --git a/pgml-cms/docs/summary_draft.md b/pgml-cms/docs/summary_draft.md index e207aa1be..6f18c1972 100644 --- a/pgml-cms/docs/summary_draft.md +++ b/pgml-cms/docs/summary_draft.md @@ -53,21 +53,21 @@ ## Guides -* [Embeddings](guides/embeddings/README.md) - * [In-database Generation](guides/embeddings/in-database-generation.md) - * [Dimensionality Reduction](guides/embeddings/dimensionality-reduction.md) - * [Aggregation](guides/embeddings/vector-aggregation.md) - * [Similarity](guides/embeddings/vector-similarity.md) - * [Normalization](guides/embeddings/vector-normalization.md) +* [Embeddings](open-source/pgml/guides/embeddings/README.md) + * [In-database Generation](open-source/pgml/guides/embeddings/in-database-generation.md) + * [Dimensionality Reduction](open-source/pgml/guides/embeddings/dimensionality-reduction.md) + * [Aggregation](open-source/pgml/guides/embeddings/vector-aggregation.md) + * [Similarity](open-source/pgml/guides/embeddings/vector-similarity.md) + * [Normalization](open-source/pgml/guides/embeddings/vector-normalization.md) -* [Search](guides/improve-search-results-with-machine-learning.md) -* [Chatbots](guides/chatbots/README.md) +* [Search](open-source/pgml/guides/improve-search-results-with-machine-learning.md) +* [Chatbots](open-source/pgml/guides/chatbots/README.md) * [Example Application](use-cases/chatbots.md) -* [Supervised Learning](guides/supervised-learning.md) -* [OpenSourceAI](guides/opensourceai.md) -* [Natural Language Processing](guides/natural-language-processing.md) +* [Supervised Learning](open-source/pgml/guides/supervised-learning.md) +* [OpenSourceAI](open-source/pgml/guides/opensourceai.md) +* [Natural Language Processing](open-source/pgml/guides/natural-language-processing.md) + * [Developers](open-source/pgml/developers/README.md) + * [Local Docker Development](open-source/pgml/developers/quick-start-with-docker.md) + * [Installation](open-source/pgml/developers/installation.md) + * [Contributing](open-source/pgml/developers/contributing.md) + * [Distributed Training](open-source/pgml/developers/distributed-training.md) + * [GPU Support](open-source/pgml/developers/gpu-support.md) + * [Self-hosting](open-source/pgml/developers/self-hosting/README.md) + * [Pooler](open-source/pgml/developers/self-hosting/pooler.md) + * [Building from source](open-source/pgml/developers/self-hosting/building-from-source.md) + * [Replication](open-source/pgml/developers/self-hosting/replication.md) + * [Backups](open-source/pgml/developers/self-hosting/backups.md) + * [Running on EC2](open-source/pgml/developers/self-hosting/running-on-ec2.md) * [Korvus](open-source/korvus/README.md) * [API](open-source/korvus/api/README.md) * [Collections](open-source/korvus/api/collections.md) @@ -69,48 +131,24 @@ * [Enterprise](cloud/enterprise/README.md) * [Teams](cloud/enterprise/teams.md) * [VPC](cloud/enterprise/vpc.md) +* [Privacy Policy](cloud/privacy-policy.md) +* [Terms of Service](cloud/terms-of-service.md) -## Guides - -* [Embeddings](open-source/pgml/guides/embeddings/README.md) - * [In-database Generation](open-source/pgml/guides/embeddings/in-database-generation.md) - * [Dimensionality Reduction](open-source/pgml/guides/embeddings/dimensionality-reduction.md) - * [Aggregation](open-source/pgml/guides/embeddings/vector-aggregation.md) - * [Similarity](open-source/pgml/guides/embeddings/vector-similarity.md) - * [Normalization](open-source/pgml/guides/embeddings/vector-normalization.md) -* [Search](open-source/pgml/guides/improve-search-results-with-machine-learning.md) -* [Chatbots](open-source/pgml/guides/chatbots/README.md) - * [Example Application](use-cases/chatbots.md) -* [Supervised Learning](open-source/pgml/guides/supervised-learning.md) -* [Unified RAG](open-source/pgml/guides/unified-rag.md) -* [OpenSourceAI](open-source/pgml/guides/opensourceai.md) -* [Natural Language Processing](open-source/pgml/guides/natural-language-processing.md) -* [Vector database](open-source/pgml/guides/vector-database.md) - -## Resources + diff --git a/pgml-cms/docs/resources/architecture/README.md b/pgml-cms/docs/TODO/architecture/README.md similarity index 100% rename from pgml-cms/docs/resources/architecture/README.md rename to pgml-cms/docs/TODO/architecture/README.md diff --git a/pgml-cms/docs/resources/architecture/why-postgresml.md b/pgml-cms/docs/TODO/architecture/why-postgresml.md similarity index 100% rename from pgml-cms/docs/resources/architecture/why-postgresml.md rename to pgml-cms/docs/TODO/architecture/why-postgresml.md diff --git a/pgml-cms/docs/use-cases/chatbots.md b/pgml-cms/docs/TODO/chatbots.md similarity index 100% rename from pgml-cms/docs/use-cases/chatbots.md rename to pgml-cms/docs/TODO/chatbots.md diff --git a/pgml-cms/docs/resources/benchmarks/ggml-quantized-llm-support-for-huggingface-transformers.md b/pgml-cms/docs/TODO/ggml-quantized-llm-support-for-huggingface-transformers.md similarity index 100% rename from pgml-cms/docs/resources/benchmarks/ggml-quantized-llm-support-for-huggingface-transformers.md rename to pgml-cms/docs/TODO/ggml-quantized-llm-support-for-huggingface-transformers.md diff --git a/pgml-cms/docs/cloud/privacy-policy.md b/pgml-cms/docs/cloud/privacy-policy.md new file mode 100644 index 000000000..82e718522 --- /dev/null +++ b/pgml-cms/docs/cloud/privacy-policy.md @@ -0,0 +1,132 @@ +# Privacy Policy + +Effective Date: 7/16/2024 + +This privacy policy (“Policy”) describes how Hyperparam Inc. (“Company”, “PostgresML”, “we”, “us”) collects, uses, and shares personal information of consumer users of this website, https://postgresml.org (the “Site”), as well as associated products and services (together, the “Services”), and applies to personal information that we collect through the Site and our Services as well as personal information you provide to us directly. This Policy also applies to any of our other websites that post this Policy. Please note that by using the Site or the Services, you accept the practices and policies described in this Policy and you consent that we will collect, use, and share your personal information as described below. If you do not agree to this Policy, please do not use the Site or the Services. + +## Personal Information We Collect + +We collect personal information about you in a number of different ways: +**Personal Information Collected From You.** When you use the Site or our Services, we collect personal information that you provide to us, which may include the following categories of personal information depending on how you use the Site or our Services and communicate with us: +- **General identifiers**, such as your full name, home or work address, zip code, telephone number, email address, job title and organizational affiliation. +- **Online identifiers**, such as your username and passwords for any of our Sites, or information we automatically collect through cookies and similar technologies used on our websites. +- **Commercial information**, such as your billing and payment history, and any records of personal property that we collect in connection with providing our Services to you. We also collect information about your preferences regarding marketing communications. +- **Protected classification characteristics**, such as any information that you choose to provide to us or that we collect in connection with providing our Services to you, including age, race, color, ancestry, national origin, citizenship, religion or creed, marital status, medical condition, physical or mental disability, sex, sexual orientation, veteran or military status or genetic information. +- **Audio, electronic, and visual information** that we collect in connection with providing our Services to you, such as video or audio recordings of conversations made with your consent. +- **Professional or employment-related information** that we collect in connection with providing our Services to you, such as your job title, employer information and work history. +- **Other information you provide to us**. + +**Personal Information We Get From Others.** We may collect personal information about you from other sources. We may add this to information we collect from the Site and through our Services. + +**Information We Collect Automatically.** We automatically log information about you and your computer, phone, tablet, or other devices you use to access the Site and the Services. For example, when visiting our Site or using the Services, we may log your computer or device identification, operating system type, browser type, screen resolution, browser language, internet protocol (IP) address, unique identifier, general location such as city, state or geographic area, the website you visited before browsing to our Site, pages you viewed, how long you spent on a page, access times and information about your use of and actions on our Site or Services. How much of this information we collect depends on the type and settings of the device you use to access the Site and Services. + +**Cookies.** We may log information using “cookies.” Cookies are small data files stored on your hard drive by a website. We may use both session Cookies (which expire once you close your web browser) and persistent Cookies (which stay on your computer until you delete them) to provide you with a more personal and interactive experience on our Site. Other similar tools we may use to collect information by automated means include web server logs, web beacons and pixel tags. This type of information is collected to make the Site and Services more useful to you and to tailor the experience with us to meet your interests and needs. + +**Google Analytics.** We may use Google Analytics to help analyze how users use the Site. Google Analytics uses Cookies to collect information such as how often users visit the Site, what pages they visit, and what other sites they used prior to coming to the Site. We use the information we get from Google Analytics only to improve our Site and the Services. Although Google Analytics plants a persistent Cookie on your web browser to identify you as a unique user the next time you visit the Site, the Cookie cannot be used by anyone but Google. Google’s ability to use and share information collected by Google Analytics about your visits to the Site is restricted by the Google Analytics Terms of Use and the Google Privacy Policy. + +**Session Replay Technology.** We use session replay technology, such as Hotjar, Inc., to collect information regarding visitor behavior on the Site and the Services. Hotjar is a full-session replay product that helps us see clearly what actions our Site visitors take and where they might get stuck or confused. Hotjar’s service allows us to record and replay an individual’s interaction with the Site and the Services. This helps us to understand our customer’s experience, where they might get stuck, and how we can improve the Site and the Services. You can review Hotjar’s privacy policy by visiting https://www.hotjar.com/legal/policies/privacy/. + +**Additional Information.** If you choose to interact on the Site or through the Services (such as by registering; using our Services; entering into agreements with us; or requesting information from us), we will collect the personal information that you provide. We may collect personal information about you that you provide through telephone, email, or other communications. If you provide us with personal information regarding another individual, please do not do so unless you have that person’s consent to give us their personal information. + +## How We Use Your Personal Information + +Generally, we may use your personal information in the following ways and as otherwise described in this Privacy Policy or to you at the time we collect the personal information from you: + +**To Provide the Services and Personalize Your Experience.** We use personal information about you to provide the Services to you, including: + +- To help establish and verify your identity; +- For the purposes for which you specifically provided it to us, including, without limitation, to enable us to process and fulfill your requests or provide the Services to you; +- To provide you with effective customer service; +- To provide you with a personalized experience when you use the Site or the Services or by delivering relevant Site or Services content; +- To send you information about your relationship or transactions with us; +- To otherwise contact you with information that we believe will be of interest to you, including marketing and promotional communications; and +- To enhance or develop features, products or services. + +**Research and development.** We may use your personal information for research and development purposes, including to analyze and improve the Services, our Sites and our business. As part of these activities, we may create aggregated, de-identified or other anonymous data from personal information we collect. We make personal information into anonymous data by removing information that makes the data personally identifiable to you. We may use this anonymous data and share it with third-parties for our lawful business purposes. + +**Marketing.** We may use your personal information in connection with sending you marketing communications as permitted by law, including by mail and email. You may opt-out of marketing communications by following the unsubscribe instructions at the bottom of our marketing communications, emailing us at contact@postgresml.org. + +**Compliance and protection.** We may use any of the categories of personal information described above to: + +- Comply with applicable laws, lawful requests, and legal process, such as to respond to subpoenas or requests from government authorities. +- Protect our, your and others’ rights, privacy, safety and property (including by making and defending legal claims). +- Audit our internal processes for compliance with legal and contractual requirements and internal policies. +- Enforce the terms and conditions that govern the Site and our Services. +- Prevent, identify, investigate and deter fraudulent, harmful, unauthorized, unethical or illegal activity, including cyberattacks and identity theft. + +We may also use your personal information for other purposes consistent with this Privacy Policy or that are explained to you at the time of collection of your personal information. + +## How We Share Your Personal Information + +We may disclose all categories of personal information described above with the following categories of third parties: + +**Affiliates.** We may share your personal information with our affiliates, for purposes consistent with this notice or that operate shared infrastructure, systems and technology. + +**Third Party Service Providers.** We may provide your personal information to third party service providers that help us provide you with the Services that we offer through the Site or otherwise, and to operate our business. + +**Professional Advisors.** We may provide your personal information to our lawyers, accountants, bankers and other outside professional advisors in the course of the services they provide to us. + +**Corporate Restructuring.** We may share some or all of your personal information in connection with or during negotiation of any merger, financing, acquisition or dissolution, transaction or proceeding involving the sale, transfer, divestiture, or disclosure of all or a portion of our business or assets. In the event of an insolvency, bankruptcy, or receivership, personal information may also be transferred as a business asset. If another company acquires PostgresML, our business, or assets, that company will possess the personal information collected by us and will assume the rights and obligations regarding your personal information described in this Privacy Policy. + +**Other Disclosures.** PostgresML may disclose your personal information if it believes in good faith that such disclosure is necessary for any of the following: + +- In connection with a legal investigation; +- To comply with relevant laws or to respond to subpoenas or warrants served on PostgresML; +- To protect or defend the rights or property of PostgresML or users of the Site or Services; and/or +- To investigate or assist in preventing any violation or potential violation of the law, this Privacy Policy, or our terms of service/terms of use. + +We may also share personal information with other categories of third parties with your consent or as described to you at the time of collection of your personal information. + +**Third Party Websites.** Our Site or the Services may contain links to third party websites or services. When you click on a link to any other website or location, you will leave our Site or the Services and go to another site and another entity may collect your personal information from you. We have no control over, do not review, and cannot be responsible for these outside websites or their content, or any collection of your personal information after you click on links to such outside websites. The links to third party websites or locations are for your convenience and do not signify our endorsement of such third parties or their products, content, websites or privacy practices. + +## Your Choices Regarding Your Personal Information + +You have several choices regarding the use of your personal information on the Site and our Services: + +**Email Communications.** We may periodically send you free newsletters and e-mails that directly promote the use of our Site or the Services. When you receive newsletters or promotional communications from us, you may indicate a preference to stop receiving further communications from us and you will have the opportunity to “opt-out” by following the unsubscribe instructions provided in the e-mail you receive or by contacting us directly (please see contact information below). Despite your indicated e-mail preferences, we may send you Services-related communications, including notices of any updates to our Privacy Policy or terms of service/terms of use. + +**Cookies.** If you decide at any time that you no longer wish to accept cookies from our Site for any of the purposes described above, then you can instruct your browser, by changing its settings, to stop accepting cookies or to prompt you before accepting a cookie from the websites you visit. Consult your browser’s technical information. If you do not accept cookies, however, you may not be able to use all portions of the Site or all functionality of the Services. If you have any questions about how to disable or modify cookies, visit https://www.allaboutcookies.org/. + +**Session Replay Technology.** If you decide that you do not wish to participate in Hotjar’s session replay technology, you can opt out of Hotjar’s collection and processing of data generated by your use of the Site and the Services by visiting https://www.hotjar.com/policies/do-not-track/. + +## Security Of Your Personal Information + +PostgresML is committed to protecting the security of your personal information. We use a variety of security technologies and procedures to help protect your personal information from unauthorized access, use, or disclosure. No method of transmission over the internet, or method of electronic storage, is 100% secure, however. Therefore, while PostgresML uses reasonable efforts to protect your personal information, we cannot guarantee its absolute security. + +## International Users + +Please note that our Site and the Services are hosted in the United States. If you use our Site or our Services from outside the United States, please be aware that your personal information may be transferred to, stored, and processed in the United States or other countries where our servers are located and our central database is operated. The data protection and privacy laws of the United States may differ from the laws in your country. By using our Site or our Services, you consent to the transfer of your personal information to the United States or other countries as described in this Privacy Policy. + +## Children + +Our Site and the Services are not intended for children under 18 years of age, and you must be at least 18 years old to have our permission to use the Site or the Services. We do not knowingly collect, use, or disclose personally identifiable information from children under 13. If you believe that we have collected, used, or disclosed personally identifiable information of a child under the age of 13, please contact us using the contact information below so that we can take appropriate action. + +## Do Not Track + +We currently do not support the Do Not Track browser setting or respond to Do Not Track signals. Do Not Track (or DNT) is a preference you can set in your browser to let the websites you visit know that you do not want them collecting certain information about you. For more details about Do Not Track, including how to enable or disable this preference, visit http://www.allaboutdnt.com. + +## Updates To This Privacy Policy + +We reserve the right to change this Privacy Policy at any time. If we make any material changes to this Privacy Policy, we will post the revised version to our website and update the “Effective Date” at the top of this Privacy Policy. Except as otherwise indicated, any changes will become effective when we post the revised Privacy Policy on our website. + +## California Consumer Privacy Act (CCPA) + +If you are a California resident, you have the right to request that we disclose certain information about our collection and use of your personal information over the past 12 months. You also have the right to request that we delete any personal information that we have collected from you, subject to certain exceptions. To make such requests, please contact us using the contact information provided below. + +We will not discriminate against you for exercising any of your CCPA rights, such as by denying you goods or services, charging you a different price, or providing you with a different level or quality of goods or services. For purposes of compliance with the CCPA, in the preceding 12 months, we have not sold any personal information. We do not sell personal information without affirmative authorization. + +## General Data Protection Regulation (GDPR) + +If you are a resident of the European Economic Area (EEA), you have certain rights under the General Data Protection Regulation (GDPR) regarding the collection, use, and retention of your personal data (which, as defined in the GDPR, means any information related to an identified or identifiable natural person). + +You have the right to access, correct, update, or delete any personal data we hold about you. You may also have the right to restrict or object to our processing of your personal data or to request that we provide a copy of your personal data to you or another controller. To exercise any of these rights, please contact us using the contact information provided below. You also have the right to lodge a complaint with a supervisory authority if you believe that our processing of your personal data violates applicable law. + +We may collect, use, and retain your personal data for the purposes of providing the Services to you and for other legitimate business purposes. Your personal data may be transferred to and stored in the United States or other countries outside the EEA. When we transfer your personal data outside the EEA, we will take appropriate steps to ensure that your personal data receives the same level of protection as it would in the EEA, including by entering into appropriate data transfer agreements. + +Our legal basis for collecting and processing your personal data is typically based on your consent or our legitimate business interests. In certain cases, we may also have a legal obligation to collect and process your personal data or may need to do so to perform services for you. + +If you have any questions or concerns about our privacy practices, please contact us using the contact information provided below. + +## Contact Us + +Our contact information is as follows: contact@postgresml.org diff --git a/pgml-cms/docs/cloud/terms-of-service.md b/pgml-cms/docs/cloud/terms-of-service.md new file mode 100644 index 000000000..93a83d750 --- /dev/null +++ b/pgml-cms/docs/cloud/terms-of-service.md @@ -0,0 +1,160 @@ +# Terms of Service + +Last Updated: 7/16/2024 + +## Introduction + +Welcome to PostgresML! Your use of PostgresML’s services, including the services PostgresML makes available through this website and applications which link to these terms of service (the “Site”) and to all software or services offered by PostgresML in connection with any of those (the “Services”), is governed by these terms of service (the “Terms”), so please carefully read them before using the Services. For the purposes of these Terms, “we,” “our,” “us,” and “PostgresML” refer to Hyperparam Inc., the providers and operators of the Services. + +In order to use the Services, you must first agree to these Terms. If you are registering for or using the Services on behalf of an organization, you are agreeing to these Terms for that organization and promising that you have the authority to bind that organization to these Terms. In that case, “you” and “Customer” will also refer to that organization, wherever possible. + +You agree your purchases and/or use of the Services are not contingent on the delivery of any future functionality or features or dependent on any oral or written public comments made by PostgresML or any of its affiliates regarding future functionality or features. + +If you have entered into a separate written agreement with PostgresML for use of the Services, the terms and conditions of such other agreement shall prevail over any conflicting terms or conditions in these Terms with respect to the Services specified in such agreement. + +Arbitration notice: except for certain types of disputes described in the arbitration clause below, you agree that disputes between you and PostgresML will be resolved by mandatory binding arbitration and you waive any right to participate in a class-action lawsuit or class-wide arbitration. + +By using, downloading, installing, or otherwise accessing the services or any materials included in or with the services, you hereby agree to be bound by these terms. If you do not accept these terms, then you may not use, download, install, or otherwise access the services. + +Certain features of the services or site may be subject to additional guidelines, terms, or rules, which will be posted on the service or site in connection with such features. To the extent such terms, guidelines, and rules conflict with these terms, such terms shall govern solely with respect to such features. In all other situations, these terms shall govern. + +## Your Account + +In the course of registering for or using the Services, you may be required to provide PostgresML with certain information, including your name, contact information, username and password (“Credentials”). PostgresML handles such information with the utmost attention, care and security. Nonetheless, you, not PostgresML, shall be responsible for maintaining and protecting your Credentials in connection with the Services. If your contact information or other information relating to your account changes, you must notify PostgresML promptly and keep such information current. You are solely responsible for any activity using your Credentials, whether or not you authorized that activity. You should immediately notify PostgresML of any unauthorized use of your Credentials or if your email or password has been hacked or stolen. If you discover that someone is using your Credentials without your consent, or you discover any other breach of security, you agree to notify PostgresML immediately. + +## Content + +A variety of information, reviews, recommendations, messages, comments, posts, text, graphics, software, photographs, videos, data, and other materials (“Content”) may be made available through the Services by PostgresML or its suppliers (“PostgresML-Supplied Content”). While PostgresML strives to keep the Content that it provides through the Services accurate, complete, and up-to-date, PostgresML cannot guarantee, and is not responsible for the accuracy, completeness, or timeliness of any PostgresML-Supplied Content. + +You acknowledge that you may also be able to create, transmit, publish or display information (such as data files, written text, computer software, music, audio files or other sounds, photographs, videos or other images) through use of the Services. All such information is referred to below as “User Content.” + +You agree that you are solely responsible for (and that PostgresML has no responsibility to you or to any third party for) any User Content, and for the consequences of your actions (including any loss or damage which PostgresML may suffer) in connection with such User Content. If you are registering for these Services on behalf of an organization, you also agree that you are also responsible for the actions of associated Users and for any User Content that such associated Users might upload, record, publish, post, link to, or otherwise transmit or distribute through use of the Services. Furthermore, you acknowledge that PostgresML does not control or actively monitor Content uploaded by users and, as such, does not guarantee the accuracy, integrity or quality of such Content. You acknowledge that by using the Services, you may be exposed to materials that are offensive, indecent or objectionable. Under no circumstances will PostgresML be liable in any way for any such Content. + +PostgresML may refuse to store, provide, or otherwise maintain your User Content for any or no reason. PostgresML may remove your User Content from the Services at any time if you violate these Terms or if the Services are canceled or suspended. If User Content is stored using the Services with an expiration date, PostgresML may also delete the User Content as of that date. User Content that is deleted may be irretrievable. You agree that PostgresML has no responsibility or liability for the deletion or failure to store any User Content or other communications maintained or transmitted through use of the Services. + +PostgresML reserves the right (but shall have no obligation) to monitor and remove User Content from the Services, in its discretion. You agree to immediately take down any Content that violates these Terms, including pursuant to a takedown request from PostgresML. PostgresML also reserves the right to directly take down such Content. + +By submitting, posting or otherwise uploading User Content on or through the Services you give PostgresML a worldwide, nonexclusive, perpetual, fully sub-licensable, royalty-free right and license as set below: + +with respect to User Content that you submit, post or otherwise make publicly or generally available via the Services (e.g. public forum posts), the license to use, reproduce, modify, adapt, publish, translate, create derivative works from, distribute , publicly perform, and publicly display such User Content (in whole or part) worldwide via the Services or otherwise, and/or to incorporate it in other works in any form, media, or technology now known or later developed for any legal business purpose; and + +with respect to User Content that you submit, post or otherwise transmit privately via the Services, the license to use, reproduce, modify, adapt, publish, translate, create derivative works from, distribute, publicly perform and publicly display such User Content for the purpose of enabling PostgresML to provide you with the Services, and for the limited purposes stated in our Privacy Policy. + +Notwithstanding anything to the contrary in these Terms, PostgresML may monitor Customer's use of the Services and collect and compile Aggregated Data. As between PostgresML and you, all right, title, and interest in Aggregated Data, and all intellectual property rights therein, belong to and are retained solely by PostgresML. You acknowledge that PostgresML may compile Aggregated Data based on User Content input into the Services. Customer agrees that PostgresML may (i) make Aggregated Data available to third parties including its other customers in compliance with applicable law, and (ii) use Aggregated Data to the extent and in the manner permitted under applicable law. As used herein, “Aggregated Data” means data and information related to or derived from User Content or your use of the Services that is used by PostgresML in an aggregate and anonymized manner, including to compile statistical and performance information related to the Services. + +## Proprietary Rights + +You acknowledge and agree that PostgresML (and/or PostgresML’s licensors) own all legal right, title and interest in and to the Services and PostgresML-Supplied Content and that the Services and PostgresML-Supplied Content are protected by copyrights, trademarks, patents, or other proprietary rights and laws (whether those rights happen to be registered or not, and wherever in the world those rights may exist). + +Except as provided in Section 3, PostgresML acknowledges and agrees that it obtains no right, title or interest from you (or your licensors) under these Terms in or to any Content that you create, upload, submit, post, transmit, share or display on, or through, the Services, including any intellectual property rights which subsist in that Content (whether those rights happen to be registered or not, and wherever in the world those rights may exist). Unless you have agreed otherwise in writing with PostgresML, you agree that you are responsible for protecting and enforcing those rights and that PostgresML has no obligation to do so on your behalf. + + +## License from PostgresML and Restrictions on Use + +PostgresML gives you a personal, worldwide, royalty-free, non-assignable and non-exclusive license to use the Site and Services for the sole purpose of to allow you to access the Services for your non-commercial or internal business purposes, in the manner permitted by these Terms. + +You may not (and you may not permit anyone else to): (i) copy, modify, create a derivative work of, reverse engineer, decompile or otherwise attempt to extract the source code of the Services or any part thereof, unless this is expressly permitted or required by law, or unless you have been specifically told that you may do so by PostgresML, in writing (e.g., through an open source software license); or (ii) attempt to disable or circumvent any security mechanisms used by the Services or any applications running on the Services. + +You may not engage in any activity that interferes with or disrupts the Services (or the servers and networks which are connected to the Services). + +You may not rent, lease, provide access to or sublicense any elements of the Services to a third party or use the Services on behalf of or to provide services to third parties. + +You may not access the Services in a manner intended to avoid incurring fees or exceeding usage limits or quotas. + +You may not access the Services for the purpose of bringing an intellectual property infringement claim against PostgresML or for the purpose of creating a product or service competitive with the Services. You may not use any robot, spider, site search/retrieval application or other manual or automatic program or device to retrieve, index, “scrape,” “data mine” or in any way gather Content from the Services. + +You agree that you will not upload, record, publish, post, link to, transmit or distribute User Content, or otherwise utilize the Services in a manner that: (i) advocates, promotes, incites, instructs, informs, assists or otherwise encourages violence or any illegal activities; (ii) infringes or violates the copyright, patent, trademark, service mark, trade name, trade secret, or other intellectual property rights of any third party or PostgresML, or any rights of publicity or privacy of any party; (iii) attempts to mislead others about your identity or the origin of a message or other communication, or impersonates or otherwise misrepresents your affiliation with any other person or entity, or is otherwise materially false, misleading, or inaccurate; (iv) promotes, solicits or comprises inappropriate, harassing, abusive, profane, hateful, defamatory, libelous, threatening, obscene, indecent, vulgar, pornographic or otherwise objectionable or unlawful content or activity; (v) is harmful to minors; (vi) utilizes or contains any viruses, Trojan horses, worms, time bombs, or any other similar software, data, or programs that may damage, detrimentally interfere with, surreptitiously intercept, or expropriate any system, data, personal information, or property of another; or (vii) violates any law, statute, ordinance, or regulation (including without limitation the laws and regulations governing export control, unfair competition, anti-discrimination, or false advertising). + +You may not use the Services if you are a person barred from receiving the Services under the laws of the United States or other countries, including the country in which you are resident or from which you use the Services. You affirm that you are over the age of 13, as the Services are not intended for children under 13. + +Customer is responsible and liable for all uses of the Services and Documentation resulting from access provided by Customer, directly or indirectly, whether such access or use is permitted by or in violation of these Terms. Without limiting the generality of the foregoing, Customer is responsible for all acts and omissions of authorized users, and any act or omission by an authorized user that would constitute a breach of these Terms if taken by Customer will be deemed a breach of these Terms by Customer. Customer shall use reasonable efforts to make all authorized users aware of these Terms's provisions as applicable to such authorized users’ use of the Services and shall cause authorized users to comply with such provisions. + +PostgresML may from time to time make third-party products available to Customer or PostgresML may allow for certain third-party products to be integrated with the Services to allow for the transmission of User Content from such third-party products into the services. For purposes of these Terms, such third-party products are subject to their own terms and conditions. If Customer does not agree to abide by the applicable terms for any such third-party products, then Customer should not install or use such third-party products. By authorizing PostgresML to transmit User Content from third-party products into the services, Customer represents and warrants to PostgresML that it has all right, power, and authority to provide such authorization. + +Customer has and will retain sole responsibility for: (i) all User Content, including its content and use; (ii) all information, instructions, and materials provided by or on behalf of Customer or any authorized user in connection with the Services; (iii) Customer's information technology infrastructure, including computers, software, databases, electronic systems (including database management systems), and networks, whether operated directly by Customer or through the use of third-party services ("Customer Systems"); (iv) the security and use of Customer's and its authorized users' access credentials; and (v) all access to and use of the Services directly or indirectly by or through the Customer Systems or its or its authorized users' access credentials, with or without Customer's knowledge or consent, including all results obtained from, and all conclusions, decisions, and actions based on, such access or use. + +## Pricing Terms + +Subject to the Terms, the Services are provided to you without charge up to certain usage limits, and usage in excess of these limits may require purchase of additional resources and the payment of fees. Please see the [pricing](/pricing) terms for details regarding pricing for the Services. + +## Privacy Policies + +These Services are provided in accordance with our [Privacy Policy](/docs/cloud/privacy-policy). You agree to the use of your User Content and personal information in accordance with these Terms and PostgresML’s Privacy Policy. + +You agree to protect the privacy and legal rights of your End Users. If your End Users provide you with user names, passwords, or other login information or personal information, you agree make such End Users aware that such information may be made available to PostgresML and to refer such End Users to our Privacy Policy linked above. + +Notwithstanding anything to the contrary, in the event you use the Services as an organization, you agree to permit PostgresML to identify you as a customer and to use your name and/or logo in PostgresML’s website and marketing materials. + +## Modification and Termination of Services + +PostgresML is constantly innovating in order to provide the best possible experience for its users. You acknowledge and agree that the form and nature of the Services which PostgresML provides may change from time to time without prior notice to you, subject to the terms in its Privacy Policy. Changes to the form and nature of the Services will be effective with respect to all versions of the Services; examples of changes to the form and nature of the Services include without limitation changes to fee and payment policies, security patches, added functionality, automatic updates, and other enhancements. Any new features that may be added to the website or the Services from time to time will be subject to these Terms, unless stated otherwise. + +You may terminate these Terms at any time by canceling your account on the Services, subject to any terms and conditions in connection with termination contained in the separate written agreement between you and PostgresML. + +You agree that PostgresML, in its sole discretion and for any or no reason, may terminate your account or any part thereof. You agree that any termination of your access to the Services may be without prior notice, and you agree that PostgresML will not be liable to you or any third party for such termination. + +You are solely responsible for exporting your User Content from the Services prior to termination of your account for any reason, provided that if we terminate your account for our convenience, we will endeavor to provide you a reasonable opportunity to retrieve your User Content. + +Upon any termination of the Services or your account these Terms will also terminate, but all provisions of these Terms which, by their nature, should survive termination, shall survive termination, including, without limitation, ownership provisions, warranty disclaimers, and limitations of liability. + +## Changes to the Terms + +These Terms may be amended or updated from time to time without notice and may have changed since your last visit to the website or use of the Services. It is your responsibility to review these Terms for any changes. By continuing to access or use the Services after revisions become effective, you agree to be bound by the revised Terms. If you do not agree to the new Terms, please stop using the Services. Please visit this page regularly to review these Terms for any changes. + +## Disclaimer of Warranty + +YOU EXPRESSLY UNDERSTAND AND AGREE THAT YOUR USE OF THE SERVICES ARE AT YOUR SOLE RISK AND THAT THE SERVICES ARE PROVIDED “AS IS” AND “AS AVAILABLE.” + +POSTGRESML, ITS SUBSIDIARIES AND AFFILIATES, AND ITS LICENSORS MAKE NO EXPRESS WARRANTIES AND DISCLAIM ALL IMPLIED WARRANTIES REGARDING THE SERVICES, INCLUDING IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. WITHOUT LIMITING THE GENERALITY OF THE FOREGOING, POSTGRESML, ITS SUBSIDIARIES AND AFFILIATES, AND ITS LICENSORS DO NOT REPRESENT OR WARRANT TO YOU THAT: (A) YOUR USE OF THE SERVICES WILL MEET YOUR REQUIREMENTS, (B) YOUR USE OF THE SERVICES WILL BE UNINTERRUPTED, TIMELY, SECURE OR FREE FROM ERROR, AND (C) USAGE DATA PROVIDED THROUGH THE SERVICES WILL BE ACCURATE. + +NOTHING IN THESE TERMS, INCLUDING SECTIONS 10 AND 11, SHALL EXCLUDE OR LIMIT POSTGRESML’S WARRANTY OR LIABILITY FOR LOSSES WHICH MAY NOT BE LAWFULLY EXCLUDED OR LIMITED BY APPLICABLE LAW. + +## Limitation of Liability + +SUBJECT TO SECTION 10 ABOVE, YOU EXPRESSLY UNDERSTAND AND AGREE THAT POSTGRESML, ITS SUBSIDIARIES AND AFFILIATES, AND ITS LICENSORS SHALL NOT BE LIABLE TO YOU FOR ANY INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL, OR EXEMPLARY DAMAGES WHICH MAY BE INCURRED BY YOU, HOWEVER CAUSED AND UNDER ANY THEORY OF LIABILITY. THIS SHALL INCLUDE, BUT NOT BE LIMITED TO, ANY LOSS OF PROFIT (WHETHER INCURRED DIRECTLY OR INDIRECTLY), ANY LOSS OF GOODWILL OR BUSINESS REPUTATION, ANY LOSS OF DATA SUFFERED, COST OF PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, OR OTHER INTANGIBLE LOSS. THESE LIMITATIONS SHALL APPLY NOTWITHSTANDING THE FAILURE OF ESSENTIAL PURPOSE OF ANY LIMITED REMEDY. + +THE LIMITATIONS ON POSTGRESML’S LIABILITY TO YOU IN THIS SECTION SHALL APPLY WHETHER OR NOT POSTGRESML HAS BEEN ADVISED OF OR SHOULD HAVE BEEN AWARE OF THE POSSIBILITY OF ANY SUCH LOSSES ARISING. + +SOME STATES AND JURISDICTIONS MAY NOT ALLOW THE LIMITATION OR EXCLUSION OF LIABILITY FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THE ABOVE LIMITATION OR EXCLUSION MAY NOT APPLY TO YOU. IN NO EVENT SHALL POSTGRESML’S TOTAL LIABILITY TO YOU FOR ALL DAMAGES, LOSSES, AND CAUSES OF ACTION (WHETHER IN CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE) EXCEED THE AMOUNT THAT YOU HAVE ACTUALLY PAID FOR THE SERVICES IN THE PAST TWELVE MONTHS, OR ONE HUNDRED DOLLARS ($100.00), WHICHEVER IS GREATER. + +## Indemnification + +You agree to hold harmless and indemnify PostgresML, and its subsidiaries, affiliates, officers, agents, employees, advertisers, licensors, suppliers or partners (collectively “PostgresML and Partners”) from and against any third party claim arising from or in any way related to (a) your breach of the Terms, (b) your use of the Services, (c) your violation of applicable laws, rules or regulations in connection with the Services, or (d) your User Content, including any liability or expense arising from all claims, losses, damages (actual and consequential), suits, judgments, litigation costs and attorneys’ fees, of every kind and nature. + +## Third-Party Content and Materials + +You may be able to access or use third party websites, resources, content, communications or information (“Third Party Materials”) via the Services. You acknowledge sole responsibility for and assume all risk arising from your access to, reliance upon or use of any such Third Party Materials and PostgresML disclaims any liability that you may incur arising from access to, reliance upon or use of such Third Party Materials via the Services. + +You acknowledge and agree that PostgresML: (a) is not responsible for the availability or accuracy of such Third Party Materials; (b) has no liability to you or any third party for any harm, injuries or losses suffered as a result of your access to, reliance upon or use of such Third Party Materials; and (c) does not make any promises to remove Third Party Materials from being accessed through the Services. + +## Third Party Software + +The Services may incorporate certain third party software (“Third Party Software”), which is licensed subject to the terms and conditions of the third party licensing such Third Party Software. Nothing in these Terms limits your rights under, or grants you rights that supersede, the terms and conditions of any applicable license for such Third Party Software. + +## Feedback + +You may choose to or we may invite you to submit comments or ideas about the Services, including without limitation about how to improve the Services or our products. By submitting any feedback, you agree that your disclosure is gratuitous, unsolicited and without restriction and will not place PostgresML under any fiduciary or other obligation, and that we are free to use such feedback without any additional compensation to you, and/or to disclose such feedback on a non-confidential basis or otherwise to anyone. Further, you warrant that your feedback is not subject to any license terms that would purport to require us to comply with any additional obligations with respect to any products or services that incorporate any of your feedback. + +## Disputes + +**Please read the following section carefully because it requires you to arbitrate certain disputes and claims with PostgresML and limits the manner in which you can seek relief from us.** + +These Terms and any action related thereto will be governed by the laws of the State of California without regard to its conflict of laws provisions. Except for small claims disputes in which you or PostgresML seek to bring an individual action in small claims court located in the county of your billing address or claims for injunctive relief by either party, any dispute or controversy arising out of, in relation to, or in connection with these Terms or your use of the Services shall be finally settled by binding arbitration in San Francisco County, California under the Federal Arbitration Act (9 U.S.C. §§ 1-307) and the then current rules of JAMS (formerly known as Judicial Arbitration & Mediation Services) by one (1) arbitrator appointed in accordance with such rules. Where arbitration is not required by these Terms, the exclusive jurisdiction and venue of any action with respect to the subject matter of these Terms will be the state and federal courts located in San Francisco County, California, and each of the parties hereto waives any objection to jurisdiction and venue in such courts. ANY DISPUTE RESOLUTION PROCEEDING ARISING OUT OF OR RELATED TO THESE TERMS OR THE SALES TRANSACTIONS BETWEEN YOU AND POSTGRESML, WHETHER IN ARBITRATION OR OTHERWISE, SHALL BE CONDUCTED ONLY ON AN INDIVIDUAL BASIS AND NOT IN A CLASS, CONSOLIDATED OR REPRESENTATIVE ACTION, AND YOU EXPRESSLY AGREE THAT CLASS ACTION AND REPRESENTATIVE ACTION PROCEDURES SHALL NOT BE ASSERTED IN NOR APPLY TO ANY ARBITRATION PURSUANT TO THESE TERMS AND CONDITIONS. YOU ALSO AGREE NOT TO BRING ANY LEGAL ACTION, BASED UPON ANY LEGAL THEORY INCLUDING CONTRACT, TORT, EQUITY OR OTHERWISE, AGAINST POSTGRESML THAT IS MORE THAN ONE YEAR AFTER THE DATE OF THE APPLICABLE ORDER. + +You have the right to opt out of binding arbitration within 30 days of the date you first accepted the terms of this Section by emailing us at contact@postgresml.org. In order to be effective, the opt out notice must include your full name and clearly indicate your intent to opt out of binding arbitration. + +## Miscellaneous + +These Terms, together with our Privacy Policy, constitutes the entire agreement between the parties relating to the Services and all related activities. These Terms shall not be modified except in writing signed by both parties or by a new posting of these Terms issued by us. If any part of these Terms is held to be unlawful, void, or unenforceable, that part shall be deemed severed and shall not affect the validity and enforceability of the remaining provisions. The failure of PostgresML to exercise or enforce any right or provision under these Terms shall not constitute a waiver of such right or provision. Any waiver of any right or provision by PostgresML must be in writing and shall only apply to the specific instance identified in such writing. You may not assign these Terms, or any rights or licenses granted hereunder, whether voluntarily, by operation of law, or otherwise without our prior written consent. + +You must be over 13 years of age to use the Services, and children under the age of 13 cannot use or register for the Services. If you are over 13 years of age but are not yet of legal age to form a binding contract (in many jurisdictions, this age is 18), then you must get your parent or guardian to read these Terms and agree to them for you before you use the Services. If you are a parent or guardian and you provide your consent to your child's registration with the Services, you agree to be bound by these Terms with respect of your child’s use of the Services. + + +## Contact Us + +If you have any questions about these Terms or if you wish to make any complaint or claim with respect to the Services, please contact us at: contact@postgresml.org. + +When submitting a complaint, please provide a brief description of the nature of your complaint and the specific services to which your complaint relates. + + + diff --git a/pgml-cms/docs/resources/faqs.md b/pgml-cms/docs/introduction/faq.md similarity index 99% rename from pgml-cms/docs/resources/faqs.md rename to pgml-cms/docs/introduction/faq.md index 2d8ede8c6..c1feae5af 100644 --- a/pgml-cms/docs/resources/faqs.md +++ b/pgml-cms/docs/introduction/faq.md @@ -2,7 +2,7 @@ description: PostgresML Frequently Asked Questions --- -# FAQs +# FAQ ## What is PostgresML? diff --git a/pgml-cms/docs/introduction/getting-started/import-your-data/README.md b/pgml-cms/docs/introduction/import-your-data/README.md similarity index 100% rename from pgml-cms/docs/introduction/getting-started/import-your-data/README.md rename to pgml-cms/docs/introduction/import-your-data/README.md diff --git a/pgml-cms/docs/introduction/getting-started/import-your-data/copy.md b/pgml-cms/docs/introduction/import-your-data/copy.md similarity index 100% rename from pgml-cms/docs/introduction/getting-started/import-your-data/copy.md rename to pgml-cms/docs/introduction/import-your-data/copy.md diff --git a/pgml-cms/docs/introduction/getting-started/import-your-data/foreign-data-wrappers.md b/pgml-cms/docs/introduction/import-your-data/foreign-data-wrappers.md similarity index 100% rename from pgml-cms/docs/introduction/getting-started/import-your-data/foreign-data-wrappers.md rename to pgml-cms/docs/introduction/import-your-data/foreign-data-wrappers.md diff --git a/pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/README.md b/pgml-cms/docs/introduction/import-your-data/logical-replication/README.md similarity index 100% rename from pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/README.md rename to pgml-cms/docs/introduction/import-your-data/logical-replication/README.md diff --git a/pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/inside-a-vpc.md b/pgml-cms/docs/introduction/import-your-data/logical-replication/inside-a-vpc.md similarity index 82% rename from pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/inside-a-vpc.md rename to pgml-cms/docs/introduction/import-your-data/logical-replication/inside-a-vpc.md index 55da8bafb..278d8e865 100644 --- a/pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/inside-a-vpc.md +++ b/pgml-cms/docs/introduction/import-your-data/logical-replication/inside-a-vpc.md @@ -3,7 +3,7 @@ If your database doesn't have Internet access, PostgresML will need a service to proxy connections to your database. Any TCP proxy will do, and we also provide an nginx-based Docker image than can be used without any additional configuration. -
VPC
+
VPC
## PostgresML IPs by region diff --git a/pgml-cms/docs/introduction/getting-started/import-your-data/pg-dump.md b/pgml-cms/docs/introduction/import-your-data/pg-dump.md similarity index 100% rename from pgml-cms/docs/introduction/getting-started/import-your-data/pg-dump.md rename to pgml-cms/docs/introduction/import-your-data/pg-dump.md diff --git a/pgml-cms/docs/resources/data-storage-and-retrieval/README.md b/pgml-cms/docs/introduction/import-your-data/storage-and-retrieval/README.md similarity index 100% rename from pgml-cms/docs/resources/data-storage-and-retrieval/README.md rename to pgml-cms/docs/introduction/import-your-data/storage-and-retrieval/README.md diff --git a/pgml-cms/docs/resources/data-storage-and-retrieval/documents.md b/pgml-cms/docs/introduction/import-your-data/storage-and-retrieval/documents.md similarity index 100% rename from pgml-cms/docs/resources/data-storage-and-retrieval/documents.md rename to pgml-cms/docs/introduction/import-your-data/storage-and-retrieval/documents.md diff --git a/pgml-cms/docs/resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md b/pgml-cms/docs/introduction/import-your-data/storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md similarity index 100% rename from pgml-cms/docs/resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md rename to pgml-cms/docs/introduction/import-your-data/storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md diff --git a/pgml-cms/docs/resources/data-storage-and-retrieval/partitioning.md b/pgml-cms/docs/introduction/import-your-data/storage-and-retrieval/partitioning.md similarity index 100% rename from pgml-cms/docs/resources/data-storage-and-retrieval/partitioning.md rename to pgml-cms/docs/introduction/import-your-data/storage-and-retrieval/partitioning.md diff --git a/pgml-cms/docs/resources/data-storage-and-retrieval/tabular-data.md b/pgml-cms/docs/introduction/import-your-data/storage-and-retrieval/tabular-data.md similarity index 100% rename from pgml-cms/docs/resources/data-storage-and-retrieval/tabular-data.md rename to pgml-cms/docs/introduction/import-your-data/storage-and-retrieval/tabular-data.md diff --git a/pgml-cms/docs/open-source/pgml/developers/README.md b/pgml-cms/docs/open-source/pgml/developers/README.md new file mode 100644 index 000000000..eb352d266 --- /dev/null +++ b/pgml-cms/docs/open-source/pgml/developers/README.md @@ -0,0 +1,3 @@ +# Developers + +Documentation relevant to self-hosting, compiling or contributing to PostgresML diff --git a/pgml-cms/docs/resources/developer-docs/contributing.md b/pgml-cms/docs/open-source/pgml/developers/contributing.md similarity index 100% rename from pgml-cms/docs/resources/developer-docs/contributing.md rename to pgml-cms/docs/open-source/pgml/developers/contributing.md diff --git a/pgml-cms/docs/resources/developer-docs/distributed-training.md b/pgml-cms/docs/open-source/pgml/developers/distributed-training.md similarity index 100% rename from pgml-cms/docs/resources/developer-docs/distributed-training.md rename to pgml-cms/docs/open-source/pgml/developers/distributed-training.md diff --git a/pgml-cms/docs/resources/developer-docs/gpu-support.md b/pgml-cms/docs/open-source/pgml/developers/gpu-support.md similarity index 100% rename from pgml-cms/docs/resources/developer-docs/gpu-support.md rename to pgml-cms/docs/open-source/pgml/developers/gpu-support.md diff --git a/pgml-cms/docs/resources/developer-docs/installation.md b/pgml-cms/docs/open-source/pgml/developers/installation.md similarity index 100% rename from pgml-cms/docs/resources/developer-docs/installation.md rename to pgml-cms/docs/open-source/pgml/developers/installation.md diff --git a/pgml-cms/docs/resources/developer-docs/quick-start-with-docker.md b/pgml-cms/docs/open-source/pgml/developers/quick-start-with-docker.md similarity index 100% rename from pgml-cms/docs/resources/developer-docs/quick-start-with-docker.md rename to pgml-cms/docs/open-source/pgml/developers/quick-start-with-docker.md diff --git a/pgml-cms/docs/resources/developer-docs/self-hosting/README.md b/pgml-cms/docs/open-source/pgml/developers/self-hosting/README.md similarity index 100% rename from pgml-cms/docs/resources/developer-docs/self-hosting/README.md rename to pgml-cms/docs/open-source/pgml/developers/self-hosting/README.md diff --git a/pgml-cms/docs/resources/developer-docs/self-hosting/backups.md b/pgml-cms/docs/open-source/pgml/developers/self-hosting/backups.md similarity index 100% rename from pgml-cms/docs/resources/developer-docs/self-hosting/backups.md rename to pgml-cms/docs/open-source/pgml/developers/self-hosting/backups.md diff --git a/pgml-cms/docs/resources/developer-docs/self-hosting/building-from-source.md b/pgml-cms/docs/open-source/pgml/developers/self-hosting/building-from-source.md similarity index 100% rename from pgml-cms/docs/resources/developer-docs/self-hosting/building-from-source.md rename to pgml-cms/docs/open-source/pgml/developers/self-hosting/building-from-source.md diff --git a/pgml-cms/docs/resources/developer-docs/self-hosting/pooler.md b/pgml-cms/docs/open-source/pgml/developers/self-hosting/pooler.md similarity index 100% rename from pgml-cms/docs/resources/developer-docs/self-hosting/pooler.md rename to pgml-cms/docs/open-source/pgml/developers/self-hosting/pooler.md diff --git a/pgml-cms/docs/resources/developer-docs/self-hosting/replication.md b/pgml-cms/docs/open-source/pgml/developers/self-hosting/replication.md similarity index 100% rename from pgml-cms/docs/resources/developer-docs/self-hosting/replication.md rename to pgml-cms/docs/open-source/pgml/developers/self-hosting/replication.md diff --git a/pgml-cms/docs/resources/developer-docs/self-hosting/running-on-ec2.md b/pgml-cms/docs/open-source/pgml/developers/self-hosting/running-on-ec2.md similarity index 100% rename from pgml-cms/docs/resources/developer-docs/self-hosting/running-on-ec2.md rename to pgml-cms/docs/open-source/pgml/developers/self-hosting/running-on-ec2.md diff --git a/pgml-cms/docs/open-source/pgml/guides/README.md b/pgml-cms/docs/open-source/pgml/guides/README.md new file mode 100644 index 000000000..da87d1334 --- /dev/null +++ b/pgml-cms/docs/open-source/pgml/guides/README.md @@ -0,0 +1,16 @@ +# Guides + +* [Embeddings](open-source/pgml/guides/embeddings/README.md) + * [In-database Generation](open-source/pgml/guides/embeddings/in-database-generation.md) + * [Dimensionality Reduction](open-source/pgml/guides/embeddings/dimensionality-reduction.md) + * [Aggregation](open-source/pgml/guides/embeddings/vector-aggregation.md) + * [Similarity](open-source/pgml/guides/embeddings/vector-similarity.md) + * [Normalization](open-source/pgml/guides/embeddings/vector-normalization.md) +* [Search](open-source/pgml/guides/improve-search-results-with-machine-learning.md) +* [Chatbots](open-source/pgml/guides/chatbots/README.md) + * [Example Application](TODO/chatbots.md) +* [Supervised Learning](open-source/pgml/guides/supervised-learning.md) +* [Unified RAG](open-source/pgml/guides/unified-rag.md) +* [OpenSourceAI](open-source/pgml/guides/opensourceai.md) +* [Natural Language Processing](open-source/pgml/guides/natural-language-processing.md) +* [Vector database](open-source/pgml/guides/vector-database.md) diff --git a/pgml-cms/docs/open-source/pgml/guides/embeddings/README.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/README.md index 46b303ecd..a50e0b673 100644 --- a/pgml-cms/docs/open-source/pgml/guides/embeddings/README.md +++ b/pgml-cms/docs/open-source/pgml/guides/embeddings/README.md @@ -39,7 +39,7 @@ Vectors can be stored in the native Postgres [`ARRAY[]`](https://www.postgresql. !!! warning -Other cloud providers claim to offer embeddings "inside the database", but [benchmarks](../../../../resources/benchmarks/mindsdb-vs-postgresml.md) show that they are orders of magnitude slower than PostgresML. The reason is they don't actually run inside the database with hardware acceleration. They are thin wrapper functions that make network calls to remote service providers. PostgresML is the only cloud that puts GPU hardware in the database for full acceleration, and it shows. +Other cloud providers claim to offer embeddings "inside the database", but [benchmarks](/blog/mindsdb-vs-postgresml.md) show that they are orders of magnitude slower than PostgresML. The reason is they don't actually run inside the database with hardware acceleration. They are thin wrapper functions that make network calls to remote service providers. PostgresML is the only cloud that puts GPU hardware in the database for full acceleration, and it shows. !!! diff --git a/pgml-cms/docs/use-cases/embeddings/personalize-embedding-results-with-application-data-in-your-database.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/personalization.md similarity index 100% rename from pgml-cms/docs/use-cases/embeddings/personalize-embedding-results-with-application-data-in-your-database.md rename to pgml-cms/docs/open-source/pgml/guides/embeddings/personalization.md diff --git a/pgml-cms/docs/open-source/pgml/guides/vector-database.md b/pgml-cms/docs/open-source/pgml/guides/vector-database.md index 17700a4d2..bdc12a456 100644 --- a/pgml-cms/docs/open-source/pgml/guides/vector-database.md +++ b/pgml-cms/docs/open-source/pgml/guides/vector-database.md @@ -18,7 +18,7 @@ Vectors can be stored in columns, just like any other data type. To add a vector #### Adding a vector column -Using the example from [Tabular data](../../../resources/data-storage-and-retrieval/README.md), let's add a vector column to our USA House Prices table: +Using the example from [Tabular data](../../../introduction/import-your-data/storage-and-retrieval/README.md), let's add a vector column to our USA House Prices table: {% tabs %} {% tab title="SQL" %} @@ -288,4 +288,4 @@ CREATE INDEX #### Maintaining an HNSW index -HNSW requires little to no maintenance. When new vectors are added, they are automatically inserted at the optimal place in the graph. However, as the graph gets bigger, rebalancing it becomes more expensive, and inserting new rows becomes slower. We address this trade-off and how to solve this problem in [Partitioning](../../../resources/data-storage-and-retrieval/partitioning.md). +HNSW requires little to no maintenance. When new vectors are added, they are automatically inserted at the optimal place in the graph. However, as the graph gets bigger, rebalancing it becomes more expensive, and inserting new rows becomes slower. We address this trade-off and how to solve this problem in [Partitioning](../../../introduction/import-your-data/storage-and-retrieval/partitioning.md). diff --git a/pgml-cms/docs/resources/benchmarks/README.md b/pgml-cms/docs/resources/benchmarks/README.md deleted file mode 100644 index ce4a798b7..000000000 --- a/pgml-cms/docs/resources/benchmarks/README.md +++ /dev/null @@ -1,2 +0,0 @@ -# Benchmarks - diff --git a/pgml-cms/docs/resources/benchmarks/making-postgres-30-percent-faster-in-production.md b/pgml-cms/docs/resources/benchmarks/making-postgres-30-percent-faster-in-production.md deleted file mode 100644 index 030a84398..000000000 --- a/pgml-cms/docs/resources/benchmarks/making-postgres-30-percent-faster-in-production.md +++ /dev/null @@ -1,49 +0,0 @@ ---- -description: >- - Anyone who runs Postgres at scale knows that performance comes with trade - offs. ---- - -# Making Postgres 30 Percent Faster in Production - -Anyone who runs Postgres at scale knows that performance comes with trade offs. The typical playbook is to place a pooler like PgBouncer in front of your database and turn on transaction mode. This makes multiple clients reuse the same server connection, which allows thousands of clients to connect to your database without causing a fork bomb. - -Unfortunately, this comes with a trade off. Since multiple clients use the same server, they couldn't take advantage of prepared statements. Prepared statements are a way for Postgres to cache a query plan and execute it multiple times with different parameters. If you have never tried this before, you can run `pgbench` against your local DB and you'll see that `--protocol prepared` outperforms `simple` and `extended` by at least 30 percent. Giving up this feature has been a given for production deployments for as long as I can remember, but not anymore. - -## PgCat Prepared Statements - -Since [#474](https://github.com/postgresml/pgcat/pull/474), PgCat supports prepared statements in session and transaction mode. Our initial benchmarks show 30% increase over extended protocol (`--protocol extended`) and 15% against simple protocol (`--simple`). Most (all?) web frameworks use at least the extended protocol, so we are looking at a **30% performance increase across the board for everyone** who writes web apps and uses Postgres in production, by just switching to named prepared statements. - -In Rails apps, it's as simple as setting `prepared_statements: true`. - -This is not only a performance benefit, but also a usability improvement for client libraries that have to use prepared statements, like the popular Rust crate [SQLx](https://github.com/launchbadge/sqlx). Until now, the typical recommendation was to just not use a pooler. - -## Benchmark - -
- -The benchmark was conducted using `pgbench` with 1, 10, 100 and 1000 clients sending millions of queries to PgCat, which itself was running on a different EC2 machine alongside the database. This is a simple setup often used in production. Another configuration sees a pooler use its own machine, which of course increases latency but improves on availability. The clients were on another EC2 machine to simulate the latency experienced in typical web apps deployed in Kubernetes, ECS, EC2 and others. - -Benchmark ran in transaction mode. Session mode is faster with fewer clients, but does not scale in production with more than a few hundred clients. Only `SELECT` statements (`-S` option) were used, since the typical `pgbench` benchmark uses a similar number of writes to reads, which is an atypical production workload. Most apps read 90% of the time, and write 10% of the time. Reads are where prepared statements truly shine. - -## Implementation - -PgCat implements an internal cache & mapping between clients' prepared statements and servers that may or may not have them. If a server has the prepared statement, PgCat just forwards the `Bind (F)`, `Execute (F)` and `Describe (F)` messages. If the server doesn't have the prepared statement, PgCat fetches it from the client cache & prepares it using the `Parse (F)` message. You can refer to [Postgres docs](https://www.postgresql.org/docs/current/protocol-flow.html) for a more detailed explanation of how the extended protocol works. - -An important feature of PgCat's implementation is that all prepared statements are renamed and assigned globally unique names. This means that clients that don't randomize their prepared statement names and expect it to be gone after they disconnect from the "Postgres server", work as expected (I put "Postgres server" in quotes because they are actually talking to a proxy that pretends to be a Postgres database). Typical error when using such clients with PgBouncer is `prepared statement "sqlx_s_2" already exists`, which is pretty confusing when you see it for the first time. - -## Metrics - -We've added two new metrics to the admin database: `prepare_cache_hit` and `prepare_cache_miss`. Prepare cache hits indicate that the prepared statement requested by the client already exists on the server. That's good because PgCat can just rewrite the messages and send them to the server immediately. Prepare cache misses indicate that PgCat had to issue a prepared statement call to the server, which requires additional time and decreases throughput. In the ideal scenario, the cache hits outnumber the cache misses by an order of magnitude. If they are the same or worse, the prepared statements are not being used correctly by the clients. - -
- -Our benchmark had a 99.99% cache hit ratio, which is really good, but in production this number is likely to be lower. You can monitor your cache hit/miss ratios through the admin database by querying it with `SHOW SERVERS`. - -## Roadmap - -Our implementation is pretty simple and we are already seeing massive improvements, but we can still do better. A `Parse (F)` made prepared statement works, but if one prepares their statements using `PREPARE` explicitly, PgCat will ignore it and that query isn't likely to work outside of session mode. - -Another issue is explicit `DEALLOCATE` and `DISCARD` calls. PgCat doesn't detect them currently, and a client can potentially bust the server prepared statement cache without PgCat knowing about it. It's an easy enough fix to intercept and act on that query accordingly, but we haven't built that yet. - -Testing with `pgbench` is an artificial benchmark, which is good and bad. It's good because, other things being equal, we can demonstrate that one implementation & configuration of the database/pooler cluster is superior to another. It's bad because in the real world, the results can differ. We are looking for users who would be willing to test our implementation against their production traffic and tell us how we did. This feature is optional and can be enabled & disabled dynamically, without restarting PgCat, with `prepared_statements = true` in `pgcat.toml`. diff --git a/pgml-cms/docs/resources/benchmarks/million-requests-per-second.md b/pgml-cms/docs/resources/benchmarks/million-requests-per-second.md deleted file mode 100644 index 716b91eba..000000000 --- a/pgml-cms/docs/resources/benchmarks/million-requests-per-second.md +++ /dev/null @@ -1,234 +0,0 @@ ---- -description: >- - The question "Does it Scale?" has become somewhat of a meme in software - engineering. ---- - -# Scaling to 1 Million Requests per Second - -The question "Does it Scale?" has become somewhat of a meme in software engineering. There is a good reason for it though, because most businesses plan for success. If your app, online store, or SaaS becomes popular, you want to be sure that the system powering it can serve all your new customers. - -At PostgresML, we are very concerned with scale. Our engineering background took us through scaling PostgreSQL to 100 TB+, so we're certain that it scales, but could we scale machine learning alongside it? - -In this post, we'll discuss how we horizontally scale PostgresML to achieve more than **1 million XGBoost predictions per second** on commodity hardware. - -If you missed our previous post and are wondering why someone would combine machine learning and Postgres, take a look at our PostgresML vs. Python benchmark. - -## Architecture Overview - -If you're familiar with how one runs PostgreSQL at scale, you can skip straight to the [results](../../benchmarks/broken-reference/). - -Part of our thesis, and the reason why we chose Postgres as our host for machine learning, is that scaling machine learning inference is very similar to scaling read queries in a typical database cluster. - -Inference speed varies based on the model complexity (e.g. `n_estimators` for XGBoost) and the size of the dataset (how many features the model uses), which is analogous to query complexity and table size in the database world and, as we'll demonstrate further on, scaling the latter is mostly a solved problem. - -

System Architecture

- -| Component | Description | -| --------- | --------------------------------------------------------------------------------------------------------- | -| Clients | Regular Postgres clients | -| ELB | [Elastic Network Load Balancer](https://aws.amazon.com/elasticloadbalancing/) | -| PgCat | A Postgres [pooler](https://github.com/levkk/pgcat/) with built-in load balancing, failover, and sharding | -| Replica | Regular Postgres [replicas](https://www.postgresql.org/docs/current/high-availability.html) | -| Primary | Regular Postgres primary | - -Our architecture has four components that may need to scale up or down based on load: - -1. Clients -2. Load balancer -3. [PgCat](https://github.com/levkk/pgcat/) pooler -4. Postgres replicas - -We intentionally don't discuss scaling the primary in this post, because sharding, which is the most effective way to do so, is a fascinating subject that deserves its own series of posts. Spoiler alert: we sharded Postgres without any problems. - -### Clients - -Clients are regular Postgres connections coming from web apps, job queues, or pretty much anywhere that needs data. They can be long-living or ephemeral and they typically grow in number as the application scales. - -Most modern deployments use containers which are added as load on the app increases, and removed as the load decreases. This is called dynamic horizontal scaling, and it's an effective way to adapt to changing traffic patterns experienced by most businesses. - -### Load Balancer - -The load balancer is a way to spread traffic across horizontally scalable components, by routing new connections to targets in a round robin (or random) fashion. It's typically a very large box (or a fast router), but even those need to be scaled if traffic suddenly increases. Since we're running our system on AWS, this is already taken care of, for a reasonably small fee, by using an Elastic Load Balancer. - -### PgCat - -If you've used Postgres in the past, you know that it can't handle many concurrent connections. For large deployments, it's necessary to run something we call a pooler. A pooler routes thousands of clients to only a few dozen server connections by time-sharing when a client can use a server. Because most queries are very quick, this is a very effective way to run Postgres at scale. - -There are many poolers available presently, the most notable being PgBouncer, which has been around for a very long time, and is trusted by many large organizations. Unfortunately, it hasn't evolved much with the growing needs of highly available Postgres deployments, so we wrote [our own](https://github.com/levkk/pgcat/) which added important functionality we needed: - -* Load balancing of read queries -* Failover in case a read replica is broken -* Sharding (this feature is still being developed) - -In this benchmark, we used its load balancing feature to evenly distribute XGBoost predictions across our Postgres replicas. - -### Postgres Replicas - -Scaling Postgres reads is pretty straight forward. If more read queries are coming in, we add a replica to serve the increased load. If the load is decreasing, we remove a replica to save money. The data is replicated from the primary, so all replicas are identical, and all of them can serve any query, or in our case, an XGBoost prediction. PgCat can dynamically add and remove replicas from its config without disconnecting clients, so we can add and remove replicas as needed, without downtime. - -#### Parallelizing XGBoost - -Scaling XGBoost predictions is a little bit more interesting. XGBoost cannot serve predictions concurrently because of internal data structure locks. This is common to many other machine learning algorithms as well, because making predictions can temporarily modify internal components of the model. - -PostgresML bypasses that limitation because of how Postgres itself handles concurrency: - -
- -_PostgresML concurrency_ - -PostgreSQL uses the fork/multiprocessing architecture to serve multiple clients concurrently: each new client connection becomes an independent OS process. During connection startup, PostgresML loads all models inside the process' memory space. This means that each connection has its own copy of the XGBoost model and PostgresML ends up serving multiple XGBoost predictions at the same time without any lock contention. - -## Results - -We ran over a 100 different benchmarks, by changing the number of clients, poolers, replicas, and XGBoost predictions we requested. The benchmarks were meant to test the limits of each configuration, and what remediations were needed in each scenario. Our raw data is available below. - -One of the tests we ran used 1,000 clients, which were connected to 1, 2, and 5 replicas. The results were exactly what we expected. - -### Linear Scaling - -
- -

Latency

- - - -

Throughput

- -
- -Both latency and throughput, the standard measurements of system performance, scale mostly linearly with the number of replicas. Linear scaling is the north star of all horizontally scalable systems, and most are not able to achieve it because of increasing complexity that comes with synchronization. - -Our architecture shares nothing and requires no synchronization. The replicas don't talk to each other and the poolers don't either. Every component has the knowledge it needs (through configuration) to do its job, and they do it well. - -The most impressive result is serving close to a million predictions with an average latency of less than 1ms. You might notice though that `950160.7` isn't quite one million, and that's true. We couldn't reach one million with 1000 clients, so we increased to 2000 and got our magic number: **1,021,692.7 req/sec**, with an average latency of **1.7ms**. - -### Batching Predictions - -Batching is a proven method to optimize performance. If you need to get several data points, batch the requests into one query, and it will run faster than making individual requests. - -We should precede this result by stating that PostgresML does not yet have a batch prediction API as such. Our `pgml.predict()` function can predict multiple points, but we haven't implemented a query pattern to pass multiple rows to that function at the same time. Once we do, based on our tests, we should see a substantial increase in batch prediction performance. - -Regardless of that limitation, we still managed to get better results by batching queries together since Postgres needed to do less query parsing and searching, and we saved on network round trip time as well. - -
- -
- - - -
- -
- -If batching did not work at all, we would see a linear increase in latency and a linear decrease in throughput. That did not happen; instead, we got a 1.5x improvement by batching 5 predictions together, and a 1.2x improvement by batching 20. A modest success, but a success nonetheless. - -### Graceful Degradation and Queuing - -
- -
- - - -
- -
- -All systems, at some point in their lifetime, will come under more load than they were designed for; what happens then is an important feature (or bug) of their design. Horizontal scaling is never immediate: it takes a bit of time to spin up additional hardware to handle the load. It can take a second, or a minute, depending on availability, but in both cases, existing resources need to serve traffic the best way they can. - -We were hoping to test PostgresML to its breaking point, but we couldn't quite get there. As the load (number of clients) increased beyond provisioned capacity, the only thing we saw was a gradual increase in latency. Throughput remained roughly the same. This gradual latency increase was caused by simple queuing: the replicas couldn't serve requests concurrently, so the requests had to patiently wait in the poolers. - -
- -_"What's taking so long over there!?"_ - -Among many others, this is a very important feature of any proxy: it's a FIFO queue (first in, first out). If the system is underutilized, queue size is 0 and all requests are served as quickly as physically possible. If the system is overutilized, the queue size increases, holds as the number of requests stabilizes, and decreases back to 0 as the system is scaled up to accommodate new traffic. - -Queueing overall is not desirable, but it's a feature, not a bug. While autoscaling spins up an additional replica, the app continues to work, although a few milliseconds slower, which is a good trade off for not overspending on hardware. - -As the demand on PostgresML increases, the system gracefully handles the load. If the number of replicas stays the same, latency slowly increases, all the while remaining well below acceptable ranges. Throughput holds as well, as increasing number of clients evenly split available resources. - -If we increase the number of replicas, latency decreases and throughput increases, as the number of clients increases in parallel. We get the best result with 5 replicas, but this number is variable and can be changed as needs for latency compete with cost. - -## What's Next - -Horizontal scaling and high availability are fascinating topics in software engineering. Needing to serve 1 million predictions per second is rare, but having the ability to do that, and more if desired, is an important aspect for any new system. - -The next challenge for us is to scale writes horizontally. In the database world, this means sharding the database into multiple separate machines using a hashing function, and automatically routing both reads and writes to the right shards. There are many possible solutions on the market for this already, e.g. Citus and Foreign Data Wrappers, but none are as horizontally scalable as we like, although we will incorporate them into our architecture until we build the one we really want. - -For that purpose, we're building our own open source [Postgres proxy](https://github.com/levkk/pgcat/) which we discussed earlier in the article. As we progress further in our journey, we'll be adding more features and performance improvements. - -By combining PgCat with PostgresML, we are aiming to build the next generation of machine learning infrastructure that can power anything from tiny startups to unicorns and massive enterprises, without the data ever leaving our favorite database. - -## Methodology - -### ML - -This time, we used an XGBoost model with 100 trees: - -```postgresql -SELECT * FROM pgml.train( - 'flights', - task => 'regression', - relation_name => 'flights_mat_3', - y_column_name => 'depdelayminutes', - algorithm => 'xgboost', - hyperparams => '{"n_estimators": 100 }', - runtime => 'rust' -); -``` - -and fetched our predictions the usual way: - -```postgresql -SELECT pgml.predict( - 'flights', - ARRAY[ - year, - quarter, - month, - distance, - dayofweek, - dayofmonth, - flight_number_operating_airline, - originairportid, - destairportid, - flight_number_marketing_airline, - departure - ] -) AS prediction -FROM flights_mat_3 LIMIT :limit; -``` - -where `:limit` is the batch size of 1, 5, and 20. - -#### Model - -The model is roughly the same as the one we used in our previous post, with just one extra feature added, which improved R2 a little bit. - -### Hardware - -#### Client - -The client was a `c5n.4xlarge` box on EC2. We chose the `c5n` class to have the 100 GBit NIC, since we wanted it to saturate our network as much as possible. Thousands of clients were simulated using [`pgbench`](https://www.postgresql.org/docs/current/pgbench.html). - -#### PgCat Pooler - -PgCat, written in asynchronous Rust, was running on `c5.xlarge` machines (4 vCPUs, 8GB RAM) with 4 Tokio workers. We used between 1 and 35 machines, and scaled them in increments of 5-20 at a time. - -The pooler did a decent amount of work around parsing queries, making sure they are read-only `SELECT`s, and routing them, at random, to replicas. If any replica was down for any reason, it would route around it to remaining machines. - -#### Postgres Replicas - -Postgres replicas were running on `c5.9xlarge` machines with 36 vCPUs and 72 GB of RAM. The hot dataset fits entirely in memory. The servers were intentionally saturated to maximum capacity before scaling up to test queuing and graceful degradation of performance. - -#### Raw Results - -Raw latency data is available [here](https://static.postgresml.org/benchmarks/reads-latency.csv) and raw throughput data is available [here](https://static.postgresml.org/benchmarks/reads-throughput.csv). - -## Call to Early Adopters - -[PostgresML](https://github.com/postgresml/postgresml/) and [PgCat](https://github.com/levkk/pgcat/) are free and open source. If your organization can benefit from simplified and fast machine learning, get in touch! We can help deploy PostgresML internally, and collaborate on new and existing features. Join our [Discord](https://discord.gg/DmyJP3qJ7U) or [email](mailto:team@postgresml.org) us! - -Many thanks and ❤️ to all those who are supporting this endeavor. We’d love to hear feedback from the broader ML and Engineering community about applications and other real world scenarios to help prioritize our work. You can show your support by starring us on our [Github](https://github.com/postgresml/postgresml/). diff --git a/pgml-cms/docs/resources/benchmarks/mindsdb-vs-postgresml.md b/pgml-cms/docs/resources/benchmarks/mindsdb-vs-postgresml.md deleted file mode 100644 index c82d4eea1..000000000 --- a/pgml-cms/docs/resources/benchmarks/mindsdb-vs-postgresml.md +++ /dev/null @@ -1,293 +0,0 @@ ---- -description: "Compare two projects that both aim\Lto provide an SQL interface to ML algorithms and the data they require." ---- - -# MindsDB vs PostgresML - -## Introduction - -There are a many ways to do machine learning with data in a SQL database. In this article, we'll compare 2 projects that both aim to provide a SQL interface to machine learning algorithms and the data they require: **MindsDB** and **PostgresML**. We'll look at how they work, what they can do, and how they compare to each other. The **TLDR** is that PostgresML is more opinionated, more scalable, more capable and several times faster than MindsDB. On the other hand, MindsDB is 5 times more mature than PostgresML according to age and GitHub Stars. What are the important factors? - -_We're occasionally asked what the difference is between PostgresML and MindsDB. We'd like to answer that question at length, and let you decide if the reasoning is fair._ - -### At a glance - -Both projects are Open Source, although PostgresML allows for more permissive use with the MIT license, compared to the GPL-3.0 license used by MindsDB. PostgresML is also a significantly newer project, with the first commit in 2022, compared to MindsDB which has been around since 2017, but one of the first hints at the real differences between the two projects is the choice of programming languages. MindsDB is implemented in Python, while PostgresML is implemented with Rust. I say _in_ Python, because it's a language with a runtime, and _with_ Rust, because it's a language with a compiler that does not require a Runtime. We'll see how this difference in implementation languages leads to different outcomes. - -| | MindsDB | PostgresML | -| -------- | ------- | ---------- | -| Age | 5 years | 1 year | -| License | GPL-3.0 | MIT | -| Language | Python | Rust | - -### Algorithms - -Both Projects integrate several dozen machine learning algorithms, including the latest LLMs from Hugging Face. - -| | MindsDB | PostgresML | -| ----------------- | ------- | ---------- | -| Classification | ✅ | ✅ | -| Regression | ✅ | ✅ | -| Time Series | ✅ | ✅ | -| LLM Support | ✅ | ✅ | -| Embeddings | - | ✅ | -| Vector Support | - | ✅ | -| Full Text Search | - | ✅ | -| Geospatial Search | - | ✅ | - -Both MindsDB and PostgresML support many classical machine learning algorithms to do classification and regression. They are both able to load ~~the latest LLMs~~ some models from Hugging Face, supported by underlying implementations in libtorch. I had to cross that out after exploring all the caveats in the MindsDB implementations. PostgresML supports the models released immediately as long as underlying dependencies are met. MindsDB has to release an update to support any new models, and their current model support is extremely limited. New algorithms, tasks, and models are constantly released, so it's worth checking the documentation for the latest list. - -Another difference is that PostgresML also supports embedding models, and closely integrates them with vector search inside the database, which is well beyond the scope of MindsDB, since it's not a database at all. PostgresML has direct access to all the functionality provided by other Postgres extensions, like vector indexes from [pgvector](https://github.com/pgvector/pgvector) to perform efficient KNN & ANN vector recall, or [PostGIS](http://postgis.net/) for geospatial information as well as built in full text search. Multiple algorithms and extensions can be combined in compound queries to build state-of-the-art systems, like search and recommendations or fraud detection that generate an end to end result with a single query, something that might take a dozen different machine learning models and microservices in a more traditional architecture. - -### Architecture - -The architectural implementations for these projects is significantly different. PostgresML takes a data centric approach with Postgres as the provider for both storage _and_ compute. To provide horizontal scalability for inference, the PostgresML team has also created [PgCat](https://github.com/postgresml/pgcat) to distribute workloads across many Postgres databases. On the other hand, MindsDB takes a service oriented approach that connects to various databases over the network. - -
- -| | MindsDB | PostgresML | -| ------------- | ------------- | ---------- | -| Data Access | Over the wire | In process | -| Multi Process | ✅ | ✅ | -| Database | - | ✅ | -| Replication | - | ✅ | -| Sharding | - | ✅ | -| Cloud Hosting | ✅ | ✅ | -| On Premise | ✅ | ✅ | -| Web UI | ✅ | ✅ | - -The difference in architecture leads to different tradeoffs and challenges. There are already hundreds of ways to get data into and out of a Postgres database, from just about every other service, language and platform that makes PostgresML highly compatible with other application workflows. On the other hand, the MindsDB Python service accepts connections from specifically supported clients like `psql` and provides a pseudo-SQL interface to the functionality. The service will parse incoming MindsDB commands that look similar to SQL (but are not), for tasks like configuring database connections, or doing actual machine learning. These commands typically have what looks like a sub-select, that will actually fetch data over the wire from configured databases for Machine Learning training and inference. - -MindsDB is actually a pretty standard Python microservice based architecture that separates data from compute over the wire, just with an SQL like API, instead of gRPC or REST. MindsDB isn't actually a DB at all, but rather an ML service with adapters for just about every database that Python can connect to. - -On the other hand, PostgresML runs ML algorithms inside the database itself. It shares memory with the database, and can access data directly, using pointers to avoid the serialization and networking overhead that frequently dominates data hungry machine learning applications. Rust is an important language choice for PostgresML because its memory safety simplifies the effort required to achieve stability along with performance in a large and complex memory space. The "tradeoff", is that it requires a Postgres database to actually host the data it operates on. - -In addition to the extension, PostgresML relies on PgCat to scale Postgres clusters horizontally using both sharding and replication strategies to provide both scalable compute and storage. Scaling a low latency and high availability feature store is often the most difficult operational challenge for Machine Learning applications. That's the primary driver of PostgresML's architectural choices. MindsDB leaves those issues as an exercise for the adopter, while also introducing a new single service bottleneck for ML compute implemented in Python. - -## Benchmarks - -If you missed our previous article benchmarking PostgresML vs Python Microservices, spoiler alert, PostgresML is between 8-40x faster than Python microservice architectures that do the same thing, even if they use "specialized" in memory databases like Redis. The network transit cost as well as data serialization is a major cost for data hungry machine learning algorithms. Since MindsDB doesn't actually provide a DB, we'll create a synthetic benchmark that doesn't use stored data in a database (even though that's the whole point of SQL ML, right?). This will negate the network serialization and transit costs a MindsDB service would typically occur, and highlight the performance differences between Python and Rust implementations. - -#### PostgresML - -We'll connect to our Postgres server running locally: - -```commandline -psql postgres://postgres:password@127.0.0.1:5432 -``` - -For both implementations, we can just pass in our data as part of the query for an apples to apples performance comparison. PostgresML adds the `pgml.transform` function, that takes an array of inputs to transform, given a task and model, without any setup beyond installing the extension. Let's see how long it takes to run a sentiment analysis model on a single sentence: - -!!! generic - -!!! code\_block time="4769.337 ms" - -```postgresql -SELECT pgml.transform( - inputs => ARRAY[ - 'I am so excited to benchmark deep learning models in SQL. I can not wait to see the results!' - ], - task => '{ - "task": "text-classification", - "model": "cardiffnlp/twitter-roberta-base-sentiment" - }'::JSONB -); -``` - -!!! - -!!! results - -| positivity | -| ---------------------------------------------------- | -| \[{"label": "LABEL\_2", "score": 0.990081250667572}] | - -!!! - -!!! - -The first time `transform` is run with a particular model name, it will download that pretrained transformer from HuggingFace, and load it into RAM, or VRAM if a GPU is available. In this case, that took about 5 seconds, but let's see how fast it is now that the model is cached. - -!!! generic - -!!! code\_block time="45.094 ms" - -```postgresql -SELECT pgml.transform( - inputs => ARRAY[ - 'I don''t really know if 5 seconds is fast or slow for deep learning. How much time is spent downloading vs running the model?' - ], - task => '{ - "task": "text-classification", - "model": "cardiffnlp/twitter-roberta-base-sentiment" - }'::JSONB -); -``` - -!!! - -!!! results - -| transform | -| ------------------------------------------------------ | -| \[{"label": "LABEL\_1", "score": 0.49658918380737305}] | - -!!! - -!!! - -45ms is below the level of human perception, so we could use a deep learning model like this to build an interactive application that feels instantaneous to our users. It's worth noting that PostgresML will automatically use a GPU if it's available. This benchmark machine includes an NVIDIA RTX 3090. We can also check the speed on CPU only, by setting the `device` argument to `cpu`: - -!!! generic - -!!! code\_block time="165.036 ms" - -```postgresql -SELECT pgml.transform( - inputs => ARRAY[ - 'Are GPUs really worth it? Sometimes they are more expensive than the rest of the computer combined.' - ], - task => '{ - "task": "text-classification", - "model": "cardiffnlp/twitter-roberta-base-sentiment", - "device": "cpu" - }'::JSONB -); -``` - -!!! - -!!! results - -| transform | -| ----------------------------------------------------- | -| \[{"label": "LABEL\_0", "score": 0.7333963513374329}] | - -!!! - -!!! - -The GPU is able to run this model about 4x faster than the i9-13900K with 24 cores. - -#### Model Outputs - -You might have noticed that the `inputs` the model was analyzing got less positive over time, and the model moved from `LABEL_2` to `LABEL_1` to `LABEL_0`. Some models use more descriptive outputs, but in this case I had to look at the [README](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment/blob/main/README.md) to see what the labels represent. - -Labels: - -* 0 -> Negative -* 1 -> Neutral -* 2 -> Positive - -It looks like this model did correctly pick up on the decreasing enthusiasm in the text, so not only is it relatively fast on a GPU, it's usefully accurate. Another thing to consider when it comes to model quality is that this model was trained on tweets, and these inputs were chosen to be about as long and complex as a tweet. It's not always clear how well a model will generalize to novel looking inputs, so it's always important to do a little reading about a model when you're looking for ways to test and improve the quality of it's output. - -#### MindsDB - -MindsDB requires a bit more setup than just the database, but I'm running it on the same machine with the latest version. I'll also use the same model, so we can compare apples to apples. - -```commandline -python -m mindsdb --api postgres -``` - -Then we can connect to this Python service with our Postgres client: - -``` -psql postgres://mindsdb:123@127.0.0.1:55432 -``` - -And turn timing on to see how long it takes to run the same query: - -```postgresql -\timing on -``` - -And now we can issue some MindsDB pseudo sql: - -!!! code\_block time="277.722 ms" - -``` -CREATE MODEL mindsdb.sentiment_classifier -PREDICT sentiment -USING - engine = 'huggingface', - task = 'text-classification', - model_name = 'cardiffnlp/twitter-roberta-base-sentiment', - input_column = 'text', - labels = ['negativ', 'neutral', 'positive']; -``` - -!!! - -This kicked off a background job in the Python service to download the model and set it up, which took about 4 seconds judging from the logs, but I don't have an exact time for exactly when the model became "status: complete" and was ready to handle queries. - -Now we can write a query that will make a prediction similar to PostgresML, using the same Huggingface model. - -!!! generic - -!!! code\_block time="741.650 ms" - -``` -SELECT * -FROM mindsdb.sentiment_classifier -WHERE text = 'I am so excited to benchmark deep learning models in SQL. I can not wait to see the results!' -``` - -!!! - -!!! results - -| sentiment | sentiment\_explain | text | -| --------- | -------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | -| positive | {"positive": 0.990081250667572, "neutral": 0.008058485575020313, "negativ": 0.0018602772615849972} | I am so excited to benchmark deep learning models in SQL. I can not wait to see the results! | - -!!! - -!!! - -Since we've provided the MindsDB model with more human-readable labels, they're reusing those (including the negativ typo), and returning all three scores along with the input by default. However, this seems to be a bit slower than anything we've seen so far. Let's try to speed it up by only returning the label without the full sentiment\_explain. - -!!! generic - -!!! code\_block time="841.936 ms" - -``` -SELECT sentiment -FROM mindsdb.sentiment_classifier -WHERE text = 'I am so excited to benchmark deep learning models in SQL. I can not wait to see the results!' -``` - -!!! - -!!! results - -| sentiment | -| --------- | -| positive | - -!!! - -!!! - -It's not the sentiment\_explain that's slowing it down. I spent several hours of debugging, and learned a lot more about the internal Python service architecture. I've confirmed that even though inside the Python service, `torch.cuda.is_available()` returns `True` when the service starts, I never see a Python process use the GPU with `nvidia-smi`. MindsDB also claims to run on GPU, but I haven't been able to find any documentation, or indication in the code why it doesn't "just work". I'm stumped on this front, but I think it's fair to assume this is a pure CPU benchmark. - -The other thing I learned trying to get this working is that MindsDB isn't just a single Python process. Python famously has a GIL that will impair parallelism, so the MindsDB team has cleverly built a service that can run multiple Python processes in parallel. This is great for scaling out, but it means that our query is serialized to JSON and sent to a worker, and then the worker actually runs the model and sends the results back to the parent, again as JSON, which as far as I can tell is where the 5x slow-down is happening. - -## Results - -PostgresML is the clear winner in terms of performance. It seems to me that it currently also support more models with a looser function API than the pseudo SQL required to create a MindsDB model. You'll notice the output structure for models on HuggingFace can very widely. I tried several not listed in the MindsDB documentation, but received errors on creation. PostgresML just returns the models output without restructuring, so it's able to handle more discrepancies, although that does leave it up to the end user to sort out how to use models. - -| task | model | MindsDB | PostgresML CPU | PostgresML GPU | -| ----------------------- | ----------------------------------------- | ------- | -------------- | -------------- | -| text-classification | cardiffnlp/twitter-roberta-base-sentiment | 741 | 165 | 45 | -| translation\_en\_to\_es | t5-base | 1573 | 1148 | 294 | -| summarization | sshleifer/distilbart-cnn-12-6 | 4289 | 3450 | 479 | - -There is a general trend, the larger and slower the model is, the more work is spent inside libtorch, the less the performance of the rest matters, but for interactive models and use cases there is a significant difference. We've tried to cover the most generous use case we could between these two. If we were to compare XGBoost or other classical algorithms, that can have sub millisecond prediction times in PostgresML, the 20ms Python service overhead of MindsDB just to parse the incoming query would be hundreds of times slower. - -## Clouds - -Setting these services up is a bit of work, even for someone heavily involved in the day-to-day machine learning mayhem. Managing machine learning services and databases at scale requires a significant investment over time. Both services are available in the cloud, so let's see how they compare on that front as well. - -MindsDB is available on the AWS marketplace on top of your own hardware instances. You can scale it out and configure your data sources through their Web UI, very similar to the local installation, but you'll also need to figure out your data sources and how to scale them for machine learning workloads. Good luck! - -PostgresML is available as a fully managed database service, that includes the storage, backups, metrics, and scalability through PgCat that large ML deployments need. End-to-end machine learning is rarely just about running the models, and often more about scaling the data pipelines and managing the data infrastructure around them, so in this case PostgresML also provides a large service advantage, whereas with MindsDB, you'll still need to figure out your cloud data storage solution independently. diff --git a/pgml-cms/docs/resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md b/pgml-cms/docs/resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md deleted file mode 100644 index c5812fd56..000000000 --- a/pgml-cms/docs/resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md +++ /dev/null @@ -1,177 +0,0 @@ ---- -description: PostgresML is a simpler alternative to that ever-growing complexity. ---- - -# PostgresML is 8-40x faster than Python HTTP microservices - -Machine learning architectures can be some of the most complex, expensive and _difficult_ arenas in modern systems. The number of technologies and the amount of required hardware compete for tightening headcount, hosting, and latency budgets. Unfortunately, the trend in the industry is only getting worse along these lines, with increased usage of state-of-the-art architectures that center around data warehouses, microservices and NoSQL databases. - -PostgresML is a simpler alternative to that ever-growing complexity. In this post, we explore some additional performance benefits of a more elegant architecture and discover that PostgresML outperforms traditional Python microservices by a **factor of 8** in local tests and by a **factor of 40** on AWS EC2. - -## Candidate architectures - -To consider Python microservices with every possible advantage, our first benchmark is run with Python and Redis located on the same machine. Our goal is to avoid any additional network latency, which puts it on a more even footing with PostgresML. Our second test takes place on AWS EC2, with Redis and Gunicorn separated by a network; this benchmark proves to be relatively devastating. - -The full source code for both benchmarks is available on [Github](https://github.com/postgresml/postgresml/tree/master/pgml-cms/docs/blog/benchmarks/python\_microservices\_vs\_postgresml). - -### PostgresML - -PostgresML architecture is composed of: - -1. A PostgreSQL server with PostgresML v2.0 -2. [pgbench](https://www.postgresql.org/docs/current/pgbench.html) SQL client - -### Python - -Python architecture is composed of: - -1. A Flask/Gunicorn server accepting and returning JSON -2. CSV file with the training data -3. Redis feature store with the inference dataset, serialized with JSON -4. [ab](https://httpd.apache.org/docs/2.4/programs/ab.html) HTTP client - -### ML - -Both architectures host the same XGBoost model, running predictions against the same dataset. See [Methodology](../../benchmarks/broken-reference/) for more details. - -## Results - -### Throughput - -
- -Throughput is defined as the number of XGBoost predictions the architecture can serve per second. In this benchmark, PostgresML outperformed Python and Redis, running on the same machine, by a **factor of 8**. - -In Python, most of the bottleneck comes from having to fetch and deserialize Redis data. Since the features are externally stored, they need to be passed through Python and into XGBoost. XGBoost itself is written in C++, and it's Python library only provides a convenient interface. The prediction coming out of XGBoost has to go through Python again, serialized as JSON, and sent via HTTP to the client. - -This is pretty much the bare minimum amount of work you can do for an inference microservice. - -PostgresML, on the other hand, collocates data and compute. It fetches data from a Postgres table, which already comes in a standard floating point format, and the Rust inference layer forwards it to XGBoost via a pointer. - -An interesting thing happened when the benchmark hit 20 clients: PostgresML throughput starts to quickly decrease. This may be surprising to some, but to Postgres enthusiasts it's a known issue: Postgres isn't very good at handling more concurrent active connections than CPU threads. To mitigate this, we introduced PgBouncer (a Postgres proxy and pooler) in front of the database, and the throughput increased back up, and continued to hold as we went to 100 clients. - -It's worth noting that the benchmarking machine had only 16 available CPU threads (8 cores). If more cores were available, the bottleneck would only occur with more clients. The general recommendation for Postgres servers it to open around 2 connections per available CPU core, although newer versions of PostgreSQL have been incrementally chipping away at this limitation. - -#### Why throughput is important - -Throughput allows you to do more with less. If you're able to serve 30,000 queries per second using a single machine, but only using 1,000 today, you're unlikely to need an upgrade anytime soon. On the other hand, if the system can only serve 5,000 requests, an expensive and possibly stressful upgrade is in your near future. - -### Latency - -
- -Latency is defined as the time it takes to return a single XGBoost prediction. Since most systems have limited resources, throughput directly impacts latency (and vice versa). If there are many active requests, clients waiting in the queue take longer to be serviced, and overall system latency increases. - -In this benchmark, PostgresML outperformed Python by a **factor of 8** as well. You'll note the same issue happens at 20 clients, and the same mitigation using PgBouncer reduces its impact. Meanwhile, Python's latency continues to increase substantially. - -Latency is a good metric to use when describing the performance of an architecture. In other words, if I were to use this service, I would get a prediction back in at most this long, irrespective of how many other clients are using it. - -#### Why latency is important - -Latency is important in machine learning services because they are often running as an addition to the main application, and sometimes have to be accessed multiple times during the same HTTP request. - -Let's take the example of an e-commerce website. A typical storefront wants to show many personalization models concurrently. Examples of such models could include "buy it again" recommendations for recurring purchases (binary classification), or "popular items in your area" (geographic clustering of purchase histories) or "customers like you bought this item" (nearest neighbour model). - -All of these models are important because they have been proven, over time, to be very successful at driving purchases. If inference latency is high, the models start to compete for very expensive real estate, front page and checkout, and the business has to drop some of them or, more likely, suffer from slow page loads. Nobody likes a slow app when they are trying to order groceries or dinner. - -### Memory utilization - -
- -Python is known for using more memory than more optimized languages and, in this case, it uses **7 times** more than PostgresML. - -PostgresML is a Postgres extension, and it shares RAM with the database server. Postgres is very efficient at fetching and allocating only the memory it needs: it reuses `shared_buffers` and OS page cache to store rows for inference, and requires very little to no memory allocation to serve queries. - -Meanwhile, Python must allocate memory for each feature it receives from Redis and for each HTTP response it returns. This benchmark did not measure Redis memory utilization, which is an additional and often substantial cost of running traditional machine learning microservices. - -#### Training - -
- -Since Python often uses Pandas to load and preprocess data, it is notably more memory hungry. Before even passing the data into XGBoost, we were already at 8GB RSS (resident set size); during actual fitting, memory utilization went to almost 12GB. This test is another best case scenario for Python, since the data has already been preprocessed, and was merely passed on to the algorithm. - -Meanwhile, PostresML enjoys sharing RAM with the Postgres server and only allocates the memory needed by XGBoost. The dataset size was significant, but we managed to train the same model using only 5GB of RAM. PostgresML therefore allows training models on datasets at least twice as large as Python, all the while using identical hardware. - -#### Why memory utilization is important - -This is another example of doing more with less. Most machine learning algorithms, outside of FAANG and research universities, require the dataset to fit into the memory of a single machine. Distributed training is not where we want it to be, and there is still so much value to be extracted from simple linear regressions. - -Using less RAM allows to train larger and better models on larger and more complete datasets. If you happen to suffer from large machine learning compute bills, using less RAM can be a pleasant surprise at the end of your fiscal year. - -## What about UltraJSON/MessagePack/Serializer X? - -We spent a lot of time talking about serialization, so it makes sense to look at prior work in that field. - -JSON is the most user-friendly format, but it's certainly not the fastest. MessagePack and Ultra JSON, for example, are sometimes faster and more efficient at reading and storing binary information. So, would using them in this benchmark be better, instead of Python's built-in `json` module? - -The answer is: not really. - -
- -
- -Time to (de)serialize is important, but ultimately needing (de)serialization in the first place is the bottleneck. Taking data out of a remote system (e.g. a feature store like Redis), sending it over a network socket, parsing it into a Python object (which requires memory allocation), only to convert it again to a binary type for XGBoost, is causing unnecessary delays in the system. - -PostgresML does **one in-memory copy** of features from Postgres. No network, no (de)serialization, no unnecessary latency. - -## What about the real world? - -Testing over localhost is convenient, but it's not the most realistic benchmark. In production deployments, the client and the server are on different machines, and in the case of the Python + Redis architecture, the feature store is yet another network hop away. - -To demonstrate this, we spun up 3 EC2 instances and ran the benchmark again. This time, PostgresML outperformed Python and Redis **by a factor of 40**. - -
- -
- -Network gap between Redis and Gunicorn made things worse...a lot worse. Fetching data from a remote feature store added milliseconds to the request the Python architecture could not spare. The additional latency compounded, and in a system that has finite resources, caused contention. Most Gunicorn threads were simply waiting on the network, and thousands of requests were stuck in the queue. - -PostgresML didn't have this issue, because the features and the Rust inference layer live on the same system. This architectural choice removes network latency and (de)serialization from the equation. - -You'll note the concurrency issue we discussed earlier hit Postgres at 20 connections, and we used PgBouncer again to save the day. - -Scaling Postgres, once you know how to do it, isn't as difficult as it sounds. - -## Methodology - -### Hardware - -Both the client and the server in the first benchmark were located on the same machine. Redis was local as well. The machine is an 8 core, 16 threads AMD Ryzen 7 5800X with 32GB RAM, 1TB NVMe SSD running Ubuntu 22.04. - -AWS EC2 benchmarks were done with one `c5.4xlarge` instance hosting Gunicorn and PostgresML, and two `c5.large` instances hosting the client and Redis, respectively. They were located in the same VPC. - -### Configuration - -Gunicorn was running with 5 workers and 2 threads per worker. Postgres was using 1, 5 and 20 connections for 1, 5 and 20 clients, respectively. PgBouncer was given a `default_pool_size` of 10, so a maximum of 10 Postgres connections were used for 20 and 100 clients. - -XGBoost was allowed to use 2 threads during inference, and all available CPU cores (16 threads) during training. - -Both `ab` and `pgbench` use all available resources, but are very lightweight; the requests were a single JSON object and a single query respectively. Both of the clients use persistent connections, `ab` by using HTTP Keep-Alives, and `pgbench` by keeping the Postgres connection open for the duration of the benchmark. - -## ML - -### Data - -We used the [Flight Status Prediction](https://www.kaggle.com/datasets/robikscube/flight-delay-dataset-20182022) dataset from Kaggle. After some post-processing, it ended up being about 2 GB of floating point features. We didn't use all columns because some of them are redundant, e.g. airport name and airport identifier, which refer to the same thing. - -### Model - -Our XGBoost model was trained with default hyperparameters and 25 estimators (also known as boosting rounds). - -Data used for training and inference is available [here](https://static.postgresml.org/benchmarks/flights.csv). Data stored in the Redis feature store is available [here](https://static.postgresml.org/benchmarks/flights\_sub.csv). It's only a subset because it was taking hours to load the entire dataset into Redis with a single Python process (28 million rows). Meanwhile, Postgres `COPY` only took about a minute. - -PostgresML model is trained with: - -```postgresql -SELECT * FROM pgml.train( - project_name => 'r2', - algorithm => 'xgboost', - hyperparams => '{ "n_estimators": 25 }' -); -``` - -It had terrible accuracy (as did the Python version), probably because we were missing any kind of weather information, the latter most likely causing delays at airports. - -### Source code - -Benchmark source code can be found on [Github](https://github.com/postgresml/postgresml/tree/master/pgml-cms/docs/blog/benchmarks/python\_microservices\_vs\_postgresml/). diff --git a/pgml-cms/docs/resources/developer-docs/README.md b/pgml-cms/docs/resources/developer-docs/README.md deleted file mode 100644 index b9194723c..000000000 --- a/pgml-cms/docs/resources/developer-docs/README.md +++ /dev/null @@ -1,2 +0,0 @@ -# Developer Docs - diff --git a/pgml-cms/docs/summary_draft.md b/pgml-cms/docs/summary_draft.md deleted file mode 100644 index 6f18c1972..000000000 --- a/pgml-cms/docs/summary_draft.md +++ /dev/null @@ -1,154 +0,0 @@ -# Table of contents - -## Introduction - -* [Overview](README.md) -* [Getting started](introduction/getting-started/README.md) - * [Create your database](introduction/getting-started/create-your-database.md) - * [Connect your app](introduction/getting-started/connect-your-app.md) -* [Import your data](introduction/getting-started/import-your-data/README.md) - * [Logical replication](introduction/getting-started/import-your-data/logical-replication/README.md) - * [Foreign Data Wrappers](introduction/getting-started/import-your-data/foreign-data-wrappers.md) - * [Move data with COPY](introduction/getting-started/import-your-data/copy.md) - * [Migrate with pg_dump](introduction/getting-started/import-your-data/pg-dump.md) - -## API - -* [Overview](api/overview.md) -* [SQL extension](api/sql-extension/README.md) - * [pgml.embed()](api/sql-extension/pgml.embed.md) - * [pgml.transform()](api/sql-extension/pgml.transform/README.md) - * [Fill-Mask](api/sql-extension/pgml.transform/fill-mask.md) - * [Question answering](api/sql-extension/pgml.transform/question-answering.md) - * [Summarization](api/sql-extension/pgml.transform/summarization.md) - * [Text classification](api/sql-extension/pgml.transform/text-classification.md) - * [Text Generation](api/sql-extension/pgml.transform/text-generation.md) - * [Text-to-Text Generation](api/sql-extension/pgml.transform/text-to-text-generation.md) - * [Token Classification](api/sql-extension/pgml.transform/token-classification.md) - * [Translation](api/sql-extension/pgml.transform/translation.md) - * [Zero-shot Classification](api/sql-extension/pgml.transform/zero-shot-classification.md) - * [pgml.deploy()](api/sql-extension/pgml.deploy.md) - * [pgml.decompose()](api/sql-extension/pgml.decompose.md) - * [pgml.chunk()](api/sql-extension/pgml.chunk.md) - * [pgml.generate()](api/sql-extension/pgml.generate.md) - * [pgml.predict()](api/sql-extension/pgml.predict/README.md) - * [Batch Predictions](api/sql-extension/pgml.predict/batch-predictions.md) - * [pgml.train()](api/sql-extension/pgml.train/README.md) - * [Regression](api/sql-extension/pgml.train/regression.md) - * [Classification](api/sql-extension/pgml.train/classification.md) - * [Clustering](api/sql-extension/pgml.train/clustering.md) - * [Decomposition](api/sql-extension/pgml.train/decomposition.md) - * [Data Pre-processing](api/sql-extension/pgml.train/data-pre-processing.md) - * [Hyperparameter Search](api/sql-extension/pgml.train/hyperparameter-search.md) - * [Joint Optimization](api/sql-extension/pgml.train/joint-optimization.md) - * [pgml.tune()](api/sql-extension/pgml.tune.md) -* [Client SDK](api/client-sdk/README.md) - * [Collections](api/client-sdk/collections.md) - * [Pipelines](api/client-sdk/pipelines.md) - * [Vector Search](api/client-sdk/search.md) - * [Document Search](api/client-sdk/document-search.md) - * [Tutorials](api/client-sdk/tutorials/README.md) - * [Semantic Search](api/client-sdk/tutorials/semantic-search.md) - * [Semantic Search Using Instructor Model](api/client-sdk/tutorials/semantic-search-1.md) - -## Guides - -* [Embeddings](open-source/pgml/guides/embeddings/README.md) - * [In-database Generation](open-source/pgml/guides/embeddings/in-database-generation.md) - * [Dimensionality Reduction](open-source/pgml/guides/embeddings/dimensionality-reduction.md) - * [Aggregation](open-source/pgml/guides/embeddings/vector-aggregation.md) - * [Similarity](open-source/pgml/guides/embeddings/vector-similarity.md) - * [Normalization](open-source/pgml/guides/embeddings/vector-normalization.md) - - - -* [Search](open-source/pgml/guides/improve-search-results-with-machine-learning.md) -* [Chatbots](open-source/pgml/guides/chatbots/README.md) - * [Example Application](use-cases/chatbots.md) -* [Supervised Learning](open-source/pgml/guides/supervised-learning.md) -* [OpenSourceAI](open-source/pgml/guides/opensourceai.md) -* [Natural Language Processing](open-source/pgml/guides/natural-language-processing.md) - - - -## Product - -* [Cloud database](product/cloud-database/README.md) - * [Serverless](product/cloud-database/serverless.md) - * [Dedicated](product/cloud-database/dedicated.md) - * [Enterprise](product/cloud-database/plans.md) -* [Vector database](product/vector-database.md) -* [PgCat pooler](product/pgcat/README.md) - * [Features](product/pgcat/features.md) - * [Installation](product/pgcat/installation.md) - * [Configuration](product/pgcat/configuration.md) - - -## Resources - -* [Architecture](resources/architecture/README.md) - * [Why PostgresML?](resources/architecture/why-postgresml.md) -* [FAQs](resources/faqs.md) -* [Data Storage & Retrieval](resources/data-storage-and-retrieval/README.md) - * [Documents](resources/data-storage-and-retrieval/documents.md) - * [Partitioning](resources/data-storage-and-retrieval/partitioning.md) - * [LLM based pipelines with PostgresML and dbt (data build tool)](resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md) -* [Benchmarks](resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md) - * [PostgresML is 8-40x faster than Python HTTP microservices](resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md) - * [Scaling to 1 Million Requests per Second](resources/benchmarks/million-requests-per-second.md) - * [MindsDB vs PostgresML](resources/benchmarks/mindsdb-vs-postgresml.md) - * [GGML Quantized LLM support for Huggingface Transformers](resources/benchmarks/ggml-quantized-llm-support-for-huggingface-transformers.md) - * [Making Postgres 30 Percent Faster in Production](resources/benchmarks/making-postgres-30-percent-faster-in-production.md) -* [Developer Docs](resources/developer-docs/README.md) - * [Local Docker Development](resources/developer-docs/quick-start-with-docker.md) - * [Installation](resources/developer-docs/installation.md) - * [Contributing](resources/developer-docs/contributing.md) - * [Distributed Training](resources/developer-docs/distributed-training.md) - * [GPU Support](resources/developer-docs/gpu-support.md) - * [Self-hosting](resources/developer-docs/self-hosting/README.md) - * [Pooler](resources/developer-docs/self-hosting/pooler.md) - * [Building from source](resources/developer-docs/self-hosting/building-from-source.md) - * [Replication](resources/developer-docs/self-hosting/replication.md) - * [Backups](resources/developer-docs/self-hosting/backups.md) - * [Running on EC2](resources/developer-docs/self-hosting/running-on-ec2.md) diff --git a/pgml-cms/docs/use-cases/README.md b/pgml-cms/docs/use-cases/README.md deleted file mode 100644 index 9b163e6e0..000000000 --- a/pgml-cms/docs/use-cases/README.md +++ /dev/null @@ -1 +0,0 @@ -use-cases section is deprecated, and is being refactored into guides, or a new section under product \ No newline at end of file diff --git a/pgml-cms/docs/use-cases/embeddings/README.md b/pgml-cms/docs/use-cases/embeddings/README.md deleted file mode 100644 index 1906c7873..000000000 --- a/pgml-cms/docs/use-cases/embeddings/README.md +++ /dev/null @@ -1,87 +0,0 @@ -# Embeddings - -## Embeddings - -Embeddings are a numeric representation of text. They are used to represent words and sentences as vectors, an array of numbers. Embeddings can be used to find similar pieces of text, by comparing the similarity of the numeric vectors using a distance measure, or they can be used as input features for other machine learning models, since most algorithms can't use text directly. - -Many pretrained LLMs can be used to generate embeddings from text within PostgresML. You can browse all the [models](https://huggingface.co/models?library=sentence-transformers) available to find the best solution on Hugging Face. - -PostgresML provides a simple interface to generate embeddings from text in your database. You can use the `pgml.embed` function to generate embeddings for a column of text. The function takes a transformer name and a text value. The transformer will automatically be downloaded and cached for reuse. - -### Long Form Examples - -For a deeper dive, check out the following articles we've written illustrating the use of embeddings: - -* [Generating LLM embeddings in the database with open source models](https://postgresml.org/blog/generating-llm-embeddings-with-open-source-models-in-postgresml) -* [Tuning vector recall while generating query embeddings on the fly](https://postgresml.org/blog/tuning-vector-recall-while-generating-query-embeddings-in-the-database) -* [Personalize embedding results with application data in your database](https://postgresml.org/blog/personalize-embedding-results-with-application-data-in-your-database) - -### API - -```postgresql -pgml.embed( - transformer TEXT, -- huggingface sentence-transformer name - text TEXT, -- input to embed - kwargs JSON -- optional arguments (see below) -) -``` - -### Example - -Let's use the `pgml.embed` function to generate embeddings for tweets, so we can find similar ones. We will use the `distilbert-base-uncased` model. This model is a small version of the `bert-base-uncased` model. It is a good choice for short texts like tweets. To start, we'll load a dataset that provides tweets classified into different topics. - -```postgresql -SELECT pgml.load_dataset('tweet_eval', 'sentiment'); -``` - -View some tweets and their topics. - -```postgresql -SELECT * -FROM pgml.tweet_eval -LIMIT 10; -``` - -Get a preview of the embeddings for the first 10 tweets. This will also download the model and cache it for reuse, since it's the first time we've used it. - -```postgresql -SELECT text, pgml.embed('distilbert-base-uncased', text) -FROM pgml.tweet_eval -LIMIT 10; -``` - -It will take a few minutes to generate the embeddings for the entire dataset. We'll save the results to a new table. - -```postgresql -CREATE TABLE tweet_embeddings AS -SELECT text, pgml.embed('distilbert-base-uncased', text) AS embedding -FROM pgml.tweet_eval; -``` - -Now we can use the embeddings to find similar tweets. We'll use the `pgml.cosign_similarity` function to find the tweets that are most similar to a given tweet (or any other text input). - -```postgresql -WITH query AS ( - SELECT pgml.embed('distilbert-base-uncased', 'Star Wars christmas special is on Disney') AS embedding -) -SELECT text, pgml.cosine_similarity(tweet_embeddings.embedding, query.embedding) AS similarity -FROM tweet_embeddings, query -ORDER BY similarity DESC -LIMIT 50; -``` - -On small datasets (<100k rows), a linear search that compares every row to the query will give sub-second results, which may be fast enough for your use case. For larger datasets, you may want to consider various indexing strategies offered by additional extensions. - -* [Cube](https://www.postgresql.org/docs/current/cube.html) is a built-in extension that provides a fast indexing strategy for finding similar vectors. By default it has an arbitrary limit of 100 dimensions, unless Postgres is compiled with a larger size. -* [PgVector](https://github.com/pgvector/pgvector) supports embeddings up to 2000 dimensions out of the box, and provides a fast indexing strategy for finding similar vectors. - -```postgresql -CREATE EXTENSION vector; -CREATE TABLE items (text TEXT, embedding VECTOR(768)); -INSERT INTO items SELECT text, embedding FROM tweet_embeddings; -CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops); -WITH query AS ( - SELECT pgml.embed('distilbert-base-uncased', 'Star Wars christmas special is on Disney')::vector AS embedding -) -SELECT * FROM items, query ORDER BY items.embedding <=> query.embedding LIMIT 10; -``` diff --git a/pgml-cms/docs/use-cases/embeddings/generating-llm-embeddings-with-open-source-models-in-postgresml.md b/pgml-cms/docs/use-cases/embeddings/generating-llm-embeddings-with-open-source-models-in-postgresml.md deleted file mode 100644 index e69de29bb..000000000 diff --git a/pgml-cms/docs/use-cases/embeddings/tuning-vector-recall-while-generating-query-embeddings-in-the-database.md b/pgml-cms/docs/use-cases/embeddings/tuning-vector-recall-while-generating-query-embeddings-in-the-database.md deleted file mode 100644 index 96c99a15d..000000000 --- a/pgml-cms/docs/use-cases/embeddings/tuning-vector-recall-while-generating-query-embeddings-in-the-database.md +++ /dev/null @@ -1,502 +0,0 @@ -# Tuning vector recall while generating query embeddings in the database - - -PostgresML makes it easy to generate embeddings using open source models and perform complex queries with vector indexes unlike any other database. The full expressive power of SQL as a query language is available to seamlessly combine semantic, geospatial, and full text search, along with filtering, boosting, aggregation, and ML reranking in low latency use cases. You can do all of this faster, simpler and with higher quality compared to applications built on disjoint APIs like OpenAI + Pinecone. Prove the results in this series to your own satisfaction, for free, by signing up for a GPU accelerated database. - -## Introduction - -This article is the second in a multipart series that will show you how to build a post-modern semantic search and recommendation engine, including personalization, using open source models. - -1. Generating LLM Embeddings with HuggingFace models -2. Tuning vector recall with pgvector -3. Personalizing embedding results with application data -4. Optimizing semantic results with an XGBoost ranking model - coming soon! - -The previous article discussed how to generate embeddings that perform better than OpenAI's `text-embedding-ada-002` and save them in a table with a vector index. In this article, we'll show you how to query those embeddings effectively. - - -_Embeddings show us the relationships between rows in the database, using natural language._ - -Our example data is based on 5 million DVD reviews from Amazon customers submitted over a decade. For reference, that's more data than fits in a Pinecone Pod at the time of writing. Webscale: check. Let's start with a quick refresher on the data in our `pgml.amazon_us_reviews` table: - -!!! generic - -!!! code\_block time="107.207ms" - -```postgresql -SELECT * -FROM pgml.amazon_us_reviews -LIMIT 5; -``` - -!!! - -!!! results - -| marketplace | customer\_id | review\_id | product\_id | product\_parent | product\_title | product\_category | star\_rating | helpful\_votes | total\_votes | vine | verified\_purchase | review\_headline | review\_body | review\_date | id | review\_embedding\_e5\_large | -| ----------- | ------------ | -------------- | ----------- | --------------- | ----------------------------------------------------------------------------------------------------------------- | ----------------- | ------------ | -------------- | ------------ | ---- | ------------------ | ------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------ | -- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| US | 16164990 | RZKBT035JA0UQ | B00X797LUS | 883589001 | Revenge: Season 4 | Video DVD | 5 | 1 | 2 | 0 | 1 | It's a hit with me | I don't usually watch soap operas, but Revenge grabbed me from the first episode. Now I have all four seasons and can watch them over again. If you like suspense and who done it's, then you will like Revenge. The ending was terrific, not to spoil it for those who haven't seen the show, but it's more fun to start with season one. | 2015-08-31 | 11 | \[-0.44635132,-1.4744929,0.29134354,0.060305085,-0.41350508,0.5875407,-0.061205346,0.3317157,0.3318643,-0.31223094,0.4632605,1.1153598,0.8087972,0.24135485,-0.09573943,-0.6522662,0.3471857,0.06589421,-0.49588993,-0.10770899,-0.12906694,-0.6840891,-0.0079286955,0.6722917,-1.1333038,0.9841143,-0.05413917,-0.63103,0.4891317,0.49941555,0.36425045,-1.1122142,0.39679757,-0.16903037,2.0291917,-0.4769759,0.069017395,-0.13972181,0.26427677,0.05579555,0.7277221,-0.09724414,-0.4079459,0.8500204,-1.4091835,0.020688279,-0.68782306,-0.024399774,1.159901,-0.7870475,0.8028308,-0.48158854,0.7254225,0.31266358,-0.8171888,0.0016202603,0.18997599,1.1948254,-0.027479807,-0.46444815,-0.16508491,0.7332363,0.53439474,0.17962055,-0.5157759,0.6162931,-0.2308871,-1.2384704,0.9215715,0.093228154,-1.0873187,0.44506252,0.6780382,1.4210767,-0.035378184,-0.37101075,0.36248568,-0.20481548,1.7752264,0.96295184,0.25421357,0.32428253,0.15021282,1.2010641,1.3598334,-0.09641862,1.9206793,-0.6621351,-0.19654606,0.9614237,0.8942871,0.06781684,0.6154728,0.5322664,-0.47281718,-0.10806668,0.19615875,1.1427128,1.1363747,-0.7448851,-0.6235285,-0.4178455,0.2823742,0.2022872,0.4639155,-0.82450366,-1.0911003,0.29300234,0.09920952,0.35992235,-0.89154017,0.6345019,-0.3539376,0.13820754,-0.08596075,-0.016720073,-0.86973023,0.60496914,1.0057746,1.4023327,1.3364636,0.41459054,0.8762501,-0.9326738,-0.62262,0.8540947,0.46354002,-0.5997743,0.14315224,1.276051,0.22685385,-0.27431846,-0.35084888,0.124737024,1.3882787,1.27789,-2.0416644,-1.2735635,0.45739195,-0.5252866,-0.049650192,-1.2893498,-0.13299808,-0.37871423,1.3282262,0.40052852,0.7439125,0.4438182,-0.11048192,0.28375423,-0.641405,-0.393038,-0.5177149,-0.9469533,-1.1396636,-1.2370745,0.36096996,0.02870304,0.5063284,-0.07706672,0.94798875,-0.27705917,-0.29239914,0.31463885,-1.0989273,-0.656829,2.8949435,-0.17305379,0.3815719,0.42526448,0.3081009,0.5685343,0.33076203,0.72707826,0.50143975,0.5845048,0.84975934,0.42427582,0.30121675,0.5989959,-0.7319157,-0.549556,0.63867736,0.012300444,-0.45165,0.6612118,-0.512683,-0.5376379,0.47559577,-0.8463519,-1.1943918,-0.76171356,0.7841424,0.5601279,-0.82258976,-1.0125699,-0.38812968,0.4420742,-0.6571599,-0.06353831,-0.59025985,0.61750174,1.126035,-1.280225,0.04327058,1.0567118,0.5743241,-1.1305283,0.45828968,-0.74915165,-1.0058457,0.44758803,-0.41461354,0.09315924,0.33658516,-0.0040031066,-0.06580057,0.5101937,-0.45152435,0.009831754,-0.86611366,0.71392256,1.3910902,1.0870686,0.7477381,0.96166354,0.27147853,0.044556435,0.6843247,-0.82584035,0.55440176,0.07432493,-0.0876536,0.89933145,-0.20821023,1.0045182,1.3212318,0.0023916673,0.30949935,-0.49783787,-0.0894654,0.42442265,0.16125606,-0.31338125,-0.18276067,0.8512234,0.29042283,1.1811026,0.17194802,0.104081966,-0.17348862,0.3214033,0.05323091,0.452102,0.44595376,-0.54339683,1.2369651,-0.90202415,-0.14463677,-0.40089816,0.4221295,-0.27183273,-0.46332398,0.03636483,-0.4491677,0.11768485,0.25375235,-0.5391649,1.6532613,-0.44395766,0.52174264,0.46777102,-0.6175785,-0.8521162,0.4074876,0.8601743,0.16133149,1.2534949,0.17186514,-1.4400607,0.12929483,0.19184573,-0.10323317,0.17845587,-0.9316995,-0.29608884,-0.15901098,0.13879488,0.7077851,0.7130752,-0.33218113,0.65922844,-0.16829759,-0.85618913,-0.50507075,0.04030782,0.28823212,0.63344556,-0.64391583,0.82986885,0.36421177,-0.31541574,0.15703243,-0.6918284,0.07207678,0.10856655,0.1837874,0.20774966,0.5002916,0.36118835,0.15846755,-0.59214884,-0.2806985,-1.4209367,-0.8781769,0.59149474,0.09860907,0.7798751,0.08356752,-0.3816034,0.62692493,1.0605069,0.009612969,-1.1639553,0.0387234,-0.62128127,-0.65425646,0.026634911,0.13652368,-0.31386188,0.5132959,-0.2279612,1.5733948,0.9453454,-0.47791338,-0.86752695,0.2590365,0.010133599,0.0731045,-0.08996825,1.5178722,0.2790404,0.42920277,0.16204502,0.51732993,0.7824352,-0.53204685,0.6322838,0.027865775,0.1909194,0.75459373,0.5329097,-0.25675827,-0.6438361,-0.6730749,0.0419199,1.647542,-0.79603523,-0.039030924,0.57257867,0.97090834,-0.18933444,0.061723463,0.054686982,0.057177402,0.24391848,-0.45859554,0.36363262,-0.028061919,0.5537379,0.23430054,0.06542831,-0.8465644,-0.61477613,-1.8602425,-0.5563627,0.5518607,1.1379824,0.05827968,0.6034838,0.10843904,0.66301763,-0.68257576,0.49940518,-1.0600849,0.3026614,0.20583217,0.45980504,-0.54227024,0.83065176,-0.12527004,0.94367605,-0.22141562,0.2656482,-1.0248334,-0.64097667,0.9686471,-0.2892358,-0.7154707,0.33837032,0.25886488,1.754326,0.040067837,-0.0130331945,1.014779,0.6381671,-0.14163442,-0.6668947,-0.52272713,0.44740087,1.0573436,0.7079764,-0.4765707,-0.45119467,0.33266848,-0.3335042,0.6264001,0.096436426,0.4861287,-0.64570946,-0.55701566,-0.8017526,-0.3268717,0.6509844,0.51674,0.5527258,0.06715509,0.13850002,-0.16415404,0.5339686,0.7038742,-0.23962326,-0.40861428,-0.80195314,-0.2562518,-0.31416067,-0.6004696,0.17173254,-0.08187528,-0.10650221,-0.8317999,0.21745056,0.5430748,-0.95596164,0.47898734,-0.6119156,0.41032174,-0.55160147,0.23355038,0.51838225,0.6097409,0.54803956,-0.64297825,-1.095854,-1.7266736,0.46846822,0.24315582,0.93500775,-1.2847418,-0.09460731,-0.9284272,-0.58228695,0.35412273,-1.338897,0.09689145,-0.9634888,-0.105158746,-0.24354713,-1.8149018,-0.81706595,0.5610544,0.2604056,-0.15690021,-0.34233433,0.21085337,0.095561,0.3357639,-0.4168723,-0.16001065,0.019738067,-0.25119543,0.21538053,0.9338039,-1.3079301,-0.5274139,0.0042342604,-0.26708132,-1.1157236,0.41096166,-1.0650482,-0.92784685,0.1649683,-0.076478265,-0.89887,-0.49810255,-0.9988228,0.398151,-0.1489247,0.18536144,0.47142923,0.7188731,-0.19373408,-0.43892148,-0.007021479,0.27125278,-0.0755358,-0.21995014,-0.09820049,-1.1432658,-0.6438058,0.45684898,-0.16717891,-0.06339566,-0.54050285,-0.21786614,-0.009872514,0.95797646,-0.6364886,0.06476644,0.15031907,-0.114178315,-0.6920534,0.33618665,-0.20828676,-1.218436,1.0650855,0.92841274,0.15988845,1.5152671,-0.27995184,0.43647304,0.123278655,-1.320316,-0.25041837,0.24997042,0.87653285,0.12610753,-0.8309733,0.5842415,-0.840945,-0.46114716,0.51617026,-0.6507864,1.5720816,0.43062973,-0.7194931,-1.400388,-0.9877925,-0.87884194,0.46331164,-0.51055473,0.24852753,0.30240974,0.12866661,-0.84918654,-0.3372634,0.46535993,0.22479752,0.7400517,0.4833228,1.3157144,1.270739,0.93192166,0.9926317,0.7777536,-0.8000388,-0.22760339,-0.7243004,-0.90151507,-0.73649806,-0.18375495,-0.9876769,-0.22154166,0.15750378,-0.051066816,1.218425,0.58040893,-0.32723624,0.08092578,-0.41428035,-0.8565249,-1.3621647,0.42233124,0.49325675,1.4729465,0.957077,-0.40788552,-0.7064396,0.67477965,0.74812657,0.17461313,1.2278605,0.42229348,0.00287759,1.6320366,0.045381133,0.8773843,-0.23280792,0.025544237,0.75055337,0.8755495,-0.21244618,-0.6180616,-0.019127166,0.55689186,1.2838972,-0.8412692,0.8461143,0.39903468,0.1857164,-0.025012616,-0.8494315,-0.2573743,-1.1831325,-0.5007239,0.5891477,-1.2416826,0.38735542,0.41872358,1.0267426,0.2482442,-0.060767986,0.7538531,-0.24033615,0.9042795,-0.24176258,-0.44520715,0.7715707,-0.6773665,0.9288903,-0.3960447,-0.041194934,0.29724947,0.8664729,0.07247823,-1.7166628,-1.1924342,-1.1135329,0.4729775,0.5345159,0.57545316,0.14463085,-0.34623942,1.2155776,0.24223511,1.3281958,-1.0329959,-1.3902934,0.09121965,0.18269718,-1.3109862,1.4591801,0.58750343,-0.8072534,0.23610781,-1.4992374,0.71078837,0.25371152,0.85618514,0.807575,1.2301548,-0.27820417,-0.29354396,0.28911537,1.2117325,4.4740834,1.3543533,0.214103,-1.3109514,-0.013579576,-0.53262085,-0.22086248,0.24246897,-0.26330945,0.30646166,-0.21399511,1.5816526,0.64849514,0.31172174,0.57089436,1.0467637,-0.42125005,-0.2877409,0.6157391,-0.6682809,-0.44719923,-0.251028,-1.0622188,-1.5241078,1.3073357,-0.21030799,0.75480264,-1.0422926,0.23265716,0.20796475,0.73489463,0.5507254,-0.04313501,1.30877,0.19338085,0.27448726,0.04000665,-0.7004063,-1.0822202,0.6009482,0.2412081,0.33919787,0.020680452,0.7649121,-0.69652104,-0.5461974,-0.60095215,-0.9746675,0.7837197,1.2018669,-0.23473008,-0.44692823,0.12413922,-1.3088125,-1.4267013,0.82524955,0.8647329,0.16150166,-1.4038807,-0.8987668,0.61025685,-0.8479041,0.59218127,0.65450156,-0.022710972,0.19090322,-0.55995494,0.12569806,0.019536465,-0.5719187,-1.1703067,0.13916619,-1.2546546,0.3547577,-0.6583496,1.4738533,0.15210527,0.045928936,-1.7701638,-1.1357217,0.0656034,0.34817895,-0.9715934,-0.036333986,-0.54871166,-0.28730902,-0.4544463,0.0044411435,-0.091176935,0.5609336,0.8184279,1.7430352,0.14487076,-0.54478693,0.13478011,-0.78083384,-0.5450215,-0.39379802,-0.52507687,0.8898843,-0.46146545,-0.6123672,-0.20210318,0.72413814,-1.3112601,0.20672223,0.73001564,-1.4695473,-0.3112792,-0.048050843,-0.25363198,-1.0228323,-0.071546085,-0.3245472,0.12762389,-0.064207725,-0.46297944,-0.61758167,1.1423731,-1.2279893,1.4896537,-0.61985505,-0.39032778,-1.1789387,-0.05861108,0.33709309,-0.11082967,0.35026795,0.011960861,-0.73383653,-0.5427297,-0.48166794,-1.1341039,-0.07019004,-0.6253811,-0.55956876,-0.87954766,0.0038243965,-1.1747614,-0.2742908,1.3408217,-0.8604027,-0.4190716,1.0705358,-0.17213087,0.2715014,0.8245274,0.06066578,0.82805973,0.47945866,-0.37825295,0.014340248,0.9461009,0.256653,-0.19689955,1.1786914,0.18505198,0.710402,-0.59817654,0.12953508,0.48922333,0.8255816,0.4042885,-0.75975555,0.20467097,0.018755354,-0.69151515,-0.23537838,0.26312333,0.82981825,-0.10950847,-0.25987357,0.33299834,-0.31744313,-0.4765103,-0.8831548,0.056800444,0.07922315,0.5476093,-0.817339,0.22928628,0.5257919,-1.1328216,0.66853505,0.42755872,-0.18290512,-0.49680132,0.7065077,-0.2543334,0.3081367,0.5692426,0.31948256,0.668704,0.72916716,-0.3097971,0.04443544,0.5626836,1.5217534,-0.51814324,-1.2701787,0.6485761,-0.8157134,-0.74196255,0.7771558,-1.3504819,0.2796807,0.44736814,0.6552933,0.13390358,0.5573986,0.099469736,-0.48586744,-0.16189729,0.40172148,-0.18505138,0.3092212,-0.30285,-0.45625964,0.8346098,-0.14941978,-0.44034964,-0.13228996,-0.45626387,-0.5833162,-0.56918347,-0.10052125,0.011119543,-0.423692,-0.36374965,-1.0971813,0.88712555,0.38785303,-0.22129343,0.19810538,0.75521517,-0.34437984,-0.9454472,-0.006488466,-0.42379746,-0.67618704,-0.25211233,0.2702919,-0.6131363,0.896094,-0.4232919,-0.25754875,-0.39714852,1.4831372,0.064787336,-0.770308,0.036396563,0.2313668,0.5655817,-0.6738516,0.857144,0.77432656,0.1454645,-1.3901217,-0.46331334,0.109622695,0.45570934,0.92387015,-0.011060692,0.30186698,-0.35252112,0.1457121,-0.2570497,0.7082791,-0.30265188,-0.23325084,-0.026542446,-0.17957532,1.1194676,0.59331983,-0.34250805,0.39761257,-0.97051114,0.6302743,-1.0416062,-0.14316575,-0.17302139,0.25761867,-0.62417996,0.427799,-0.26894867,0.4448027,-0.6683409,-1.0712901,-0.49355477,0.46255362,-0.26607195,-0.1882482,-1.0833352,-1.2174416,-0.22160827,-0.63442576,-0.20239262,0.08509241,0.27062747,0.3231089,0.75656915,-0.59737813,0.64800847,-0.3792087,0.06189245,-1.0148673,-0.64977705,0.23959091,0.5693892,0.2220355,0.050067283,-1.1472284,-0.05411025,-0.51574,0.9436675,0.08399284,-0.1538182,-0.087096035,0.22088972,-0.74958104,-0.45439938,-0.9840612,0.18691222,-0.27567235,1.4122254,-0.5019997,0.59119046,-0.3159759,0.18572812,-0.8638007,-0.20484222,-0.22735544,0.009947425,0.08660857,-0.43803024,-0.87153643,0.06910624,1.3576175,-0.5727235,0.001615673,-0.5057925,0.93217665,-1.0369575,-0.8864083,-0.76695895,-0.6097337,0.046172515,0.4706499,-0.43419397,-0.7006992,-1.2508268,-0.5113818,0.96917367,-0.65436345,-0.83149797,-0.9900211,0.38023964,0.16216993,-0.11047968] | -| US | 33386989 | R253N5W74SM7N3 | B00C6MXB42 | 734735137 | YOUNG INDIANA JONES CHRONICLES Volumes 1, 2 and 3 DVD Sets (Complete Collections All 3 Volumes DVD Sets Together) | Video DVD | 4 | 1 | 1 | 0 | 1 | great stuff. I thought excellent for the kids | great stuff. I thought excellent for the kids. The extras are a must after the movie. | 2015-08-31 | 12 | \[0.30739722,-1.2976353,0.44150844,0.28229898,0.8129836,0.19451006,-0.16999333,-0.07356771,0.5831099,-0.5702598,0.5513152,0.9893058,0.8913247,1.2790804,-0.21743622,-0.13258074,0.5267081,-1.1273692,0.08361904,-0.32674226,-0.7284242,-0.3742802,-0.315159,-0.06914908,-0.9370208,0.5965896,-0.46391407,-0.30802932,0.34784046,0.35328323,-0.06566019,-0.83673024,1.2235038,-0.5311309,1.7232236,0.100425154,-0.42236832,-0.4189702,0.65639615,-0.19411941,0.2861547,-0.011099293,0.6224927,0.2937978,-0.57707405,0.1723467,-1.1128687,-0.23458324,0.85969496,-0.5544667,0.69622403,0.20537117,0.5376313,0.18094051,-0.5935286,0.58459294,0.2588672,1.2592428,0.40739542,-0.3853751,0.5736207,-0.27588457,0.44027475,0.06457652,-0.40556684,-0.25630975,-0.0024269535,-0.63066584,1.435617,-0.41023165,-0.39362282,0.9855966,1.1903448,0.8181575,-0.13602419,-1.1992644,0.057811044,0.17973477,1.3552206,0.38971838,-0.021610033,0.19899082,-0.10303763,1.0268506,0.6143311,-0.21900427,2.4331384,-0.7311581,-0.07520742,0.25789547,0.78391874,-0.48391873,1.4095061,0.3000153,-1.1587081,-0.470519,0.63760203,1.212848,-0.13230722,0.1575143,0.5233601,-0.26733217,0.88544065,1.0455207,0.3242259,-0.08548101,-1.1858246,-0.34827423,0.10947221,0.7657727,-1.1886615,0.5846556,-0.06701131,-0.18275288,0.9688948,-0.44766253,-0.24283795,0.84013104,1.1865685,1.0322199,1.1621728,0.2904784,0.45513308,-0.046442263,-1.5924592,1.1268036,1.2244802,-0.12986387,-0.652806,1.3956618,0.09316843,0.0074809124,-0.40963998,0.11233859,0.23004606,1.0019808,-1.1334686,-1.6484728,0.17822856,-0.52497756,-0.97292185,-1.3860162,-0.10179921,0.41441512,0.94668996,0.6478229,-0.1378847,0.2240062,0.12373086,0.37892383,-1.0213026,-0.002514686,-0.6206891,-1.2263044,-0.81023514,-2.1251488,-0.05212076,0.5007569,-0.10503322,-0.15165941,0.80570364,-0.67640734,-0.38113695,-0.7051068,-0.7457319,-1.1459444,1.2534835,-0.48408872,0.20323983,0.49218604,-0.01939073,0.42854333,0.871685,0.3215819,-0.016663345,0.492181,0.93779576,0.59563607,1.2095222,-0.1319952,-0.74563706,-0.7584777,-0.06784309,1.0673252,-0.18296064,1.180183,-0.01517544,-0.996551,1.4614015,-0.9834482,-0.8929142,-1.1343371,1.2919606,0.67674285,-1.264175,-0.78025484,-0.91170585,0.6446593,-0.44662225,-0.02165111,-0.34166083,0.23982073,-0.0695019,-0.55098635,0.061257105,0.14019178,0.58004445,-0.22117937,0.20757008,-0.47917584,-0.23402964,0.07655301,-0.28613323,-0.24914591,-0.40391505,-0.53980047,1.0352598,0.08218856,-0.21157777,0.5807184,-1.4730825,0.3812591,0.83882,0.5867736,0.74007905,1.0515761,-0.15946862,1.1032714,0.58210975,-1.3155121,-0.74103445,-0.65089387,0.8670826,0.43553326,-0.6407162,0.47036576,1.5228021,-0.45694724,0.7269809,0.5492361,-1.1711032,0.23924577,0.34736052,-0.12079343,-0.09562126,0.74119747,-0.6178057,1.3842496,-0.24629863,0.16725276,0.543255,0.28207174,0.58856744,0.87834567,0.50831103,-1.2316333,1.2317014,-1.0706112,-0.16112426,0.6000713,0.5483024,-0.13964792,-0.75518215,-0.98008883,0.6262824,-0.056649026,-0.14632829,-0.6952095,1.1196847,0.16559249,0.8219887,0.27358034,-0.37535465,-0.45660818,0.47437778,0.54943615,0.6596993,1.3418778,0.088481836,-1.0798514,-0.20523094,-0.043823265,-0.03007651,0.6147437,-1.2054923,0.21634094,0.5619677,-0.38945594,1.1649859,0.67147845,-0.67930675,0.25937733,-0.41399506,0.14421114,0.8055827,0.11315601,-0.25499323,0.5075335,-0.96640706,0.86042404,0.27332047,-0.262736,0.1961017,-0.85305786,-0.32757896,0.008568222,-0.46760023,-0.5723287,0.353183,0.20126922,-0.022152433,0.39879513,-0.57369196,-1.1627877,-0.948688,0.54274577,0.52627236,0.7573314,-0.72570753,0.22652717,0.5562541,0.8202502,-1.0198171,-1.3022298,-0.2893229,-0.0275145,-0.46199337,0.119201764,0.73928577,0.05394686,0.5549575,0.5820973,0.5786865,0.4721187,-0.75830203,-1.2166464,-0.83674186,-0.3327995,-0.41074058,0.12167103,0.5753096,-0.39288408,0.101028144,-0.076566614,0.28128016,0.30121502,-0.45290747,0.3249064,0.29726675,0.060289554,1.012353,0.5653782,0.50774586,-1.1048855,-0.89840156,0.04853676,-0.0005516126,-0.43757257,0.52133596,0.90517247,1.2548338,0.032170154,-0.45365888,-0.32101494,0.52082396,0.06505445,-0.016106995,-0.15512307,0.4979914,0.019423941,-0.4410003,0.13686578,-0.55569375,-0.22618975,-1.3745868,0.14976598,0.31227916,0.22514923,-0.09152527,0.9595029,-0.24047574,0.9036276,0.06045522,0.4275914,-1.6211287,0.23627052,-0.123569466,1.0207809,-0.20820981,0.2928954,-0.37402752,-0.39281377,-0.9055283,0.42601687,-0.64971703,-0.83537567,-0.7551133,-0.3613483,-1.2591509,0.38164553,0.23480861,0.67463505,0.4188478,0.30875853,-0.23840418,-0.10466987,-0.45718357,-0.47870898,-0.7566724,-0.124758095,0.8912765,0.37436476,0.123713054,-0.9435858,-0.19343798,-0.7673082,0.45333877,-0.1314696,-0.046679523,-1.0924501,-0.36073965,-0.55994475,-0.25058964,0.6564909,-0.44103456,0.2519441,0.791008,0.7515483,-0.27565363,0.7055519,1.195922,0.37065807,-0.8460473,-0.070156336,0.46037647,-0.42738107,-0.40138105,0.13542275,-0.16810405,-0.17116192,-1.0791,0.094485305,0.499162,-1.3476236,0.21234894,-0.45902762,0.30559424,-0.75315285,-0.18889536,-0.18098111,0.6468135,-0.027758462,-0.4563393,-1.8142252,-1.1079813,0.15492673,0.67000175,1.7885993,-1.163623,-0.19585003,-1.265403,-0.65268534,0.8609888,-0.12089075,0.16340052,-0.40799433,0.1796395,-0.6490773,-1.1581244,-0.69040763,0.9861761,-0.94788885,-0.23661669,-0.26939982,-0.10966676,-0.2558066,0.11404798,0.2280753,1.1175905,1.2406538,-0.8405682,-0.0042185634,0.08700524,-1.490236,-0.83169794,0.80318516,-0.2759455,-1.2379494,1.2254013,-0.574187,-0.589692,-0.30691916,-0.23825237,-0.26592287,-0.34925,-1.1334181,0.18125409,-0.15863669,0.5677274,0.15621394,0.69536006,-0.7235879,-0.4440141,0.72681504,-0.071697086,-0.28574806,0.1978488,-0.29763848,-1.3379228,-1.7364287,0.4866264,-0.4246215,0.39696288,-0.39847228,-0.43619227,0.74066365,1.3941747,-0.980746,0.28616947,-0.41534734,-0.37235045,-0.3020338,-0.078414746,0.5320422,-0.8390588,0.39802805,0.9956247,0.48060423,1.0830654,-0.3462163,0.1495632,-0.70074755,-1.4337711,-0.47201052,-0.20542778,1.4469681,-0.28534025,-0.8658506,0.43706423,-0.031963903,-1.1208986,0.24726066,-0.15195882,1.6915563,0.48345947,0.36665258,-0.84477395,-0.67024755,-1.3117748,0.5186414,-0.111863896,-0.24438074,0.4496351,-0.16038479,-0.6309886,0.30835655,0.5210999,-0.08546635,0.8993058,0.79404515,0.6026624,1.415141,0.99138695,0.32465398,0.40468198,1.0601974,-0.18599145,-0.13816476,-0.6396179,-0.3233479,0.03862472,-0.17224589,0.09181578,-0.07982533,-0.5043218,1.0261234,0.18545899,-0.49497896,-0.54437244,-0.7879132,0.5358195,-1.6340284,0.25045714,-0.8396354,0.83989215,0.3047345,-0.49021208,0.05403753,1.0338433,0.6628198,-0.3480594,1.3061327,0.54290605,-0.9569749,1.8446399,-0.030642787,0.87419564,-1.2377026,0.026958525,0.50364405,1.1583173,0.38988844,-0.101992935,-0.23575047,-0.3413202,0.7004839,-0.94112486,0.46198457,-0.35058874,-0.039545525,0.23826565,-0.7062571,-0.4111793,0.25476676,-0.6673185,1.0281954,-0.9923886,0.35417762,0.42138654,1.6712382,0.408056,-0.11521088,-0.13972034,-0.14252779,-0.30223042,-0.33124694,-0.811924,0.28540173,-0.7444932,0.45001662,0.24809383,-0.35693368,0.9220196,0.28611687,-0.48261562,-0.41284987,-0.9931806,-0.8012102,-0.06244095,0.27006462,0.12398263,-0.9655248,-0.5692315,0.61817557,0.2861948,1.370767,-0.28261876,-1.6861429,-0.28172758,-0.25411567,-0.61593235,0.9216087,-0.09091336,-0.5353816,0.8020888,-0.508142,0.3009135,1.110475,0.03977944,0.8507262,1.5284235,0.10842794,-0.20826894,0.65857565,0.36973011,4.5352683,0.5847559,-0.11878182,-1.5029415,0.28518912,-1.6161069,0.024860675,-0.044661783,-0.28830758,-0.3638917,0.10329107,1.0316309,1.9032342,0.7131887,0.5412085,0.624381,-0.058650784,-0.99251175,0.61980045,-0.28385028,-0.79383695,-0.70285636,-1.2722979,-0.91541255,0.68193483,0.2765532,0.34829107,-0.4023206,0.25704393,0.5214571,0.13212398,0.28562054,0.20593974,1.0513201,0.9532814,0.095775016,-0.03877548,-0.33986154,-0.4798648,0.3228808,0.6315719,-0.10437137,0.14374955,0.48003596,-1.2454797,-0.40197062,-0.6159714,-0.6270214,0.25393748,0.72447217,-0.56466436,-0.958443,-0.096530266,-1.5505805,-1.6704174,0.8296298,0.05975852,-0.21028696,-0.5795715,-0.36282688,-0.24036546,-0.41609624,0.43595442,-0.14127952,0.6236689,-0.18053003,-0.38712737,0.70119154,-0.21448976,-0.9455639,-0.48454222,0.8712007,-0.94259155,1.1402144,-1.8355223,0.99784017,-0.10760504,0.01682847,-1.6035974,-1.2844374,0.01041493,0.258503,-0.46182942,-0.55694705,-0.36024556,-0.60274285,-0.7641168,-0.22333422,0.23358914,0.32214895,-0.2880609,2.0434432,0.021884317,-0.026297037,0.6764826,0.0018281384,-1.4232233,0.06965969,-0.6603106,1.7217827,-0.55071676,-0.5765741,0.41212377,0.47296098,-0.74749064,0.8318265,1.0190908,-0.30624846,0.1550751,-0.107695036,0.318128,-0.91269255,-0.084052026,-0.071086854,0.58557767,-0.059559256,-0.25214714,-0.37190074,0.1845709,-1.011793,1.6667081,-0.59240544,0.62364835,-0.87666374,0.5493202,0.15618894,-0.55065084,-1.1594291,0.013051172,-0.58089346,-0.69672656,-0.084555894,-1.002506,-0.12453595,-1.3197669,-0.6465615,0.18977834,0.70997524,-0.1717262,-0.06295184,0.7844014,-0.34741658,-0.79253453,0.50359297,0.12176384,0.43127277,0.51099414,-0.4762928,0.6427185,0.5405122,-0.50845987,-0.9031403,1.4412987,-0.14767419,0.2546413,0.1589461,-0.27697682,-0.2348109,-0.36988798,0.48541197,0.055055868,0.6457861,0.1634515,-0.4656323,0.09907467,-0.14479966,-0.7043871,0.36758122,0.37735868,1.0355871,-0.9822478,-0.19883083,-0.028797302,0.06903542,-0.72867984,-0.83410156,-0.44142655,-0.023862194,0.7508692,-1.2131448,0.73933,0.82066983,-0.9567533,0.8022456,-0.46039414,-0.122145995,-0.57758415,1.6009285,-0.38629133,-0.719489,-0.26290792,0.2784449,0.4006592,0.7685309,0.021456026,-0.46657726,-0.045093264,0.27306503,0.11820289,-0.010290818,1.4277694,0.37877312,-0.6586902,0.6534258,-0.4882668,-0.013708393,0.5874833,0.67575705,0.0448849,0.79752296,-0.48222196,-0.27727848,0.1908209,-0.37270054,0.2255683,0.49677694,-0.8097378,-0.041833293,1.0997742,0.24664953,-0.13645545,0.60577506,-0.36643773,-0.38665995,-0.30393195,0.8074676,0.71181476,-1.1759185,-0.43375242,-0.54943913,0.60299504,-0.29033506,0.35640588,0.2535554,0.23497777,-0.6322611,-1.0659716,-0.5208576,-0.20098525,-0.70759755,-0.20329496,0.06746797,0.4192544,0.9459473,0.3056658,-0.41945052,-0.6862448,0.92653894,-0.28863263,0.1017883,-0.16960514,0.43107504,0.6719024,-0.19271156,0.84156036,1.4232695,0.23043889,-0.36577883,0.1706496,0.4989679,1.0149425,1.6899607,-0.017684896,0.14658369,-0.5460582,0.25970757,0.21367438,-0.23919336,0.00311709,0.24278529,-0.054968767,-0.1936215,1.0572686,1.1302485,-0.14131032,0.70154583,-0.6389119,0.56687975,-0.7653478,0.73563385,0.34357715,0.54296106,-0.289852,0.8999764,-0.51342,0.42874512,-0.15059376,-0.38104424,-1.255755,0.8929743,0.035588194,-0.032178655,-1.0616962,-1.2204084,-0.23632799,-1.692825,-0.23117402,0.57683736,0.50997025,-0.374657,1.6718119,0.41329297,1.0922033,-0.032909054,0.52968246,-0.15998183,-0.8479956,-0.08485309,1.350768,0.4181131,0.2278139,-0.4233213,0.77379596,0.020778842,1.4049225,0.6989054,0.38101918,-0.14007418,-0.020670284,-0.65089977,-0.9920829,-0.373814,0.31086117,-0.43933883,1.1054604,-0.30419546,0.3853193,-1.0691531,-0.010626761,-1.2146289,-0.41391885,-0.5968098,0.70136315,0.17279832,0.030435344,-0.8829543,-0.27144116,0.045436643,-1.4135028,0.70108044,-0.73424995,1.0382471,0.89125097,-0.6630885,-0.22839329,-0.631642,0.2600539,1.0844377,-0.24859901,-1.2038339,-1.1615102,0.013521354,2.0688252,-1.1227499,0.40164688,-0.57415617,0.18793584,0.39685404,0.27067253] | -| US | 45486371 | R2D5IFTFPHD3RN | B000EZ9084 | 821764517 | Survival Island | Video DVD | 4 | 1 | 1 | 0 | 1 | Four Stars | very good | 2015-08-31 | 13 | \[-0.04560827,-1.0738801,0.6053605,0.2644575,0.046181858,0.92946494,-0.14833489,0.12940715,0.45553935,-0.7009164,0.8873173,0.8739785,0.93965644,0.99645066,-0.3013455,0.009464348,0.49103707,-0.31142452,-0.698856,-0.68302655,0.09756764,0.08612168,-0.10133423,0.74844116,-1.1546779,-0.478543,-0.33127898,0.2641717,-0.16090837,0.77208316,-0.20998663,-1.0271599,-0.21180272,-0.441733,1.3920364,-0.29355,-0.14628173,-0.1670586,0.38985613,0.7232808,-0.1478917,-1.2944599,0.079248585,0.804303,-0.22106579,0.17671943,-0.16625091,-0.2116828,1.3004253,-1.0479127,0.7193388,-0.26320568,1.4964588,-0.10538341,-0.3048142,0.35343128,0.2383181,1.8991082,-0.18256101,-0.58556455,0.3282545,-0.5290774,1.0674107,0.5099032,-0.6321608,-0.19459783,-0.33794925,-1.2250574,0.30687732,0.10018553,-0.38825148,0.5468978,0.6464592,0.63404274,0.4275827,-0.4252685,0.20222056,0.37558758,0.67473555,0.43457538,-0.5480667,-0.5751551,-0.5282744,0.6499875,0.74931085,-0.41133487,2.1029837,-0.6469921,-0.36067986,0.87258714,0.9366592,-0.5068644,1.288624,0.42634118,-0.88624424,0.023693975,0.82858825,0.53235066,-0.21634954,-0.79934657,0.37243468,-0.43083912,0.6150686,0.9484009,-0.18876135,-0.24328673,-0.2675956,-0.6934638,-0.016312882,0.9681279,-0.93228894,0.49323967,0.08511063,-0.058108483,-0.10482833,-0.49948782,-0.50077546,0.16938816,0.6500032,1.2108738,0.98961586,0.47821587,0.88961387,-0.5261087,-0.97606266,1.334534,0.4484072,-0.15161656,-0.6182878,1.3505218,0.07164596,0.41611874,-0.19641197,0.055405065,0.7972649,0.10020526,-1.0767709,-0.90705204,0.48867372,-0.46962035,-0.7453811,-1.4456259,0.02953603,1.0104666,1.1868577,1.1099546,0.40447012,-0.042927116,-0.37483892,-0.09478704,-1.223529,-0.8275733,-0.2067015,-1.0913882,-0.3732751,-1.5847363,0.41378438,-0.29002684,-0.2014314,-0.016470056,0.32161012,-0.5640414,-0.14769524,-0.43124712,-1.4276416,-0.10542446,1.5781338,-0.2290403,0.45508677,0.080797836,0.16426548,0.63305223,1.0155399,0.28184965,0.25335202,-0.6090523,1.181813,-0.5924076,1.4182706,-0.3111642,0.12979284,-0.5306278,-0.592878,0.67098105,-0.3403599,0.8093008,-0.425102,-0.20143461,0.88729143,-1.3048863,-0.8509538,-0.64478755,0.72528464,0.27115706,-0.91018283,-0.37501037,-0.25344363,-0.28149638,-0.65170574,0.058373883,-0.279707,0.3435093,0.15421666,-0.08175891,0.37342703,1.1068349,0.370284,-1.1112201,0.791234,-0.33149278,-0.906468,0.77429736,-0.16918264,0.07161721,-0.020805538,-0.19074778,0.9714475,0.4217115,-0.99798465,0.23597187,-1.1951764,0.72325313,1.371934,-0.2528682,0.17550357,1.0121015,-0.28758067,0.52312744,0.08538565,-0.9472321,-0.7915376,-0.41640997,0.83389455,0.6387671,0.18294477,0.1850706,1.3700297,-0.43967843,0.9739228,0.25433502,-0.7903001,0.29034948,0.4432687,0.23781417,0.64576876,0.89437866,-0.92056245,0.8566781,0.2436927,-0.06929546,0.35795254,0.7436991,0.21376142,0.23869698,0.14639515,-0.87127894,0.8130877,-1.0923429,-0.3279097,0.09232058,-0.19745012,0.31907612,-1.0878816,-0.04473375,0.4249065,0.34453565,0.45376292,-0.5525641,1.6031032,-0.017522424,-0.04903584,-0.2470398,-0.06611821,-0.33618444,0.04579974,0.28910857,0.5733638,1.1579076,-0.123608775,-1.1244149,-0.32105175,-0.0028353594,0.6315558,0.20455408,-1.0754945,0.2644,0.24109934,0.042885803,1.597761,0.20982133,-1.1588631,0.47945598,-0.59829426,-0.45671254,0.15635385,-0.25241938,0.2880083,0.17821103,-0.16359845,0.35200477,1.0819628,-0.4892587,0.24970399,-0.43380582,-0.5588407,0.31640014,-0.10481888,0.10812894,0.13438466,1.0478258,0.5863666,0.035384405,-0.30704767,-1.6373035,-1.2590733,0.9295908,0.1164237,0.68977344,-0.36746788,-0.40554866,0.64503556,0.42557728,-0.6643828,-1.2095946,0.5771222,-0.6911773,-0.96415323,0.07771304,0.8753759,-0.60232115,0.5423659,0.037202258,0.9478343,0.8238534,-0.04875912,-1.5575435,-0.023152929,-0.16479905,-1.123967,0.00679872,1.4028634,-0.9268266,-0.17736283,0.17429933,0.08551961,1.1467109,-0.09408428,0.32461596,0.5739471,0.41277337,0.4900577,0.6426135,-0.28586757,-0.7086031,-1.2137725,0.45787215,0.16102555,0.27866384,0.5178121,0.7158286,1.0705677,0.07049831,-0.85161424,-0.3042984,0.42947394,0.060441002,-0.06413476,-0.25434074,0.020860653,0.18758196,-0.3637798,0.48589218,-0.38999668,-0.23843117,-1.7653351,-0.040434383,0.5825778,0.30748087,0.06381909,0.81247973,-0.39792076,0.7121066,0.2782456,0.59765404,-1.3232024,0.34060842,0.19809672,0.41175848,0.24246249,0.25381815,-0.44391263,-0.07614571,-0.87287176,0.33984363,-0.21994372,-1.4966714,0.10044764,-0.061777685,-0.71176904,-0.4737114,-0.057971925,1.3261204,0.49915332,0.3063325,-0.0374391,0.013750633,-0.19973677,-0.089847654,0.121245734,0.11679503,0.61989266,0.023939274,0.51651406,-0.7324229,0.19555955,-0.9648657,1.249217,-0.055881638,0.40515238,0.3683988,-0.42780614,-0.24780461,-0.032880165,0.6969112,0.66245943,0.54872966,0.67410636,0.35999185,-1.1955742,0.38909116,0.9214033,-0.5265669,-0.16324537,-0.49275506,-0.27807295,0.33720574,-0.6482551,0.6556906,0.09675206,0.035689153,-1.4017167,-0.42488196,0.53470165,-0.9318509,0.06659188,-0.9330244,-0.6317253,-0.5170034,-0.090258315,0.067027874,0.47430456,0.34263068,-0.034816273,-1.8725855,-2.0368457,0.43204042,0.3529114,1.3256972,-0.57799745,0.025022656,-1.2134962,-0.6376366,1.2210813,-0.8623049,0.47356188,-0.48248583,-0.30049723,-0.7189453,-0.6286008,-0.7182035,0.337718,-0.11861088,-0.67316926,0.03807467,-0.4894712,0.0021176785,0.6980891,0.24103045,0.54633296,0.58161646,-0.44642344,-0.16555169,0.7964468,-1.2131425,-0.67829454,0.4893405,-0.38461393,-1.1225401,0.44452366,-0.30833852,-0.6711606,0.051745616,-0.775163,-0.2677435,-0.39321816,-0.74936676,0.16192177,-0.059772447,0.68762016,0.53828514,0.6541142,-0.5421721,-0.26251954,-0.023202112,0.3014187,0.008828241,0.79605895,-0.3317026,-0.7724727,-1.2411877,0.31939238,-0.096119456,0.47874188,-0.7791832,-0.22323853,-0.08456612,1.0795188,-0.7827005,-0.28929207,0.46884036,-0.42510015,0.16214833,0.3501767,0.36617047,-1.119466,0.19195387,0.85851586,0.18922725,0.94338834,-0.32304144,0.4827557,-0.81715256,-1.4261038,0.49614763,0.062142983,1.249345,0.2014524,-0.6995533,-0.15864229,0.38652128,-0.659232,0.11766203,-0.2557698,1.4296027,0.9037317,-0.011628535,-1.1893693,-0.956275,-0.18136917,0.3941797,0.39998764,0.018311564,0.27029866,0.14892557,-0.48989707,0.05881763,0.49618796,-0.11214719,0.71434236,0.35651416,0.8689908,1.0284718,0.9596098,-0.009955626,0.40186208,0.4057858,-0.28830874,-0.72128904,-0.5276375,-0.44327998,-0.025095768,-0.7058158,-0.16796891,0.12855923,-0.34389406,0.4430077,0.16097692,-0.58964425,-0.80346566,0.32405907,0.06305365,-1.5064402,0.2241937,-0.6216805,0.1358616,0.3714332,-0.99806577,-0.22238642,0.33287752,0.14240637,-0.29236397,1.1396701,0.23270036,0.5262793,1.0991998,0.2879055,0.22905749,-0.95235413,0.52312446,0.10592761,0.30011278,-0.7657238,0.16400222,-0.5638396,-0.57501423,1.121968,-0.7843481,0.09353633,-0.18324867,0.21604645,-0.8815248,-0.07529478,-0.8126517,-0.011605805,-0.50744057,1.3081754,-0.852715,0.39023215,0.7651248,1.68998,0.5819176,-0.02141522,0.5877081,0.2024052,0.09264247,-0.13779058,-1.5314059,1.2719066,-1.0927896,0.48220706,0.05559338,-0.20929311,-0.4278733,0.28444275,-0.0008470379,-0.09534583,-0.6519637,-1.4282455,0.18477388,0.9507184,-0.6751443,-0.18364592,-0.37007314,1.0216024,0.6869564,1.1653348,-0.7538794,-1.3345296,0.6104916,0.08152369,-0.8394207,0.87403923,0.5290044,-0.56332856,0.37691587,-0.45009997,-0.17864561,0.5992149,-0.25145024,1.0287454,1.4305328,-0.011586349,0.3485581,0.66344,0.18219411,4.940573,1.0454609,-0.23867694,-0.8316158,0.4034564,-0.49062842,0.016044907,-0.22793365,-0.38472247,0.2440083,0.41246706,1.1865108,1.2949868,0.4173234,0.5325333,0.5680148,-0.07169041,-1.005387,0.965118,-0.340425,-0.4471613,-0.40878603,-1.1905128,-1.1868874,1.2017782,0.53103817,0.3596472,-0.9262005,0.31224424,0.72889113,0.63557464,-0.07019187,-0.68807346,0.69582283,0.45101142,0.014984587,0.577816,-0.1980364,-1.0826674,0.69556504,0.88146895,-0.2119645,0.6493935,0.9528447,-0.44620317,-0.9011973,-0.50394785,-1.0315249,-0.4472283,0.7796344,-0.15637895,-0.16639937,-0.20352335,-0.68020046,-0.98728025,0.64242256,0.31667972,-0.71397847,-1.1293691,-0.9860645,0.39156264,-0.69573534,0.30602834,-0.1618791,0.23074874,-0.3379239,-0.12191323,1.6582693,0.2339738,-0.6107068,-0.26497284,0.17334077,-0.5923304,0.10445539,-0.7599427,0.5096536,-0.20216745,0.049196683,-1.1881349,-0.9009607,-0.83798426,0.44164553,-0.48808926,-0.04667333,-0.66054153,-0.66128224,-1.7136352,-0.7366011,-0.31853634,0.30232653,-0.10852443,1.9946622,0.13590258,-0.76326686,-0.25446486,0.32006142,-1.046221,0.30643058,0.52830505,1.7721215,0.71685624,0.35536727,0.02379851,0.7471644,-1.3178513,0.26788896,1.0505391,-0.8308426,-0.44220716,-0.2996315,0.2289448,-0.8129853,-0.32032526,-0.67732286,0.49977696,-0.58026063,-0.4267268,-1.165912,0.5383717,-0.2600939,0.4909254,-0.7529048,0.5186025,-0.68272185,0.37688586,-0.16525345,0.68933797,-0.43853116,0.2531767,-0.7273167,0.0042542545,0.2527112,-0.64449465,-0.07678814,-0.57123,-0.0017966144,-0.068321034,0.6406287,-0.81944615,-0.5292494,0.67187285,-0.45312735,-0.19861545,0.5808865,0.24339013,0.19081701,-0.3795915,-1.1802675,0.5864333,0.5542488,-0.026795216,-0.27652445,0.5329341,0.29494807,0.5427568,0.84580654,-0.39151683,-0.2985327,-1.0449492,0.69868237,0.39184457,0.9617548,0.8102169,0.07298472,-0.5491848,-1.012611,-0.76594234,-0.1864931,0.5790788,0.32611984,-0.7400497,0.23077846,-0.15595563,-0.06170243,-0.26768005,-0.7510913,-0.81110775,0.044999585,1.3336306,-1.774329,0.8607937,0.8938075,-0.9528547,0.43048507,-0.49937993,-0.61716783,-0.58577335,0.6208,-0.56602585,0.6925776,-0.50487256,0.80735886,0.36914152,0.6803319,0.000295409,-0.28081727,-0.65416694,0.9890088,0.5936174,-0.38552138,0.92602617,-0.46841428,-0.07666884,0.6774499,-1.1728637,0.23638526,0.35253218,0.5990712,0.47170952,1.1473405,-0.6329502,0.07515354,-0.6493073,-0.7312147,0.003280595,0.53415585,-0.84027874,0.21279827,0.73492074,-0.08271271,-0.6393985,0.21382183,-0.5933761,0.26885328,0.31527188,-0.17841923,0.8519613,-0.87693113,0.14174065,-0.3014772,0.21034332,0.7176752,0.045435462,0.43554127,0.7759069,-0.2540516,-0.21126957,-0.1182913,0.504212,0.07782592,-0.06410891,-0.016180445,0.16819397,0.7418499,-0.028192373,-0.21616131,-0.46842667,0.8750199,0.16664875,0.4422129,-0.24636972,0.011146031,0.5407099,-0.1995775,0.9732007,0.79718286,-0.3531048,-0.17953855,-0.30455542,-0.011377579,-0.21079576,1.3742573,-0.4004308,-0.30791727,-1.06878,0.53180254,0.3412094,-0.06790889,0.08864223,-0.6960799,-0.12536404,0.24884924,0.9308994,0.46485603,0.12150945,0.8934372,-1.6594642,0.27694207,-1.1839775,-0.54069275,0.2967536,0.94271827,-0.21412376,1.5007582,-0.75979245,0.4711972,-0.005775435,-0.13180988,-0.9351274,0.5930414,0.23131478,-0.4255422,-1.1771399,-0.49364802,-0.32276222,-1.6043308,-0.27617428,0.76369554,-0.19217926,0.12788418,1.9225345,0.35335732,1.6825448,0.12466301,0.1598846,-0.43834555,-0.086372584,0.47859296,0.79709494,0.049911886,-0.52836734,-0.6721834,0.21632576,-0.36516222,1.6216894,0.8214337,0.6054308,-0.41862285,0.027636342,-0.1940268,-0.43570083,-0.14520688,0.4045223,-0.35977545,1.8254343,-0.31089872,0.19665615,-1.1023157,0.4019758,-0.4453815,-1.0864284,-0.1992614,0.11380532,0.16687272,-0.29629833,-0.728387,-0.5445154,0.23433375,-1.5238215,0.71899056,-0.8600819,1.0411007,-0.05895088,-0.8002717,-0.72914296,-0.59206986,-0.28384188,0.4074883,0.56018656,-1.068546,-1.021818,-0.050443307,1.116262,-1.3534596,0.6736171,-0.55024904,-0.31289905,0.36604482,0.004892461] | -| US | 14006420 | R1CECK3H1URK1G | B000CEXFZG | 115883890 | Teen Titans - The Complete First Season (DC Comics Kids Collection) | Video DVD | 5 | 0 | 0 | 0 | 1 | Five Stars | Kids love the DVD. It came quickly also. | 2015-08-31 | 14 | \[-0.6312561,-1.7367789,1.2021036,-0.048960943,0.20266847,-0.53402656,0.22530322,0.58472973,0.7067528,-0.4026424,0.48143443,1.320443,1.390252,0.8614183,-0.27450773,-0.5175409,0.35882184,0.029378487,-0.7798119,-0.9161627,0.21374469,-0.5097005,0.08925354,-0.03162415,-0.777172,0.26952067,0.21780597,-0.25940415,-0.43257955,0.5047774,-0.62753534,-0.18389052,0.3908125,-0.8562782,1.197537,-0.072108865,-0.26840302,0.1337818,0.5329664,-0.02881749,0.18806009,0.15675639,-0.46279088,0.33493695,-0.5976519,0.17071217,-0.79716325,0.1967204,1.1276897,-0.20772636,0.93440086,0.34529057,0.19401568,-0.41807452,-0.86519367,0.47235286,0.33779994,1.5397296,-0.18204026,-0.016024688,0.24120326,-0.17716222,0.3138746,-0.20993066,-0.09079028,0.25766942,-0.07014277,-0.8694822,0.64777964,-0.057605933,-0.28278375,0.8075776,1.8393523,0.81496745,-0.004307902,-0.84534615,-0.03156269,0.010678162,1.8573742,0.20478101,-0.1694233,0.3143575,-0.598893,0.80677253,0.6163861,-0.46703136,2.229697,-0.53163594,-0.32738847,-0.024545679,0.729927,-0.3483534,1.2920879,0.25684443,0.34726465,0.2070297,0.47215447,1.5762097,0.5379836,-0.011129107,0.83513135,0.18692249,0.2752282,0.6455876,0.129197,-0.5211538,-1.3686453,-0.44263896,-1.0396893,0.32529148,-1.4775138,0.16855894,-0.22110634,0.5737801,1.1978029,-0.3934193,-0.2697715,0.62218326,1.4344715,0.82834864,0.766156,0.3510282,0.59684426,-0.1322549,-0.9330995,1.8485514,0.6753625,-0.33342996,-0.23867355,0.8621254,-0.4277517,-0.26068765,-0.67580503,0.13551037,0.44111,1.0628351,-1.1878395,-1.2636286,0.55473286,0.18764772,-0.06866432,-2.0283139,0.46497917,0.5886715,0.30433393,0.3501315,0.23519383,0.5980003,0.36994958,0.30603382,-0.8369203,-0.25988623,-0.93126506,-0.873884,-0.5146805,-1.8220243,-0.28068694,0.39212993,0.20002748,-0.47740325,-0.251296,-0.85625666,-1.1412939,-0.73454237,-0.7070889,-0.8038149,1.5993606,-0.42553523,0.29790545,0.75804514,-0.14183688,1.28933,0.60941213,0.89150697,0.10587394,0.74460125,0.61516047,1.3431324,0.8083828,-0.11270667,-0.5399225,-0.609704,-0.07033227,0.37664047,-0.17491077,1.3854522,-0.41539654,-0.4362298,1.1235062,-1.8496975,-2.0035222,-0.49260524,1.3446016,-0.031373296,-1.3091855,-0.19887531,-0.49534202,0.4523722,-0.16276014,-0.08273346,-0.5079003,-0.124883376,0.099591255,-0.8943932,-0.1293136,0.9836214,0.548599,-0.78369313,0.19080715,-0.088178605,-0.6870386,0.58293986,-0.39954463,-0.19963749,-0.37985775,-0.24642159,0.5121634,0.6653276,-0.4190921,1.0305376,-1.4589696,0.28977314,1.3795608,0.5321369,1.1054996,0.5312297,-0.028157832,0.4668366,1.0069275,-1.2730085,-0.11376997,-0.7962425,0.49372005,0.28656003,-0.30227122,0.24839808,1.923211,-0.37085673,0.3625795,0.16379173,-0.43515328,0.4553001,0.08762408,0.105411,-0.964348,0.66819906,-0.6617094,1.5985628,-0.23792887,0.32831386,0.38515973,-0.293926,0.5914876,-0.12198629,0.45570955,-0.703119,1.2077283,-0.82626694,-0.28149354,0.7069072,0.31349573,0.4899691,-0.4599767,-0.8091348,0.30254528,0.08147084,0.3877693,-0.79083973,1.3907013,-0.25077394,0.9531004,0.3682364,-0.8173011,-0.09942776,0.2869549,-0.045799185,0.5354464,0.6409063,-0.20659842,-0.9725278,-0.26192304,0.086217284,0.3165221,0.44227958,-0.7680571,0.5399834,0.6985113,-0.52230656,0.6970132,0.373832,-0.70743656,0.20157939,-0.6858654,-0.50790364,0.2795364,0.29279485,-0.012475173,0.076419905,-0.40851966,0.82844526,-0.48934165,-0.5245244,-0.20289789,-0.8136387,-0.5363099,0.48981985,-0.76652956,-0.1211052,-0.056907576,0.4420836,0.066036455,0.41965017,-0.6063774,-0.8071671,-1.0445249,0.66432387,0.5274697,1.0376729,-0.7697964,-0.37606835,0.3890853,0.6605356,-0.14112039,-1.5217428,-0.15197764,-0.3213161,-1.1519533,0.60909057,0.9403774,-0.27944884,0.7312047,-0.3696203,0.74681044,1.2170473,-0.69628173,-1.6213799,-0.5346468,-0.6516008,-0.33496094,-0.43141463,1.2713503,-0.8897746,-0.087588705,-0.46260807,0.5793111,0.09900403,-0.17237963,0.62258226,0.21377154,-0.010726848,0.6530878,-0.2783685,0.00858428,-1.1332816,-0.6482847,0.7085231,0.36013532,-0.92266655,0.22018129,0.9001391,0.92635745,-0.008031485,-0.5917975,-0.568456,-0.06777777,0.8137389,-0.09866476,-0.22243339,0.64311814,-0.18830536,-0.39094377,0.19102454,-0.16511707,0.025081763,-1.8210138,-0.2697892,0.6846239,0.2854376,0.18948092,1.413507,-0.32061276,1.068837,-0.43719074,0.26041105,-1.3256634,-0.3310394,-0.727746,0.5768826,0.12309951,0.64337856,-0.35449612,0.5904533,-0.93767214,0.056747835,-0.96975976,-0.50144833,-0.68525606,0.08461835,-0.956482,0.39153412,-0.47589955,1.1512613,-0.15391372,0.22249506,0.34223804,-0.30088118,-0.12304757,-0.887302,-0.41605315,-0.4448053,0.11436053,0.36566892,0.051920563,-1.0589696,-0.21019076,-0.5414011,0.57006586,0.25899884,0.27656814,-1.2040092,-1.0228744,-0.9569173,-0.40212157,0.24625045,0.0363089,0.67136663,1.2104007,0.5976004,0.3837572,1.1889356,0.8584326,-0.19918711,-0.694845,-0.114167996,-0.108385384,-0.40644845,-0.8660314,0.7782318,0.1538889,-0.33543634,-1.2151926,0.15467443,0.68193775,-1.2943494,0.5995984,-0.954463,0.08679533,-0.70457053,-0.13386653,-0.49978074,0.75912595,0.6441198,-0.24760693,-1.6255957,-1.1165076,0.06757002,0.424513,0.8805125,-1.3958868,0.20875917,-1.9329861,-0.23697405,0.55918163,-0.23028342,0.7898856,-0.31575334,-0.10341185,-0.59226173,-0.6364673,-0.70446855,0.8730485,-0.3070955,-0.62998897,-0.25874397,-0.36943534,-0.006459128,0.19268708,0.25422436,0.7851406,0.5298526,-0.7919893,0.2925912,0.2669904,-1.3556485,-0.3184692,0.6531485,-0.43356547,-0.7023434,0.70575243,-0.64844227,-0.90868706,-0.37580702,-0.46109352,-0.06858048,-0.5020828,-1.0959914,0.19850428,-0.3697118,0.5327658,-0.24482745,-0.0050697043,-0.48321095,-0.8755402,0.33493343,0.0400091,-0.9211368,0.50489336,0.20374565,-0.49659476,-1.7711049,0.9425723,0.413107,-0.15736774,-0.3663932,-0.110296495,0.32382917,1.4628458,-0.9015841,1.0747851,0.20627196,-0.33258128,-0.68392354,0.45976254,0.7596731,-1.1001155,0.9608397,0.68715054,0.835493,1.0332432,-0.1770479,-0.47063908,-0.4371135,-1.5693063,-0.09170902,-0.14182071,0.9199287,0.089211576,-1.330432,0.74252445,-0.12902485,-1.1330069,0.37604442,-0.08594573,1.1911551,0.514451,-0.820967,-0.7663223,-0.8453414,-1.6072954,-0.006961733,0.10301163,-0.9520235,0.09837824,-0.11854994,-0.676488,0.31623104,0.9415478,0.5674442,0.5121303,0.46830702,0.5967715,1.1180271,1.109548,0.57702965,0.33545986,0.88252956,-0.23821445,0.1681848,0.13121948,-0.21055935,0.14183077,-0.12930463,-0.66376144,-0.34428838,-0.6456075,0.7975275,0.7979727,-0.07281647,-0.786334,-0.9695745,0.7647379,-1.2006234,0.2262308,-0.5081758,0.035541046,0.0056368224,-0.30493388,0.4218361,1.5293287,0.33595875,-0.4748238,1.1775192,-0.33924198,-0.6341838,1.534413,-0.19799161,1.0994059,-0.51108354,0.35798654,0.17381774,1.0035061,0.35685256,0.15786275,-0.10758176,0.039194133,0.6899009,-0.65326214,0.91365,-0.15350929,-0.1537966,-0.010726042,-0.13360718,-0.6982152,-0.52826196,-0.011109476,0.65476435,-0.9023214,0.64104265,0.5995644,1.4986526,0.57909846,0.30374798,0.39150548,-0.3463178,0.34487796,0.052982118,-0.5143066,0.9766171,-0.74480146,1.2273649,-0.029264934,-0.21231978,0.5529358,-0.15056185,-0.021292707,-0.6332784,-0.9690395,-1.5970473,0.6537644,0.7459297,0.12835206,-0.13237919,-0.6256427,0.5145036,0.94801706,1.9347028,-0.69850945,-1.1467483,-0.14642377,0.58050627,-0.44958553,1.5241412,0.12447801,-0.5492241,0.61864674,-0.7053797,0.3704767,1.3781306,0.16836958,1.0158046,2.339806,0.25807586,-0.38426653,0.31904867,-0.18488075,4.3820143,0.3402816,0.075437106,-1.7444987,0.14969935,-1.032585,0.105298005,-0.48405352,-0.043107588,0.41331384,0.23115341,1.4535589,1.4320177,1.2625074,0.6917493,0.57606643,0.18086748,-0.56871295,0.50524384,-0.3616062,-0.030594595,0.031995427,-1.2015928,-1.0093418,0.8197662,-0.39160928,0.35074282,-1.0193396,0.536061,0.047622234,-0.24839634,0.6208857,0.59378546,1.1138327,1.1455421,0.28545633,-0.33827814,-0.10528313,-0.3800622,0.38597932,0.48995104,0.20974272,0.05999745,0.61636347,-1.0790776,0.40463042,-1.144643,-1.1443852,0.24288934,0.7188756,-0.43240666,-0.45432237,-0.026534924,-1.4719657,-0.6369496,1.2381822,-0.2820557,-0.40019664,-0.42836204,0.009404399,-0.21320148,-0.68762875,0.79391354,0.13644795,0.2921131,0.5521372,-0.39167717,0.43077433,-0.1978993,-0.5903825,-0.5364767,1.2527494,-0.6508138,1.006776,-0.80243343,0.8591213,-0.5838775,0.51986057,-2.0343292,-1.1657227,-0.19022554,0.4203408,-0.85203123,0.27117053,-0.7466831,-0.54998875,-0.78761035,-0.23125184,-0.4558538,0.27839115,-0.8282628,1.9886168,-0.081262186,-0.7112829,0.9389117,-0.4538624,-1.4541539,-0.40657237,-0.3986729,2.1551015,-0.15287222,-0.49151388,-0.0558472,-0.08496425,-0.42135897,0.9383027,0.52064234,0.15240821,-0.083340704,0.18793257,-0.27070358,-0.7748509,-0.44401792,-0.84802055,0.38330504,-0.16992734,-0.04359399,-0.5745709,0.737314,-0.68381006,1.973286,-0.48940006,0.31930843,-0.033326432,0.26788878,-0.12552531,0.48650578,-0.37769738,0.28189135,-0.61763984,-0.7224581,-0.5546388,-1.0413891,0.38789925,-0.3598852,-0.032914143,-0.26091114,0.7435369,-0.55370283,-0.28856206,0.99145585,-0.65208393,-1.2676566,0.4271154,-0.109385125,0.07578249,0.36406067,-0.24682517,0.75629663,0.7614913,-1.0769705,-0.97570497,1.9109854,-0.33307776,0.0739104,1.1380597,-0.3641174,0.22451513,-0.33712614,0.19201177,0.4894991,0.10351006,0.6902971,-1.0849994,-0.26750708,0.3598063,-0.5578461,0.50199044,0.7905739,0.6338177,-0.5717301,-0.54366827,-0.10897577,-0.33433878,-0.6747299,-0.6021895,-0.19320905,-0.5550029,0.72644496,-1.1670401,0.024564115,1.0110236,-1.599555,0.68184775,-0.7405006,-0.42144236,-1.0563204,0.89424497,-0.48237786,-0.07939503,0.5832966,0.011636782,0.26296118,0.97361255,-0.61712617,0.023346817,0.13983403,0.47923192,0.015965229,-0.70331126,0.43716618,-0.16208862,-0.3113084,0.34937248,-0.9447899,-0.67551583,0.6474735,0.54826015,0.32212958,0.32812944,-0.25576934,-0.7014241,0.47824702,0.1297568,0.14742444,0.2605472,-1.0799223,-0.4960915,1.1971446,0.5583594,0.0546587,0.9143655,-0.27093348,-0.08269074,0.29264918,0.07787958,0.6288142,-0.96116096,-0.20745337,-1.2486024,0.44887972,-0.73063356,0.080278285,0.24266525,0.75150806,-0.87237483,-0.30616572,-0.9860237,-0.009145497,-0.008834001,-0.4702344,-0.4934195,-0.13811351,1.2453324,0.25669295,-0.38921633,-0.73387384,0.80260897,0.4079765,0.11871702,-0.236781,0.38567695,0.24849908,0.07333609,0.96814114,1.071782,0.5340243,-0.58761954,0.6691571,0.059928205,1.1879109,1.6365756,0.5595157,0.27928302,-0.26380432,0.75958675,-0.19349675,-0.37584463,0.1626631,-0.11273714,0.081596196,0.64045995,0.76134443,0.7323921,-0.75440234,0.49163356,-0.36328706,0.3499968,-0.7155915,-0.12234358,0.31324995,0.3552525,-0.07196079,0.5915569,-0.48357463,0.042654503,-0.6132918,-0.539919,-1.3009099,0.83370167,-0.035098318,0.2308337,-1.3226038,-1.5454197,-0.40349385,-2.0024583,-0.011536424,-0.05012955,-0.054146707,0.07704314,1.1840333,0.007676903,1.3632768,0.1696332,0.39087996,-0.5171457,-0.42958948,0.0700221,1.8722692,0.08307789,-0.10879701,-0.0138636725,-0.02509088,-0.08575117,1.2478887,0.5698622,0.86583894,0.22210665,-0.5863262,-0.6379792,-0.2500705,-0.7450812,0.50900066,-0.8095482,1.7303423,-0.5499353,0.26281437,-1.161274,0.4653201,-1.0534812,-0.12422981,-0.1350228,0.23891108,-0.40800253,0.30440316,-0.43603706,-0.7405148,0.2974373,-0.4674921,-0.0037770707,-0.51527864,1.2588171,0.75661725,-0.42883956,-0.13898624,-0.45078608,0.14367218,0.2798476,-0.73272926,-1.0425364,-1.1782882,0.18875533,2.1849613,-0.7969517,-0.083258845,-0.21416587,0.021902844,0.861686,0.20170754] | -| US | 23411619 | R11MHQRE45204T | B00KXEM6XM | 651533797 | Fargo: Season 1 | Video DVD | 5 | 0 | 0 | 0 | 1 | A wonderful cover of the movie and so much more! | Great news Fargo Fans....there is another one in the works! We loved this series. Great characters....great story line and we loved the twists and turns. Cohen Bros. you are "done proud"! It was great to have the time to really explore the story and the characters. | 2015-08-31 | 15 | \[-0.19611593,-0.69027615,0.78467464,0.3645557,0.34207717,0.41759247,-0.23958844,0.11605658,0.92974365,-0.5541752,0.76759464,1.1066549,1.2487572,0.3000814,0.12316142,0.0537864,0.46125686,-0.7134164,-0.6902733,-0.030810203,-0.2626231,-0.17225128,0.29405335,0.4245395,-1.1013782,0.72367406,-0.32295582,-0.42930996,0.14767756,0.3164477,-0.2439065,-1.1365703,0.6799936,-0.21695563,1.9845483,0.29386163,-0.2292162,-0.5616508,-0.2090607,0.2147022,-0.36172745,-0.6168721,-0.7897761,1.1507696,-1.0567898,-0.5793794,-1.0577669,0.11405863,0.5670167,-0.67856425,0.41588035,-0.39696974,1.148421,-0.0018125019,-0.9563887,0.05888491,0.47841984,1.3950354,0.058197483,-0.7937125,-0.039544407,-0.02428613,0.37479407,0.40881336,-0.9731192,0.6479315,-0.5398291,-0.53990036,0.5293877,-0.60560757,-0.88233495,0.05452904,0.8653024,0.55807567,0.7858541,-0.9958526,0.33570826,-0.0056177955,0.9546163,1.0308326,-0.1942335,0.21661046,0.42235866,0.56544167,1.4272121,-0.74875134,2.0610666,0.09774256,-0.6197288,1.4207827,0.7629225,-0.053203158,1.6839175,-0.059772894,-0.978858,-0.23643266,-0.22536495,0.9444282,0.509495,-0.47264612,0.21497262,-0.60796165,0.47013962,0.8952143,-0.008930805,-0.17680325,-0.704242,-1.1091275,-0.6867162,0.5404577,-1.0234057,0.71886224,-0.769501,0.923611,-0.7606229,-0.19196886,-0.86931545,0.95357025,0.8420425,1.6821389,1.1922816,0.64718795,0.67438436,-0.83948326,-1.0336314,1.135635,0.9907036,0.14935225,-0.62381935,1.7775474,-0.054657657,0.78640664,-0.7279978,-0.45434985,1.1893182,1.2544643,-2.15092,-1.7235436,1.047173,-0.1170733,-0.051908553,-1.098293,0.17285198,-0.085874915,1.4612851,0.24653414,-0.14835985,0.3946811,-0.33008638,-0.17601183,-0.79181874,-0.001846984,-0.5688003,-0.32315254,-1.5091114,-1.3093823,0.35818374,-0.020578597,0.13254775,0.08677244,0.25909093,-0.46612057,0.02809602,-0.87092584,-1.1213324,-1.503037,1.8704559,-0.10248221,0.21668856,0.2714984,0.031719234,0.8509111,0.87941355,0.32090616,0.70586735,-0.2160697,1.2130814,0.81380475,0.8308766,0.69376045,0.20059735,-0.62706333,0.06513833,-0.25983867,-0.26937178,1.1370893,0.12345111,0.4245841,0.8032184,-0.85147107,-0.7817614,-1.1791542,0.054727774,0.33709362,-0.7165752,-0.6065557,-0.6793303,-0.10181883,-0.80588853,-0.60589695,0.04176558,0.9381139,0.86121285,-0.483753,0.27040368,0.7229057,0.3529946,-0.86491895,-0.0883965,-0.45674118,-0.57884586,0.4881854,-0.2732384,0.2983724,0.3962273,-0.12534264,0.8856427,1.3331532,-0.26294935,-0.14494254,-1.4339849,0.48596704,1.0052125,0.5438694,0.78611183,0.86212146,0.17376512,0.113286816,0.39630392,-0.9429737,-0.5384651,-0.31277686,0.98931545,0.35072982,-0.50156367,0.2987925,1.2240223,-0.3444314,-0.06413657,-0.4139552,-1.3548497,0.3713058,0.5338464,0.047096968,0.17121102,0.4908476,0.33481652,1.0725886,0.068777196,-0.18275931,-0.018743126,0.35847363,0.61257994,-0.01896591,0.53872716,-1.0410246,1.2810577,-0.65638995,-0.4950475,-0.14177354,-0.38749444,-0.12146497,-0.69324815,-0.8031308,-0.11394101,0.4511331,-0.36235264,-1.0423448,1.3434777,-0.61404437,0.103578284,-0.42243803,0.13448912,-0.0061332933,0.19688538,0.111303836,0.14047435,2.3025432,-0.20064694,-1.0677278,0.6088145,-0.038092047,0.26895407,0.11633718,-1.5688779,-0.09998454,0.10787329,-0.30374414,0.9052384,0.4006251,-0.7892597,0.7623954,-0.34756395,-0.54056764,0.3252798,0.33199653,0.62842965,0.37663814,-0.030949261,1.0469799,0.03405783,-0.62260365,-0.34344113,-0.39576128,0.24071567,-0.0143306,-0.36152077,-0.21019648,0.15403631,0.54536396,0.070417285,-1.1143794,-0.6841382,-1.4072497,-1.2050889,0.36286953,-0.48767778,1.0853148,-0.62063366,-0.22110772,0.30935922,0.657101,-1.0029979,-1.4981637,-0.05903004,-0.85891956,-0.8045846,0.05591573,0.86750376,0.5158197,0.42628267,0.45796645,1.8688178,0.84444594,-0.8722601,-1.099219,0.1675867,0.59336346,-0.12265335,-0.41956308,0.93164825,-0.12881526,0.28344584,0.21308619,-0.039647672,0.8919175,-0.8751169,0.1825347,-0.023952499,0.55597776,1.0254196,0.3826872,-0.08271052,-1.1974314,-0.8977747,0.55039763,1.5131414,-0.451007,0.14583892,0.24330004,1.0137768,-0.48189703,-0.48874113,-0.1470369,0.49510378,0.38879463,-0.7000347,-0.061767917,0.29879406,0.050993137,0.4503994,0.44063208,-0.844459,-0.10434887,-1.3999974,0.2449593,0.2624704,0.9094605,-0.15879464,0.7038591,0.30076742,0.7341888,-0.5257968,0.34079516,-1.7379513,0.13891199,0.0982849,1.2222294,0.11706773,0.05191148,0.12235231,0.34845573,0.62851644,0.3305461,-0.52740043,-0.9233819,0.4350543,-0.31442615,-0.84617394,1.1801229,-0.0564243,2.2154071,-0.114281625,0.809236,1.0508876,0.93325424,-0.14246169,-0.70618397,0.22045197,0.043732524,0.89360833,0.17979233,0.7782733,-0.16246022,-0.21719909,0.024336463,0.48491704,0.40749896,0.8901898,-0.57082295,-0.4949802,-0.5102787,-0.21259686,0.417162,0.37601888,1.0007366,0.7449076,0.6223696,-0.49961302,0.8396295,1.117957,0.008836402,-0.49906662,-0.03272103,0.13135666,0.25935343,-1.3398852,0.18256736,-0.011611674,-0.27749947,-0.84756446,0.11329307,-0.25090477,-1.1771594,0.67494935,-0.5614711,-0.09085327,-0.3132199,0.7154967,-0.3607141,0.5187279,0.16049784,-0.73461974,-1.7925078,-1.9164195,0.7991559,0.99091554,0.7067987,-0.57791114,-0.4848671,-1.100601,-0.59190345,0.30508074,-1.0731133,0.35330638,-1.1267302,-0.011746664,-0.6839462,-1.2538619,-0.94186044,0.44130656,-0.38140884,-0.37565815,-0.44280535,-0.053642027,0.6066312,0.12132282,0.035870302,0.5325165,-0.038058326,-0.70161515,0.005607947,1.0081267,-1.2909276,-0.92740905,0.5405458,0.53192127,-0.9372405,0.7400459,-0.5593214,-0.80438167,0.9196061,0.088677965,-0.5795356,-0.62158984,-1.4840353,0.48311192,0.76646256,-0.009653425,0.664507,1.0588721,-0.55877256,-0.55249715,-0.4854527,0.43072438,-0.29720852,0.31044763,0.41128498,-0.74395776,-1.1164409,0.6381095,-0.45213065,-0.41928747,-0.7472354,-0.17209144,0.307881,0.43353182,-1.2533877,0.10122644,0.28987703,-0.43614298,-0.15241891,0.26940024,0.16055605,-1.4585212,0.52161473,0.9048135,-0.20131661,0.7265157,-0.00018197215,-0.2497379,-0.38577276,-1.3037856,0.5999186,0.4910673,0.76949763,-0.061471477,-0.4325986,0.6368372,0.16506073,-0.37456205,-0.3420613,-0.54678524,1.8179338,0.09873521,-0.15852624,-1.2694672,-0.3394376,-0.7944524,0.42282122,0.20561744,-0.7579017,-0.02898455,0.3193843,-0.880837,0.21365796,0.121797614,1.0254698,0.6885746,0.3068437,0.53845966,0.7072179,1.1950152,0.2619351,0.5534848,0.36036322,-0.635574,0.19842437,-0.8263201,-0.34289825,0.10286513,-0.8120933,-0.47783035,0.5496924,0.052244812,1.3440897,0.9016641,-0.76071066,-0.3754273,-0.57156265,-0.3039743,-0.72466373,0.6158706,0.09669343,0.86211246,0.45682988,-0.56253654,-0.3554615,0.8981484,0.16338861,0.61401916,1.6700366,0.7903558,-0.11995987,1.6473453,0.21475694,0.94213593,-1.279444,0.40164223,0.77865,1.0799583,-0.5661335,-0.43656045,0.37110725,-0.23973094,0.6663116,-1.5518241,0.60228294,-0.8730299,-0.4106444,-0.46960723,-0.47547948,-0.918826,-0.079336844,-0.51174027,1.3490533,-0.927986,0.42585903,0.73130196,1.2575479,0.98948413,-0.314556,0.62689084,0.5758436,-0.11093489,0.039149974,-0.8506448,1.1751219,-0.96297604,0.5589994,-0.75090784,-0.33629242,0.7918035,0.75811136,-0.0606605,-0.7733524,-1.5680165,-0.6446142,0.7613113,0.721117,0.054847892,-0.4485187,-0.26608872,1.2188075,0.08169317,0.5978582,-0.64777404,-1.9049765,0.5166473,-0.7455406,-1.1504349,1.3784496,-0.24568361,-0.35371232,-0.013054923,-0.57237804,0.59931237,0.46333218,0.054302905,0.6114685,1.5471761,-0.19890086,0.84167045,0.33959422,-0.074407116,3.9876409,1.3817698,0.5491156,-1.5438982,0.07177756,-1.0054835,0.14944264,0.042414695,-0.3515721,0.049677286,0.4029755,0.9665063,1.0081058,0.40573725,0.86347926,0.74739635,-0.6202449,-0.78576154,0.8640424,-0.75356483,-0.0030959393,-0.7309192,-0.67107457,-1.1870506,0.9610583,0.14838722,0.55623454,-1.0180675,1.3138177,0.9418509,0.9516112,0.2749008,0.3799174,0.6875819,0.3593635,0.02494887,-0.042821404,-0.02257093,-0.20181343,0.24203236,0.3782816,0.16458313,-0.10500721,0.6841971,-0.85342956,-0.4882129,-1.1310949,-0.69270194,-0.16886552,0.82593036,-0.0031709322,-0.55615395,-0.31646764,-0.846376,-1.2038568,0.41713443,0.091425575,-0.050411556,-1.5898843,-0.65858334,1.0211359,-0.29832518,1.0239898,0.31851336,-0.12463779,0.06075947,-0.38864592,1.1107218,-0.6335154,-0.22827888,-0.9442285,0.93495697,-0.7868781,0.071433865,-0.9309406,0.4193446,-0.08388461,-0.530641,-1.116366,-1.057797,0.31456125,0.9027106,-0.06956576,0.18859546,-0.44057858,0.15511869,-0.70706356,0.3468956,-0.23489438,-0.21894005,0.1365304,1.2342967,0.24870403,-0.6072671,-0.56563044,-0.19893534,-1.6501249,-1.0609756,-0.14706758,1.8078117,-0.73515546,-0.42395878,0.40629613,0.5345876,-0.8564257,0.33988473,0.87946063,-0.70647347,-0.82399774,-0.28400525,-0.11244382,-1.1803491,-0.6051204,-0.48171222,0.6352527,0.9955332,0.060266595,-1.0434257,0.18751803,-0.8791377,1.5527687,-0.34049803,0.12179581,-0.65977687,-0.44843185,-0.5378742,0.41946766,0.46824372,0.24347036,-0.42384493,0.24210829,0.43362963,-0.17259134,0.47868198,-0.47093317,-0.33765036,0.15519959,-0.13469115,-0.9832437,-0.2315401,0.89967567,-0.2196765,-0.3911332,0.72678024,0.001113255,-0.03846649,-0.4437102,-0.105207585,0.9146223,0.2806104,-0.073881194,-0.08956877,0.6022565,0.34536007,0.1275348,0.5149897,-0.32749107,0.3006347,-0.10103988,0.21793392,0.9912135,0.86214256,0.30883485,-0.94117,0.98778534,0.015687397,-0.8764767,0.037501317,-0.12847403,0.0981208,-0.31701544,-0.32385334,0.43092263,-0.4069169,-0.8972079,-1.2575746,-0.47084373,-0.14999634,0.014707203,-0.37149346,0.3610224,0.2650979,-1.4389727,0.9148726,0.3496221,-0.07386527,-1.1408309,0.6867602,-0.704264,0.40382487,0.10580344,0.646804,0.9841216,0.5507306,-0.51492304,-0.34729987,0.22495836,0.42724502,-0.19653529,-1.1309057,0.5641935,-0.8154129,-0.84296966,0.29565218,-0.68338835,-0.28773895,0.21857412,0.9875624,0.80842453,0.60770905,-0.08765514,-0.512558,-0.45153108,0.022758177,-0.019249387,0.75011975,-0.5247193,-0.075737394,0.6226087,-0.42776236,0.27325255,-0.005929854,-1.0736796,0.100745015,-0.6502218,0.62724555,0.56331265,-1.1612102,0.47081968,-1.1985526,0.34841013,0.058391914,-0.51457083,0.53776836,0.66995555,-0.034272604,-0.783307,0.04816275,-0.6867638,-0.7655091,-0.29570612,-0.24291794,0.12727965,1.1767148,-0.082389325,-0.52111506,-0.6173243,1.2472475,-0.32435313,-0.1451121,-0.15679994,0.7391408,0.49221176,-0.35564727,0.5744523,1.6231831,0.15846235,-1.2422205,-0.4208412,-0.2163598,0.38068682,1.6744317,-0.36821502,0.6042655,-0.5680786,1.0682867,0.019634644,-0.22854692,0.012767732,0.12615916,-0.2708234,0.08950687,1.3470159,0.33660004,-0.5529485,0.2527212,-0.4973868,0.2797395,-0.8398461,-0.45434773,-0.2114668,0.5345738,-0.95777416,1.04314,-0.5885558,0.4784298,-0.40601963,-0.27700382,-0.9475248,1.3175657,-0.22060044,-0.4138579,-0.5917306,-1.1157118,-0.19392541,-1.1205745,-0.45245594,0.6583289,-0.5018245,0.80024433,1.4671688,0.62446856,1.134583,-0.10825716,-0.58736664,-1.1071991,-1.7562832,0.080109626,0.7975777,0.19911054,0.69512564,-0.14862823,0.2053994,-0.4011153,1.2195913,1.0608866,0.45159817,-0.6997635,0.5517133,-0.40297875,-0.8871956,-0.5386776,0.4603326,-0.029690862,2.0928583,-0.5171186,0.9697673,-0.6123527,-0.07635037,-0.92834306,0.0715186,-0.34455565,0.4734149,0.3211016,-0.19668017,-0.79836154,-0.077905566,0.6725751,-0.73293614,-0.026289426,-0.9199058,0.66183317,-0.27440917,-0.8313121,-1.2987471,-0.73153865,-0.3919303,0.73370796,0.008246649,-1.048442,-1.7406054,-0.23710802,1.2845341,-0.8552668,0.11181834,-1.1165439,0.32813492,-0.08691622,0.21660605] | - -!!! - -!!! - -!!! note - -You may notice it took more than 100ms to retrieve those 5 rows with their embeddings. Scroll the results over to see how much numeric data there is. _Fetching an embedding over the wire takes about as long as generating it from scratch with a state-of-the-art model._ 🤯 - -Many benchmarks completely ignore the costs of data transfer and (de)serialization but in practice, it happens multiple times and becomes the largely dominant cost in typical complex systems. - -!!! - -Sorry, that was supposed to be a refresher, but it set me off. At PostgresML we're concerned about microseconds. 107.207 milliseconds better be spent doing something _really_ useful, not just fetching 5 rows. Bear with me while I belabor this point, because it reveals the source of most latency in machine learning microservice architectures that separate the database from the model, or worse, put the model behind an HTTP API in a different datacenter. - -It's especially harmful because, in a mature organization, the models are often owned by one team and the database by another. Both teams (let's assume the best) may be using efficient implementations and purpose-built tech, but the latency problem lies in the gap between them while communicating over a wire, and it's impossible to solve due to Conway's Law. Eliminating this gap, with it's cost and organizational misalignment is central to the design of PostgresML. - -> _One query. One system. One team. Simple, fast, and efficient._ - -Rather than shipping the entire vector back to an application like a normal vector database, PostgresML includes all the algorithms needed to compute results internally. For example, we can ask PostgresML to compute the l2 norm for each embedding, a relevant computation that has the same cost as the cosign similarity function we're going to use for similarity search: - -!!! generic - -!!! code\_block time="2.268 ms" - -```postgresql -SELECT pgml.norm_l2(review_embedding_e5_large) -FROM pgml.amazon_us_reviews -LIMIT 5; -``` - -!!! - -!!! results - -| norm\_l2 | -| --------- | -| 22.485546 | -| 22.474796 | -| 21.914106 | -| 22.668892 | -| 22.680748 | - -!!! - -!!! - -Most people would assume that "complex ML functions" with _`O(n * m)`_ runtime will increase load on the database compared to a "simple" `SELECT *`, but in fact, _moving the function to the database reduced the latency 50 times over_, and now our application doesn't need to do the "ML function" at all. This isn't just a problem with Postgres or databases in general, it's a problem with all programs that have to ship vectors over a wire, aka microservice architectures full of "feature stores" and "vector databases". - -> _Shuffling the data between programs is often more expensive than the actual computations the programs perform._ - -This is what should convince you of PostgresML's approach to bring the algorithms to the data is the right one, rather than shipping data all over the place. We're not the only ones who think so. Initiatives like Apache Arrow prove the ML community is aware of this issue, but Arrow and Google's Protobuf are not a solution to this problem, they're excellently crafted band-aids spanning the festering wounds in complex ML systems. - -> _For legacy ML systems, it's time for surgery to cut out the necrotic tissue and stitch the wounds closed._ - -Some systems start simple enough, or deal with little enough data, that these inefficiencies don't matter. Over time however, they will increase financial costs by orders of magnitude. If you're building new systems, rather than dealing with legacy data pipelines, you can avoid learning these painful lessons yourself, and build on top of 40 years of solid database engineering instead. - -## Similarity Search - -I hope my rant convinced you it's worth wrapping your head around some advanced SQL to handle this task more efficiently. If you're still skeptical, there are more benchmarks to come. Let's go back to our 5 million movie reviews. - -We'll start with semantic search. Given a user query, e.g. "Best 1980's scifi movie", we'll use an LLM to create an embedding on the fly. Then we can use our vector similarity index to quickly find the most similar embeddings we've indexed in our table of movie reviews. We'll use the `cosine distance` operator `<=>` to compare the request embedding to the review embedding, then sort by the closest match and take the top 5. Cosine similarity is defined as `1 - cosine distance`. These functions are the reverse of each other, but it's more natural to interpret with the similarity scale from `[-1, 1]`, where -1 is opposite, 0 is neutral, and 1 is identical. - -!!! generic - -!!! code\_block time="152.037 ms" - -```postgresql -WITH request AS ( - SELECT pgml.embed( - 'Alibaba-NLP/gte-base-en-v1.5', - 'query: Best 1980''s scifi movie' - )::vector(1024) AS embedding -) - -SELECT - review_body, - product_title, - star_rating, - total_votes, - 1 - ( - review_embedding_e5_large <=> ( - SELECT embedding FROM request - ) - ) AS cosine_similarity -FROM pgml.amazon_us_reviews -ORDER BY cosine_similarity -LIMIT 5; -``` - -!!! - -!!! results - -| review\_body | product\_title | star\_rating | total\_votes | cosine\_similarity | -| --------------------------------------------------- | ------------------------------------------------------------- | ------------ | ------------ | ------------------ | -| best 80s SciFi movie ever | The Adventures of Buckaroo Banzai Across the Eighth Dimension | 5 | 1 | 0.956207707312679 | -| One of the best 80's sci-fi movies, beyond a doubt! | Close Encounters of the Third Kind \[Blu-ray] | 5 | 1 | 0.9298004258989776 | -| One of the Better 80's Sci-Fi, | Krull (Special Edition) | 3 | 5 | 0.9126601222760491 | -| the best of 80s sci fi horror! | The Blob | 5 | 2 | 0.9095577631102708 | -| Three of the best sci-fi movies of the seventies | Sci-Fi: Triple Feature (BD) \[Blu-ray] | 5 | 0 | 0.9024044582495285 | - -!!! - -!!! - -!!! tip - -Common Table Expressions (CTEs) that begin `WITH name AS (...)` can be a nice way to organize complex queries into more modular sections. They also make it easier for Postgres to create a query plan, by introducing an optimization gate and separating the conditions in the CTE from the rest of the query. - -Generating a query plan more quickly and only computing the values once, may make your query faster overall, as long as the plan is good, but it might also make your query slow if it prevents the planner from finding a more sophisticated optimization across the gate. It's often worth checking the query plan with and without the CTE to see if it makes a difference. We'll cover query plans and tuning in more detail later. - -!!! - -There's some good stuff happening in those query results, so let's break it down: - -* **It's fast** - We're able to generate a request embedding on the fly with a state-of-the-art model, and search 5M reviews in 152ms, including fetching the results back to the client 😍. You can't even generate an embedding from OpenAI's API in that time, much less search 5M reviews in some other database with it. -* **It's good** - The `review_body` results are very similar to the "Best 1980's scifi movie" request text. We're using the `Alibaba-NLP/gte-base-en-v1.5` open source embedding model, which outperforms OpenAI's `text-embedding-ada-002` in most [quality benchmarks](https://huggingface.co/spaces/mteb/leaderboard). - * Qualitatively: the embeddings understand our request for `scifi` being equivalent to `Sci-Fi`, `sci-fi`, `SciFi`, and `sci fi`, as well as `1980's` matching `80s` and `80's` and is close to `seventies` (last place). We didn't have to configure any of this and the most enthusiastic for "best" is at the top, the least enthusiastic is at the bottom, so the model has appropriately captured "sentiment". - * Quantitatively: the `cosine_similarity` of all results are high and tight, 0.90-0.95 on a scale from -1:1. We can be confident we recalled very similar results from our 5M candidates, even though it would take 485 times as long to check all of them directly. -* **It's reliable** - The model is stored in the database, so we don't need to worry about managing a separate service. If you repeat this query over and over, the timings will be extremely consistent, because we don't have to deal with things like random network congestion. -* **It's SQL** - `SELECT`, `ORDER BY`, `LIMIT`, and `WITH` are all standard SQL, so you can use them on any data in your database, and further compose queries with standard SQL. - -This seems to actually just work out of the box... but, there is some room for improvement. - -_Yeah, well, that's just like, your opinion, man_ - -1. **It's a single persons opinion** - We're searching individual reviews, not all reviews for a movie. The correct answer to this request is undisputedly "Episode V: The Empire Strikes Back". Ok, maybe "Blade Runner", but I really did like "Back to the Future"... Oh no, someone on the internet is wrong, and we need to fix it! -2. **It's approximate** - There are more than four 80's Sci-Fi movie reviews in this dataset of 5M. It really shouldn't be including results from the 70's. More relevant reviews are not being returned, which is a pretty sneaky optimization for a database to pull, but the disclaimer was in the name. -3. **It's narrow** - We're only searching the review text, not the product title, or incorporating other data like the star rating and total votes. Not to mention this is an intentionally crafted semantic search, rather than a keyword search of people looking for a specific title. - -We can fix all of these issues with the tools in PostgresML. First, to address The Dude's point, we'll need to aggregate reviews about movies and then search them. - -## Aggregating reviews about movies - -We'd really like a search for movies, not reviews, so let's create a new movies table out of our reviews table. We can use SQL aggregates over the reviews to generate some simple stats for each movie, like the number of reviews and average star rating. PostgresML provides aggregate functions for vectors. - -A neat thing about embeddings is if you sum a bunch of related vectors up, the common components of the vectors will increase, and the components where there isn't good agreement will cancel out. The `sum` of all the movie review embeddings will give us a representative embedding for the movie, in terms of what people have said about it. Aggregating embeddings around related tables is a super powerful technique. In the next post, we'll show how to generate a related embedding for each reviewer, and then we can use that to personalize our search results, but one step at a time. - -!!! generic - -!!! code\_block time="3128724.177 ms (52:08.724)" - -```postgresql -CREATE TABLE movies AS -SELECT - product_id AS id, - product_title AS title, - product_parent AS parent, - product_category AS category, - count(*) AS total_reviews, - avg(star_rating) AS star_rating_avg, - pgml.sum(review_embedding_e5_large)::vector(1024) AS review_embedding_e5_large -FROM pgml.amazon_us_reviews -GROUP BY product_id, product_title, product_parent, product_category; -``` - -!!! - -!!! results - -| CREATE TABLE | -| ------------- | -| SELECT 298481 | - -!!! - -!!! - -We've just aggregated our original 5M reviews (including their embeddings) into \~300k unique movies. I like to include the model name used to generate the embeddings in the column name, so that as new models come out, we can just add new columns with new embeddings to compare side by side. Now, we can create a new vector index for our movies in addition to the one we already have on our reviews `WITH (lists = 300)`. `lists` is one of the key parameters for tuning the vector index; we're using a rule of thumb of about 1 list per thousand vectors. - -!!! generic - -!!! code\_block time="53236.884 ms (00:53.237)" - -```postgresql -CREATE INDEX CONCURRENTLY - index_movies_on_review_embedding_e5_large -ON movies -USING ivfflat (review_embedding_e5_large vector_cosine_ops) -WITH (lists = 300); -``` - -!!! - -!!! results - -!!! - -!!! - -Now we can quickly search for movies by what people have said about them: - -!!! generic - -!!! code\_block time="122.000 ms" - -```postgresql -WITH request AS ( - SELECT pgml.embed( - 'Alibaba-NLP/gte-base-en-v1.5', - 'Best 1980''s scifi movie' - )::vector(1024) AS embedding -) -SELECT - title, - 1 - ( - review_embedding_e5_large <=> (SELECT embedding FROM request) - ) AS cosine_similarity -FROM movies -ORDER BY review_embedding_e5_large <=> (SELECT embedding FROM request) -LIMIT 10; -``` - -!!! - -!!! results - -| title | cosine\_similarity | -| ------------------------------------------------------------------ | ------------------ | -| THX 1138 (The George Lucas Director's Cut Special Edition/ 2-Disc) | 0.8652007733744973 | -| 2010: The Year We Make Contact | 0.8621574666546908 | -| Forbidden Planet | 0.861032948199611 | -| Alien | 0.8596578185151328 | -| Andromeda Strain | 0.8592793014849687 | -| Forbidden Planet | 0.8587316047371392 | -| Alien (The Director's Cut) | 0.8583879679255717 | -| Forbidden Planet (Two-Disc 50th Anniversary Edition) | 0.8577616472530644 | -| Strange New World | 0.8576321103975245 | -| It Came from Outer Space | 0.8575860003514065 | - -!!! - -!!! - -It's somewhat expected that the movie vectors will have been diluted compared to review vectors during aggregation, but we still have results with pretty high cosine similarity of \~0.85 (compared to \~0.95 for reviews). - -It's important to remember that we're doing _Approximate_ Nearest Neighbor (ANN) search, so we're not guaranteed to get the exact best results. When we were searching 5M reviews, it was more likely we'd find 5 good matches just because there were more candidates, but now that we have fewer movie candidates, we may want to dig deeper into the dataset to find more high quality matches. - -## Tuning vector indexes for recall vs speed - -Inverted File Indexes (IVF) are built by clustering all the vectors into `lists` using cosine similarity. Once the `lists` are created, their center is computed by summing all the vectors in the list. It's the same thing we did as clustering the reviews around their movies, except these clusters are just some arbitrary number of similar vectors. - -When we perform a vector search, we will compare to the center of all `lists` to find the closest ones. The default number of `probes` in a query is 1. In that case, only the closest `list` will be exhaustively searched. This reduces the number of vectors that need to be compared from 300,000 to (300 + 1000) = 1300. That saves a lot of work, but sometimes the best results were just on the edges of the `lists` we skipped. - -Most applications have an acceptable latency limit. If we have some latency budget to spare, it may be worth increasing the number of `probes` to check more `lists` for better recall. If we up the number of `probes` to 300, we can exhaustively search all lists and get the best possible results: - -```prostgresql -SET ivfflat.probes = 300; -``` - -!!! generic - -!!! code\_block time="2337.031 ms (00:02.337)" - -```postgresql -WITH request AS ( - SELECT pgml.embed( - 'Alibaba-NLP/gte-base-en-v1.5', - 'Best 1980''s scifi movie' - )::vector(1024) AS embedding -) -SELECT - title, - 1 - ( - review_embedding_e5_large <=> (SELECT embedding FROM request) - ) AS cosine_similarity -FROM movies -ORDER BY review_embedding_e5_large <=> (SELECT embedding FROM request) -LIMIT 10; -``` - -!!! - -!!! results - -| title | cosine\_similarity | -| ------------------------------------------------------------------ | ------------------ | -| THX 1138 (The George Lucas Director's Cut Special Edition/ 2-Disc) | 0.8652007733744973 | -| Big Trouble in Little China \[UMD for PSP] | 0.8649691870870362 | -| 2010: The Year We Make Contact | 0.8621574666546908 | -| Forbidden Planet | 0.861032948199611 | -| Alien | 0.8596578185151328 | -| Andromeda Strain | 0.8592793014849687 | -| Forbidden Planet | 0.8587316047371392 | -| Alien (The Director's Cut) | 0.8583879679255717 | -| Forbidden Planet (Two-Disc 50th Anniversary Edition) | 0.8577616472530644 | -| Strange New World | 0.8576321103975245 | - -!!! - -!!! - -There's a big difference in the time it takes to search 300,000 vectors vs 1,300 vectors, almost 20 times as long, although it does find one more vector that was not in the original list: - -| title | cosine\_similarity | -| ------------------------------------------ | ------------------ | -| Big Trouble in Little China \[UMD for PSP] | 0.8649691870870362 | - -This is a weird result. It's not Sci-Fi like all the others and it wasn't clustered with them in the closest list, which makes sense. So why did it rank so highly? Let's dig into the individual reviews to see if we can tell what's going on. - -## Digging deeper into recall quality - -SQL makes it easy to investigate these sorts of data issues. Let's look at the reviews for `Big Trouble in Little China [UMD for PSP]`, noting it only has 1 review. - -!!! generic - -!!! code\_block - -```postgresql -SELECT review_body -FROM pgml.amazon_us_reviews -WHERE product_title = 'Big Trouble in Little China [UMD for PSP]'; -``` - -!!! - -!!! results - -| review\_body | -| ----------------------- | -| Awesome 80's cult flick | - -!!! - -!!! - -This confirms our model has picked up on lingo like "flick" = "movie", and it seems it must have strongly associated "cult" flicks with the "scifi" genre. But, with only 1 review, there hasn't been any generalization in the movie embedding. It's a relatively strong match for a movie, even if it's not the best for a single review match (0.86 vs 0.95). - -Overall, our movie results look better to me than the titles pulled just from single reviews, but we haven't completely addressed The Dudes point as evidenced by this movie having a single review and being out of the requested genre. Embeddings often have fuzzy boundaries that we may need to firm up. - -## Adding a filter to the request - -To prevent noise in the data from leaking into our results, we can add a filter to the request to only consider movies with a minimum number of reviews. We can also add a filter to only consider movies with a minimum average review score with a `WHERE` clause. - -```prostgresql -SET ivfflat.probes = 1; -``` - -!!! generic - -!!! code\_block time="107.359 ms" - -```postgresql -WITH request AS ( - SELECT pgml.embed( - 'Alibaba-NLP/gte-base-en-v1.5', - 'query: Best 1980''s scifi movie' - )::vector(1024) AS embedding -) - -SELECT - title, - total_reviews, - 1 - ( - review_embedding_e5_large <=> (SELECT embedding FROM request) - ) AS cosine_similarity -FROM movies -WHERE total_reviews > 10 -ORDER BY review_embedding_e5_large <=> (SELECT embedding FROM request) -LIMIT 10; -``` - -!!! - -!!! results - -| title | total\_reviews | cosine\_similarity | -| ---------------------------------------------------- | -------------- | ------------------ | -| 2010: The Year We Make Contact | 29 | 0.8621574666546908 | -| Forbidden Planet | 202 | 0.861032948199611 | -| Alien | 250 | 0.8596578185151328 | -| Andromeda Strain | 30 | 0.8592793014849687 | -| Forbidden Planet | 19 | 0.8587316047371392 | -| Alien (The Director's Cut) | 193 | 0.8583879679255717 | -| Forbidden Planet (Two-Disc 50th Anniversary Edition) | 255 | 0.8577616472530644 | -| Strange New World | 27 | 0.8576321103975245 | -| It Came from Outer Space | 155 | 0.8575860003514065 | -| The Quatermass Xperiment (The Creeping Unknown) | 46 | 0.8572098277579617 | - -!!! - -!!! - -There we go. We've filtered out the noise, and now we're getting a list of movies that are all Sci-Fi. As we play with this dataset a bit, I'm getting the feeling that some of these are legit (Alien), but most of these are a bit too out on the fringe for my interests. I'd like to see more popular movies as well. Let's influence these rankings to take an additional popularity score into account. - -## Boosting and Reranking - -There are a few simple examples where NoSQL vector databases facilitate a killer app, like recalling text chunks to build a prompt to feed an LLM chatbot, but in most cases, it requires more context to create good search results from a user's perspective. - -As the Product Manager for this blog post search engine, I have an expectation that results should favor the movies that have more `total_reviews`, so that we can rely on an established consensus. Movies with higher `star_rating_avg` should also be boosted, because people very explicitly like those results. We can add boosts directly to our query to achieve this. - -SQL is a very expressive language that can handle a lot of complexity. To keep things clean, we'll move our current query into a second CTE that will provide a first-pass ranking for our initial semantic search candidates. Then, we'll re-score and rerank those first round candidates to refine the final result with a boost to the `ORDER BY` clause for movies with a higher `star_rating_avg`: - -!!! generic - -!!! code\_block time="124.119 ms" - -```postgresql --- create a request embedding on the fly -WITH request AS ( - SELECT pgml.embed( - 'Alibaba-NLP/gte-base-en-v1.5', - 'query: Best 1980''s scifi movie' - )::vector(1024) AS embedding -), - --- vector similarity search for movies -first_pass AS ( - SELECT - title, - total_reviews, - star_rating_avg, - 1 - ( - review_embedding_e5_large <=> (SELECT embedding FROM request) - ) AS cosine_similarity, - star_rating_avg / 5 AS star_rating_score - FROM movies - WHERE total_reviews > 10 - ORDER BY review_embedding_e5_large <=> (SELECT embedding FROM request) - LIMIT 1000 -) - --- grab the top 10 results, re-ranked with a boost for the avg star rating -SELECT - title, - total_reviews, - round(star_rating_avg, 2) as star_rating_avg, - star_rating_score, - cosine_similarity, - cosine_similarity + star_rating_score AS final_score -FROM first_pass -ORDER BY final_score DESC -LIMIT 10; -``` - -!!! - -!!! results - -| title | total\_reviews | star\_rating\_avg | final\_score | star\_rating\_score | cosine\_similarity | -| ---------------------------------------------------- | -------------: | ----------------: | -----------------: | ---------------------: | -----------------: | -| Forbidden Planet (Two-Disc 50th Anniversary Edition) | 255 | 4.82 | 1.8216832158805154 | 0.96392156862745098000 | 0.8577616472530644 | -| Back to the Future | 31 | 4.94 | 1.82090702765472 | 0.98709677419354838000 | 0.8338102534611714 | -| Warning Sign | 17 | 4.82 | 1.8136734057737756 | 0.96470588235294118000 | 0.8489675234208343 | -| Plan 9 From Outer Space/Robot Monster | 13 | 4.92 | 1.8126103400815046 | 0.98461538461538462000 | 0.8279949554661198 | -| Blade Runner: The Final Cut (BD) \[Blu-ray] | 11 | 4.82 | 1.8120690455673043 | 0.96363636363636364000 | 0.8484326819309408 | -| The Day the Earth Stood Still | 589 | 4.76 | 1.8076752363401547 | 0.95212224108658744000 | 0.8555529952535671 | -| Forbidden Planet \[Blu-ray] | 223 | 4.79 | 1.8067426345035993 | 0.95874439461883408000 | 0.8479982398847651 | -| Aliens (Special Edition) | 25 | 4.76 | 1.803194119705901 | 0.95200000000000000000 | 0.851194119705901 | -| Night of the Comet | 22 | 4.82 | 1.802469182369724 | 0.96363636363636364000 | 0.8388328187333605 | -| Forbidden Planet | 19 | 4.68 | 1.795573710000297 | 0.93684210526315790000 | 0.8587316047371392 | - -!!! - -!!! - -This is starting to look pretty good! True confessions: I'm really surprised "Empire Strikes Back" is not on this list. What is wrong with people these days?! I'm glad I called "Blade Runner" and "Back to the Future" though. Now, that I've got a list that is catering to my own sensibilities, I need to stop writing code and blog posts and watch some of these! In the next article, we'll look at incorporating more of ~~my preferences~~ a customer's preferences into the search results for effective personalization. - -P.S. I'm a little disappointed I didn't recall Aliens, because yeah, it's perfect 80's Sci-Fi, but that series has gone on so long I had associated it all with "vague timeframe". No one is perfect... right? I should probably watch "Plan 9 From Outer Space" & "Forbidden Planet", even though they are both 3 decades too early. I'm sure they are great! From f3cf188a07519536f246662c471c5a43dcf0d3ba Mon Sep 17 00:00:00 2001 From: SilasMarvin <19626586+SilasMarvin@users.noreply.github.com> Date: Wed, 17 Jul 2024 11:10:55 -0700 Subject: [PATCH 3/4] Small clean ups --- pgml-cms/docs/SUMMARY.md | 5 ++-- .../docs/open-source/korvus/guides/README.md | 2 ++ .../{pgml => korvus}/guides/opensourceai.md | 0 .../docs/open-source/pgml/guides/README.md | 26 +++++++++---------- 4 files changed, 16 insertions(+), 17 deletions(-) rename pgml-cms/docs/open-source/{pgml => korvus}/guides/opensourceai.md (100%) diff --git a/pgml-cms/docs/SUMMARY.md b/pgml-cms/docs/SUMMARY.md index 25c44b815..c0d10d814 100644 --- a/pgml-cms/docs/SUMMARY.md +++ b/pgml-cms/docs/SUMMARY.md @@ -49,7 +49,7 @@ * [Hyperparameter Search](open-source/pgml/api/pgml.train/hyperparameter-search.md) * [Joint Optimization](open-source/pgml/api/pgml.train/joint-optimization.md) * [pgml.tune()](open-source/pgml/api/pgml.tune.md) - * [Guides]() + * [Guides](open-source/pgml/guides/README.md) * [Embeddings](open-source/pgml/guides/embeddings/README.md) * [In-database Generation](open-source/pgml/guides/embeddings/in-database-generation.md) * [Dimensionality Reduction](open-source/pgml/guides/embeddings/dimensionality-reduction.md) @@ -58,10 +58,8 @@ * [Normalization](open-source/pgml/guides/embeddings/vector-normalization.md) * [Search](open-source/pgml/guides/improve-search-results-with-machine-learning.md) * [Chatbots](open-source/pgml/guides/chatbots/README.md) - * [Example Application](TODO/chatbots.md) * [Supervised Learning](open-source/pgml/guides/supervised-learning.md) * [Unified RAG](open-source/pgml/guides/unified-rag.md) - * [OpenSourceAI](open-source/pgml/guides/opensourceai.md) * [Natural Language Processing](open-source/pgml/guides/natural-language-processing.md) * [Vector database](open-source/pgml/guides/vector-database.md)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy