Skip to content

Montana/guides #1569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ LIMIT 5;

## Generating embeddings from natural language text

PostgresML provides a simple interface to generate embeddings from text in your database. You can use the [`pgml.embed`](https://postgresml.org/docs/guides/transformers/embeddings) function to generate embeddings for a column of text. The function takes a transformer name and a text value. The transformer will automatically be downloaded and cached on your connection process for reuse. You can see a list of potential good candidate models to generate embeddings on the [Massive Text Embedding Benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
PostgresML provides a simple interface to generate embeddings from text in your database. You can use the [`pgml.embed`](https://postgresml.org/docs/open-source/pgml/guides/transformers/embeddings) function to generate embeddings for a column of text. The function takes a transformer name and a text value. The transformer will automatically be downloaded and cached on your connection process for reuse. You can see a list of potential good candidate models to generate embeddings on the [Massive Text Embedding Benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard).

Since our corpus of documents (movie reviews) are all relatively short and similar in style, we don't need a large model. [`Alibaba-NLP/gte-base-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) will be a good first attempt. The great thing about PostgresML is you can always regenerate your embeddings later to experiment with different embedding models.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ We have truncated the output to two items

!!!

We also have asynchronous versions of the create and `create_stream` functions relatively named `create_async` and `create_stream_async`. Checkout [our documentation](https://postgresml.org/docs/guides/opensourceai) for a complete guide of the open-source AI SDK including guides on how to specify custom models.
We also have asynchronous versions of the create and `create_stream` functions relatively named `create_async` and `create_stream_async`. Checkout [our documentation](https://postgresml.org/docs/open-source/pgml/guides/opensourceai) for a complete guide of the open-source AI SDK including guides on how to specify custom models.

PostgresML is free and open source. To run the above examples yourself [create an account](https://postgresml.org/signup), install korvus, and get running!

Expand Down
2 changes: 1 addition & 1 deletion pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ SELECT '[1,2,3]'::vector <=> '[2,3,4]'::vector;

!!!

Other distance functions have similar formulas and provide convenient operators to use as well. It may be worth testing other operators and to see which performs better for your use case. For more information on the other distance functions, take a look at our [Embeddings guide](https://postgresml.org/docs/guides/embeddings/vector-similarity).
Other distance functions have similar formulas and provide convenient operators to use as well. It may be worth testing other operators and to see which performs better for your use case. For more information on the other distance functions, take a look at our [Embeddings guide](https://postgresml.org/docs/open-source/pgml/guides/embeddings/vector-similarity).

Going back to our search example, we can compute the cosine distance between our query embedding and our documents:

Expand Down
129 changes: 83 additions & 46 deletions pgml-cms/docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,16 @@
* [Getting started](introduction/getting-started/README.md)
* [Create your database](introduction/getting-started/create-your-database.md)
* [Connect your app](introduction/getting-started/connect-your-app.md)
* [Import your data](introduction/getting-started/import-your-data/README.md)
* [Logical replication](introduction/getting-started/import-your-data/logical-replication/README.md)
* [Foreign Data Wrappers](introduction/getting-started/import-your-data/foreign-data-wrappers.md)
* [Move data with COPY](introduction/getting-started/import-your-data/copy.md)
* [Migrate with pg_dump](introduction/getting-started/import-your-data/pg-dump.md)
* [Import your data](introduction/import-your-data/README.md)
* [Logical replication](introduction/import-your-data/logical-replication/README.md)
* [Foreign Data Wrappers](introduction/import-your-data/foreign-data-wrappers.md)
* [Move data with COPY](introduction/import-your-data/copy.md)
* [Migrate with pg_dump](introduction/import-your-data/pg-dump.md)
* [Storage & Retrieval](introduction/import-your-data/storage-and-retrieval/README.md)
* [Documents](introduction/import-your-data/storage-and-retrieval/documents.md)
* [Partitioning](introduction/import-your-data/storage-and-retrieval/partitioning.md)
* [LLM based pipelines with PostgresML and dbt (data build tool)](introduction/import-your-data/storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md)
* [FAQ](introduction/faq.md)

## Open Source

Expand Down Expand Up @@ -44,6 +49,61 @@
* [Hyperparameter Search](open-source/pgml/api/pgml.train/hyperparameter-search.md)
* [Joint Optimization](open-source/pgml/api/pgml.train/joint-optimization.md)
* [pgml.tune()](open-source/pgml/api/pgml.tune.md)
* [Guides](open-source/pgml/guides/README.md)
* [Embeddings](open-source/pgml/guides/embeddings/README.md)
* [In-database Generation](open-source/pgml/guides/embeddings/in-database-generation.md)
* [Dimensionality Reduction](open-source/pgml/guides/embeddings/dimensionality-reduction.md)
* [Aggregation](open-source/pgml/guides/embeddings/vector-aggregation.md)
* [Similarity](open-source/pgml/guides/embeddings/vector-similarity.md)
* [Normalization](open-source/pgml/guides/embeddings/vector-normalization.md)
* [Search](open-source/pgml/guides/improve-search-results-with-machine-learning.md)
* [Chatbots](open-source/pgml/guides/chatbots/README.md)
* [Supervised Learning](open-source/pgml/guides/supervised-learning.md)
* [Unified RAG](open-source/pgml/guides/unified-rag.md)
* [Natural Language Processing](open-source/pgml/guides/natural-language-processing.md)
* [Vector database](open-source/pgml/guides/vector-database.md)
<!--
* [Search]()
* [Keyword Search]()
* [Vector Search]()
* [Hybrid Search]()
* [Ranking]()
* [Transformers & LLMs]()
* [Text Generation]()
* [Prompt Engineering]()
* [Unified RAG]()
* [Personalization]()
* [Recommendations]()
* [Forecasting]()
* [Time series]()
* [Events]()
* [Fraud Detection]()
* [Incentive Optimization]()
* [Machine Learning]()
* [Feature Engineering]()
* [Regression]()
* [Classification]()
* [Clustering]()
* [Matrix Decomposition]()
* [Natural Language Processing]()
* [Tokenization]()
* [Chunking]()
* [Text Generation]()
* [Sentiment Analysis]()
* [Summarization]()
-->
* [Developers](open-source/pgml/developers/README.md)
* [Local Docker Development](open-source/pgml/developers/quick-start-with-docker.md)
* [Installation](open-source/pgml/developers/installation.md)
* [Contributing](open-source/pgml/developers/contributing.md)
* [Distributed Training](open-source/pgml/developers/distributed-training.md)
* [GPU Support](open-source/pgml/developers/gpu-support.md)
* [Self-hosting](open-source/pgml/developers/self-hosting/README.md)
* [Pooler](open-source/pgml/developers/self-hosting/pooler.md)
* [Building from source](open-source/pgml/developers/self-hosting/building-from-source.md)
* [Replication](open-source/pgml/developers/self-hosting/replication.md)
* [Backups](open-source/pgml/developers/self-hosting/backups.md)
* [Running on EC2](open-source/pgml/developers/self-hosting/running-on-ec2.md)
* [Korvus](open-source/korvus/README.md)
* [API](open-source/korvus/api/README.md)
* [Collections](open-source/korvus/api/collections.md)
Expand All @@ -53,6 +113,7 @@
* [RAG](open-source/korvus/guides/rag.md)
* [Vector Search](open-source/korvus/guides/vector-search.md)
* [Document Search](open-source/korvus/guides/document-search.md)
* [OpenSourceAI](open-source/korvus/guides/opensourceai.md)
* [Example Apps](open-source/korvus/example-apps/README.md)
* [Semantic Search](open-source/korvus/example-apps/semantic-search.md)
* [RAG with OpenAI](open-source/korvus/example-apps/rag-with-openai.md)
Expand All @@ -69,48 +130,24 @@
* [Enterprise](cloud/enterprise/README.md)
* [Teams](cloud/enterprise/teams.md)
* [VPC](cloud/enterprise/vpc.md)
* [Privacy Policy](cloud/privacy-policy.md)
* [Terms of Service](cloud/terms-of-service.md)

## Guides

* [Embeddings](guides/embeddings/README.md)
* [In-database Generation](guides/embeddings/in-database-generation.md)
* [Dimensionality Reduction](guides/embeddings/dimensionality-reduction.md)
* [Aggregation](guides/embeddings/vector-aggregation.md)
* [Similarity](guides/embeddings/vector-similarity.md)
* [Normalization](guides/embeddings/vector-normalization.md)
* [Search](guides/improve-search-results-with-machine-learning.md)
* [Chatbots](guides/chatbots/README.md)
* [Example Application](use-cases/chatbots.md)
* [Supervised Learning](guides/supervised-learning.md)
* [Unified RAG](guides/unified-rag.md)
* [OpenSourceAI](guides/opensourceai.md)
* [Natural Language Processing](guides/natural-language-processing.md)
* [Vector database](guides/vector-database.md)

## Resources
<!--
## TODO

-- Merge into Introduction > Overview
* [Architecture](resources/architecture/README.md)
* [Why PostgresML?](resources/architecture/why-postgresml.md)
* [FAQs](resources/faqs.md)
* [Data Storage & Retrieval](resources/data-storage-and-retrieval/README.md)
* [Documents](resources/data-storage-and-retrieval/documents.md)
* [Partitioning](resources/data-storage-and-retrieval/partitioning.md)
* [LLM based pipelines with PostgresML and dbt (data build tool)](resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md)
* [Benchmarks](resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md)
* [PostgresML is 8-40x faster than Python HTTP microservices](resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md)
* [Scaling to 1 Million Requests per Second](resources/benchmarks/million-requests-per-second.md)
* [MindsDB vs PostgresML](resources/benchmarks/mindsdb-vs-postgresml.md)
* [GGML Quantized LLM support for Huggingface Transformers](resources/benchmarks/ggml-quantized-llm-support-for-huggingface-transformers.md)
* [Making Postgres 30 Percent Faster in Production](resources/benchmarks/making-postgres-30-percent-faster-in-production.md)
* [Developer Docs](resources/developer-docs/README.md)
* [Local Docker Development](resources/developer-docs/quick-start-with-docker.md)
* [Installation](resources/developer-docs/installation.md)
* [Contributing](resources/developer-docs/contributing.md)
* [Distributed Training](resources/developer-docs/distributed-training.md)
* [GPU Support](resources/developer-docs/gpu-support.md)
* [Self-hosting](resources/developer-docs/self-hosting/README.md)
* [Pooler](resources/developer-docs/self-hosting/pooler.md)
* [Building from source](resources/developer-docs/self-hosting/building-from-source.md)
* [Replication](resources/developer-docs/self-hosting/replication.md)
* [Backups](resources/developer-docs/self-hosting/backups.md)
* [Running on EC2](resources/developer-docs/self-hosting/running-on-ec2.md)

## Reference

* [SQL]()
* [Explain plans]()
* [Composition]()
* [LLMs]()
* [LLama]()
* [GPT]()
* [Facon]()
* [Glossary]()
-->
File renamed without changes.
Loading
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy