Skip to content

Commit e909ac5

Browse files
authored
Merge branch 'master' into set-omp-num-threads
2 parents 2a1853c + c322729 commit e909ac5

File tree

68 files changed

+1027
-132
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+1027
-132
lines changed
49.1 KB
Loading

pgml-cms/blog/SUMMARY.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# Table of contents
22

33
* [Home](README.md)
4-
* [Introducing the OpenAI Switch Kit: Move from closed to open-source AI in minutes](introducing-the-openai-switch-kit-move-from-closed-to-open-source-ai-in-minutes.md)
5-
* [Speeding up vector recall 5x with HNSW](speeding-up-vector-recall-5x-with-hnsw.md)
6-
* [How-to Improve Search Results with Machine Learning](how-to-improve-search-results-with-machine-learning.md)
74
* [Meet us at the 2024 Postgres Conference!](meet-us-at-the-2024-postgres-conference.md)
85
* [The 1.0 SDK is Here](the-1.0-sdk-is-here.md)
96
* [Using PostgresML with Django and embedding search](using-postgresml-with-django-and-embedding-search.md)
107
* [PostgresML is going multicloud](postgresml-is-going-multicloud.md)
8+
* [Introducing the OpenAI Switch Kit: Move from closed to open-source AI in minutes](introducing-the-openai-switch-kit-move-from-closed-to-open-source-ai-in-minutes.md)
9+
* [Speeding up vector recall 5x with HNSW](speeding-up-vector-recall-5x-with-hnsw.md)
10+
* [How-to Improve Search Results with Machine Learning](how-to-improve-search-results-with-machine-learning.md)
1111
* [pgml-chat: A command-line tool for deploying low-latency knowledge-based chatbots](pgml-chat-a-command-line-tool-for-deploying-low-latency-knowledge-based-chatbots-part-i.md)
1212
* [Announcing Support for AWS us-east-1 Region](announcing-support-for-aws-us-east-1-region.md)
1313
* [LLM based pipelines with PostgresML and dbt (data build tool)](llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md)
@@ -30,3 +30,4 @@
3030
* [Postgres Full Text Search is Awesome!](postgres-full-text-search-is-awesome.md)
3131
* [Oxidizing Machine Learning](oxidizing-machine-learning.md)
3232
* [Data is Living and Relational](data-is-living-and-relational.md)
33+
* [Sentiment Analysis using Express JS and PostgresML](sentiment-analysis-using-express-js-and-postgresml.md)

pgml-cms/blog/announcing-support-for-aws-us-east-1-region.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
2-
description: >-
3-
We added aws us east 1 to our list of support aws regions.
42
featured: false
5-
tags: [product]
3+
tags:
4+
- product
5+
description: We added aws us east 1 to our list of support aws regions.
66
---
77

88
# Announcing Support for AWS us-east-1 Region
@@ -27,8 +27,12 @@ To demonstrate the impact of moving the data closer to your application, we've c
2727

2828
<figure><img src=".gitbook/assets/image (8).png" alt=""><figcaption></figcaption></figure>
2929

30+
\\
31+
3032
<figure><img src=".gitbook/assets/image (9).png" alt=""><figcaption></figcaption></figure>
3133

34+
\\
35+
3236
## Using the New Region
3337

3438
To take advantage of latency savings, you can [deploy a dedicated PostgresML database](https://postgresml.org/signup) in `us-east-1` today. We make it as simple as filling out a very short form and clicking "Create database".

pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
2+
image: .gitbook/assets/blog_image_generating_llm_embeddings.png
3+
features: true
24
description: >-
35
How to use the pgml.embed(...) function to generate embeddings with free and
46
open source models in your own database.
5-
image: ".gitbook/assets/blog_image_generating_llm_embeddings.png"
6-
features: true
77
---
88

99
# Generating LLM embeddings with open source models in PostgresML
@@ -18,14 +18,14 @@ Montana Low
1818

1919
April 21, 2023
2020

21-
PostgresML makes it easy to generate embeddings from text in your database using a large selection of state-of-the-art models with one simple call to `pgml.embed(model_name, text)`. Prove the results in this series to your own satisfaction, for free, by signing up for a GPU accelerated database.
21+
PostgresML makes it easy to generate embeddings from text in your database using a large selection of state-of-the-art models with one simple call to **`pgml.embed`**`(model_name, text)`. Prove the results in this series to your own satisfaction, for free, by signing up for a GPU accelerated database.
2222

2323
This article is the first in a multipart series that will show you how to build a post-modern semantic search and recommendation engine, including personalization, using open source models.
2424

25-
1. [Generating LLM Embeddings with HuggingFace models](generating-llm-embeddings-with-open-source-models-in-postgresml.md)
26-
2. [Tuning vector recall with pgvector](tuning-vector-recall-while-generating-query-embeddings-in-the-database.md)
27-
3. [Personalizing embedding results with application data](personalize-embedding-results-with-application-data-in-your-database.md)
28-
4. [Optimizing semantic results with an XGBoost ranking model](/docs/use-cases/improve-search-results-with-machine-learning)
25+
1. Generating LLM Embeddings with HuggingFace models
26+
2. Tuning vector recall with pgvector
27+
3. Personalizing embedding results with application data
28+
4. Optimizing semantic results with an XGBoost ranking model - coming soon!
2929

3030
## Introduction
3131

@@ -216,6 +216,8 @@ For comparison, it would cost about $299 to use OpenAI's cheapest embedding mode
216216
| GPU | 17ms | $72 | 6 hours |
217217
| OpenAI | 300ms | $299 | millennia |
218218

219+
\\
220+
219221
You can also find embedding models that outperform OpenAI's `text-embedding-ada-002` model across many different tests on the [leaderboard](https://huggingface.co/spaces/mteb/leaderboard). It's always best to do your own benchmarking with your data, models, and hardware to find the best fit for your use case.
220222

221223
> _HTTP requests to a different datacenter cost more time and money for lower reliability than co-located compute and storage._

pgml-cms/blog/meet-us-at-the-2024-postgres-conference.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,26 +12,29 @@ description: Announcing our sponsorship of the Postgres Conference in San Jose A
1212

1313
Cassandra Stumer
1414

15-
March 20, 2024
15+
March 20, 2023
1616

17-
Hey database aficionados, mark your calendars because something big is coming your way! We're thrilled to announce that we will be sponsoring the[ 2024 Postgres Conference](https://postgresconf.org/conferences/2024) – the marquee PostgreSQL conference event for North America.&#x20;
17+
Hey database aficionados, mark your calendars because something big is coming your way! We're thrilled to announce that we will be sponsoring the[ 2024 Postgres Conference](https://postgresconf.org/conferences/2024) – the marquee PostgreSQL conference event for North America.
1818

1919
Why should you care? It's not every day you get to dive headfirst into the world of Postgres with folks who eat, sleep, and breathe data. We're talking hands-on workshops, lightning talks, and networking galore. Whether you're itching to sharpen your SQL skills or keen to explore the frontier of machine learning in the database, we've got you covered.
2020

2121
{% hint style="info" %}
2222
Save 25% on your ticket with our discount code: 2024\_POSTGRESML\_25
2323
{% endhint %}
2424

25-
PostgresML CEO and founder, Montana Low, will kick off the event on April 17th with a keynote about navigating the confluence of hardware evolution and machine learning technology.&#x20;
25+
\
26+
PostgresML CEO and founder, Montana Low, will kick off the event on April 17th with a keynote about navigating the confluence of hardware evolution and machine learning technology.
2627

27-
We’ll also be hosting a masterclass in retrieval augmented generation (RAG) on April 18th. Our own Silas Marvin will give hands-on guidance to equip you with the ability to implement RAG directly within your database.&#x20;
28+
We’ll also be hosting a masterclass in retrieval augmented generation (RAG) on April 18th. Our own Silas Marvin will give hands-on guidance to equip you with the ability to implement RAG directly within your database.
2829

29-
But wait, there's more! Our senior team will be at our booth at all hours to get to know you, talk shop, and answer any questions you may have. Whether it's about PostgresML, machine learning, or all the sweet merch we’ll have on deck.&#x20;
30+
But wait, there's more! Our senior team will be at our booth at all hours to get to know you, talk shop, and answer any questions you may have. Whether it's about PostgresML, machine learning, or all the sweet merch we’ll have on deck.
3031

3132
{% hint style="info" %}
32-
If you’d like some 1:1 time with our team at PgConf [contact us here](https://postgresml.org/contact). We’d be happy to prep something special for you.&#x20;
33+
If you’d like some 1:1 time with our team at PgConf [contact us here](https://postgresml.org/contact). We’d be happy to prep something special for you.
3334
{% endhint %}
3435

3536
So, why sit on the sidelines when you could be right in the thick of it, soaking up knowledge, making connections, and maybe even stumbling upon your next big breakthrough? Clear your schedule, grab your ticket, and get ready to geek out with us in San Jose.
3637

3738
See you there!
39+
40+
\\

pgml-cms/blog/mindsdb-vs-postgresml.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ Both Projects integrate several dozen machine learning algorithms, including the
4747
| Full Text Search | - ||
4848
| Geospatial Search | - ||
4949

50+
\\
51+
5052
Both MindsDB and PostgresML support many classical machine learning algorithms to do classification and regression. They are both able to load ~~the latest LLMs~~ some models from Hugging Face, supported by underlying implementations in libtorch. I had to cross that out after exploring all the caveats in the MindsDB implementations. PostgresML supports the models released immediately as long as underlying dependencies are met. MindsDB has to release an update to support any new models, and their current model support is extremely limited. New algorithms, tasks, and models are constantly released, so it's worth checking the documentation for the latest list.
5153

5254
Another difference is that PostgresML also supports embedding models, and closely integrates them with vector search inside the database, which is well beyond the scope of MindsDB, since it's not a database at all. PostgresML has direct access to all the functionality provided by other Postgres extensions, like vector indexes from [pgvector](https://github.com/pgvector/pgvector) to perform efficient KNN & ANN vector recall, or [PostGIS](http://postgis.net/) for geospatial information as well as built in full text search. Multiple algorithms and extensions can be combined in compound queries to build state-of-the-art systems, like search and recommendations or fraud detection that generate an end to end result with a single query, something that might take a dozen different machine learning models and microservices in a more traditional architecture.
@@ -68,8 +70,7 @@ The architectural implementations for these projects is significantly different.
6870
| On Premise |||
6971
| Web UI |||
7072

71-
\
72-
73+
\\
7374

7475
The difference in architecture leads to different tradeoffs and challenges. There are already hundreds of ways to get data into and out of a Postgres database, from just about every other service, language and platform that makes PostgresML highly compatible with other application workflows. On the other hand, the MindsDB Python service accepts connections from specifically supported clients like `psql` and provides a pseudo-SQL interface to the functionality. The service will parse incoming MindsDB commands that look similar to SQL (but are not), for tasks like configuring database connections, or doing actual machine learning. These commands typically have what looks like a sub-select, that will actually fetch data over the wire from configured databases for Machine Learning training and inference.
7576

@@ -297,6 +298,8 @@ PostgresML is the clear winner in terms of performance. It seems to me that it c
297298
| translation\_en\_to\_es | t5-base | 1573 | 1148 | 294 |
298299
| summarization | sshleifer/distilbart-cnn-12-6 | 4289 | 3450 | 479 |
299300

301+
\\
302+
300303
There is a general trend, the larger and slower the model is, the more work is spent inside libtorch, the less the performance of the rest matters, but for interactive models and use cases there is a significant difference. We've tried to cover the most generous use case we could between these two. If we were to compare XGBoost or other classical algorithms, that can have sub millisecond prediction times in PostgresML, the 20ms Python service overhead of MindsDB just to parse the incoming query would be hundreds of times slower.
301304

302305
## Clouds

pgml-cms/blog/personalize-embedding-results-with-application-data-in-your-database.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,10 @@ PostgresML makes it easy to generate embeddings using open source models from Hu
2222

2323
This article is the third in a multipart series that will show you how to build a post-modern semantic search and recommendation engine, including personalization, using open source models. You may want to start with the previous articles in the series if you aren't familiar with PostgresML's capabilities.
2424

25-
1. [Generating LLM Embeddings with HuggingFace models](generating-llm-embeddings-with-open-source-models-in-postgresml.md)
26-
2. [Tuning vector recall with pgvector](tuning-vector-recall-while-generating-query-embeddings-in-the-database.md)
27-
3. [Personalizing embedding results with application data](personalize-embedding-results-with-application-data-in-your-database.md)
28-
4. [Optimizing semantic results with an XGBoost ranking model](/docs/use-cases/improve-search-results-with-machine-learning)
25+
1. Generating LLM Embeddings with HuggingFace models
26+
2. Tuning vector recall with pgvector
27+
3. Personalizing embedding results with application data
28+
4. Optimizing semantic results with an XGBoost ranking model - coming soon!
2929

3030
<figure><img src=".gitbook/assets/image (24).png" alt=""><figcaption><p>Embeddings can be combined into personalized perspectives when stored as vectors in the database.</p></figcaption></figure>
3131

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
---
2+
description: >-
3+
An example application for an easy and scalable way to get started with
4+
machine learning in Express
5+
---
6+
7+
# Sentiment Analysis using Express JS and PostgresML
8+
9+
<div align="left">
10+
11+
<figure><img src=".gitbook/assets/daniel.jpg" alt="Author" width="125"><figcaption><p>Daniel Illenberger</p></figcaption></figure>
12+
13+
</div>
14+
15+
Daniel Illenberger
16+
17+
March 26, 2024
18+
19+
Traditional MLOps requires continuously moving data between models and storage. Both small and large projects suffer with such an implementation on the metrics of time, cost, and complexity. PostgresML simplifies and streamlines MLOps by performing machine learning directly where your data resides.
20+
21+
Express is a mature JS backend framework touted as being fast and flexible. It is a popular choice for JS developers wanting to quickly develop an API or full fledge website. Since it is in the JS ecosystem, there's an endless number of open source projects you can use to add functionality.
22+
23+
### Application Overview
24+
25+
Sentiment analysis is a valuable tool for understanding the emotional polarity of text. You can determine if the text is positive, negative, or neutral. Common use cases include understanding product reviews, survey questions, and social media posts.
26+
27+
In this application, we'll be applying sentiment analysis to note taking. Note taking and journaling can be an excellent practice for work efficiency and self improvement. However, if you are like me, it quickly becomes impossible to find and make use of anything I've written down. Notes that are useful must be easy to navigate. With this motivation, let's create a demo that can record notes throughout the day. Each day will have a summary and sentiment score. That way, if I'm looking for that time a few weeks ago when we were frustrated with our old MLOps platform — it will be easy to find.&#x20;
28+
29+
We will perform all the Machine Learning heavy lifting with the pgml extension function `pgml.transform()`. This brings Hugging Face Transformers into our data layer.
30+
31+
### Follow Along
32+
33+
You can see the full code on [GitHub](https://github.com/postgresml/example-expressjs). Follow the Readme to get the application up and running on your local machine.
34+
35+
### The Code
36+
37+
This app is composed of three main parts, reading and writing to a database, performing sentiment analysis on entries, and creating a summary.
38+
39+
We are going to use [postgresql-client](https://www.npmjs.com/package/postgresql-client) to connect to our DB.&#x20;
40+
41+
When the application builds we ensure we have two tables, one for notes and one for the the daily summary and sentiment score.
42+
43+
```javascript
44+
const notes = await connection.execute(`
45+
CREATE TABLE IF NOT EXISTS notes (
46+
id BIGSERIAL PRIMARY KEY,
47+
note VARCHAR,
48+
score FLOAT,
49+
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
50+
);`
51+
)
52+
53+
const day = await connection.execute(`
54+
CREATE TABLE IF NOT EXISTS days (
55+
id BIGSERIAL PRIMARY KEY,
56+
summary VARCHAR,
57+
score FLOAT,
58+
created_at DATE NOT NULL UNIQUE DEFAULT DATE(NOW())
59+
);`
60+
)
61+
```
62+
63+
We also have three endpoints to hit:
64+
65+
* `app.get(“/", async (req, res, next)` which returns all the notes for that day and the daily summary.&#x20;
66+
* `app.post(“/add", async (req, res, next)` which accepts a new note entry and performs a sentiment analysis. We simplify the score by converting it to 1, 0, -1 for positive, neutral, negative and save it in our notes table.
67+
68+
```sql
69+
WITH note AS (
70+
SELECT pgml.transform(
71+
inputs => ARRAY['${req.body.note}'],
72+
task => '{"task": "text-classification", "model": "finiteautomata/bertweet-base-sentiment-analysis"}'::JSONB
73+
) AS market_sentiment
74+
),
75+
76+
score AS (
77+
SELECT
78+
CASE
79+
WHEN (SELECT market_sentiment FROM note)[0]::JSONB ->> 'label' = 'POS' THEN 1
80+
WHEN (SELECT market_sentiment FROM note)[0]::JSONB ->> 'label' = 'NEG' THEN -1
81+
ELSE 0
82+
END AS score
83+
)
84+
85+
INSERT INTO notes (note, score) VALUES ('${req.body.note}', (SELECT score FROM score));
86+
87+
```
88+
89+
* `app.get(“/analyze”, async (req, res, next)` which takes the daily entries, produces a summary and total sentiment score, and places that into our days table.
90+
91+
```sql
92+
WITH day AS (
93+
SELECT
94+
note,
95+
score
96+
FROM notes
97+
WHERE DATE(created_at) = DATE(NOW())),
98+
99+
sum AS (
100+
SELECT pgml.transform(
101+
task => '{"task": "summarization", "model": "sshleifer/distilbart-cnn-12-6"}'::JSONB,
102+
inputs => array[(SELECT STRING_AGG(note, '\n') FROM day)],
103+
args => '{"min_length" : 20, "max_length" : 70}'::JSONB
104+
) AS summary
105+
)
106+
107+
INSERT INTO days (summary, score)
108+
VALUES ((SELECT summary FROM sum)[0]::JSONB ->> 'summary_text', (SELECT SUM(score) FROM day))
109+
On Conflict (created_at) DO UPDATE SET summary=EXCLUDED.summary, score=EXCLUDED.score
110+
RETURNING score;
111+
```
112+
113+
and this is all that is required!
114+
115+
### Test Run
116+
117+
Let's imagine a day in the life of a boy destined to save the galaxy. Throughout his day he records the following notes:
118+
119+
```
120+
Woke to routine chores. Bought droids, found Leia's message. She pleads for help from Obi-Wan Kenobi. Intrigued, but uncertain.
121+
```
122+
123+
```
124+
Frantically searched for R2-D2, encountered Sand People. Saved by Obi-Wan. His presence is a glimmer of hope in this desolate place.
125+
```
126+
127+
```
128+
Returned home to find it destroyed by stormtroopers. Aunt and uncle gone. Rage and despair fill me. Empire's cruelty knows no bounds.
129+
```
130+
131+
```
132+
Left Tatooine with Obi-Wan, droids. Met Han Solo and Chewbacca in Mos Eisley. Sense of purpose grows despite uncertainty. Galaxy awaits.
133+
```
134+
135+
```
136+
On our way to Alderaan. With any luck we will find the princes soon.
137+
```
138+
139+
When we analyze this info we get a score of 2 and our summary is:
140+
141+
```
142+
Returned home to find it destroyed by stormtroopers . Bought droids, found Leia's message . Met Han Solo and Chewbacca in Mos Eisley . Sense of purpose grows despite uncertainty .
143+
```
144+
145+
not bad for less than an hour of coding.
146+
147+
### Final Thoughts
148+
149+
This app is far from complete but does show an easy and scalable way to get started with ML in Express. From here I encourage you to head over to our [docs](https://postgresml.org/docs/api/sql-extension/) and see what other features could be added.
150+
151+
If SQL is not your thing, no worries. Check out or [JS SDK](https://postgresml.org/docs/api/client-sdk/getting-started) to streamline all our best practices with simple JavaScript.&#x20;
152+
153+
We love hearing from you — please reach out to us on [Discord ](https://discord.gg/DmyJP3qJ7U)or simply [Contact Us](https://postgresml.org/contact) here if you have any questions or feedback.&#x20;

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy