postgresml
diff --git a/‎pgml-cms/blog/.gitbook/assets/daniel.jpg
49.1 KB b/‎pgml-cms/blog/.gitbook/assets/daniel.jpg
49.1 KB
diff --git a/‎pgml-cms/blog/SUMMARY.md
Lines changed: 4 additions & 3 deletions b/‎pgml-cms/blog/SUMMARY.md
Lines changed: 4 additions & 3 deletions
diff --git a/‎pgml-cms/blog/announcing-support-for-aws-us-east-1-region.md
Lines changed: 7 additions & 3 deletions b/‎pgml-cms/blog/announcing-support-for-aws-us-east-1-region.md
Lines changed: 7 additions & 3 deletions
diff --git a/‎pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md
Lines changed: 9 additions & 7 deletions b/‎pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md
Lines changed: 9 additions & 7 deletions
diff --git a/‎pgml-cms/blog/meet-us-at-the-2024-postgres-conference.md
Lines changed: 9 additions & 6 deletions b/‎pgml-cms/blog/meet-us-at-the-2024-postgres-conference.md
Lines changed: 9 additions & 6 deletions
diff --git a/‎pgml-cms/blog/mindsdb-vs-postgresml.md
Lines changed: 5 additions & 2 deletions b/‎pgml-cms/blog/mindsdb-vs-postgresml.md
Lines changed: 5 additions & 2 deletions
diff --git a/‎pgml-cms/blog/personalize-embedding-results-with-application-data-in-your-database.md
Lines changed: 4 additions & 4 deletions b/‎pgml-cms/blog/personalize-embedding-results-with-application-data-in-your-database.md
Lines changed: 4 additions & 4 deletions
diff --git a/‎pgml-cms/blog/sentiment-analysis-using-express-js-and-postgresml.md
Lines changed: 153 additions & 0 deletions b/‎pgml-cms/blog/sentiment-analysis-using-express-js-and-postgresml.md
Lines changed: 153 additions & 0 deletions
@@ -1,13 +1,13 @@
 # Table of contents
 
 * [Home](README.md)
-* [Introducing the OpenAI Switch Kit: Move from closed to open-source AI in minutes](introducing-the-openai-switch-kit-move-from-closed-to-open-source-ai-in-minutes.md)
-* [Speeding up vector recall 5x with HNSW](speeding-up-vector-recall-5x-with-hnsw.md)
-* [How-to Improve Search Results with Machine Learning](how-to-improve-search-results-with-machine-learning.md)
 * [Meet us at the 2024 Postgres Conference!](meet-us-at-the-2024-postgres-conference.md)
 * [The 1.0 SDK is Here](the-1.0-sdk-is-here.md)
 * [Using PostgresML with Django and embedding search](using-postgresml-with-django-and-embedding-search.md)
 * [PostgresML is going multicloud](postgresml-is-going-multicloud.md)
+* [Introducing the OpenAI Switch Kit: Move from closed to open-source AI in minutes](introducing-the-openai-switch-kit-move-from-closed-to-open-source-ai-in-minutes.md)
+* [Speeding up vector recall 5x with HNSW](speeding-up-vector-recall-5x-with-hnsw.md)
+* [How-to Improve Search Results with Machine Learning](how-to-improve-search-results-with-machine-learning.md)
 * [pgml-chat: A command-line tool for deploying low-latency knowledge-based chatbots](pgml-chat-a-command-line-tool-for-deploying-low-latency-knowledge-based-chatbots-part-i.md)
 * [Announcing Support for AWS us-east-1 Region](announcing-support-for-aws-us-east-1-region.md)
 * [LLM based pipelines with PostgresML and dbt (data build tool)](llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md)
@@ -30,3 +30,4 @@
 * [Postgres Full Text Search is Awesome!](postgres-full-text-search-is-awesome.md)
 * [Oxidizing Machine Learning](oxidizing-machine-learning.md)
 * [Data is Living and Relational](data-is-living-and-relational.md)
+* [Sentiment Analysis using Express JS and PostgresML](sentiment-analysis-using-express-js-and-postgresml.md)
@@ -1,8 +1,8 @@
 ---
-description: >-
-  We added aws us east 1 to our list of support aws regions.
 featured: false
-tags: [product]
+tags:
+  - product
+description: We added aws us east 1 to our list of support aws regions.
 ---
 
 # Announcing Support for AWS us-east-1 Region
@@ -27,8 +27,12 @@ To demonstrate the impact of moving the data closer to your application, we've c
 
 <figure><img src=".gitbook/assets/image (8).png" alt=""><figcaption></figcaption></figure>
 
+\\
+
 <figure><img src=".gitbook/assets/image (9).png" alt=""><figcaption></figcaption></figure>
 
+\\
+
 ## Using the New Region
 
 To take advantage of latency savings, you can [deploy a dedicated PostgresML database](https://postgresml.org/signup) in `us-east-1` today. We make it as simple as filling out a very short form and clicking "Create database".
 
@@ -1,9 +1,9 @@
 ---
+image: .gitbook/assets/blog_image_generating_llm_embeddings.png
+features: true
 description: >-
   How to use the pgml.embed(...) function to generate embeddings with free and
   open source models in your own database.
-image: ".gitbook/assets/blog_image_generating_llm_embeddings.png"
-features: true
 ---
 
 # Generating LLM embeddings with open source models in PostgresML
@@ -18,14 +18,14 @@ Montana Low
 
 April 21, 2023
 
-PostgresML makes it easy to generate embeddings from text in your database using a large selection of state-of-the-art models with one simple call to `pgml.embed(model_name, text)`. Prove the results in this series to your own satisfaction, for free, by signing up for a GPU accelerated database.
+PostgresML makes it easy to generate embeddings from text in your database using a large selection of state-of-the-art models with one simple call to **`pgml.embed`**`(model_name, text)`. Prove the results in this series to your own satisfaction, for free, by signing up for a GPU accelerated database.
 
 This article is the first in a multipart series that will show you how to build a post-modern semantic search and recommendation engine, including personalization, using open source models.
 
-1. [Generating LLM Embeddings with HuggingFace models](generating-llm-embeddings-with-open-source-models-in-postgresml.md)
-2. [Tuning vector recall with pgvector](tuning-vector-recall-while-generating-query-embeddings-in-the-database.md)
-3. [Personalizing embedding results with application data](personalize-embedding-results-with-application-data-in-your-database.md)
-4. [Optimizing semantic results with an XGBoost ranking model](/docs/use-cases/improve-search-results-with-machine-learning)
+1. Generating LLM Embeddings with HuggingFace models
+2. Tuning vector recall with pgvector
+3. Personalizing embedding results with application data
+4. Optimizing semantic results with an XGBoost ranking model - coming soon!
 
 ## Introduction
 
@@ -216,6 +216,8 @@ For comparison, it would cost about $299 to use OpenAI's cheapest embedding mode
 | GPU       | 17ms    | $72  | 6 hours   |
 | OpenAI    | 300ms   | $299 | millennia |
 
+\\
+
 You can also find embedding models that outperform OpenAI's `text-embedding-ada-002` model across many different tests on the [leaderboard](https://huggingface.co/spaces/mteb/leaderboard). It's always best to do your own benchmarking with your data, models, and hardware to find the best fit for your use case.
 
 > _HTTP requests to a different datacenter cost more time and money for lower reliability than co-located compute and storage._
 
@@ -12,26 +12,29 @@ description: Announcing our sponsorship of the Postgres Conference in San Jose A
 
 Cassandra Stumer
 
-March 20, 2024
+March 20, 2023
 
-Hey database aficionados, mark your calendars because something big is coming your way! We're thrilled to announce that we will be sponsoring the[ 2024 Postgres Conference](https://postgresconf.org/conferences/2024) – the marquee PostgreSQL conference event for North America.&#x20;
+Hey database aficionados, mark your calendars because something big is coming your way! We're thrilled to announce that we will be sponsoring the[ 2024 Postgres Conference](https://postgresconf.org/conferences/2024) – the marquee PostgreSQL conference event for North America.
 
 Why should you care? It's not every day you get to dive headfirst into the world of Postgres with folks who eat, sleep, and breathe data. We're talking hands-on workshops, lightning talks, and networking galore. Whether you're itching to sharpen your SQL skills or keen to explore the frontier of machine learning in the database, we've got you covered.
 
 {% hint style="info" %}
 Save 25% on your ticket with our discount code: 2024\_POSTGRESML\_25
 {% endhint %}
 
-PostgresML CEO and founder, Montana Low, will kick off the event on April 17th with a keynote about navigating the confluence of hardware evolution and machine learning technology.&#x20;
+\
+PostgresML CEO and founder, Montana Low, will kick off the event on April 17th with a keynote about navigating the confluence of hardware evolution and machine learning technology.
 
-We’ll also be hosting a masterclass in retrieval augmented generation (RAG) on April 18th. Our own Silas Marvin will give hands-on guidance to equip you with the ability to implement RAG directly within your database.&#x20;
+We’ll also be hosting a masterclass in retrieval augmented generation (RAG) on April 18th. Our own Silas Marvin will give hands-on guidance to equip you with the ability to implement RAG directly within your database.
 
-But wait, there's more! Our senior team will be at our booth at all hours to get to know you, talk shop, and answer any questions you may have. Whether it's about PostgresML, machine learning, or all the sweet merch we’ll have on deck.&#x20;
+But wait, there's more! Our senior team will be at our booth at all hours to get to know you, talk shop, and answer any questions you may have. Whether it's about PostgresML, machine learning, or all the sweet merch we’ll have on deck.
 
 {% hint style="info" %}
-If you’d like some 1:1 time with our team at PgConf [contact us here](https://postgresml.org/contact). We’d be happy to prep something special for you.&#x20;
+If you’d like some 1:1 time with our team at PgConf [contact us here](https://postgresml.org/contact). We’d be happy to prep something special for you.
 {% endhint %}
 
 So, why sit on the sidelines when you could be right in the thick of it, soaking up knowledge, making connections, and maybe even stumbling upon your next big breakthrough? Clear your schedule, grab your ticket, and get ready to geek out with us in San Jose.
 
 See you there!
+
+\\
@@ -47,6 +47,8 @@ Both Projects integrate several dozen machine learning algorithms, including the
 | Full Text Search  | -       | ✅          |
 | Geospatial Search | -       | ✅          |
 
+\\
+
 Both MindsDB and PostgresML support many classical machine learning algorithms to do classification and regression. They are both able to load ~~the latest LLMs~~ some models from Hugging Face, supported by underlying implementations in libtorch. I had to cross that out after exploring all the caveats in the MindsDB implementations. PostgresML supports the models released immediately as long as underlying dependencies are met. MindsDB has to release an update to support any new models, and their current model support is extremely limited. New algorithms, tasks, and models are constantly released, so it's worth checking the documentation for the latest list.
 
 Another difference is that PostgresML also supports embedding models, and closely integrates them with vector search inside the database, which is well beyond the scope of MindsDB, since it's not a database at all. PostgresML has direct access to all the functionality provided by other Postgres extensions, like vector indexes from [pgvector](https://github.com/pgvector/pgvector) to perform efficient KNN & ANN vector recall, or [PostGIS](http://postgis.net/) for geospatial information as well as built in full text search. Multiple algorithms and extensions can be combined in compound queries to build state-of-the-art systems, like search and recommendations or fraud detection that generate an end to end result with a single query, something that might take a dozen different machine learning models and microservices in a more traditional architecture.
@@ -68,8 +70,7 @@ The architectural implementations for these projects is significantly different.
 | On Premise    | ✅             | ✅          |
 | Web UI        | ✅             | ✅          |
 
-\
-
+\\
 
 The difference in architecture leads to different tradeoffs and challenges. There are already hundreds of ways to get data into and out of a Postgres database, from just about every other service, language and platform that makes PostgresML highly compatible with other application workflows. On the other hand, the MindsDB Python service accepts connections from specifically supported clients like `psql` and provides a pseudo-SQL interface to the functionality. The service will parse incoming MindsDB commands that look similar to SQL (but are not), for tasks like configuring database connections, or doing actual machine learning. These commands typically have what looks like a sub-select, that will actually fetch data over the wire from configured databases for Machine Learning training and inference.
 
@@ -297,6 +298,8 @@ PostgresML is the clear winner in terms of performance. It seems to me that it c
 | translation\_en\_to\_es | t5-base                                   | 1573    | 1148           | 294            |
 | summarization           | sshleifer/distilbart-cnn-12-6             | 4289    | 3450           | 479            |
 
+\\
+
 There is a general trend, the larger and slower the model is, the more work is spent inside libtorch, the less the performance of the rest matters, but for interactive models and use cases there is a significant difference. We've tried to cover the most generous use case we could between these two. If we were to compare XGBoost or other classical algorithms, that can have sub millisecond prediction times in PostgresML, the 20ms Python service overhead of MindsDB just to parse the incoming query would be hundreds of times slower.
 
 ## Clouds
 
@@ -22,10 +22,10 @@ PostgresML makes it easy to generate embeddings using open source models from Hu
 
 This article is the third in a multipart series that will show you how to build a post-modern semantic search and recommendation engine, including personalization, using open source models. You may want to start with the previous articles in the series if you aren't familiar with PostgresML's capabilities.
 
-1. [Generating LLM Embeddings with HuggingFace models](generating-llm-embeddings-with-open-source-models-in-postgresml.md)
-2. [Tuning vector recall with pgvector](tuning-vector-recall-while-generating-query-embeddings-in-the-database.md)
-3. [Personalizing embedding results with application data](personalize-embedding-results-with-application-data-in-your-database.md)
-4. [Optimizing semantic results with an XGBoost ranking model](/docs/use-cases/improve-search-results-with-machine-learning)
+1. Generating LLM Embeddings with HuggingFace models
+2. Tuning vector recall with pgvector
+3. Personalizing embedding results with application data
+4. Optimizing semantic results with an XGBoost ranking model - coming soon!
 
 <figure><img src=".gitbook/assets/image (24).png" alt=""><figcaption><p>Embeddings can be combined into personalized perspectives when stored as vectors in the database.</p></figcaption></figure>
 
 
@@ -0,0 +1,153 @@
+---
+description: >-
+  An example application for an easy and scalable way to get started with
+  machine learning in Express
+---
+
+# Sentiment Analysis using Express JS and PostgresML
+
+<div align="left">
+
+<figure><img src=".gitbook/assets/daniel.jpg" alt="Author" width="125"><figcaption><p>Daniel Illenberger</p></figcaption></figure>
+
+</div>
+
+Daniel Illenberger
+
+March 26, 2024
+
+Traditional MLOps requires continuously moving data between models and storage. Both small and large projects suffer with such an implementation on the metrics of time, cost, and complexity. PostgresML simplifies and streamlines MLOps by performing machine learning directly where your data resides.
+
+Express is a mature JS backend framework touted as being fast and flexible. It is a popular choice for JS developers wanting to quickly develop an API or full fledge website. Since it is in the JS ecosystem, there's an endless number of open source projects you can use to add functionality.
+
+### Application Overview
+
+Sentiment analysis is a valuable tool for understanding the emotional polarity of text. You can determine if the text is positive, negative, or neutral. Common use cases include understanding product reviews, survey questions, and social media posts.
+
+In this application, we'll be applying sentiment analysis to note taking. Note taking and journaling can be an excellent practice for work efficiency and self improvement. However, if you are like me, it quickly becomes impossible to find and make use of anything I've written down. Notes that are useful must be easy to navigate. With this motivation, let's create a demo that can record notes throughout the day. Each day will have a summary and sentiment score. That way, if I'm looking for that time a few weeks ago when we were frustrated with our old MLOps platform — it will be easy to find.&#x20;
+
+We will perform all the Machine Learning heavy lifting with the pgml extension function `pgml.transform()`. This brings Hugging Face Transformers into our data layer.
+
+### Follow Along
+
+You can see the full code on [GitHub](https://github.com/postgresml/example-expressjs). Follow the Readme to get the application up and running on your local machine.
+
+### The Code
+
+This app is composed of three main parts, reading and writing to a database, performing sentiment analysis on entries, and creating a summary.
+
+We are going to use [postgresql-client](https://www.npmjs.com/package/postgresql-client) to connect to our DB.&#x20;
+
+When the application builds we ensure we have two tables, one for notes and one for the the daily summary and sentiment score.
+
+```javascript
+const notes = await connection.execute(`
+  CREATE TABLE IF NOT EXISTS notes ( 
+    id BIGSERIAL PRIMARY KEY, 
+    note VARCHAR, 
+    score FLOAT, 
+    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
+  );`
+)
+
+const day = await connection.execute(`
+  CREATE TABLE IF NOT EXISTS days ( 
+    id BIGSERIAL PRIMARY KEY, 
+    summary VARCHAR, 
+    score FLOAT, 
+    created_at DATE NOT NULL UNIQUE DEFAULT DATE(NOW())
+  );`
+) 
+```
+
+We also have three endpoints to hit:
+
+* `app.get(“/", async (req, res, next)` which returns all the notes for that day and the daily summary.&#x20;
+* `app.post(“/add", async (req, res, next)` which accepts a new note entry and performs a sentiment analysis. We simplify the score by converting it to 1, 0, -1 for positive, neutral, negative and save it in our notes table.
+
+```sql
+WITH note AS (
+  SELECT pgml.transform(
+    inputs => ARRAY['${req.body.note}'],
+    task => '{"task": "text-classification", "model": "finiteautomata/bertweet-base-sentiment-analysis"}'::JSONB
+  ) AS market_sentiment
+), 
+
+score AS (
+  SELECT 
+    CASE 
+      WHEN (SELECT market_sentiment FROM note)[0]::JSONB ->> 'label' = 'POS' THEN 1
+      WHEN (SELECT market_sentiment FROM note)[0]::JSONB ->> 'label' = 'NEG' THEN -1
+      ELSE 0
+    END AS score
+)
+
+INSERT INTO notes (note, score) VALUES ('${req.body.note}', (SELECT score FROM score));
+
+```
+
+* `app.get(“/analyze”, async (req, res, next)` which takes the daily entries, produces a summary and total sentiment score, and places that into our days table.
+
+```sql
+WITH day AS (
+  SELECT 
+    note,
+    score
+  FROM notes 
+  WHERE DATE(created_at) = DATE(NOW())),
+
+  sum AS (
+    SELECT pgml.transform(
+      task => '{"task": "summarization", "model": "sshleifer/distilbart-cnn-12-6"}'::JSONB,
+      inputs => array[(SELECT STRING_AGG(note, '\n') FROM day)],
+      args => '{"min_length" : 20, "max_length" : 70}'::JSONB
+    ) AS summary
+  )
+
+  INSERT INTO days (summary, score) 
+  VALUES ((SELECT summary FROM sum)[0]::JSONB ->> 'summary_text', (SELECT SUM(score) FROM day))
+  On Conflict (created_at) DO UPDATE SET summary=EXCLUDED.summary, score=EXCLUDED.score 
+  RETURNING score;
+```
+
+and this is all that is required!
+
+### Test Run
+
+Let's imagine a day in the life of a boy destined to save the galaxy. Throughout his day he records the following notes:
+
+```
+Woke to routine chores. Bought droids, found Leia's message. She pleads for help from Obi-Wan Kenobi. Intrigued, but uncertain.
+```
+
+```
+Frantically searched for R2-D2, encountered Sand People. Saved by Obi-Wan. His presence is a glimmer of hope in this desolate place.
+```
+
+```
+Returned home to find it destroyed by stormtroopers. Aunt and uncle gone. Rage and despair fill me. Empire's cruelty knows no bounds.
+```
+
+```
+Left Tatooine with Obi-Wan, droids. Met Han Solo and Chewbacca in Mos Eisley. Sense of purpose grows despite uncertainty. Galaxy awaits.
+```
+
+```
+On our way to Alderaan. With any luck we will find the princes soon.
+```
+
+When we analyze this info we get a score of 2 and our summary is:
+
+```
+Returned home to find it destroyed by stormtroopers . Bought droids, found Leia's message . Met Han Solo and Chewbacca in Mos Eisley . Sense of purpose grows despite uncertainty .
+```
+
+not bad for less than an hour of coding.
+
+### Final Thoughts
+
+This app is far from complete but does show an easy and scalable way to get started with ML in Express. From here I encourage you to head over to our [docs](https://postgresml.org/docs/api/sql-extension/) and see what other features could be added.
+
+If SQL is not your thing, no worries. Check out or [JS SDK](https://postgresml.org/docs/api/client-sdk/getting-started) to streamline all our best practices with simple JavaScript.&#x20;
+
+We love hearing from you — please reach out to us on [Discord ](https://discord.gg/DmyJP3qJ7U)or simply [Contact Us](https://postgresml.org/contact) here if you have any questions or feedback.&#x20;