Skip to content

Added blog post semantic search in postgres in 15 minutes #1535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 18, 2024

Conversation

SilasMarvin
Copy link
Contributor

No description provided.

@SilasMarvin SilasMarvin requested review from montanalow and levkk June 17, 2024 21:45

We used the [pgml.embed](/docs/api/sql-extension/pgml.embed) PostresML function to generate an embedding of the sentence "Generating embeddings in Postgres is fun!" using the [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) model from mixedbread.ai.

The output size of the vector varies per model, and in _mxbai-embed-large-v1_ outputs vectors with 1024 dimensions: each vector contains 1024 floating point numbers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The output size of the vector varies per model, and in _mxbai-embed-large-v1_ outputs vectors with 1024 dimensions: each vector contains 1024 floating point numbers.
The output size of the vector varies per model, and in `mxbai-embed-large-v1` outputs vectors with 1024 dimensions: each vector contains 1024 floating point numbers.

I think we should use backticks for all identifiers, rather than italics.

Copy link
Contributor

@levkk levkk Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't render very nicely in our docs imo:

image

A few options here:

  1. Increase line height, may help with the rendering.
  2. Remove the border from the code blocks, making them less jarring.
  3. Not use inline code blocks at all, since they don't really add to the readability at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we could make some design improvements here. It'd be good to bring this up with the design team. We should still consistently use semantic markup, so that when the style is fixed, we get the benefits.


!!!

Now trying our search engine again:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better illustrated with a graph at 10, 100, 1000, 10000, 100k, 1M. Show that Big O.

SilasMarvin and others added 13 commits June 18, 2024 08:52
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
@SilasMarvin SilasMarvin merged commit 3096657 into master Jun 18, 2024
1 check passed
@SilasMarvin SilasMarvin deleted the silas-semantic-search-in-postgres-in-15-minutes branch June 18, 2024 22:12

If you have any questions, or just have an idea on how to make PostgresML better, we'd love to hear from you in our [Discord](https://discord.com/invite/DmyJP3qJ7U). We’re open source, and welcome contributions from the community, especially when it comes to the rapidly evolving ML/AI landscape.

## Closing thoughts / why PostgresQL?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgresQL PostgreSQL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy