-
Notifications
You must be signed in to change notification settings - Fork 318
Added blog post semantic search in postgres in 15 minutes #1535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added blog post semantic search in postgres in 15 minutes #1535
Conversation
|
||
We used the [pgml.embed](/docs/api/sql-extension/pgml.embed) PostresML function to generate an embedding of the sentence "Generating embeddings in Postgres is fun!" using the [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) model from mixedbread.ai. | ||
|
||
The output size of the vector varies per model, and in _mxbai-embed-large-v1_ outputs vectors with 1024 dimensions: each vector contains 1024 floating point numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output size of the vector varies per model, and in _mxbai-embed-large-v1_ outputs vectors with 1024 dimensions: each vector contains 1024 floating point numbers. | |
The output size of the vector varies per model, and in `mxbai-embed-large-v1` outputs vectors with 1024 dimensions: each vector contains 1024 floating point numbers. |
I think we should use backticks for all identifiers, rather than italics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we could make some design improvements here. It'd be good to bring this up with the design team. We should still consistently use semantic markup, so that when the style is fixed, we get the benefits.
|
||
!!! | ||
|
||
Now trying our search engine again: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better illustrated with a graph at 10, 100, 1000, 10000, 100k, 1M. Show that Big O.
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
Co-authored-by: Montana Low <montanalow@users.noreply.github.com>
|
||
If you have any questions, or just have an idea on how to make PostgresML better, we'd love to hear from you in our [Discord](https://discord.com/invite/DmyJP3qJ7U). We’re open source, and welcome contributions from the community, especially when it comes to the rapidly evolving ML/AI landscape. | ||
|
||
## Closing thoughts / why PostgresQL? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PostgresQL PostgreSQL
No description provided.