Skip to content

Reorganized the SDK directory #954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/javascript-sdk.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ jobs:
with:
command: version
- name: Do build
env:
TYPESCRIPT_DECLARATION_FILE: "javascript/index.d.ts"
run: |
npm i
npm run build-release
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/python-sdk.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,13 @@ jobs:
if: github.event.inputs.deploy_to_pypi == 'false'
env:
MATURIN_PYPI_TOKEN: ${{ secrets.TEST_PYPI_API_TOKEN }}
PYTHON_STUB_FILE: "python/pgml/pgml.pyi"
run: maturin publish -r testpypi -i python3.7 -i python3.8 -i python3.9 -i python3.10 -i python3.11 --skip-existing -F python
- name: Build and deploy wheels to PyPI
if: github.event.inputs.deploy_to_pypi == 'true'
env:
MATURIN_PYPI_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
PYTHON_STUB_FILE: "python/pgml/pgml.pyi"
run: maturin publish -i python3.7 -i python3.8 -i python3.9 -i python3.10 -i python3.11 --skip-existing -F python

deploy-python-sdk-mac:
Expand Down Expand Up @@ -83,11 +85,13 @@ jobs:
if: github.event.inputs.deploy_to_pypi == 'false'
env:
MATURIN_PYPI_TOKEN: ${{ secrets.TEST_PYPI_API_TOKEN }}
PYTHON_STUB_FILE: "python/pgml/pgml.pyi"
run: maturin publish -r testpypi -i python3.8 -i python3.9 -i python3.10 -i python3.11 --skip-existing -F python
- name: Build and deploy wheels to PyPI
if: github.event.inputs.deploy_to_pypi == 'true'
env:
MATURIN_PYPI_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
PYTHON_STUB_FILE: "python/pgml/pgml.pyi"
run: maturin publish -i python3.8 -i python3.9 -i python3.10 -i python3.11 --skip-existing -F python

deploy-python-sdk-windows:
Expand Down Expand Up @@ -119,9 +123,11 @@ jobs:
if: github.event.inputs.deploy_to_pypi == 'false'
env:
MATURIN_PYPI_TOKEN: ${{ secrets.TEST_PYPI_API_TOKEN }}
PYTHON_STUB_FILE: "python/pgml/pgml.pyi"
run: maturin publish -r testpypi -i python3.8 -i python3.9 -i python3.10 -i python3.11 --skip-existing -F python
- name: Build and deploy wheels to PyPI
if: github.event.inputs.deploy_to_pypi == 'true'
env:
MATURIN_PYPI_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
PYTHON_STUB_FILE: "python/pgml/pgml.pyi"
run: maturin publish -i python3.8 -i python3.9 -i python3.10 -i python3.11 --skip-existing -F python
File renamed without changes.
37 changes: 26 additions & 11 deletions pgml-sdks/rust/pgml/Cargo.lock → pgml-sdks/pgml/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 6 additions & 1 deletion pgml-sdks/rust/pgml/Cargo.toml → pgml-sdks/pgml/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,19 @@
name = "pgml"
version = "0.9.0"
edition = "2021"
authors = ["PosgresML <team@postgresml.org>"]
homepage = "https://postgresml.org/"
repository = ""
license = "MIT"
keywords = ["postgres", "machine learning", "vector databases", "embeddings"]

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
name = "pgml"
crate-type = ["lib", "cdylib"]

[dependencies]
pgml-macros = {path = "../pgml-macros"}
rust_bridge = {path = "../rust-bridge/rust-bridge", version = "0.1.0"}
sqlx = { version = "0.6", features = [ "runtime-tokio-rustls", "postgres", "json", "time", "uuid", "chrono"] }
serde_json = "1.0.9"
anyhow = "1.0.9"
Expand Down
14 changes: 14 additions & 0 deletions pgml-sdks/pgml/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Open Source Alternative for Building End-to-End Vector Search Applications without OpenAI & Pinecone

# Suported Languages

We support a number of different languages:
- [Python](python)
- [JavaScript](javascript)
- [Rust](#rust)

Our SDK is written completely in Rust and translated by Rust to our other supported languages. See each individual language for an overview and specification on how to use the SDK.

# Rust

More information about our methodologies and Rust SDK coming soon.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
- [Usage](#usage)
- [Examples](./examples/README.md)
- [Developer setup](#developer-setup)
- [API Reference](#api-reference)
- [Roadmap](#roadmap)

# Overview
Expand Down Expand Up @@ -469,7 +468,7 @@ Removing a Pipeline deletes it and all associated data from the database. Remove

## Developer Setup

This javascript library is generated from our core rust-sdk. Please check [rust-sdk documentation](../../README.md) for developer setup.
This javascript library is generated from our core rust-sdk. Please check [rust-sdk documentation](../README.md) for developer setup.

## Roadmap

Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,25 @@
## Examples
# Examples

### [Semantic Search](./semantic_search.js)
## Prerequisites
Before running any examples first install dependencies and set the DATABASE_URL environment variable:
```
npm i
export DATABASE_URL={YOUR DATABASE URL}
```

Optionally, configure a .env file containing a DATABASE_URL variable.

## [Semantic Search](./semantic_search.js)
This is a basic example to perform semantic search on a collection of documents. Embeddings are created using `intfloat/e5-small` model. The results are semantically similar documemts to the query. Finally, the collection is archived.

### [Question Answering](./question_answering.js)
## [Question Answering](./question_answering.js)
This is an example to find documents relevant to a question from the collection of documents. The query is passed to vector search to retrieve documents that match closely in the embeddings space. A score is returned with each of the search result.

### [Question Answering using Instructore Model](./question_answering_instructor.js)
## [Question Answering using Instructore Model](./question_answering_instructor.js)
In this example, we will use `hknlp/instructor-base` model to build text embeddings instead of the default `intfloat/e5-small` model.

### [Extractive Question Answering](./extractive_question_answering.js)
## [Extractive Question Answering](./extractive_question_answering.js)
In this example, we will show how to use `vector_recall` result as a `context` to a HuggingFace question answering model. We will use `Builtins.transform()` to run the model on the database.

### [Summarizing Question Answering](./summarizing_question_answering.js)
## [Summarizing Question Answering](./summarizing_question_answering.js)
This is an example to find documents relevant to a question from the collection of documents and then summarize those documents.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
- [Usage](#usage)
- [Examples](./examples/README.md)
- [Developer setup](#developer-setup)
- [API Reference](#api-reference)
- [Roadmap](#roadmap)

# Overview
Expand Down Expand Up @@ -152,8 +151,7 @@ Continuing within `async def main():`
Call `main` function in an async loop.

```python
if __name__ == "__main__":
asyncio.run(main())
asyncio.run(main())
```

**Running the Code**
Expand Down Expand Up @@ -482,7 +480,7 @@ Removing a Pipeline deletes it and all associated data from the database. Remove

## Developer Setup

This Python library is generated from our core rust-sdk. Please check [rust-sdk documentation](../../README.md) for developer setup.
This Python library is generated from our core rust-sdk. Please check [rust-sdk documentation](../README.md) for developer setup.

## Roadmap

Expand Down
Original file line number Diff line number Diff line change
@@ -1,20 +1,28 @@
## Examples
# Examples

### [Semantic Search](./semantic_search.py)
## Prerequisites
Before running any examples first install dependencies and set the DATABASE_URL environment variable:
```
pip install -r requirements.txt
export DATABASE_URL={YOUR DATABASE URL}
```

Optionally, configure a .env file containing a DATABASE_URL variable.

## [Semantic Search](./semantic_search.py)
This is a basic example to perform semantic search on a collection of documents. It loads the Quora dataset, creates a collection in a PostgreSQL database, upserts documents, generates chunks and embeddings, and then performs a vector search on a query. Embeddings are created using `intfloat/e5-small` model. The results are semantically similar documemts to the query. Finally, the collection is archived.

### [Question Answering](./question_answering.py)
## [Question Answering](./question_answering.py)
This is an example to find documents relevant to a question from the collection of documents. It loads the Stanford Question Answering Dataset (SQuAD) into the database, generates chunks and embeddings. Query is passed to vector search to retrieve documents that match closely in the embeddings space. A score is returned with each of the search result.

### [Question Answering using Instructore Model](./question_answering_instructor.py)
## [Question Answering using Instructore Model](./question_answering_instructor.py)
In this example, we will use `hknlp/instructor-base` model to build text embeddings instead of the default `intfloat/e5-small` model.

### [Extractive Question Answering](./extractive_question_answering.py)
## [Extractive Question Answering](./extractive_question_answering.py)
In this example, we will show how to use `vector_recall` result as a `context` to a HuggingFace question answering model. We will use `Builtins.transform()` to run the model on the database.

### [Table Question Answering](./table_question_answering.py)
In this example, we will use [Open Table-and-Text Question Answering (OTT-QA)
](https://github.com/wenhuchen/OTT-QA) dataset to run queries on tables. We will use `deepset/all-mpnet-base-v2-table` model that is trained for embedding tabular data for retrieval tasks.
## [Table Question Answering](./table_question_answering.py)
In this example, we will use [Open Table-and-Text Question Answering (OTT-QA)](https://github.com/wenhuchen/OTT-QA) dataset to run queries on tables. We will use `deepset/all-mpnet-base-v2-table` model that is trained for embedding tabular data for retrieval tasks.

### [Summarizing Question Answering](./summarizing_question_answering.py)
## [Summarizing Question Answering](./summarizing_question_answering.py)
This is an example to find documents relevant to a question from the collection of documents and then summarize those documents.
36 changes: 36 additions & 0 deletions pgml-sdks/pgml/python/examples/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
aiohttp==3.8.5
aiosignal==1.3.1
async-timeout==4.0.3
attrs==23.1.0
certifi==2023.7.22
charset-normalizer==3.2.0
datasets==2.14.4
dill==0.3.7
filelock==3.12.3
frozenlist==1.4.0
fsspec==2023.6.0
huggingface-hub==0.16.4
idna==3.4
markdown-it-py==3.0.0
mdurl==0.1.2
multidict==6.0.4
multiprocess==0.70.15
numpy==1.25.2
packaging==23.1
pandas==2.0.3
pgml==0.9.0
pyarrow==13.0.0
Pygments==2.16.1
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3
PyYAML==6.0.1
requests==2.31.0
rich==13.5.2
six==1.16.0
tqdm==4.66.1
typing_extensions==4.7.1
tzdata==2023.3
urllib3==2.0.4
xxhash==3.3.0
yarl==1.9.2
6 changes: 6 additions & 0 deletions pgml-sdks/pgml/python/manual-build-deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

echo "Make sure and set the environment variable MATURIN_PYPI_TOKEN to your PyPI token."

cd ..
PYTHON_STUB_FILE="python/pgml/pgml.pyi" maturin publish -r $1 -i python3.8 -i python3.9 -i python3.10 -i python3.11 --skip-existing -F python
5 changes: 5 additions & 0 deletions pgml-sdks/pgml/python/pgml/pgml.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@

def py_init_logger(level: Optional[str] = "", format: Optional[str] = "") -> None

Json = Any
DateTime = int
Original file line number Diff line number Diff line change
@@ -1,21 +1,20 @@
use pgml_macros::{custom_derive, custom_methods};
use rust_bridge::{alias, alias_methods};
use sqlx::Row;
use tracing::instrument;

#[derive(custom_derive, Debug, Clone)]
/// Provides access to builtin database methods
#[derive(alias, Debug, Clone)]
pub struct Builtins {
pub database_url: Option<String>,
}

use crate::{get_or_initialize_pool, query_runner::QueryRunner, types::Json};

#[cfg(feature = "javascript")]
use crate::languages::javascript::*;

#[cfg(feature = "python")]
use crate::{languages::python::*, query_runner::QueryRunnerPython, types::JsonPython};
use crate::{query_runner::QueryRunnerPython, types::JsonPython};

#[custom_methods(new, query, transform)]
#[alias_methods(new, query, transform)]
impl Builtins {
pub fn new(database_url: Option<String>) -> Self {
Self { database_url }
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use anyhow::Context;
use indicatif::MultiProgress;
use itertools::Itertools;
use pgml_macros::{custom_derive, custom_methods};
use rust_bridge::{alias, alias_methods};
use sqlx::postgres::PgPool;
use sqlx::Executor;
use sqlx::PgConnection;
Expand All @@ -15,14 +15,8 @@ use crate::{
splitter::Splitter, types::DateTime, types::Json, utils,
};

#[cfg(feature = "javascript")]
use crate::languages::javascript::*;

#[cfg(feature = "python")]
use crate::{
languages::python::*, pipeline::PipelinePython, query_builder::QueryBuilderPython,
types::JsonPython,
};
use crate::{pipeline::PipelinePython, query_builder::QueryBuilderPython, types::JsonPython};

/// Our project tasks
#[derive(Debug, Clone)]
Expand Down Expand Up @@ -91,7 +85,7 @@ pub(crate) struct CollectionDatabaseData {
}

/// A collection of documents
#[derive(custom_derive, Debug, Clone)]
#[derive(alias, Debug, Clone)]
pub struct Collection {
pub name: String,
pub database_url: Option<String>,
Expand All @@ -103,7 +97,7 @@ pub struct Collection {
pub(crate) database_data: Option<CollectionDatabaseData>,
}

#[custom_methods(
#[alias_methods(
new,
upsert_documents,
get_documents,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ mod tests {
assert_eq!(
sql,
format!(
r##"SELECT "id" FROM "test_table" WHERE ("test_table"."metadata"#>>'{{id}}')::bigint {} 1 AND ("test_table"."metadata"#>>'{{id2,id3}}')::bigint {} 1"##,
r##"SELECT "id" FROM "test_table" WHERE ("test_table"."metadata"#>>'{{id}}')::float8 {} 1 AND ("test_table"."metadata"#>>'{{id2,id3}}')::float8 {} 1"##,
operator, operator
)
);
Expand All @@ -344,7 +344,7 @@ mod tests {
assert_eq!(
sql,
format!(
r##"SELECT "id" FROM "test_table" WHERE ("test_table"."metadata"#>>'{{id}}')::bigint {} (1) AND ("test_table"."metadata"#>>'{{id2,id3}}')::bigint {} (1)"##,
r##"SELECT "id" FROM "test_table" WHERE ("test_table"."metadata"#>>'{{id}}')::float8 {} (1) AND ("test_table"."metadata"#>>'{{id2,id3}}')::float8 {} (1)"##,
operator, operator
)
);
Expand Down
Loading
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy