-
Notifications
You must be signed in to change notification settings - Fork 318
add a new example #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
from pgml.exceptions import PgMLException | ||
from pgml.sql import q | ||
|
||
def flatten(S): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this won't blow the stack on a large dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s called per row, and I haven’t seen datasets with more than 4D arrays.
SELECT models.* | ||
FROM pgml.models | ||
WHERE project_id = {q(project.id)} | ||
ORDER by models.metrics->>{q(metric)} DESC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not flatten normalize the structure into the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant metrics are different depending on the objective. We could have another join table to hold just metrics per model, but that seems like overkill just yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just thinking of making them nullable and only fill in the relevant columns for the model being trained.
--- | ||
--- Predict | ||
--- | ||
CREATE OR REPLACE FUNCTION pgml.predict(project_name TEXT, VARIADIC features DOUBLE PRECISION[]) | ||
CREATE OR REPLACE FUNCTION pgml.predict(project_name TEXT, features NUMERIC[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering about this because variadic allows us to pass columns as arguments, e.g.:
SELECT pgml.predict('Red Wine Quality', quality_wine_red.acidity, quality_wine_red.color, ...)
FROM quality_wine_red
WHERE ...
I see this to be the more likely use case than passing in some raw numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s true, but you can always put those columns in an array just like this, and having the features as a single param will allow us to extend the API with additional Paramus in the future if we need.
Bump the version in |
pgml is not compatible with plpython, if using both pgml and plpython in the same session, postgresql will crash. minimum reproducible code: ```sql SELECT pgml.embed('intfloat/e5-small', 'hi mom'); create or replace function pyudf() returns int as $$ return 0 $$ language 'plpython3u'; ``` the call stack: ``` Stack trace of thread 161970: #0 0x00007efc1429edb8 PyImport_Import (libpython3.9.so.1.0 + 0x9edb8) postgresml#1 0x00007efc1429f125 PyImport_ImportModule (libpython3.9.so.1.0 + 0x9f125) postgresml#2 0x00007efb04b0f496 n/a (plpython3.so + 0x10496) postgresml#3 0x00007efb04b1039d plpython3_validator (plpython3.so + 0x1139d) postgresml#4 0x0000559d0cdbc5c2 OidFunctionCall1Coll (postgres + 0x6465c2) postgresml#5 0x0000559d0c9d68bb ProcedureCreate (postgres + 0x2608bb) postgresml#6 0x0000559d0ca5030c CreateFunction (postgres + 0x2da30c) postgresml#7 0x0000559d0ce1c730 n/a (postgres + 0x6a6730) postgresml#8 0x0000559d0cc5a030 standard_ProcessUtility (postgres + 0x4e4030) postgresml#9 0x0000559d0cc545ed n/a (postgres + 0x4de5ed) postgresml#10 0x0000559d0cc546e7 n/a (postgres + 0x4de6e7) postgresml#11 0x0000559d0cc54beb PortalRun (postgres + 0x4debeb) postgresml#12 0x0000559d0cc55249 n/a (postgres + 0x4df249) postgresml#13 0x0000559d0cc576f0 PostgresMain (postgres + 0x4e16f0) postgresml#14 0x0000559d0cbc3e9c n/a (postgres + 0x44de9c) postgresml#15 0x0000559d0cbc50aa PostmasterMain (postgres + 0x44f0aa) postgresml#16 0x0000559d0c8ce7d2 main (postgres + 0x1587d2) postgresml#17 0x00007efc18427cd0 n/a (libc.so.6 + 0x27cd0) postgresml#18 0x00007efc18427d8a __libc_start_main (libc.so.6 + 0x27d8a) postgresml#19 0x0000559d0c8cee15 _start (postgres + 0x158e15) ``` this is because PostgreSQL is using dlopen(RTLD_GLOBAL). this will parse some of symbols into the previous opened .so file, but the others will use a relative offset in pgml.so, and will cause a null-pointer crash. this commit hide all symbols except the UDF symbols (ends with `_wrapper`) and the magic symbols (`_PG_init` `Pg_magic_func`). so dlopen(RTLD_GLOBAL) will parse the symbols to the correct position.
No description provided.