Content-Length: 269811 | pFad | https://github.com/postgresml/postgresml/issues/1007

78 Unable to train any data · Issue #1007 · postgresml/postgresml · GitHub
Skip to content

Unable to train any data #1007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
EVAUTOAI opened this issue Sep 12, 2023 · 4 comments
Closed

Unable to train any data #1007

EVAUTOAI opened this issue Sep 12, 2023 · 4 comments
Assignees

Comments

@EVAUTOAI
Copy link

EVAUTOAI commented Sep 12, 2023

I am trying to train a model on iris dataset by running the command in the postgres database that conatins pgml:
SELECT * FROM pgml.train(project_name => 'test1c_extra_trees', task => 'classification',
relation_name => 'public.iris_new', y_column_name => 'Class');

I am getting the below error:

INFO: Snapshotting table "public.iris_new", this may take a little while...
INFO: Dataset { num_features: 4, num_labels: 1, num_distinct_labels: 4, num_rows: 150, num_train_rows: 112, num_test_rows: 38 }
INFO: Column "Class": Statistics { min: 1.0, max: 3.0, max_abs: 3.0, mean: 1.6607143, median: 2.0, mode: 2.0, variance: 0.43845627, std_dev: 0.6621603, missing: 0, distinct: 3, histogram: [50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 12], ventiles: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0], categories: Some({"Iris-setosa": Category { value: 1.0, members: 50 }, "Iris-virginica": Category { value: 3.0, members: 12 }, "Iris-versicolor": Category { value: 2.0, members: 50 }, "NULL": Category { value: 0.0, members: 0 }}) }
INFO: Column "Sepallength": Statistics { min: 4.3, max: 7.6, max_abs: 7.6, mean: 5.5866075, median: 5.5, mode: 5.0, variance: 0.5275885, std_dev: 0.7263529, missing: 0, distinct: 32, histogram: [4, 5, 2, 11, 19, 4, 7, 12, 7, 7, 8, 2, 8, 5, 4, 2, 2, 1, 1, 1], ventiles: [4.6, 4.8, 4.9, 5.0, 5.0, 5.1, 5.1, 5.2, 5.4, 5.5, 5.6, 5.7, 5.8, 6.0, 6.1, 6.3, 6.4, 6.6, 6.9], categories: None }
INFO: Column "Sepalwidth": Statistics { min: 2.0, max: 4.4, max_abs: 4.4, mean: 3.0776787, median: 3.0, mode: 3.0, variance: 0.21280527, std_dev: 0.4613082, missing: 0, distinct: 23, histogram: [1, 2, 4, 3, 6, 10, 6, 10, 17, 8, 13, 10, 6, 3, 7, 2, 1, 1, 1, 1], ventiles: [2.3, 2.5, 2.6, 2.7, 2.8, 2.9, 2.9, 3.0, 3.0, 3.0, 3.1, 3.2, 3.2, 3.3, 3.4, 3.4, 3.5, 3.7, 3.9], categories: None }
INFO: Column "Petallength": Statistics { min: 1.0, max: 6.6, max_abs: 6.6, mean: 3.1633925, median: 3.7, mode: 1.5, variance: 2.645713, std_dev: 1.6265649, missing: 0, distinct: 36, histogram: [4, 33, 11, 2, 0, 0, 0, 1, 4, 2, 9, 9, 15, 9, 4, 1, 1, 4, 2, 1], ventiles: [1.3, 1.4, 1.4, 1.4, 1.5, 1.5, 1.6, 1.7, 3.0, 3.6, 4.0, 4.1, 4.2, 4.4, 4.5, 4.6, 4.8, 5.1, 5.8], categories: None }
INFO: Column "Petalwidth": Statistics { min: 0.1, max: 2.5, max_abs: 2.5, mean: 0.91785717, median: 1.0, mode: 0.2, variance: 0.43753833, std_dev: 0.6614668, missing: 0, distinct: 20, histogram: [34, 7, 7, 1, 1, 0, 0, 7, 3, 18, 7, 10, 3, 2, 6, 1, 2, 1, 0, 2], ventiles: [0.1, 0.2, 0.2, 0.2, 0.2, 0.2, 0.3, 0.4, 1.0, 1.0, 1.2, 1.3, 1.3, 1.4, 1.4, 1.5, 1.6, 1.8, 2.0], categories: None }
INFO: Training Model { id: 233, task: classification, algorithm: linear, runtime: python }
INFO: Hyperparameter searches: 1, cross validation folds: 1
INFO: Hyperparams: {}

ERROR: assertion failed: (left == right)
left: 4,
right: 2

SQL state: XX000_

I even tried running the below command given in git:
SELECT pgml.transform(
task => 'text-classification',
inputs => ARRAY[
'I love how amazingly simple ML has become!',
'I hate doing mundane and thankless tasks. ☹️'
]
) AS positivity;

This threw the error:
ERROR: Lazy instance has previously been poisoned

SQL state: XX000

@montanalow
Copy link
Contributor

montanalow commented Sep 12, 2023

There are a few issues here.

  1. The iris dataset is ordered by class, so when we split into train/test all of class 1 goes into train, all of class 3 goes into test, and class 2 is split in between them.
  2. When we calculate metrics for classification, there are only 2 labels present, which fails an assertion that the test data contains at least 1 example of each label. That's the first error, the failed assertion that aborts training. The fix is the create a view of the iris data ordered by random. e.g. create view randomized_iris as select * from public.iris_new order by random();
  3. When training fails the assertion, a lock on the model cache is poisoned which prevents additional models from being accessed.

Proposed fix:

  • remove assertions from the code, and replace with error!() that include actionable messages
  • validate that error! does not poison the lock

@montanalow
Copy link
Contributor

Thanks for the report!

@EVAUTOAI
Copy link
Author

image Hi, tried creating a classification model on randomised data and I am still getting the same error. And I tried this is a new session

@kczimm
Copy link
Contributor

kczimm commented Sep 14, 2023

Are you on master? What version of pgml are you using? SELECT * FROM pgml.version();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/postgresml/postgresml/issues/1007

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy