Skip to content

remove static linking #620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions pgml-extension/build.rs

This file was deleted.

6 changes: 6 additions & 0 deletions pgml-extension/examples/image_classification.sql
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,12 @@ SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'xgboost', hyperpara
-- Histogram Gradient Boosting is too expensive for normal tests on even a toy dataset
-- SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'hist_gradient_boosting', hyperparams => '{"max_iter": 2}');

-- runtimes
SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'linear', runtime => 'python');
SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'linear', runtime => 'rust');

SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'xgboost', runtime => 'python', hyperparams => '{"n_estimators": 10}');
SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'xgboost', runtime => 'rust', hyperparams => '{"n_estimators": 10}');

-- check out all that hard work
SELECT trained_models.* FROM pgml.trained_models
Expand Down
6 changes: 6 additions & 0 deletions pgml-extension/examples/regression.sql
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,12 @@ SELECT * FROM pgml.train('Diabetes Progression', algorithm => 'xgboost', hyperpa
-- Histogram Gradient Boosting is too expensive for normal tests on even a toy dataset
-- SELECT * FROM pgml.train('Diabetes Progression', algorithm => 'hist_gradient_boosting', hyperparams => '{"max_iter": 10}');

-- runtimes
SELECT * FROM pgml.train('Diabetes Progression', algorithm => 'linear', runtime => 'python');
SELECT * FROM pgml.train('Diabetes Progression', algorithm => 'linear', runtime => 'rust');

SELECT * FROM pgml.train('Diabetes Progression', algorithm => 'xgboost', runtime => 'python', hyperparams => '{"n_estimators": 10}');
SELECT * FROM pgml.train('Diabetes Progression', algorithm => 'xgboost', runtime => 'rust', hyperparams => '{"n_estimators": 10}');

-- check out all that hard work
SELECT trained_models.* FROM pgml.trained_models
Expand Down
3 changes: 2 additions & 1 deletion pgml-extension/examples/transformers.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
\set ON_ERROR_STOP true
\timing on

SELECT pgml.embed('intfloat/e5-small', 'hi mom');


SELECT pgml.transform(
'translation_en_to_fr',
Expand Down Expand Up @@ -88,4 +90,3 @@ SELECT pgml.transform(
]
) AS answer;


3 changes: 2 additions & 1 deletion pgml-extension/tests/test.sql
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,5 @@ SELECT pgml.load_dataset('wine');
\i examples/multi_classification.sql
\i examples/regression.sql
\i examples/vectors.sql

-- transformers are generally too slow to run in the test suite
--\i examples/transformers.sql
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@levkk can you run tests in this file including transformers on AMD? You'll need to uncomment this line.

psql -h localhost -p 28815 -d pgml -f tests/test.sql -P pager

Copy link
Contributor

@levkk levkk May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it passes, it won't necessarily prove we fixed anything, since we added that line explicitly to "fix" linking issues we've had in the past. Maybe we should just move openblas to be dynamically linked? We will need to fork the crate and change the option.

Scikit is dynamically linking against openblas (or cblas)1 I believe, so if we have both static and dynamic linking in the same library, I think we'll continue to have this issue.

Scikit uses blas to solve linear regression problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reference to what "linking issues" we've had? A lot has changed since this was introduced (maybe even including our linker).

Copy link
Contributor

@levkk levkk May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we would segfault inside scikit when trying to train a logistic regression model (or maybe it was a linear regression). One of those uses clbas. It would happen on my machine but not on yours, and sometimes vice versa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We default to Rust now for linear regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more tests. No issues on intel specifying the runtime as python or rust for both linear and logistic regression, or xgboost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what may have been happening a year ago, was that scikit actually has had many bugs with CBLAS, which is our preferred link target. My guess is crashes were happening on machines with old versions of scikit installed. They largely removed CBLAS in favor of cython blas in 2019.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a newer version of Scikit than 2019.

Copy link
Contributor Author

@montanalow montanalow May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ubuntu 20.04LTS Focal was using an older unpatched version of scikit, which likely would have been popular when we started developing this package. I think we should go back to the default dynamic linking behavior, although I'd love to dig further into @eeeebbbbrrr's setup to understand which blas dependency is installed on his machine:

On my machine:

$ apt show libblas-dev -a
Package: libblas-dev
Version: 3.10.0-2ubuntu1
Priority: optional
Section: libdevel
Source: lapack
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Debian Science Team <debian-science-maintainers@lists.alioth.debian.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 1084 kB
Provides: libblas.so
Depends: libblas3 (= 3.10.0-2ubuntu1)
Suggests: liblapack-doc
Homepage: https://www.netlib.org/lapack/
Download-Size: 164 kB
APT-Sources: http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages
Description: Basic Linear Algebra Subroutines 3, static library
 This package is a binary incompatible upgrade to the blas-dev
 package. Several minor changes to the C interface have been
 incorporated.
 .
 BLAS (Basic Linear Algebra Subroutines) is a set of efficient
 routines for most of the basic vector and matrix operations.
 They are widely used as the basis for other high quality linear
 algebra software, for example lapack and linpack.  This
 implementation is the Fortran 77 reference implementation found
 at netlib.
 .
 This package contains a static version of the library.

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy