-
Notifications
You must be signed in to change notification settings - Fork 333
remove static linking #620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -27,4 +27,5 @@ SELECT pgml.load_dataset('wine'); | |||
\i examples/multi_classification.sql | |||
\i examples/regression.sql | |||
\i examples/vectors.sql | |||
|
|||
-- transformers are generally too slow to run in the test suite | |||
--\i examples/transformers.sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@levkk can you run tests in this file including transformers on AMD? You'll need to uncomment this line.
psql -h localhost -p 28815 -d pgml -f tests/test.sql -P pager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it passes, it won't necessarily prove we fixed anything, since we added that line explicitly to "fix" linking issues we've had in the past. Maybe we should just move openblas to be dynamically linked? We will need to fork the crate and change the option.
Scikit is dynamically linking against openblas (or cblas)1 I believe, so if we have both static and dynamic linking in the same library, I think we'll continue to have this issue.
Scikit uses blas to solve linear regression problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reference to what "linking issues" we've had? A lot has changed since this was introduced (maybe even including our linker).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we would segfault inside scikit when trying to train a logistic regression model (or maybe it was a linear regression). One of those uses clbas. It would happen on my machine but not on yours, and sometimes vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We default to Rust now for linear regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more tests. No issues on intel specifying the runtime as python or rust for both linear and logistic regression, or xgboost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what may have been happening a year ago, was that scikit actually has had many bugs with CBLAS, which is our preferred link target. My guess is crashes were happening on machines with old versions of scikit installed. They largely removed CBLAS in favor of cython blas in 2019.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a newer version of Scikit than 2019.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ubuntu 20.04LTS Focal was using an older unpatched version of scikit, which likely would have been popular when we started developing this package. I think we should go back to the default dynamic linking behavior, although I'd love to dig further into @eeeebbbbrrr's setup to understand which blas dependency is installed on his machine:
On my machine:
$ apt show libblas-dev -a
Package: libblas-dev
Version: 3.10.0-2ubuntu1
Priority: optional
Section: libdevel
Source: lapack
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Debian Science Team <debian-science-maintainers@lists.alioth.debian.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 1084 kB
Provides: libblas.so
Depends: libblas3 (= 3.10.0-2ubuntu1)
Suggests: liblapack-doc
Homepage: https://www.netlib.org/lapack/
Download-Size: 164 kB
APT-Sources: http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages
Description: Basic Linear Algebra Subroutines 3, static library
This package is a binary incompatible upgrade to the blas-dev
package. Several minor changes to the C interface have been
incorporated.
.
BLAS (Basic Linear Algebra Subroutines) is a set of efficient
routines for most of the basic vector and matrix operations.
They are widely used as the basis for other high quality linear
algebra software, for example lapack and linpack. This
implementation is the Fortran 77 reference implementation found
at netlib.
.
This package contains a static version of the library.
Funny enough, google only knows 1 source for this config, which happens to be @levkk's blog post, which documents that I was the original reporter of a crash, that I'd completely forgotten about, and hadn't really understood, because he was the one doing the Python/Rust transition at the time. https://postgresml.org/blog/backwards-compatible-or-bust-python-inside-rust-inside-postgres We know that old versions of scikit crash using CBLAS, and I know that I was on a 18.04LTS when this post was written, so it makes sense I started exercising this crash when @levkk gave Python in Rust access to my dynamically linked CBLAS installation with py03, but my system Python was not otherwise configured with access to CBLAS, so it was unaffected. This line forces openblas-src crate to skip the ["cblas", "system"] features which dynamic links to system CBLAS, and instead statically link to the bundled OpenBLAS. This prevents scikit from having access to a dynamically linked CBLAS crash in older versions, which "fixed" my original issue, but now we're seeing issues w/ statically linked OpenBLAS on @eeeebbbbrrrr's machine. I think the "right fix" is to upgrade scikit to a version new than 2019, and dynamically link CBLAS, which is the standard, not to statically link OpenBLAS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I smell another blog post incoming.
#617 fix