Skip to content

remove static linking #620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 6, 2023
Merged

remove static linking #620

merged 2 commits into from
May 6, 2023

Conversation

montanalow
Copy link
Contributor

#617 fix

@@ -27,4 +27,5 @@ SELECT pgml.load_dataset('wine');
\i examples/multi_classification.sql
\i examples/regression.sql
\i examples/vectors.sql

-- transformers are generally too slow to run in the test suite
--\i examples/transformers.sql
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@levkk can you run tests in this file including transformers on AMD? You'll need to uncomment this line.

psql -h localhost -p 28815 -d pgml -f tests/test.sql -P pager

Copy link
Contributor

@levkk levkk May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it passes, it won't necessarily prove we fixed anything, since we added that line explicitly to "fix" linking issues we've had in the past. Maybe we should just move openblas to be dynamically linked? We will need to fork the crate and change the option.

Scikit is dynamically linking against openblas (or cblas)1 I believe, so if we have both static and dynamic linking in the same library, I think we'll continue to have this issue.

Scikit uses blas to solve linear regression problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reference to what "linking issues" we've had? A lot has changed since this was introduced (maybe even including our linker).

Copy link
Contributor

@levkk levkk May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we would segfault inside scikit when trying to train a logistic regression model (or maybe it was a linear regression). One of those uses clbas. It would happen on my machine but not on yours, and sometimes vice versa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We default to Rust now for linear regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more tests. No issues on intel specifying the runtime as python or rust for both linear and logistic regression, or xgboost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what may have been happening a year ago, was that scikit actually has had many bugs with CBLAS, which is our preferred link target. My guess is crashes were happening on machines with old versions of scikit installed. They largely removed CBLAS in favor of cython blas in 2019.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a newer version of Scikit than 2019.

Copy link
Contributor Author

@montanalow montanalow May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ubuntu 20.04LTS Focal was using an older unpatched version of scikit, which likely would have been popular when we started developing this package. I think we should go back to the default dynamic linking behavior, although I'd love to dig further into @eeeebbbbrrr's setup to understand which blas dependency is installed on his machine:

On my machine:

$ apt show libblas-dev -a
Package: libblas-dev
Version: 3.10.0-2ubuntu1
Priority: optional
Section: libdevel
Source: lapack
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Debian Science Team <debian-science-maintainers@lists.alioth.debian.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 1084 kB
Provides: libblas.so
Depends: libblas3 (= 3.10.0-2ubuntu1)
Suggests: liblapack-doc
Homepage: https://www.netlib.org/lapack/
Download-Size: 164 kB
APT-Sources: http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages
Description: Basic Linear Algebra Subroutines 3, static library
 This package is a binary incompatible upgrade to the blas-dev
 package. Several minor changes to the C interface have been
 incorporated.
 .
 BLAS (Basic Linear Algebra Subroutines) is a set of efficient
 routines for most of the basic vector and matrix operations.
 They are widely used as the basis for other high quality linear
 algebra software, for example lapack and linpack.  This
 implementation is the Fortran 77 reference implementation found
 at netlib.
 .
 This package contains a static version of the library.

@montanalow
Copy link
Contributor Author

Funny enough, google only knows 1 source for this config, which happens to be @levkk's blog post, which documents that I was the original reporter of a crash, that I'd completely forgotten about, and hadn't really understood, because he was the one doing the Python/Rust transition at the time.

https://postgresml.org/blog/backwards-compatible-or-bust-python-inside-rust-inside-postgres

We know that old versions of scikit crash using CBLAS, and I know that I was on a 18.04LTS when this post was written, so it makes sense I started exercising this crash when @levkk gave Python in Rust access to my dynamically linked CBLAS installation with py03, but my system Python was not otherwise configured with access to CBLAS, so it was unaffected.

This line forces openblas-src crate to skip the ["cblas", "system"] features which dynamic links to system CBLAS, and instead statically link to the bundled OpenBLAS. This prevents scikit from having access to a dynamically linked CBLAS crash in older versions, which "fixed" my original issue, but now we're seeing issues w/ statically linked OpenBLAS on @eeeebbbbrrrr's machine. I think the "right fix" is to upgrade scikit to a version new than 2019, and dynamically link CBLAS, which is the standard, not to statically link OpenBLAS.

Copy link
Contributor

@levkk levkk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I smell another blog post incoming.

@montanalow montanalow merged commit 907c2cc into master May 6, 2023
@montanalow montanalow deleted the montana/link branch May 6, 2023 19:51
SilasMarvin pushed a commit that referenced this pull request Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy