Skip to content

np.fromfile accepts a gzip file object but silently returns corrupt data #10866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
akors opened this issue Apr 9, 2018 · 5 comments
Open
Labels

Comments

@akors
Copy link

akors commented Apr 9, 2018

Hi, I was trying to read directly from a compressed file:

data = np.array([9, 8, 7, 6, 5, 4, 3, 2, 1])
dt = data.type
with gzip.open("datafile.gz", "wb") as outfile:
    outfile.write(data.tobytes())

with gzip.open("datafile.gz", "rb") as infile:
     data = np.fromfile(infile, dtype=dt)
     print(data)

Unfortunately, this returns garbage data:
[6542475788951259935 7594864974085029634 1008947487324530028 3113290099057413416 691954333390038676 6865743131250406218]

The workaround is to load the data into a buffer first, and then let np.frombuffer read it:

with gzip.open("datafile.gz", "rb") as infile:
     data = np.frombuffer(infile.read(), dtype=dt)
     print(data)

Returns: [9 8 7 6 5 4 3 2 1] as expected.

I think silently returning corrupt data is pretty much the worst possible behaviour in this case. Either of these options would be an improvement:

  • Teach np.fromfile() to read from gzip (and possible other compression formats) file objects
  • Raise an error explaining that NumPy can't deal with this kind of file object
  • At least clearly document what kind of "file object" np.fromfile() expects.

Right now, the file parameter of np.fromfile is documented as:

file : file or str
Open file object or filename.

And the file object returned from gzip.open is a file object, but apparently not the right kind. That's pretty confusing and should at least be changed in the documentation.

Operating System: Fedora 27, 64-Bit
Python version: 3.6.4
NumPy version: 1.13.3

@pv
Copy link
Member

pv commented Apr 9, 2018 via email

@mattip
Copy link
Member

mattip commented Apr 18, 2018

related to #7989 "numpy.load cannot read from tar archive"

Edit: add issue description

@Teque5
Copy link

Teque5 commented Jul 14, 2019

This same behavior is observed with the lzma module.

@kolmodin
Copy link

Thanks @akors for suggesting the alternative method.

@andyfaff
Copy link
Member

andyfaff commented Jun 3, 2025

Argh, this just bit me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy