Skip to content

bpo-29659: Expose copyfileobj() length arg for public use #328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 30 additions & 10 deletions Doc/library/shutil.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 +39,16 @@ Directory and files operations

.. function:: copyfileobj(fsrc, fdst[, length])

Copy the contents of the file-like object *fsrc* to the file-like object *fdst*.
The integer *length*, if given, is the buffer size. In particular, a negative
*length* value means to copy the data without looping over the source data in
chunks; by default the data is read in chunks to avoid uncontrolled memory
consumption. Note that if the current file position of the *fsrc* object is not
0, only the contents from the current file position to the end of the file will
be copied.
Copy the contents of the file-like object *fsrc* to the file-like object
*fdst*. Only the contents from the current file position to the end of
the file will be copied.

The integer *length*, if given, is the buffer size; the default value
in bytes is 16 KiB. A negative *length* value means to copy the data without
looping over the source data in chunks; by default the data is read in
chunks to avoid uncontrolled memory consumption.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"A negative length value means to copy the data without looping over the source data in chunks"

I dislike this definition. In practice, negative means "unlimited" buffer size: the whole input file is loaded into memory.

I'm not sure that it's a good practice to try to load files of unknown size into memory.

I suggest to remove this feature which seems more like a side effect than a carefully designed API.

If you want to get fast copy, pass a very large length like 1 GB. But if Python starts to load 1 TB into memory, it's likely to crash the system... At least, to slow down the system, a lot.


.. function:: copyfile(src, dst, *, follow_symlinks=True)
.. function:: copyfile(src, dst, *, follow_symlinks=True, length=None)

Copy the contents (no metadata) of the file named *src* to a file named
*dst* and return *dst*. *src* and *dst* are path names given as strings.
Expand All @@ -65,6 +65,9 @@ Directory and files operations
a new symbolic link will be created instead of copying the
file *src* points to.

The integer *length*, if given, is the in-memory buffer size; the default
value in bytes is 16 KiB (see :func:`shutil.copyfileobj`).

.. versionchanged:: 3.3
:exc:`IOError` used to be raised instead of :exc:`OSError`.
Added *follow_symlinks* argument.
Expand All @@ -74,6 +77,9 @@ Directory and files operations
Raise :exc:`SameFileError` instead of :exc:`Error`. Since the former is
a subclass of the latter, this change is backward compatible.

.. versionchanged:: 3.7
Added *length* parameter.


.. exception:: SameFileError

Expand Down Expand Up @@ -141,7 +147,7 @@ Directory and files operations
.. versionchanged:: 3.3
Added *follow_symlinks* argument and support for Linux extended attributes.

.. function:: copy(src, dst, *, follow_symlinks=True)
.. function:: copy(src, dst, *, follow_symlinks=True, length=None)

Copies the file *src* to the file or directory *dst*. *src* and *dst*
should be strings. If *dst* specifies a directory, the file will be
Expand All @@ -153,6 +159,9 @@ Directory and files operations
is true and *src* is a symbolic link, *dst* will be a copy of
the file *src* refers to.

The integer *length*, if given, is the in-memory buffer size; the default
value in bytes is 16 KiB (see :func:`shutil.copyfileobj`).

:func:`~shutil.copy` copies the file data and the file's permission
mode (see :func:`os.chmod`). Other metadata, like the
file's creation and modification times, is not preserved.
Expand All @@ -163,7 +172,11 @@ Directory and files operations
Added *follow_symlinks* argument.
Now returns path to the newly created file.

.. function:: copy2(src, dst, *, follow_symlinks=True)
.. versionchanged:: 3.7
Added `length` parameter


.. function:: copy2(src, dst, *, follow_symlinks=True, length=None)

Identical to :func:`~shutil.copy` except that :func:`copy2`
also attempts to preserve all file metadata.
Expand All @@ -176,6 +189,9 @@ Directory and files operations
unavailable, :func:`copy2` will preserve all the metadata
it can; :func:`copy2` never returns failure.

The integer *length*, if given, is the in-memory buffer size; the default
value in bytes is 16 KiB (see :func:`shutil.copyfileobj`).

:func:`copy2` uses :func:`copystat` to copy the file metadata.
Please see :func:`copystat` for more information
about platform support for modifying symbolic link metadata.
Expand All @@ -185,6 +201,10 @@ Directory and files operations
file system attributes too (currently Linux only).
Now returns path to the newly created file.

.. versionchanged:: 3.7
Added `length` parameter


.. function:: ignore_patterns(\*patterns)

This factory function creates a function that can be used as a callable for
Expand Down
29 changes: 21 additions & 8 deletions Lib/shutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,14 @@ class RegistryError(Exception):
and unpacking registries fails"""


def copyfileobj(fsrc, fdst, length=16*1024):
"""copy data from file-like object fsrc to file-like object fdst"""
def copyfileobj(fsrc, fdst, length=None):
Copy link
Contributor

@AraHaan AraHaan Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def copyfileobj(fsrc, fdst, length=16*1024):

I personally prefer this way in my code to concentrate 2 lines worth of code into one.

    if not length:
        length = 16 * 1024

It looks a little nicer and not to mention saves lines of code that basically do the same thing.
And yes I am also guilty of doing it this way and then people started hating me for it and calling my a bad programmer for it.

"""Copy data from file-like object `fsrc` to file-like object `fdst`.

An in-memory buffer size in bytes can be set with `length`; the default is
16 KiB.
"""
if not length:
length = 16 * 1024
while 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it's ok to use a loop if the length is negative. I suggest to have a special case for negative value calling read() (no parameter) only once.

buf = fsrc.read(length)
if not buf:
Expand All @@ -93,12 +99,14 @@ def _samefile(src, dst):
return (os.path.normcase(os.path.abspath(src)) ==
os.path.normcase(os.path.abspath(dst)))

def copyfile(src, dst, *, follow_symlinks=True):
def copyfile(src, dst, *, follow_symlinks=True, length=None):
"""Copy data from src to dst.

If follow_symlinks is not set and src is a symbolic link, a new
symlink will be created instead of copying the file it points to.

An in-memory buffer size in bytes can be set with `length`; the default is
16 KiB.
"""
if _samefile(src, dst):
raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
Expand All @@ -119,7 +127,7 @@ def copyfile(src, dst, *, follow_symlinks=True):
else:
with open(src, 'rb') as fsrc:
with open(dst, 'wb') as fdst:
copyfileobj(fsrc, fdst)
copyfileobj(fsrc, fdst, length=length)
return dst

def copymode(src, dst, *, follow_symlinks=True):
Expand Down Expand Up @@ -224,7 +232,7 @@ def lookup(name):
raise
_copyxattr(src, dst, follow_symlinks=follow)

def copy(src, dst, *, follow_symlinks=True):
def copy(src, dst, *, follow_symlinks=True, length=None):
"""Copy data and mode bits ("cp src dst"). Return the file's destination.

The destination may be a directory.
Expand All @@ -235,14 +243,17 @@ def copy(src, dst, *, follow_symlinks=True):
If source and destination are the same file, a SameFileError will be
raised.

An in-memory buffer size in bytes can be set with `length`; the default is
16 KiB.

"""
if os.path.isdir(dst):
dst = os.path.join(dst, os.path.basename(src))
copyfile(src, dst, follow_symlinks=follow_symlinks)
copyfile(src, dst, follow_symlinks=follow_symlinks, length=length)
copymode(src, dst, follow_symlinks=follow_symlinks)
return dst

def copy2(src, dst, *, follow_symlinks=True):
def copy2(src, dst, *, follow_symlinks=True, length=None):
"""Copy data and all stat info ("cp -p src dst"). Return the file's
destination."

Expand All @@ -251,10 +262,12 @@ def copy2(src, dst, *, follow_symlinks=True):
If follow_symlinks is false, symlinks won't be followed. This
resembles GNU's "cp -P src dst".

An in-memory buffer size in bytes can be set with `length`; the default is
16 KiB.
"""
if os.path.isdir(dst):
dst = os.path.join(dst, os.path.basename(src))
copyfile(src, dst, follow_symlinks=follow_symlinks)
copyfile(src, dst, follow_symlinks=follow_symlinks, length=length)
copystat(src, dst, follow_symlinks=follow_symlinks)
return dst

Expand Down
9 changes: 9 additions & 0 deletions Lib/test/test_shutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -1403,6 +1403,14 @@ def test_copyfile_same_file(self):
# But Error should work too, to stay backward compatible.
self.assertRaises(Error, shutil.copyfile, src_file, src_file)

def test_copy_w_different_length(self):
# copy and copy2 both accept an alternate buffer `length`
for fn in (shutil.copy, shutil.copy2):
with tempfile.NamedTemporaryFile() as src:
with tempfile.NamedTemporaryFile() as dst:
write_file(src.name, b'x' * 100, binary=True)
fn(src.name, dst.name, length=20)

def test_copytree_return_value(self):
# copytree returns its destination path.
src_dir = self.mkdtemp()
Expand Down Expand Up @@ -1830,6 +1838,7 @@ def test_move_dir_caseinsensitive(self):
finally:
os.rmdir(dst_dir)


class TermsizeTests(unittest.TestCase):
def test_does_not_crash(self):
"""Check if get_terminal_size() returns a meaningful value.
Expand Down
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy