Skip to content

Commit e77d428

Browse files
frenzymadnesshroncokvstinner
authored
bpo-40495: compileall option to hardlink duplicate pyc files (GH-19901)
compileall is now able to use hardlinks to prevent duplicates in a case when .pyc files for different optimization levels have the same content. Co-authored-by: Miro Hrončok <miro@hroncok.cz> Co-authored-by: Victor Stinner <vstinner@python.org>
1 parent 7443d42 commit e77d428

File tree

6 files changed

+285
-15
lines changed

6 files changed

+285
-15
lines changed

Doc/library/compileall.rst

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,11 @@ compile Python sources.
113113

114114
Ignore symlinks pointing outside the given directory.
115115

116+
.. cmdoption:: --hardlink-dupes
117+
118+
If two ``.pyc`` files with different optimization level have
119+
the same content, use hard links to consolidate duplicate files.
120+
116121
.. versionchanged:: 3.2
117122
Added the ``-i``, ``-b`` and ``-h`` options.
118123

@@ -125,7 +130,7 @@ compile Python sources.
125130
Added the ``--invalidation-mode`` option.
126131

127132
.. versionchanged:: 3.9
128-
Added the ``-s``, ``-p``, ``-e`` options.
133+
Added the ``-s``, ``-p``, ``-e`` and ``--hardlink-dupes`` options.
129134
Raised the default recursion limit from 10 to
130135
:py:func:`sys.getrecursionlimit()`.
131136
Added the possibility to specify the ``-o`` option multiple times.
@@ -143,7 +148,7 @@ runtime.
143148
Public functions
144149
----------------
145150

146-
.. function:: compile_dir(dir, maxlevels=sys.getrecursionlimit(), ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, workers=1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None)
151+
.. function:: compile_dir(dir, maxlevels=sys.getrecursionlimit(), ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, workers=1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None, hardlink_dupes=False)
147152

148153
Recursively descend the directory tree named by *dir*, compiling all :file:`.py`
149154
files along the way. Return a true value if all the files compiled successfully,
@@ -193,6 +198,9 @@ Public functions
193198
the ``-s``, ``-p`` and ``-e`` options described above.
194199
They may be specified as ``str``, ``bytes`` or :py:class:`os.PathLike`.
195200

201+
If *hardlink_dupes* is true and two ``.pyc`` files with different optimization
202+
level have the same content, use hard links to consolidate duplicate files.
203+
196204
.. versionchanged:: 3.2
197205
Added the *legacy* and *optimize* parameter.
198206

@@ -219,9 +227,9 @@ Public functions
219227
Setting *workers* to 0 now chooses the optimal number of cores.
220228

221229
.. versionchanged:: 3.9
222-
Added *stripdir*, *prependdir* and *limit_sl_dest* arguments.
230+
Added *stripdir*, *prependdir*, *limit_sl_dest* and *hardlink_dupes* arguments.
223231

224-
.. function:: compile_file(fullname, ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None)
232+
.. function:: compile_file(fullname, ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None, hardlink_dupes=False)
225233

226234
Compile the file with path *fullname*. Return a true value if the file
227235
compiled successfully, and a false value otherwise.
@@ -257,6 +265,9 @@ Public functions
257265
the ``-s``, ``-p`` and ``-e`` options described above.
258266
They may be specified as ``str``, ``bytes`` or :py:class:`os.PathLike`.
259267

268+
If *hardlink_dupes* is true and two ``.pyc`` files with different optimization
269+
level have the same content, use hard links to consolidate duplicate files.
270+
260271
.. versionadded:: 3.2
261272

262273
.. versionchanged:: 3.5
@@ -273,7 +284,7 @@ Public functions
273284
The *invalidation_mode* parameter's default value is updated to None.
274285

275286
.. versionchanged:: 3.9
276-
Added *stripdir*, *prependdir* and *limit_sl_dest* arguments.
287+
Added *stripdir*, *prependdir*, *limit_sl_dest* and *hardlink_dupes* arguments.
277288

278289
.. function:: compile_path(skip_curdir=True, maxlevels=0, force=False, quiet=0, legacy=False, optimize=-1, invalidation_mode=None)
279290

Doc/whatsnew/3.9.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,16 @@ that schedules a shutdown for the default executor that waits on the
245245
Added :class:`asyncio.PidfdChildWatcher`, a Linux-specific child watcher
246246
implementation that polls process file descriptors. (:issue:`38692`)
247247

248+
compileall
249+
----------
250+
251+
Added new possibility to use hardlinks for duplicated ``.pyc`` files: *hardlink_dupes* parameter and --hardlink-dupes command line option.
252+
(Contributed by Lumír 'Frenzy' Balhar in :issue:`40495`.)
253+
254+
Added new options for path manipulation in resulting ``.pyc`` files: *stripdir*, *prependdir*, *limit_sl_dest* parameters and -s, -p, -e command line options.
255+
Added the possibility to specify the option for an optimization level multiple times.
256+
(Contributed by Lumír 'Frenzy' Balhar in :issue:`38112`.)
257+
248258
concurrent.futures
249259
------------------
250260

Lib/compileall.py

Lines changed: 35 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import importlib.util
1616
import py_compile
1717
import struct
18+
import filecmp
1819

1920
from functools import partial
2021
from pathlib import Path
@@ -47,7 +48,7 @@ def _walk_dir(dir, maxlevels, quiet=0):
4748
def compile_dir(dir, maxlevels=None, ddir=None, force=False,
4849
rx=None, quiet=0, legacy=False, optimize=-1, workers=1,
4950
invalidation_mode=None, *, stripdir=None,
50-
prependdir=None, limit_sl_dest=None):
51+
prependdir=None, limit_sl_dest=None, hardlink_dupes=False):
5152
"""Byte-compile all modules in the given directory tree.
5253
5354
Arguments (only dir is required):
@@ -70,6 +71,7 @@ def compile_dir(dir, maxlevels=None, ddir=None, force=False,
7071
after stripdir
7172
limit_sl_dest: ignore symlinks if they are pointing outside of
7273
the defined path
74+
hardlink_dupes: hardlink duplicated pyc files
7375
"""
7476
ProcessPoolExecutor = None
7577
if ddir is not None and (stripdir is not None or prependdir is not None):
@@ -104,22 +106,24 @@ def compile_dir(dir, maxlevels=None, ddir=None, force=False,
104106
invalidation_mode=invalidation_mode,
105107
stripdir=stripdir,
106108
prependdir=prependdir,
107-
limit_sl_dest=limit_sl_dest),
109+
limit_sl_dest=limit_sl_dest,
110+
hardlink_dupes=hardlink_dupes),
108111
files)
109112
success = min(results, default=True)
110113
else:
111114
for file in files:
112115
if not compile_file(file, ddir, force, rx, quiet,
113116
legacy, optimize, invalidation_mode,
114117
stripdir=stripdir, prependdir=prependdir,
115-
limit_sl_dest=limit_sl_dest):
118+
limit_sl_dest=limit_sl_dest,
119+
hardlink_dupes=hardlink_dupes):
116120
success = False
117121
return success
118122

119123
def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
120124
legacy=False, optimize=-1,
121125
invalidation_mode=None, *, stripdir=None, prependdir=None,
122-
limit_sl_dest=None):
126+
limit_sl_dest=None, hardlink_dupes=False):
123127
"""Byte-compile one file.
124128
125129
Arguments (only fullname is required):
@@ -140,6 +144,7 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
140144
after stripdir
141145
limit_sl_dest: ignore symlinks if they are pointing outside of
142146
the defined path.
147+
hardlink_dupes: hardlink duplicated pyc files
143148
"""
144149

145150
if ddir is not None and (stripdir is not None or prependdir is not None):
@@ -176,6 +181,14 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
176181
if isinstance(optimize, int):
177182
optimize = [optimize]
178183

184+
# Use set() to remove duplicates.
185+
# Use sorted() to create pyc files in a deterministic order.
186+
optimize = sorted(set(optimize))
187+
188+
if hardlink_dupes and len(optimize) < 2:
189+
raise ValueError("Hardlinking of duplicated bytecode makes sense "
190+
"only for more than one optimization level")
191+
179192
if rx is not None:
180193
mo = rx.search(fullname)
181194
if mo:
@@ -220,10 +233,16 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
220233
if not quiet:
221234
print('Compiling {!r}...'.format(fullname))
222235
try:
223-
for opt_level, cfile in opt_cfiles.items():
236+
for index, opt_level in enumerate(optimize):
237+
cfile = opt_cfiles[opt_level]
224238
ok = py_compile.compile(fullname, cfile, dfile, True,
225239
optimize=opt_level,
226240
invalidation_mode=invalidation_mode)
241+
if index > 0 and hardlink_dupes:
242+
previous_cfile = opt_cfiles[optimize[index - 1]]
243+
if filecmp.cmp(cfile, previous_cfile, shallow=False):
244+
os.unlink(cfile)
245+
os.link(previous_cfile, cfile)
227246
except py_compile.PyCompileError as err:
228247
success = False
229248
if quiet >= 2:
@@ -352,6 +371,9 @@ def main():
352371
'Python interpreter itself (specified by -O).'))
353372
parser.add_argument('-e', metavar='DIR', dest='limit_sl_dest',
354373
help='Ignore symlinks pointing outsite of the DIR')
374+
parser.add_argument('--hardlink-dupes', action='store_true',
375+
dest='hardlink_dupes',
376+
help='Hardlink duplicated pyc files')
355377

356378
args = parser.parse_args()
357379
compile_dests = args.compile_dest
@@ -371,6 +393,10 @@ def main():
371393
if args.opt_levels is None:
372394
args.opt_levels = [-1]
373395

396+
if len(args.opt_levels) == 1 and args.hardlink_dupes:
397+
parser.error(("Hardlinking of duplicated bytecode makes sense "
398+
"only for more than one optimization level."))
399+
374400
if args.ddir is not None and (
375401
args.stripdir is not None or args.prependdir is not None
376402
):
@@ -404,7 +430,8 @@ def main():
404430
stripdir=args.stripdir,
405431
prependdir=args.prependdir,
406432
optimize=args.opt_levels,
407-
limit_sl_dest=args.limit_sl_dest):
433+
limit_sl_dest=args.limit_sl_dest,
434+
hardlink_dupes=args.hardlink_dupes):
408435
success = False
409436
else:
410437
if not compile_dir(dest, maxlevels, args.ddir,
@@ -414,7 +441,8 @@ def main():
414441
stripdir=args.stripdir,
415442
prependdir=args.prependdir,
416443
optimize=args.opt_levels,
417-
limit_sl_dest=args.limit_sl_dest):
444+
limit_sl_dest=args.limit_sl_dest,
445+
hardlink_dupes=args.hardlink_dupes):
418446
success = False
419447
return success
420448
else:

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy