-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Starting a couple of years ago, git added the concept of partial clones. This adds the --filter
option to various commands, git clone
, git submodule update
, and maybe more. The partiality can be to fetch only the "blobs" (file contents) needed for a particular checkout, or to fetch only the part of a tree.
Rather than repeat some good explanations here, take a look at these writeups:
https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/
https://about.gitlab.com/blog/2020/03/13/partial-clone-for-massive-repositories/
We could use --filter=blob:none
to do what we are now doing with shallow clones of submodules. The big advantage of partial clones over shallow clones is that they're transparent, fetching what is needed on demand. Instead of not being able to deal at all with commits before the shallow clone, the commits are accessible, just not fetched.
This can be used for circuitpython
itself and also its submodules. Try git clone --filter=blob:none https://github.com/adafruit/circuitpython
. It's fast, and gets you the equivalent of a --depth=1
shallow clone. Then try git checkout 7.3.3
, and see that it fetches what it needs to. These filters can be passed down to a recursive clone with submodules, or git submodule update
can take a --filter
option. The submodule option to git submodule update
was added somewhere between git 2.34 and 2.38: I'm not sure where. At least git 2.36 is good, because a partial recursive clone from the top properly propagates the --filter
down to the submodule fetching.
I think what this means is that make fetch-submodules
could be switched to use partial clones. We could also recommend a different workflow with top-level clone that recursively fetches everything.
I tried a partial and shallow clone of the huge rpi-firmware
submodule in various ways:
$ time git clone --filter=blob:none https://github.com/raspberrypi/rpi-firmware.git
[...]
real 0m46.067s
user 0m6.522s
sys 0m2.545s
$ time git clone --filter=tree:0 https://github.com/raspberrypi/rpi-firmware.git
[...]
real 0m10.298s
user 0m3.437s
sys 0m1.641s
$ time git clone --depth=1 https://github.com/raspberrypi/rpi-firmware.git
[...]
real 0m7.453s
user 0m3.259s
sys 0m1.492s
For that particular submodule, we don't really need the full tree. But for many other submodules, it's convenient to have the full tree to allow experimentation and updating in place, without manually re-cloning.
Another interesting development is the git scalar
command, which optimizes git for use with large repos. I have only begun to look at that.