Skip to content

Consider using git "partial clone" #7225

@dhalbert

Description

@dhalbert

Starting a couple of years ago, git added the concept of partial clones. This adds the --filter option to various commands, git clone , git submodule update, and maybe more. The partiality can be to fetch only the "blobs" (file contents) needed for a particular checkout, or to fetch only the part of a tree.

Rather than repeat some good explanations here, take a look at these writeups:
https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/
https://about.gitlab.com/blog/2020/03/13/partial-clone-for-massive-repositories/

We could use --filter=blob:none to do what we are now doing with shallow clones of submodules. The big advantage of partial clones over shallow clones is that they're transparent, fetching what is needed on demand. Instead of not being able to deal at all with commits before the shallow clone, the commits are accessible, just not fetched.

This can be used for circuitpython itself and also its submodules. Try git clone --filter=blob:none https://github.com/adafruit/circuitpython. It's fast, and gets you the equivalent of a --depth=1 shallow clone. Then try git checkout 7.3.3, and see that it fetches what it needs to. These filters can be passed down to a recursive clone with submodules, or git submodule update can take a --filter option. The submodule option to git submodule update was added somewhere between git 2.34 and 2.38: I'm not sure where. At least git 2.36 is good, because a partial recursive clone from the top properly propagates the --filter down to the submodule fetching.

I think what this means is that make fetch-submodules could be switched to use partial clones. We could also recommend a different workflow with top-level clone that recursively fetches everything.

I tried a partial and shallow clone of the huge rpi-firmware submodule in various ways:

$ time git clone --filter=blob:none https://github.com/raspberrypi/rpi-firmware.git
[...]
real	0m46.067s
user	0m6.522s
sys	0m2.545s
$ time git clone --filter=tree:0 https://github.com/raspberrypi/rpi-firmware.git
[...]
real	0m10.298s
user	0m3.437s
sys	0m1.641s
$ time git clone --depth=1 https://github.com/raspberrypi/rpi-firmware.git
[...]
real	0m7.453s
user	0m3.259s
sys	0m1.492s

For that particular submodule, we don't really need the full tree. But for many other submodules, it's convenient to have the full tree to allow experimentation and updating in place, without manually re-cloning.

Another interesting development is the git scalar command, which optimizes git for use with large repos. I have only begun to look at that.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy