Skip to content

Commit a8bbe16

Browse files
bk2204chrisd8088
andcommitted
git: improve sparse file support
When we invoke `git ls-files` to try to find all LFS files, we don't honour sparse file paths or exclusions. While we should never actually traverse excluded files, using the `--exclude-standard` option can avoid loading some data with filtered clones, which may result in less data being downloaded. In addition, we can honour sparse checkouts, since this code path is only used to handle the working tree and we know that the only files we need to consider are those Git actually put in the working tree. The `--sparse` option is new in 2.35, but we already require 2.42 above, so we can use it unconditionally. Co-authored-by: Chris Darroch <[email protected]>
1 parent 3990c7a commit a8bbe16

File tree

3 files changed

+131
-0
lines changed

3 files changed

+131
-0
lines changed

git/git.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -329,7 +329,9 @@ func LsFilesLFS() (*subprocess.BufferedCmd, error) {
329329
return gitNoLFSBuffered(
330330
"ls-files",
331331
"--cached",
332+
"--exclude-standard",
332333
"--full-name",
334+
"--sparse",
333335
"-z",
334336
"--format=%(objectmode) %(objecttype) %(objectname) %(objectsize)\t%(path)",
335337
":(top,attr:filter=lfs)",

t/t-checkout.sh

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,3 +280,68 @@ begin_test "checkout: GIT_WORK_TREE"
280280
[ "$contents" = "$(cat "$reponame/file1.dat")" ]
281281
)
282282
end_test
283+
284+
begin_test "checkout with partial clone and sparse checkout and index"
285+
(
286+
set -e
287+
288+
# Only test with Git version 2.42.0 as it introduced support for the
289+
# "objecttype" format option to the "git ls-files" command, which our
290+
# code requires.
291+
ensure_git_version_isnt "$VERSION_LOWER" "2.42.0"
292+
293+
reponame="checkout-partial-clone"
294+
setup_remote_repo "$reponame"
295+
296+
clone_repo "$reponame" "$reponame"
297+
298+
git lfs track "*.dat"
299+
300+
contents1="a"
301+
contents1_oid=$(calc_oid "$contents1")
302+
contents2="b"
303+
contents2_oid=$(calc_oid "$contents2")
304+
contents3="c"
305+
contents3_oid=$(calc_oid "$contents3")
306+
307+
mkdir in out
308+
printf "%s" "$contents1" > a.dat
309+
printf "%s" "$contents2" > in/b.dat
310+
printf "%s" "$contents3" > out/c.dat
311+
git add .
312+
git commit -m "add files"
313+
314+
git push origin main
315+
316+
assert_server_object "$reponame" "$contents1_oid"
317+
assert_server_object "$reponame" "$contents2_oid"
318+
assert_server_object "$reponame" "$contents3_oid"
319+
320+
# Create a partial clone with a cone-mode sparse checkout of one directory
321+
# and a sparse index, which is important because otherwise the "git ls-files"
322+
# command ignores the --sparse option and lists all LFS files.
323+
cd ..
324+
git clone --filter=tree:0 --depth=1 --no-checkout \
325+
"$GITSERVER/$reponame" partial
326+
327+
cd partial
328+
git sparse-checkout init --cone --sparse-index
329+
git sparse-checkout set in
330+
GIT_LFS_SKIP_SMUDGE=1 git checkout main
331+
git lfs fetch origin main
332+
333+
# This was downloaded by `git lfs fetch`.
334+
delete_local_object "$contents3_oid"
335+
336+
assert_local_object "$contents1_oid" 1
337+
assert_local_object "$contents2_oid" 1
338+
refute_local_object "$contents3_oid" 1
339+
340+
git lfs checkout
341+
test -f "out/c.dat" && exit 1
342+
343+
assert_local_object "$contents1_oid" 1
344+
assert_local_object "$contents2_oid" 1
345+
refute_local_object "$contents3_oid" 1
346+
)
347+
end_test

t/t-pull.sh

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -392,3 +392,67 @@ begin_test "pull with empty file doesn't modify mtime"
392392
diff -u foo.mtime foo.mtime2
393393
)
394394
end_test
395+
396+
begin_test "pull with partial clone and sparse checkout"
397+
(
398+
set -e
399+
400+
# Only test with Git version 2.42.0 as it introduced support for the
401+
# "objecttype" format option to the "git ls-files" command, which our
402+
# code requires.
403+
ensure_git_version_isnt "$VERSION_LOWER" "2.42.0"
404+
405+
reponame="pull-sparse"
406+
setup_remote_repo "$reponame"
407+
408+
clone_repo "$reponame" "$reponame"
409+
410+
git lfs track "*.dat"
411+
412+
contents1="a"
413+
contents1_oid=$(calc_oid "$contents1")
414+
contents2="b"
415+
contents2_oid=$(calc_oid "$contents2")
416+
contents3="c"
417+
contents3_oid=$(calc_oid "$contents3")
418+
419+
mkdir in out
420+
printf "%s" "$contents1" > a.dat
421+
printf "%s" "$contents2" > in/b.dat
422+
printf "%s" "$contents3" > out/c.dat
423+
git add .
424+
git commit -m "add files"
425+
426+
git push origin main
427+
428+
assert_server_object "$reponame" "$contents1_oid"
429+
assert_server_object "$reponame" "$contents2_oid"
430+
assert_server_object "$reponame" "$contents3_oid"
431+
432+
# Create a partial clone with a cone-mode sparse checkout of one directory
433+
# and a sparse index, which is important because otherwise the "git ls-files"
434+
# command ignores the --sparse option and lists all Git LFS files.
435+
cd ..
436+
git clone --filter=tree:0 --depth=1 --no-checkout \
437+
"$GITSERVER/$reponame" "${reponame}-partial"
438+
439+
cd "${reponame}-partial"
440+
git sparse-checkout init --cone --sparse-index
441+
git sparse-checkout set in
442+
git checkout main
443+
444+
[ -d "in" ]
445+
[ ! -e "out" ]
446+
447+
assert_local_object "$contents1_oid" 1
448+
assert_local_object "$contents2_oid" 1
449+
refute_local_object "$contents3_oid"
450+
451+
git lfs pull 2>&1 | tee pull.log
452+
grep -q "Downloading LFS objects" pull.log && exit 1
453+
454+
# Git LFS objects associated with files outside of the sparse cone
455+
# should not have been pulled.
456+
refute_local_object "$contents3_oid"
457+
)
458+
end_test

0 commit comments

Comments
 (0)