aboutsummaryrefslogtreecommitdiff
path: root/t/t1450-fsck.sh
diff options
context:
space:
mode:
authorJeff King <peff@peff.net>2017-01-17 16:32:57 -0500
committerJunio C Hamano <gitster@pobox.com>2017-01-17 14:23:20 -0800
commit3e3f8bd608e6682a2ad4f6cef52ed8ec45b8ad59 (patch)
treea48bf3463fbac800b2defe8292c1a360603e4595 /t/t1450-fsck.sh
parentb4584e4f665f59f51572f479db6baf1a1cdbc03a (diff)
downloadgit-3e3f8bd608e6682a2ad4f6cef52ed8ec45b8ad59.tar.gz
git-3e3f8bd608e6682a2ad4f6cef52ed8ec45b8ad59.tar.xz
fsck: prepare dummy objects for --connectivity-check
Normally fsck makes a pass over all objects to check their integrity, and then follows up with a reachability check to make sure we have all of the referenced objects (and to know which ones are dangling). The latter checks for the HAS_OBJ flag in obj->flags to see if we found the object in the first pass. Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`, 2015-06-22) taught fsck to skip the initial pass, and to fallback to has_sha1_file() instead of the HAS_OBJ check. However, it converted only one HAS_OBJ check to use has_sha1_file(). But there are many other places in builtin/fsck.c that assume that the flag is set (or that lookup_object() will return an object at all). This leads to several bugs with --connectivity-only: 1. mark_object() will not queue objects for examination, so recursively following links from commits to trees, etc, did nothing. I.e., we were checking the reachability of hardly anything at all. 2. When a set of heads is given on the command-line, we use lookup_object() to see if they exist. But without the initial pass, we assume nothing exists. 3. When loading reflog entries, we do a similar lookup_object() check, and complain that the reflog is broken if the object doesn't exist in our hash. So in short, --connectivity-only is broken pretty badly, and will claim that your repository is fine when it's not. Presumably nobody noticed for a few reasons. One is that the embedded test does not actually test the recursive nature of the reachability check. All of the missing objects are still in the index, and we directly check items from the index. This patch modifies the test to delete the index, which shows off breakage (1). Another is that --connectivity-only just skips the initial pass for loose objects. So on a real repository, the packed objects were still checked correctly. But on the flipside, it means that "git fsck --connectivity-only" still checks the sha1 of all of the packed objects, nullifying its original purpose of being a faster git-fsck. And of course the final problem is that the bug only shows up when there _is_ corruption, which is rare. So anybody running "git fsck --connectivity-only" proactively would assume it was being thorough, when it was not. One possibility for fixing this is to find all of the spots that rely on HAS_OBJ and tweak them for the connectivity-only case. But besides the risk that we might miss a spot (and I found three already, corresponding to the three bugs above), there are other parts of fsck that _can't_ work without a full list of objects. E.g., the list of dangling objects. Instead, let's make the connectivity-only case look more like the normal case. Rather than skip the initial pass completely, we'll do an abbreviated one that sets up the HAS_OBJ flag for each object, without actually loading the object data. That's simple and fast, and we don't have to care about the connectivity_only flag in the rest of the code at all. While we're at it, let's make sure we treat loose and packed objects the same (i.e., setting up dummy objects for both and skipping the actual sha1 check). That makes the connectivity-only check actually fast on a real repo (40 seconds versus 180 seconds on my copy of linux.git). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t1450-fsck.sh')
-rwxr-xr-xt/t1450-fsck.sh29
1 files changed, 27 insertions, 2 deletions
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index e88ec7747..4d1c3ba66 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -523,9 +523,21 @@ test_expect_success 'fsck --connectivity-only' '
touch empty &&
git add empty &&
test_commit empty &&
+
+ # Drop the index now; we want to be sure that we
+ # recursively notice the broken objects
+ # because they are reachable from refs, not because
+ # they are in the index.
+ rm -f .git/index &&
+
+ # corrupt the blob, but in a way that we can still identify
+ # its type. That lets us see that --connectivity-only is
+ # not actually looking at the contents, but leaves it
+ # free to examine the type if it chooses.
empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 &&
- rm -f $empty &&
- echo invalid >$empty &&
+ blob=$(echo unrelated | git hash-object -w --stdin) &&
+ mv $(sha1_file $blob) $empty &&
+
test_must_fail git fsck --strict &&
git fsck --strict --connectivity-only &&
tree=$(git rev-parse HEAD:) &&
@@ -537,6 +549,19 @@ test_expect_success 'fsck --connectivity-only' '
)
'
+test_expect_success 'fsck --connectivity-only with explicit head' '
+ rm -rf connectivity-only &&
+ git init connectivity-only &&
+ (
+ cd connectivity-only &&
+ test_commit foo &&
+ rm -f .git/index &&
+ tree=$(git rev-parse HEAD^{tree}) &&
+ remove_object $(git rev-parse HEAD:foo.t) &&
+ test_must_fail git fsck --connectivity-only $tree
+ )
+'
+
remove_loose_object () {
sha1="$(git rev-parse "$1")" &&
remainder=${sha1#??} &&