aboutsummaryrefslogtreecommitdiff
path: root/sha1_file.c
Commit message (Collapse)AuthorAge
* Merge branch 'jc/shared-literally'Junio C Hamano2009-04-06
|\ | | | | | | | | | | | | | | | | * jc/shared-literally: t1301: loosen test for forced modes set_shared_perm(): sometimes we know what the final mode bits should look like move_temp_to_file(): do not forget to chmod() in "Coda hack" codepath Move chmod(foo, 0444) into move_temp_to_file() "core.sharedrepository = 0mode" should set, not loosen
| * set_shared_perm(): sometimes we know what the final mode bits should look likeJunio C Hamano2009-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | adjust_shared_perm() first obtains the mode bits from lstat(2), expecting to find what the result of applying user's umask is, and then tweaks it as necessary. When the file to be adjusted is created with mkstemp(3), however, the mode thusly obtained does not have anything to do with user's umask, and we would need to start from 0444 in such a case and there is no point running lstat(2) for such a path. This introduces a new API set_shared_perm() to bypass the lstat(2) and instead force setting the mode bits to the desired value directly. adjust_shared_perm() becomes a thin wrapper to the function. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * move_temp_to_file(): do not forget to chmod() in "Coda hack" codepathJunio C Hamano2009-03-28
| | | | | | | | | | | | | | | | Now move_temp_to_file() is responsible for doing everything that is necessary to turn a tempfile in $GIT_DIR into its final form, it must make sure "Coda hack" codepath correctly makes the file read-only. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * Move chmod(foo, 0444) into move_temp_to_file()Johan Herland2009-03-27
| | | | | | | | | | | | | | | | | | | | | | | | When writing out a loose object or a pack (index), move_temp_to_file() is called to finalize the resulting file. These files (loose files and packs) should all have permission mode 0444 (modulo adjust_shared_perm()). Therefore, instead of doing chmod(foo, 0444) explicitly from each callsite (or even forgetting to chmod() at all), do the chmod() call from within move_temp_to_file(). Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * "core.sharedrepository = 0mode" should set, not loosenJunio C Hamano2009-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the behaviour of octal notation to how it is defined in the documentation, while keeping the traditional "loosen only" semantics intact for "group" and "everybody". Three main points of this patch are: - For an explicit octal notation, the internal shared_repository variable is set to a negative value, so that we can tell "group" (which is to "OR" in 0660) and 0660 (which is to "SET" to 0660); - git-init did not set shared_repository variable early enough to affect the initial creation of many files, notably copied templates and the configuration. We set it very early when a command-line option specifies a custom value. - Many codepaths create files inside $GIT_DIR by various ways that all involve mkstemp(), and then call move_temp_to_file() to rename it to its final destination. We can add adjust_shared_perm() call here; for the traditional "loosen-only", this would be a no-op for many codepaths because the mode is already loose enough, but with the new behaviour it makes a difference. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'jc/maint-1.6.0-keep-pack'Junio C Hamano2009-04-01
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * jc/maint-1.6.0-keep-pack: pack-objects: don't loosen objects available in alternate or kept packs t7700: demonstrate repack flaw which may loosen objects unnecessarily Remove --kept-pack-only option and associated infrastructure pack-objects: only repack or loosen objects residing in "local" packs git-repack.sh: don't use --kept-pack-only option to pack-objects t7700-repack: add two new tests demonstrating repacking flaws Conflicts: t/t7700-repack.sh
| * | Remove --kept-pack-only option and associated infrastructureBrandon Casey2009-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This option to pack-objects/rev-list was created to improve the -A and -a options of repack. It was found to be lacking in that it did not provide the ability to differentiate between local and non-local kept packs, and found to be unnecessary since objects residing in local kept packs can be filtered out by the --honor-pack-keep option. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Merge branch 'maint'Junio C Hamano2009-03-24
|\ \ \ | | |/ | |/| | | | | | | | | | | | | * maint: Increase the size of the die/warning buffer to avoid truncation close_sha1_file(): make it easier to diagnose errors avoid possible overflow in delta size filtering computation
| * | Merge branch 'maint-1.6.1' into maintJunio C Hamano2009-03-24
| |\ \ | | | | | | | | | | | | | | | | | | | | * maint-1.6.1: close_sha1_file(): make it easier to diagnose errors avoid possible overflow in delta size filtering computation
| | * \ Merge branch 'maint-1.6.0' into maint-1.6.1Junio C Hamano2009-03-24
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | * maint-1.6.0: close_sha1_file(): make it easier to diagnose errors avoid possible overflow in delta size filtering computation
| | | * | close_sha1_file(): make it easier to diagnose errorsLinus Torvalds2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bug report with "unable to write sha1 file" made us realize that we do not have enough information to guess why close() is failing. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | | Merge branch 'jc/maint-1.6.0-keep-pack'Junio C Hamano2009-03-11
|\ \ \ \ \ | |/ / / / |/| | | / | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | * jc/maint-1.6.0-keep-pack: is_kept_pack(): final clean-up Simplify is_kept_pack() Consolidate ignore_packed logic more has_sha1_kept_pack(): take "struct rev_info" has_sha1_pack(): refactor "pretend these packs do not exist" interface git-repack: resist stray environment variable
| * | | is_kept_pack(): final clean-upJunio C Hamano2009-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now is_kept_pack() is just a member lookup into a structure, we can write it as such. Also rewrite the sole caller of has_sha1_kept_pack() to switch on the criteria the callee uses (namely, revs->kept_pack_only) between calling has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not have to take a pointer to struct rev_info as an argument. This removes the header file dependency issue temporarily introduced by the earlier commit, so we revert changes associated to that as well. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | Simplify is_kept_pack()Junio C Hamano2009-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes --unpacked=<packfile> parameter from the revision parser, and rewrites its use in git-repack to pass a single --kept-pack-only option instead. The new --kept-pack-only option means just that. When this option is given, is_kept_pack() that used to say "not on the --unpacked=<packfile> list" now says "the packfile has corresponding .keep file". Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | Consolidate ignore_packed logic moreJunio C Hamano2009-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This refactors three loops that check if a given packfile is on the ignore_packed list into a function is_kept_pack(). The function returns false for a pack on the list, and true for a pack not on the list, because this list is solely used by "git repack" to pass list of packfiles that do not have corresponding .keep files, i.e. a packfile not on the list is "kept". Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | has_sha1_kept_pack(): take "struct rev_info"Junio C Hamano2009-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Its "ignore_packed" parameter always comes from struct rev_info. This patch makes the function take a pointer to the surrounding structure, so that the refactoring in the next patch becomes easier to review. There is an unfortunate header file dependency and the easiest workaround is to temporarily move the function declaration from cache.h to revision.h; this will be moved back to cache.h once the function loses this "ignore_packed" parameter altogether in the later part of the series. Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | has_sha1_pack(): refactor "pretend these packs do not exist" interfaceJunio C Hamano2009-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the callers of this function except only one pass NULL to its last parameter, ignore_packed. Introduce has_sha1_kept_pack() function that has the function signature and the semantics of this function, and convert the sole caller that does not pass NULL to call this new function. All other callers and has_sha1_pack() lose the ignore_packed parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | sha1_file.c: fix typoFelipe Contreras2009-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | it's != its Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'maint'Junio C Hamano2009-02-11
|\ \ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * maint: Prepare for 1.6.1.4. Make repack less likely to corrupt repository fast-export: ensure we traverse commits in topological order Clear the delta base cache if a pack is rebuilt Conflicts: RelNotes
| * | | Merge branch 'maint-1.6.0' into maintJunio C Hamano2009-02-11
| |\ \ \ | | | |/ | | |/| | | | | | | | | | | | | | | | | * maint-1.6.0: Make repack less likely to corrupt repository fast-export: ensure we traverse commits in topological order Clear the delta base cache if a pack is rebuilt
| | * | Clear the delta base cache if a pack is rebuiltShawn O. Pearce2009-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is some risk that re-opening a regenerated pack file with different offsets could leave stale entries within the delta base cache that could be matched up against other objects using the same "struct packed_git*" and pack offset. Throwing away the entire delta base cache in this case is safer, as we don't have to worry about a recycled "struct packed_git*" matching to the wrong base object, resulting in delta apply errors while unpacking an object. Suggested-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'maint'Junio C Hamano2009-02-10
|\ \ \ \ | |/ / / | | | | | | | | | | | | * maint: Clear the delta base cache during fast-import checkpoint
| * | | Merge branch 'maint-1.6.0' into maintJunio C Hamano2009-02-10
| |\ \ \ | | |/ / | | | | | | | | | | | | * maint-1.6.0: Clear the delta base cache during fast-import checkpoint
| | * | Clear the delta base cache during fast-import checkpointShawn O. Pearce2009-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise we may reuse the same memory address for a totally different "struct packed_git", and a previously cached object from the prior occupant might be returned when trying to unpack an object from the new pack. Found-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | Merge branch 'lt/maint-wrap-zlib' into maintJunio C Hamano2009-02-05
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * lt/maint-wrap-zlib: Wrap inflate and other zlib routines for better error reporting Conflicts: http-push.c http-walker.c sha1_file.c
* | \ \ \ Sync with 1.6.1.2Junio C Hamano2009-01-29
|\ \ \ \ \ | |/ / / / | | | | | | | | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | Merge branch 'maint-1.6.0' into maintJunio C Hamano2009-01-28
| |\ \ \ \ | | | |/ / | | |/| | | | | | | | | | | | * maint-1.6.0: avoid 31-bit truncation in write_loose_object
| | * | | avoid 31-bit truncation in write_loose_objectJeff King2009-01-28
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The size of the content we are adding may be larger than 2.1G (i.e., "git add gigantic-file"). Most of the code-path to do so uses size_t or unsigned long to record the size, but write_loose_object uses a signed int. On platforms where "int" is 32-bits (which includes x86_64 Linux platforms), we end up passing malloc a negative size. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'lt/maint-wrap-zlib'Junio C Hamano2009-01-21
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * lt/maint-wrap-zlib: Wrap inflate and other zlib routines for better error reporting Conflicts: http-push.c http-walker.c sha1_file.c
| * | | Wrap inflate and other zlib routines for better error reportingLinus Torvalds2009-01-11
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | R. Tyler Ballance reported a mysterious transient repository corruption; after much digging, it turns out that we were not catching and reporting memory allocation errors from some calls we make to zlib. This one _just_ wraps things; it doesn't do the "retry on low memory error" part, at least not yet. It is an independent issue from the reporting. Some of the errors are expected and passed back to the caller, but we die when zlib reports it failed to allocate memory for now. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | sha1_file: make "read_object" staticChristian Couder2009-01-13
| |/ |/| | | | | | | | | | | | | | | | | | | This function is only used from "sha1_file.c". And as we want to add a "replace_object" hook in "read_sha1_file", we must not let people bypass the hook using something other than "read_sha1_file". Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Make 'index_path()' use 'strbuf_readlink()'Linus Torvalds2008-12-17
| | | | | | | | | | | | | | | | This makes us able to properly index symlinks even on filesystems where st_size doesn't match the true size of the link. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'maint'Junio C Hamano2008-12-11
|\ \ | |/ | | | | | | | | * maint: fsck: reduce stack footprint make sure packs to be replaced are closed beforehand
| * make sure packs to be replaced are closed beforehandNicolas Pitre2008-12-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Especially on Windows where an opened file cannot be replaced, make sure pack-objects always close packs it is about to replace. Even on non Windows systems, this could save potential bad results if ever objects were to be read from the new pack file using offset from the old index. This should fix t5303 on Windows. Signed-off-by: Nicolas Pitre <nico@cam.org> Tested-by: Johannes Sixt <j6t@kdbg.org> (MinGW) Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * Merge branch 'bc/maint-keep-pack' into maintJunio C Hamano2008-12-02
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * bc/maint-keep-pack: repack: only unpack-unreachable if we are deleting redundant packs t7700: test that 'repack -a' packs alternate packed objects pack-objects: extend --local to mean ignore non-local loose objects too sha1_file.c: split has_loose_object() into local and non-local counterparts t7700: demonstrate mishandling of loose objects in an alternate ODB builtin-gc.c: use new pack_keep bitfield to detect .keep file existence repack: do not fall back to incremental repacking with [-a|-A] repack: don't repack local objects in packs with .keep file pack-objects: new option --honor-pack-keep packed_git: convert pack_local flag into a bitfield and add pack_keep t7700: demonstrate mishandling of objects in packs with a .keep file
* | \ Merge branch 'maint'Junio C Hamano2008-11-27
|\ \ \ | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * maint: sha1_file.c: resolve confusion EACCES vs EPERM sha1_file: avoid bogus "file exists" error message git checkout: don't warn about unborn branch if -f is already passed bash: offer refs instead of filenames for 'git revert' bash: remove dashed command leftovers git-p4: fix keyword-expansion regex fast-export: use an unsorted string list for extra_refs Add new testcase to show fast-export does not always exports all tags
| * | sha1_file.c: resolve confusion EACCES vs EPERMSam Vilain2008-11-27
| | | | | | | | | | | | | | | | | | | | | | | | An earlier commit 916d081 (Nicer error messages in case saving an object to db goes wrong, 2006-11-09) confused EACCES with EPERM, the latter of which is an unlikely error from mkstemp(). Signed-off-by: Sam Vilain <sam@vilain.net>
| * | sha1_file: avoid bogus "file exists" error messageJoey Hess2008-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids the following misleading error message: error: unable to create temporary sha1 filename ./objects/15: File exists mkstemp can fail for many reasons, one of which, ENOENT, can occur if the directory for the temp file doesn't exist. create_tmpfile tried to handle this case by always trying to mkdir the directory, even if it already existed. This caused errno to be clobbered, so one cannot tell why mkstemp really failed, and it truncated the buffer to just the directory name, resulting in the strange error message shown above. Note that in both occasions that I've seen this failure, it has not been due to a missing directory, or bad permissions, but some other, unknown mkstemp failure mode that did not occur when I ran git again. This code could perhaps be made more robust by retrying mkstemp, in case it was a transient failure. Signed-off-by: Joey Hess <joey@kitenet.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | sha1_file: avoid bogus "file exists" error messageJoey Hess2008-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids the following misleading error message: error: unable to create temporary sha1 filename ./objects/15: File exists mkstemp can fail for many reasons, one of which, ENOENT, can occur if the directory for the temp file doesn't exist. create_tmpfile tried to handle this case by always trying to mkdir the directory, even if it already existed. This caused errno to be clobbered, so one cannot tell why mkstemp really failed, and it truncated the buffer to just the directory name, resulting in the strange error message shown above. Note that in both occasions that I've seen this failure, it has not been due to a missing directory, or bad permissions, but some other, unknown mkstemp failure mode that did not occur when I ran git again. This code could perhaps be made more robust by retrying mkstemp, in case it was a transient failure. Signed-off-by: Joey Hess <joey@kitenet.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Fix handle leak in sha1_file/unpack_objects if there were damaged object dataAlex Riesen2008-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | In the case of bad packed object CRC, unuse_pack wasn't called after check_pack_crc which calls use_pack. Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Merge branch 'jk/commit-v-strip'Junio C Hamano2008-11-16
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | * jk/commit-v-strip: status: show "-v" diff even for initial commit wt-status: refactor initial commit printing define empty tree sha1 as a macro
| * | | define empty tree sha1 as a macroJeff King2008-11-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This can potentially be used in a few places, so let's make it available to all parts of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'np/pack-safer'Junio C Hamano2008-11-12
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * np/pack-safer: t5303: fix printf format string for portability t5303: work around printf breakage in dash pack-objects: don't leak pack window reference when splitting packs extend test coverage for latest pack corruption resilience improvements pack-objects: allow "fixing" a corrupted pack without a full repack make find_pack_revindex() aware of the nasty world make check_object() resilient to pack corruptions make packed_object_info() resilient to pack corruptions make unpack_object_header() non fatal better validation on delta base object offsets close another possibility for propagating pack corruption
| * | | | make find_pack_revindex() aware of the nasty worldNicolas Pitre2008-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It currently calls die() whenever given offset is not found thinking that such thing should never happen. But this offset may come from a corrupted pack whych _could_ happen and not be found. Callers should deal with this possibility gracefully instead. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | make packed_object_info() resilient to pack corruptionsNicolas Pitre2008-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the same spirit as commit 8eca0b47ff, let's try to survive a pack corruption by making packed_object_info() able to fall back to alternate packs or loose objects. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | make unpack_object_header() non fatalNicolas Pitre2008-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to have pack corruption in the object header. Currently unpack_object_header() simply die() on them instead of letting the caller deal with that gracefully. So let's have unpack_object_header() return an error instead, and find a better name for unpack_object_header_gently() in that context. All callers of unpack_object_header() are ready for it. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | better validation on delta base object offsetsNicolas Pitre2008-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In one case, it was possible to have a bad offset equal to 0 effectively pointing a delta onto itself and crashing git after too many recursions. In the other cases, a negative offset could result due to off_t being signed. Catch those. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * | | | close another possibility for propagating pack corruptionNicolas Pitre2008-11-02
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Abstract -------- With index v2 we have a per object CRC to allow quick and safe reuse of pack data when repacking. This, however, doesn't currently prevent a stealth corruption from being propagated into a new pack when _not_ reusing pack data as demonstrated by the modification to t5302 included here. The Context ----------- The Git database is all checksummed with SHA1 hashes. Any kind of corruption can be confirmed by verifying this per object hash against corresponding data. However this can be costly to perform systematically and therefore this check is often not performed at run time when accessing the object database. First, the loose object format is entirely compressed with zlib which already provide a CRC verification of its own when inflating data. Any disk corruption would be caught already in this case. Then, packed objects are also compressed with zlib but only for their actual payload. The object headers and delta base references are not deflated for obvious performance reasons, however this leave them vulnerable to potentially undetected disk corruptions. Object types are often validated against the expected type when they're requested, and deflated size must always match the size recorded in the object header, so those cases are pretty much covered as well. Where corruptions could go unnoticed is in the delta base reference. Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to get corrupted so it actually matches the SHA1 of another object with the same size (the delta header stores the expected size of the base object to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the reference is a pack offset which would have to match the start boundary of a different base object but still with the same size, and although this is relatively much more "probable" than in the OBJ_REF_DELTA case, the probability is also about zero in absolute terms. Still, the possibility exists as demonstrated in t5302 and is certainly greater than a SHA1 collision, especially in the OBJ_OFS_DELTA case which is now the default when repacking. Again, repacking by reusing existing pack data is OK since the per object CRC provided by index v2 guards against any such corruptions. What t5302 failed to test is a full repack in such case. The Solution ------------ As unlikely as this kind of stealth corruption can be in practice, it certainly isn't acceptable to propagate it into a freshly created pack. But, because this is so unlikely, we don't want to pay the run time cost associated with extra validation checks all the time either. Furthermore, consequences of such corruption in anything but repacking should be rather visible, and even if it could be quite unpleasant, it still has far less severe consequences than actively creating bad packs. So the best compromize is to check packed object CRC when unpacking objects, and only during the compression/writing phase of a repack, and only when not streaming the result. The cost of this is minimal (less than 1% CPU time), and visible only with a full repack. Someone with a stats background could provide an objective evaluation of this, but I suspect that it's bad RAM that has more potential for data corruptions at this point, even in those cases where this extra check is not performed. Still, it is best to prevent a known hole for corruption when recreating object data into a new pack. What about the streamed pack case? Well, any client receiving a pack must always consider that pack as untrusty and perform full validation anyway, hence no such stealth corruption could be propagated to remote repositoryes already. It is therefore worthless doing local validation in that case. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'bc/maint-keep-pack'Junio C Hamano2008-11-12
|\ \ \ \ | |/ / / |/| | / | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | * bc/maint-keep-pack: t7700: test that 'repack -a' packs alternate packed objects pack-objects: extend --local to mean ignore non-local loose objects too sha1_file.c: split has_loose_object() into local and non-local counterparts t7700: demonstrate mishandling of loose objects in an alternate ODB builtin-gc.c: use new pack_keep bitfield to detect .keep file existence repack: do not fall back to incremental repacking with [-a|-A] repack: don't repack local objects in packs with .keep file pack-objects: new option --honor-pack-keep packed_git: convert pack_local flag into a bitfield and add pack_keep t7700: demonstrate mishandling of objects in packs with a .keep file
| * | sha1_file.c: split has_loose_object() into local and non-local counterpartsBrandon Casey2008-11-12
| | | | | | | | | | | | | | | Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>