aboutsummaryrefslogtreecommitdiff
path: root/diff.c
Commit message (Collapse)AuthorAge
* Move diff.renamelimit out of default configuration.Junio C Hamano2005-11-21
| | | | | | | Otherwise we would end up linking all the unneeded stuff into git-daemon only to link with git_default_config. Signed-off-by: Junio C Hamano <junkio@cox.net>
* rename/copy score parsing updates.H. Peter Anvin2005-11-21
| | | | | | | | | | | | | | | | Better variant, which handles stuff like "4.5%" and rejects "192.168.0.1". Additionally, make sure numbers are unsigned (I'm making them unsigned long just for the hell of it), to make sure that artificial wraparound scenarios don't cause harm. -hpa [jc: with this, -M100 changes its meaning back to 10%. People wanting to say "pure renames only" should now say -M100% or -M1.0; sounds a bit like an earthquake, but arguably things are more consistent this way ;-)] Signed-off-by: Junio C Hamano <junkio@cox.net>
* rename detection with -M100 means "exact renames only".Junio C Hamano2005-11-21
| | | | | | | | | | | When the user is interested in pure renames, there is no point doing the similarity scores. This changes the score argument parsing to special case -M100 (otherwise, it is a precision scaled value 0 <= v < 1 and would mean 0.1, not 1.0 --- if you do mean 0.1, you can say -M1), and optimizes the diffcore_rename transformation to only look at pure renames in that case. Signed-off-by: Junio C Hamano <junkio@cox.net>
* diff: --full-indexJunio C Hamano2005-11-16
| | | | | | | | | A new option, --full-index, is introduced to diff family. This causes the full object name of pre- and post-images to appear on the index line of patch formatted output, to be used in conjunction with --allow-binary-replacement option of git-apply. Signed-off-by: Junio C Hamano <junkio@cox.net>
* diff: make default rename detection limit configurable.Junio C Hamano2005-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A while ago, a rename-detection limit logic was implemented as a response to this thread: http://marc.theaimsgroup.com/?l=git&m=112413080630175 where gitweb was found to be using a lot of time and memory to detect renames on huge commits. git-diff family takes -l<num> flag, and if the number of paths that are rename destination candidates (i.e. new paths with -M, or modified paths with -C) are larger than that number, skips rename/copy detection even when -M or -C is specified on the command line. This commit makes the rename detection limit easier to use. You can have: [diff] renamelimit = 30 in your .git/config file to specify the default rename detection limit. You can override this from the command line; giving 0 means 'unlimited': git diff -M -l0 We might want to change the default behaviour, when you do not have the configuration, to limit it to say 20 paths or so. This would also help the diffstat generation after a big 'git pull'. Signed-off-by: Junio C Hamano <junkio@cox.net>
* Split up tree diff functions into tree-diff.c libraryLinus Torvalds2005-10-22
| | | | | | | | | | | | | | | | | | | | | | This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Handle "-" at beginning of filenames, part 3Linus Torvalds2005-10-18
| | | | | | | | | This fixes the default built-in exec() of "diff" to add a "--" before the filenames, so that if a filename starts with a "-", the diff program won't think it's an option. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Update git-diff-* to use C-style quoting for funny pathnames.Junio C Hamano2005-10-17
| | | | Signed-off-by: Junio C Hamano <junkio@cox.net>
* Show original and resulting blob object info in diff output.Junio C Hamano2005-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds more cruft to diff --git header to record the blob SHA1 and the mode the patch/diff is intended to be applied against, to help the receiving end fall back on a three-way merge. The new header looks like this: diff --git a/apply.c b/apply.c index 7be5041..8366082 100644 --- a/apply.c +++ b/apply.c @@ -14,6 +14,7 @@ // files that are being modified, but doesn't apply the patch // --stat does just a diffstat, and doesn't actually apply +// --show-index-info shows the old and new index info for... ... Upon receiving such a patch, if the patch did not apply cleanly to the target tree, the recipient can try to find the matching old objects in her object database and create a temporary tree, apply the patch to that temporary tree, and attempt a 3-way merge between the patched temporary tree and the target tree using the original temporary tree as the common ancestor. The patch lifts the code to compute the hash for an on-filesystem object from update-index.c and makes it available to the diff output routine. Signed-off-by: Junio C Hamano <junkio@cox.net>
* Consolidate null_sha1[].Junio C Hamano2005-09-30
| | | | Signed-off-by: Junio C Hamano <junio@twinsun.com>
* Diff: --name-status output format.Junio C Hamano2005-09-24
| | | | | | The new output format shows only the status letter and paths. Signed-off-by: Junio C Hamano <junkio@cox.net>
* Diff: -l<num> to limit rename/copy detection.Junio C Hamano2005-09-24
| | | | | | | | When many paths are modified, rename detection takes a lot of time. The new option -l<num> can be used to disable rename detection when more than <num> paths are possibly created as renames. Signed-off-by: Junio C Hamano <junkio@cox.net>
* Diff clean-up.Junio C Hamano2005-09-24
| | | | | | | This is a long overdue clean-up to the code for parsing and passing diff options. It also tightens some constness issues. Signed-off-by: Junio C Hamano <junkio@cox.net>
* Retire diff-helper.Junio C Hamano2005-09-22
| | | | | | | The textual diff generation with built-in '-p' in diff-* brothers has proven to be useful enough that git-diff-helper outlived its usefulness. Signed-off-by: Junio C Hamano <junkio@cox.net>
* Plug diff leaks.Junio C Hamano2005-09-15
| | | | | | | | | | | | | | | | It is a bit embarrassing that it took this long for a fix since the problem was first reported on Aug 13th. Message-ID: <87y876gl1r.wl@mail2.atmark-techno.com> From: Yasushi SHOJI <yashi@atmark-techno.com> Newsgroups: gmane.comp.version-control.git Subject: [patch] possible memory leak in diff.c::diff_free_filepair() Date: Sat, 13 Aug 2005 19:58:56 +0900 This time I used valgrind to make sure that it does not overeagerly discard memory that is still being used. Signed-off-by: Junio C Hamano <junkio@cox.net>
* Revert "[PATCH] plug memory leak in diff.c::diff_free_filepair()"Junio C Hamano2005-09-14
| | | | This reverts 068eac91ce04b9aca163acb1927c3878c45d1a07 commit.
* [PATCH] Fix alloc_filespec() initializationLinus Torvalds2005-09-14
| | | | | | | | | | This simplifies and fixes the initialization of a "diff_filespec" when allocated. The old code would not initialize "sha1_valid". Noticed by valgrind. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Fix copy marking from diffcore-rename.Junio C Hamano2005-09-10
| | | | | | | | | | | When (A,B) ==> (B,C) rename-copy was detected, we incorrectly said that C was created by copying B. This is because we only check if the path of rename/copy source still exists in the resulting tree to see if the file is renamed out of existence. In this case, the new B is created by copying or renaming A, so the original B is lost and we should say C is a rename of B not a copy of B. Signed-off-by: Junio C Hamano <junkio@cox.net>
* Retire support for old environment variables.Junio C Hamano2005-09-09
| | | | | | | | | | We have deprecated the old environment variable names for quite a while and now it's time to remove them. Gone are: SHA1_FILE_DIRECTORIES AUTHOR_DATE AUTHOR_EMAIL AUTHOR_NAME COMMIT_AUTHOR_EMAIL COMMIT_AUTHOR_NAME SHA1_FILE_DIRECTORY Signed-off-by: Junio C Hamano <junkio@cox.net>
* Fix compilation warnings.Junio C Hamano2005-08-29
| | | | | | ... found by compiling them with gcc 2.95. Signed-off-by: Junio C Hamano <junkio@cox.net>
* Fix ?: statements.Jason Riedy2005-08-23
| | | | | | | | | Omitting the first branch in ?: is a GNU extension. Cute, but not supported by other compilers. Replaced mostly by explicit tests. Calls to getenv() simply are repeated on non-GNU compilers. Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu>
* [PATCH] possible memory leak in diff.c::diff_free_filepair()Yasushi SHOJI2005-08-21
| | | | Here is a patch to fix the problem in the simplest way.
* [PATCH] plug memory leak in diff.c::diff_free_filepair()Yasushi SHOJI2005-08-13
| | | | | | | | | | | | | | | When I run git-diff-tree on big change, it seems the command eats so much memory. so I just put git under valgrind to see what's going on. diff_free_filespec_data() doesn't free diff_filespec itself. [jc: I ended up doing things slightly differently from Yasushi's patch. The original idea was to use free_filespec_data() only to free the data portion and keep useing the filespec itself, but no existing code seems to do things that way, so I just yanked that part out.] Signed-off-by: Yasushi SHOJI <yashi@atmark-techno.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
* A bit more format warning squelching.Junio C Hamano2005-08-09
| | | | | | | | Inspired by patch from Timo Sirainen. Most of them are not strictly necessary but making warnings less chatty would help spot real bugs later. Signed-off-by: Junio C Hamano <junkio@cox.net>
* [PATCH] git: use git_mkstemp() instead of mkstemp() for diff generation.Holger Eitzenberger2005-08-05
| | | | | | | This lets you run git diff in a repository otherwise read-only to you. Signed-off-by: Junio C Hamano <junkio@cox.net>
* [PATCH] mmap error handlingPavel Roskin2005-07-29
| | | | | | | | | | | | | | | | | | I have reviewed all occurrences of mmap() in git and fixed three types of errors/defects: 1) The result is not checked. 2) The file descriptor is closed if mmap() succeeds, but not when it fails. 3) Various casts applied to -1 are used instead of MAP_FAILED, which is specifically defined to check mmap() return value. [jc: This is a second round of Pavel's patch. He fixed up the problem that close() potentially clobbering the errno from mmap, which the first round had.] Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Use symbolic constants for diff-raw status indicators.Junio C Hamano2005-07-25
| | | | | | | | | Both Cogito and StGIT prefer to see 'A' for new files. The current 'N' is visually harder to distinguish from 'M', which is used for modified files. Prepare the internals to use symbolic constants to make the change easier. Signed-off-by: Junio C Hamano <junkio@cox.net>
* Split up "diff_format" into "format" and "line_termination".Linus Torvalds2005-07-14
| | | | | | | | | | | This removes the separate "formats" for name and name-with-zero- termination. It also removes the difference between HUMAN and MACHINE formats, and they both become DIFF_FORMAT_RAW, with the difference being just in the line and inter-filename termination. It also makes the code easier to understand.
* [PATCH] git-diff-*: --name-only and --name-only-z.Junio C Hamano2005-07-13
| | | | | | | | | | | Porcelain layers often want to find only names of changed files, and even with diff-raw output format they end up having to pick out only the filename. Support --name-only (and --name-only-z for xargs -0 and cpio -0 users that want to treat filenames with embedded newlines sanely) flag to help them. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Make sq_expand() available as sq_quote().Junio C Hamano2005-07-08
| | | | | | | | | A useful shell safety helper sq_expand() was hidden as a static function in diff.c. Extract it out and make it available as sq_quote(). Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Enhance sha1_file_size() into sha1_object_info()Junio C Hamano2005-06-27
| | | | | | | | This lets us eliminate one use of map_sha1_file() outside sha1_file.c, to bring us one step closer to the packed GIT. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Rework -B output.Junio C Hamano2005-06-19
| | | | | | | | | | | | | | | | Patch for a completely rewritten file detected by the -B flag was shown as a pair of creation followed by deletion in earlier versions. This was an misguided attempt to make reviewing such a complete rewrite easier, and unnecessarily ended up confusing git-apply. Instead, show the entire contents of old version prefixed with '-', followed by the entire contents of new version prefixed with '+'. This gives the same easy-to-review for human consumer while keeping it a single, regular modification patch for machine consumption, something that even GNU patch can grok. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Add --diff-filter= output restriction to diff-* family.Junio C Hamano2005-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a halfway between debugging aid and a helper to write an ultra-smart merge scripts. The new option takes a string that consists of a list of "status" letters, and limits the diff output to only those classes of changes, with two exceptions: - A broken pair (aka "complete rewrite"), does not match D (deleted) or N (created). Use B to look for them. - The letter "A" in the diff-filter string does not match anything itself, but causes the entire diff that contains selected patches to be output (this behaviour is similar to that of --pickaxe-all for the -S option). For example, $ git-rev-list HEAD | git-diff-tree --stdin -s -v -B -C --diff-filter=BCR shows a list of commits that have complete rewrite, copy, or rename. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix rename/copy when dealing with temporarily broken pairs.Junio C Hamano2005-06-12
| | | | | | | | | | | When rename/copy uses a file that was broken by diffcore-break as the source, and the broken filepair gets merged back later, the output was mislabeled as a rename. In this case, the source file ends up staying in the output, so we should label it as a copy instead. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Re-Fix SIGSEGV on unmerged files in git-diff-files -pJunio C Hamano2005-06-12
| | | | | | | | | | When an unmerged path was fed via diff_unmerged() into diffcore, it eventually called run_diff() with "one" and "two" parameters with NULL, but run_diff() was not written carefully enough to notice this situation. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* diff 'rename' format change.Linus Torvalds2005-06-05
| | | | | | | | | Clearly even Junio felt git "rename" header lines should say "from/to" instead of "old/new", since he wrote the documentation that way. This way it also matches "copy". git-apply will accept both versions, at least for a while.
* [PATCH] diff.c: -B argument passing fix.Junio C Hamano2005-06-05
| | | | | | | | | This fixes a bug that was preventing non-default parameter to -B option to be passed correctly; you could not give more than 50% break score. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] diff.c: locate_size_cache() fix.Junio C Hamano2005-06-05
| | | | | | | | | | | | | | | | This fixes two bugs. - declaration of auto variable "cmp" was preceeded by a statement, causing compilation error on real C compilers; noticed and patch given by Yoichi Yuasa. - the function's calling convention was overloading its size parameter to mean "largest possible value means do not add entry", which was a bad taste. Brought up during a discussion with Peter Baudis. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] diff: Update -B heuristics.Junio C Hamano2005-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Linus pointed out on the mailing list discussion, -B should break a files that has many inserts even if it still keeps enough of the original contents, so that the broken pieces can later be matched with other files by -M or -C. However, if such a broken pair does not get picked up by -M or -C, we would want to apply different criteria; namely, regardless of the amount of new material in the result, the determination of "rewrite" should be done by looking at the amount of original material still left in the result. If you still have the original 97 lines from a 100-line document, it does not matter if you add your own 13 lines to make a 110-line document, or if you add 903 lines to make a 1000-line document. It is not a rewrite but an in-place edit. On the other hand, if you did lose 97 lines from the original, it does not matter if you added 27 lines to make a 30-line document or if you added 997 lines to make a 1000-line document. You did a complete rewrite in either case. This patch introduces a post-processing phase that runs after diffcore-rename matches up broken pairs diffcore-break creates. The purpose of this post-processing is to pick up these broken pieces and merge them back into in-place modifications. For this, the score parameter -B option takes is changed into a pair of numbers, and it takes "-B99/80" format when fully spelled out. The first number is the minimum amount of "edit" (same definition as what diffcore-rename uses, which is "sum of deletion and insertion") that a modification needs to have to be broken, and the second number is the minimum amount of "delete" a surviving broken pair must have to avoid being merged back together. It can be abbreviated to "-B" to use default for both, "-B9" or "-B9/" to use 90% for "edit" but default (80%) for merge avoidance, or "-B/75" to use default (99%) "edit" and 75% for merge avoidance. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] diff: Clean up diff_scoreopt_parse().Junio C Hamano2005-06-03
| | | | | | | | | | | This cleans up diff_scoreopt_parse() function that is used to parse the fractional notation -B, -C and -M option takes. The callers are modified to check for errors and complain. Earlier they silently ignored malformed input and falled back on the default. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Find size of SHA1 object without inflating everything.Junio C Hamano2005-06-02
| | | | | | | | | This adds sha1_file_size() helper function and uses it in the rename/copy similarity estimator. The helper function handles deltified object as well. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] diff: mode bits fixesJunio C Hamano2005-06-01
| | | | | | | | | | | | | | | | | | | The core GIT repository has trees that record regular file mode in 0664 instead of normalized 0644 pattern. Comparing such a tree with another tree that records the same file in 0644 pattern without content changes with git-diff-tree causes it to feed otherwise unmodified pairs to the diff_change() routine, which triggers a sanity check routine and barfs. This patch fixes the problem, along with the fix to another caller that uses unnormalized mode bits to call diff_change() routine in a similar way. Without this patch, you will see "fatal error" from diff-tree when you run git-deltafy-script on the core GIT repository itself. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Show dissimilarity index for D and N case.Junio C Hamano2005-05-30
| | | | | | | | | | | | The way broken deletes and creates are shown in the -p (diff-patch) output format has become consistent with how rename/copy edits are shown. They will show "dissimilarity index" value, immediately following the "deleted file mode" and "new file mode" lines. The git-apply is taught to grok such an extended header. Signed-off-by: Junio C Hamano <junkio@cox.net>
* [PATCH] Add -O<orderfile> option to diff-* brothers.Junio C Hamano2005-05-30
| | | | | | | | | | | | | | | | | | | | A new diffcore filter diffcore-order is introduced. This takes a text file each of whose line is a shell glob pattern. Patches that match a glob pattern on an earlier line in the file are output before patches that match a later line, and patches that do not match any glob pattern are output last. A typical orderfile for git project probably should look like this: README Makefile Documentation *.h *.c Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Add -B flag to diff-* brothers.Junio C Hamano2005-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new diffcore transformation, diffcore-break.c, is introduced. When the -B flag is given, a patch that represents a complete rewrite is broken into a deletion followed by a creation. This makes it easier to review such a complete rewrite patch. The -B flag takes the same syntax as the -M and -C flags to specify the minimum amount of non-source material the resulting file needs to have to be considered a complete rewrite, and defaults to 99% if not specified. As the new test t4008-diff-break-rewrite.sh demonstrates, if a file is a complete rewrite, it is broken into a delete/create pair, which can further be subjected to the usual rename detection if -M or -C is used. For example, if file0 gets completely rewritten to make it as if it were rather based on file1 which itself disappeared, the following happens: The original change looks like this: file0 --> file0' (quite different from file0) file1 --> /dev/null After diffcore-break runs, it would become this: file0 --> /dev/null /dev/null --> file0' file1 --> /dev/null Then diffcore-rename matches them up: file1 --> file0' The internal score values are finer grained now. Earlier maximum of 10000 has been raised to 60000; there is no user visible changes but there is no reason to waste available bits. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] diff: fix the culling of unneeded delete record.Junio C Hamano2005-05-30
| | | | | | | | | | | | | | The commit 15d061b435a7e3b6bead39df3889f4af78c4b00a [PATCH] Fix the way diffcore-rename records unremoved source. still leaves unneeded delete records in its output stream by mistake, which was covered up by having an extra check to turn such a delete into a no-op downstream. Fix the check in the diffcore-rename to simplify the output routine. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] diff: further cleanup.Junio C Hamano2005-05-30
| | | | | | | | | | | | | | | | | When preparing data to feed the external diff, we should give the mode we obtained from the caller, even when we are dealing with a file with 0{40} SHA1 (i.e. the caller said "look at the filesystem"), since the mode passed by the caller via diff_addremove() or diff_change() is always trustworthy. This is _not_ a bugfix --- the existing code stat() on the file ifself and does the same computation on st.st_mode to compute the mode the same way the caller did to give the original mode. We cannot remove the stat() call from here, but the extra computation to create the mode value is unnecessary. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] diff: code clean-up and removal of rename hack.Junio C Hamano2005-05-30
| | | | | | | | | | | A new macro, DIFF_PAIR_RENAME(), is introduced to distinguish a filepair that is a rename/copy (the definition of which is src and dst are different paths, of course). This removes the hack used in the record_rename_pair() to always put a non-zero value in the score field. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] diff: consolidate various calls into diffcore.Junio C Hamano2005-05-30
| | | | | | | | | | | The three diff-* brothers had a sequence of calls into diffcore that were almost identical. Introduce a new diffcore_std() function that takes all the necessary arguments to consolidate it. This will make later enhancements and changing the order of diffcore application simpler. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Optimize diff-tree -[CM] --stdinJunio C Hamano2005-05-29
| | | | | | | | | | | | | | | | | | | | | | | | This attempts to optimize "diff-tree -[CM] --stdin", which compares successible tree pairs. This optimization does not make much sense for other commands in the diff-* brothers. When reading from --stdin and using rename/copy detection, the patch makes diff-tree to read the current index file first. This is done to reuse the optimization used by diff-cache in the non-cached case. Similarity estimator can avoid expanding a blob if the index says what is in the work tree has an exact copy of that blob already expanded. Another optimization the patch makes is to check only file sizes first to terminate similarity estimation early. In order for this to work, it needs a way to tell the size of the blob without expanding it. Since an obvious way of doing it, which is to keep all the blobs previously used in the memory, is too costly, it does so by keeping the filesize for each object it has already seen in memory. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>