aboutsummaryrefslogtreecommitdiff
path: root/xdiff
Commit message (Collapse)AuthorAge
* Merge branch 'dl/xdiff'Junio C Hamano2008-11-12
|\ | | | | | | | | * dl/xdiff: xdiff: give up scanning similar lines early
| * xdiff: give up scanning similar lines earlyDavide Libenzi2008-11-08
| | | | | | | | | | | | | | | | | | In a corner case of large files whose lines do not match uniquely, the loop to eliminate a line that matches multiple locations adjacent to a run of lines that do not uniquely match wasted too much cycles. Fix this by giving up early after scanning 100 lines in both direction. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Allow alternate "low-level" emit function from xdl_diffBrian Downing2008-10-25
|/ | | | | | | | | | | | | | | | | For some users (e.g. git blame), getting textual patch output is just extra work, as they can get all the information they need from the low- level diff structures. Allow for an alternate low-level emit function to be defined to allow bypassing the textual patch generation; set xemitconf_t's emit_func member to enable this. The (void (*)()) type is pretty ugly, but the alternative would be to include most of the private xdiff headers in xdiff.h to get the types required for the "proper" function prototype. Also, a (void *) won't work, as ANSI C doesn't allow a function pointer to be cast to an object pointer. Signed-off-by: Brian Downing <bdowning@lavos.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* xmerge.c: "diff3 -m" style clips merge reduction level to EAGER or lessJunio C Hamano2008-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When showing a conflicting merge result, and "--diff3 -m" style is asked for, this patch makes sure that the merge reduction level does not exceed XDL_MERGE_EAGER. This is because "diff3 -m" style output would not make sense for anything more aggressive than XDL_MERGE_EAGER, because of the way how the merge reduction works. "git merge-file" no longer has to force MERGE_EAGER when "--diff3" is asked for because of this change. Suppose a common ancestor (shared preimage) is modified to postimage #1 and #2 (each letter represents one line): ##### postimage#1: 1234ABCDE789 | / | / preimage: 123456789 | \ postimage#2: 1234AXYE789 #### XDL_MERGE_MINIMAL and XDL_MERGE_EAGER would: (1) find the s/56/ABCDE/ done on one side and s/56/AXYE/ done on the other side, (2) notice that they touch an overlapping area, and (3) mark it as a conflict, "ABCDE vs AXYE". The difference between the two algorithms is that EAGER drops the hunk altogether if the postimages match (i.e. both sides modified the same way), while MINIMAL keeps it. There is no other operation performed to the hunk. As the result, lines marked with "#" in the above picure will be in the RCS merge style output like this (letters <, = and > represent conflict marker lines): output: 1234<ABCDE=AXYE>789 ; with MINIMAL/EAGER The part from the preimage that corresponds to these conflicting changes is "56", which is what "diff3 -m" style output adds to it: output: 1234<ABCDE|56=AXYE>789 ; in "diff3 -m" style Now, XDL_MERGE_ZEALOUS looks at the differences between the changes two postimages made in order to reduce the number of lines in the conflicting regions. It notices that both sides start their new contents with "A", and excludes it from the output (it also excludes "E" for the same reason). The conflict that used to be "ABCDE vs AXYE" is now "BCD vs XY": output: 1234A<BCD=XY>E789 ; with ZEALOUS There could even be matching parts between two postimages in the middle. Instead of one side rewriting the shared "56" to "ABCDE" and the other side to "AXYE", imagine the case where the postimages are "ABCDE" and "AXCYE", in which case instead of having one conflicted hunk "BCD vs XY", you would have two conflicting hunks "B vs X" and "D vs Y". In either case, once you reduce "ABCDE vs AXYE" to "BCD vs XY" (or "ABCDE vs AXCYE" to "B vs X" and "D vs Y"), there is no part from the preimage that corresponds to the conflicting change made in both postimages anymore. In other words, conflict reduced by ZEALOUS algorithm cannot be expressed in "diff3 -m" style. Representing the last illustration like this is misleading to say the least: output: 1234A<BCD|56=XY>E789 ; broken "diff3 -m" style because the preimage was not ...4A56E... to begin with. "A" and "E" are common only between the postimages. Even worse, once a single conflicting hunk is split into multiple ones (recall the example of breaking "ABCDE vs AXCYE" to "B vs X" and "D vs Y"), there is no sane way to distribute the preimage text across split conflicting hunks. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* xmerge.c: minimum readability fixupsJunio C Hamano2008-08-30
| | | | | | | | This replaces hardcoded magic constants with symbolic ones for readability, and swaps one if/else blocks to better match the order in which 0/1/2 variables are handled to nearby codepath. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* xdiff-merge: optionally show conflicts in "diff3 -m" styleJunio C Hamano2008-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When showing conflicting merges, we traditionally followed RCS's merge output format. The output shows: <<<<<<< postimage from one side; ======= postimage of the other side; and >>>>>>> Some poeple find it easier to be able to understand what is going on when they can view the common ancestor's version, which is used by "diff3 -m", which shows: <<<<<<< postimage from one side; ||||||| shared preimage; ======= postimage of the other side; and >>>>>>> This is an initial step to bring that as an optional feature to git. Only "git merge-file" has been converted, with "--diff3" option. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* xdl_fill_merge_buffer(): separate out a too deeply nested functionJunio C Hamano2008-08-30
| | | | | | | This simply moves code around to make a separate function that prepares a single conflicted hunk with markers into the buffer. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* xdl_merge(): introduce XDL_MERGE_ZEALOUS_ALNUMJohannes Schindelin2008-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a merge conflicts, there are often common lines that are not really common, such as empty lines or lines containing a single curly bracket. With XDL_MERGE_ZEALOUS_ALNUM, we use the following heuristics: when a hunk does not contain any letters or digits, it is treated as conflicting. In other words, a conflict which used to look like this: <<<<<<< a = 1; ======= output(); >>>>>>> } } } <<<<<<< output(); ======= b = 1; >>>>>>> will look like this with ZEALOUS_ALNUM: <<<<<<< a = 1; } } } output(); ======= output(); } } } b = 1; >>>>>>> To demonstrate this, git-merge-file has been switched from XDL_MERGE_ZEALOUS to XDL_MERGE_ZEALOUS_ALNUM. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* xdl_merge(): make XDL_MERGE_ZEALOUS output simplerJohannes Schindelin2008-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a merge conflicts, there are often less than three common lines between two conflicting regions. Since a conflict takes up as many lines as are conflicting, plus three lines for the commit markers, the output will be shorter (and thus, simpler) in this case, if the common lines will be merged into the conflicting regions. This patch merges up to three common lines into the conflicts. For example, what looked like this before this patch: <<<<<<< if (a == 1) ======= if (a != 0) >>>>>>> { int i; <<<<<<< a = 0; ======= a = !a; >>>>>>> will now look like this: <<<<<<< if (a == 1) { int i; a = 0; ======= if (a != 0) { int i; a = !a; >>>>>>> Suggested Linus (based on ideas by "Voltage Spike" -- if that name is real, it is mighty cool). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Remove unreachable statementsGuido Ostkamp2007-11-15
| | | | | | | Solaris Workshop Compiler found a few unreachable statements. Signed-off-by: Guido Ostkamp <git@ostkamp.fastmail.fm> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Per-path attribute based hunk header selection.Junio C Hamano2007-07-06
| | | | | | | | | | | | | | | | | | | | | | | | This makes"diff -p" hunk headers customizable via gitattributes mechanism. It is based on Johannes's earlier patch that allowed to define a single regexp to be used for everything. The mechanism to arrive at the regexp that is used to define hunk header is the same as other use of gitattributes. You assign an attribute, funcname (because "diff -p" typically uses the name of the function the patch is about as the hunk header), a simple string value. This can be one of the names of built-in pattern (currently, "java" is defined) or a custom pattern name, to be looked up from the configuration file. (in .gitattributes) *.java funcname=java *.perl funcname=perl (in .git/config) [funcname] java = ... # ugly and complicated regexp to override the built-in one. perl = ... # another ugly and complicated regexp to define a new one. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Missing statics.Pierre Habouzit2007-06-08
| | | | | Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* War on whitespaceJunio C Hamano2007-06-07
| | | | | | | | | This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* xdiff/xutils.c(xdl_hash_record): factor out whitespace handlingJohannes Schindelin2007-03-19
| | | | | | | | | | Since in at least one use case, xdl_hash_record() takes over 15% of the CPU time, it makes sense to even micro-optimize it. For many cases, no whitespace special handling is needed, and in these cases we should not even bother to check for whitespace in _every_ iteration of the loop. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* teach diff machinery about --ignore-space-at-eolJohannes Schindelin2007-02-13
| | | | | | | | `git diff --ignore-space-at-eol` will ignore whitespace at the line ends. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Fix yet another subtle xdl_merge() bugJohannes Schindelin2006-12-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In very obscure cases, a merge can hit an unexpected code path (where the original code went as far as saying that this was a bug). This failing merge was noticed by Alexandre Juillard. The problem is that the original file contains something like this: -- snip -- two non-empty lines before two empty lines after two empty lines -- snap -- and this snippet is reduced to _one_ empty line in _both_ new files. However, it is ambiguous as to which hunk takes the empty line: the first or the second one? Indeed in Alexandre's example files, the xdiff algorithm attributes the empty line to the first hunk in one case, and to the second hunk in the other case. (Trimming down the example files _changes_ that behaviour!) Thus, the call to xdl_merge_cmp_lines() has no chance to realize that the change is actually identical in both new files. Therefore, xdl_refine_conflicts() finds an empty diff script, which was not expected there, because (the original author of xdl_merge() thought) xdl_merge_cmp_lines() would catch that case earlier. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* xdl_merge(): fix a segmentation fault when refining conflictsJohannes Schindelin2006-12-28
| | | | | | | | | | | | | | | The function xdl_refine_conflicts() tries to break down huge conflicts by doing a diff on the conflicting regions. However, this does not make sense when one side is empty. Worse, when one side is not only empty, but after EOF, the code accessed unmapped memory. Noticed by Luben Tuikov, Shawn Pearce and Alexandre Julliard, the latter providing a test case. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Merge branch 'master' into js/mergeJunio C Hamano2006-12-12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master: (42 commits) git-svn: correctly handle packed-refs in refs/remotes/ add test case for recursive merge git-svn: correctly display fatal() error messages git-svn: allow dcommit to take an alternate head git-svn: enable logging of information not supported by git Clarify fetch error for missing objects. Move Fink and Ports check to after config file shortlog: fix segfault on empty authorname shortlog: remove "[PATCH]" prefix from shortlog output Make sure the empty tree exists when needed in merge-recursive. Don't use memcpy when source and dest. buffers may overlap no need to install manpages as executable Documentation: simpler shared repository creation shortlog: fix segfault on empty authorname Add branch.*.merge warning and documentation update Fix perl/ build. git-svn: use do_switch for --follow-parent if the SVN library supports it Fix documentation copy&paste typo git-svn: extra error check to ensure we open a file correctly Documentation: update git-clone man page with new behavior ...
| * diff -b: ignore whitespace at end of lineJohannes Schindelin2006-12-04
| | | | | | | | | | | | | | | | This is _not_ the same as "treat eol as whitespace", since that would mean that multiple empty lines would be treated as equal to e.g. a space. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* | xdl_merge(): fix and simplify conflict handlingJohannes Schindelin2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Suppose you have changes in new1 to the original lines 10-20, and changes in new2 to the original lines 15-25, then the changes to 10-25 conflict. But it is possible that the next changes in new1 still overlap with this change to new2. So, in the next iteration we have to look at the same change to new2 again. The old code tried to be a bit too clever. The new code is shorter and more to the point: do not fiddle with the ranges at all. Also, xdl_append_merge() tries harder to combine conflicts. This is necessary, because with the above simplification, some conflicts would not be recognized as conflicts otherwise: In the above scenario, it is possible that there is no other change to new1. Absent the combine logic, the change in new2 would be recorded _again_, but as a non-conflict. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
* | xdl_merge(): fix thinkoJohannes Schindelin2006-12-02
| | | | | | | | | | | | | | | | | | If one side's block (of changed lines) ends later than the other side's block, the former should be tested against the next block of the other side, not vice versa. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* | xdl_merge(): fix an off-by-one bugJohannes Schindelin2006-12-02
| | | | | | | | | | | | | | The line range is i1 .. (i1 + chg1 - 1), not i1 .. (i1 + chg1). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* | xmerge: make return value from xdl_merge() more usable.Junio C Hamano2006-12-02
| | | | | | | | | | | | | | The callers would want to know if the resulting merge is clean; do not discard that information away after calling xdl_do_merge(). Signed-off-by: Junio C Hamano <junkio@cox.net>
* | xdiff: add xdl_merge()Johannes Schindelin2006-12-02
|/ | | | | | | | | | | | | | | | | | | | | | | | This new function implements the functionality of RCS merge, but in-memory. It returns < 0 on error, otherwise the number of conflicts. Finding the conflicting lines can be a very expensive task. You can control the eagerness of this algorithm: - a level value of 0 means that all overlapping changes are treated as conflicts, - a value of 1 means that if the overlapping changes are identical, it is not treated as a conflict. - If you set level to 2, overlapping changes will be analyzed, so that almost identical changes will not result in huge conflicts. Rather, only the conflicting lines will be shown inside conflict markers. With each increasing level, the algorithm gets slower, but more accurate. Note that the code for level 2 depends on the simple definition of mmfile_t specific to git, and therefore it will be harder to port that to LibXDiff. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Increase length of function name bufferAndy Parkins2006-11-23
| | | | | | | | | | | | | In xemit.c:xdl_emit_diff() a buffer for showing the function name as commentary is allocated; this buffer was 40 characters. This is a bit small; particularly for C++ function names where there is often an identical prefix (like void LongNamespace::LongClassName) on multiple functions, which makes the context the same everywhere. In other words the context is useless. This patch increases that buffer to 80 characters - which may still not be enough, but is better Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
* xdiff: Match GNU diff behaviour when deciding hunk comment worthiness of linesPetr Baudis2006-10-25
| | | | | | | | | | This removes the '#' and '(' tests and adds a '$' test instead although I have no idea what it is actually good for - but hey, if that's what GNU diff does... Pasky only went and did as Junio sayeth. Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <junkio@cox.net>
* xdiff/xemit.c (xdl_find_func): Elide trailing white space in a context header.Jim Meyering2006-10-23
| | | | | | | | | | | | | This removes trailing blanks from git-generated diff headers the same way a similar patch did that for GNU diff: http://article.gmane.org/gmane.comp.gnu.utils.bugs/13839 That is, it removes trailing blanks on the hunk header line that shows the function name. Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Merge branch 'maint'Junio C Hamano2006-10-16
|\ | | | | | | | | * maint: Fix hash function in xdiff library
| * Fix hash function in xdiff libraryv1.4.2.4Linus Torvalds2006-10-16
| | | | | | | | | | | | | | | | | | | | | | | | | | Jim Mayering noticed that xdiff library took insanely long time when comparing files with many identical lines. This was because the hash function used in the library is broken on 64-bit architectures and caused too many collisions. http://thread.gmane.org/gmane.comp.version-control.git/28962/focus=28994 Acked-by: Davide Libenzi <davidel@xmaliserver.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* | diff: fix 2 whitespace issuesJohannes Schindelin2006-10-12
|/ | | | | | | | | | | | | | | | When whitespace or whitespace change was ignored, the function xdl_recmatch() returned memcmp() style differences, which is wrong, since it should return 0 on non-match. Also, there were three horrible off-by-one bugs, even leading to wrong hashes in the whitespace special handling. The issue was noticed by Ray Lehtiniemi. For good measure, this commit adds a test. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Merge branch 'lt/merge-tree'Junio C Hamano2006-07-12
|\ | | | | | | | | | | | | * lt/merge-tree: Improved three-way blob merging code Prepare "git-merge-tree" for future work xdiff: generate "anti-diffs" aka what is common to two files
| * xdiff: generate "anti-diffs" aka what is common to two filesLinus Torvalds2006-06-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fairly trivial patch adds a new XDL_EMIT_xxx flag to tell libxdiff that we don't want to generate the _diff_ between two files, we want to see the lines that are _common_ to two files. So when you set XDL_EMIT_COMMON, xdl_diff() will do everything exactly like it used to do, but the output records it generates just contain the lines that aren't part of the diff. This is for doing things like generating the common base case for a file that was added in both branches. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* | Fix more typos, primarily in the codePavel Roskin2006-07-10
|/ | | | | | | | | The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Teach diff about -b and -w flagsJohannes Schindelin2006-06-23
| | | | | | | | | | | | This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to diff. The main part of the patch is teaching libxdiff about it. [jc: renamed xdl_line_match() to xdl_recmatch() since the former is used for different purposes in xpatchi.c which is in the parts of the upstream source we do not use.] Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* xdiff: minor changes to match libxdiff-0.21Junio C Hamano2006-06-19
| | | | | | | | | This reformats the change 621c53cc082299eaf69e9f2dc0274547c7d87fb0 introduced to match what upstream author implemented in libxdiff-0.21 without changing any logic (hopefully ;-). This is to help keep us in sync with the upstream. Signed-off-by: Junio C Hamano <junkio@cox.net>
* xdiff: post-process hunks to make them consistent.Davide Libenzi2006-04-13
|
* xdiff/xdiffi.c: fix warnings about possibly uninitialized variablesMarco Roeland2006-04-08
| | | | | | | | | | | | | | | | | | | | | Compiling this module gave the following warnings (some double dutch!): xdiff/xdiffi.c: In functie 'xdl_recs_cmp': xdiff/xdiffi.c:298: let op: 'spl.i1' may be used uninitialized in this function xdiff/xdiffi.c:298: let op: 'spl.i2' may be used uninitialized in this function xdiff/xdiffi.c:219: let op: 'fbest1' may be used uninitialized in this function xdiff/xdiffi.c:219: let op: 'bbest1' may be used uninitialized in this function A superficial tracking of their usage, without deeper knowledge about the algorithm, indeed confirms that there are code paths on which these variables will be used uninitialized. In practice these code paths might never be reached, but then these fixes will not change the algorithm. If these code paths are ever reached we now at least have a predictable outcome. And should the very small performance impact of these initializations be noticeable, then they should at least be replaced by comments why certain code paths will never be reached. Some extra initializations in this patch now fix the warnings.
* Clean-up trivially redundant diff.Davide Libenzi2006-04-04
| | | | | Also corrects the line numbers in unified output when using zero lines context.
* xdiff: Show function names in hunk headers.Mark Wooding2006-03-27
| | | | | | | | | | | | | | | | | | | | The speed of the built-in diff generator is nice; but the function names shown by `diff -p' are /really/ nice. And I hate having to choose. So, we hack xdiff to find the function names and print them. xdiff has grown a flag to say whether to dig up the function names. The builtin_diff function passes this flag unconditionally. I suppose it could parse GIT_DIFF_OPTS, but it doesn't at the moment. I've also reintroduced the `function name' into the test suite, from which it was removed in commit 3ce8f089. The function names are parsed by a particularly stupid algorithm at the moment: it just tries to find a line in the `old' file, from before the start of the hunk, whose first character looks plausible. Still, it's most definitely a start. Signed-off-by: Mark Wooding <mdw@distorted.org.uk> Signed-off-by: Junio C Hamano <junkio@cox.net>
* built-in diff: minimum tweaksJunio C Hamano2006-03-25
| | | | | | | | | | | | | | | | | | | This fixes up a couple of minor issues with the real built-in diff to be more usable: - Omit ---/+++ header unless we emit diff output; - Detect and punt binary diff like GNU does; - Honor GIT_DIFF_OPTS minimally (only -u<number> and --unified=<number> are currently supported); - Omit line count of 1 from "@@ -l,k +m,n @@" hunk header (i.e. when k == 1 or n == 1) - Adjust testsuite for the lack of -p support. Signed-off-by: Junio C Hamano <junkio@cox.net>
* builtin-diff: \No newline at end of file.Linus Torvalds2006-03-25
| | | | Signed-off-by: Junio C Hamano <junkio@cox.net>
* Use a *real* built-in diff generatorLinus Torvalds2006-03-25
This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>