aboutsummaryrefslogtreecommitdiff
path: root/contrib/diff-highlight/README
diff options
context:
space:
mode:
authorJeff King <peff@peff.net>2012-02-13 17:37:33 -0500
committerJunio C Hamano <gitster@pobox.com>2012-02-13 15:57:07 -0800
commita0b676aaee29446388cd57fc555a740f9d26eea3 (patch)
treec425e36d6970dfdd17487878a286e4467b645423 /contrib/diff-highlight/README
parent34d9819e0a387be6d49cffe67458036450d6d0d5 (diff)
downloadgit-a0b676aaee29446388cd57fc555a740f9d26eea3.tar.gz
git-a0b676aaee29446388cd57fc555a740f9d26eea3.tar.xz
diff-highlight: document some non-optimal cases
The diff-highlight script works on heuristics, so it can be wrong. Let's document some of the wrong-ness in case somebody feels like working on it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'contrib/diff-highlight/README')
-rw-r--r--contrib/diff-highlight/README93
1 files changed, 93 insertions, 0 deletions
diff --git a/contrib/diff-highlight/README b/contrib/diff-highlight/README
index 4a5857977..502e03b30 100644
--- a/contrib/diff-highlight/README
+++ b/contrib/diff-highlight/README
@@ -57,3 +57,96 @@ following in your git configuration:
show = diff-highlight | less
diff = diff-highlight | less
---------------------------------------------
+
+Bugs
+----
+
+Because diff-highlight relies on heuristics to guess which parts of
+changes are important, there are some cases where the highlighting is
+more distracting than useful. Fortunately, these cases are rare in
+practice, and when they do occur, the worst case is simply a little
+extra highlighting. This section documents some cases known to be
+sub-optimal, in case somebody feels like working on improving the
+heuristics.
+
+1. Two changes on the same line get highlighted in a blob. For example,
+ highlighting:
+
+----------------------------------------------
+-foo(buf, size);
++foo(obj->buf, obj->size);
+----------------------------------------------
+
+ yields (where the inside of "+{}" would be highlighted):
+
+----------------------------------------------
+-foo(buf, size);
++foo(+{obj->buf, obj->}size);
+----------------------------------------------
+
+ whereas a more semantically meaningful output would be:
+
+----------------------------------------------
+-foo(buf, size);
++foo(+{obj->}buf, +{obj->}size);
+----------------------------------------------
+
+ Note that doing this right would probably involve a set of
+ content-specific boundary patterns, similar to word-diff. Otherwise
+ you get junk like:
+
+-----------------------------------------------------
+-this line has some -{i}nt-{ere}sti-{ng} text on it
++this line has some +{fa}nt+{a}sti+{c} text on it
+-----------------------------------------------------
+
+ which is less readable than the current output.
+
+2. The multi-line matching assumes that lines in the pre- and post-image
+ match by position. This is often the case, but can be fooled when a
+ line is removed from the top and a new one added at the bottom (or
+ vice versa). Unless the lines in the middle are also changed, diffs
+ will show this as two hunks, and it will not get highlighted at all
+ (which is good). But if the lines in the middle are changed, the
+ highlighting can be misleading. Here's a pathological case:
+
+-----------------------------------------------------
+-one
+-two
+-three
+-four
++two 2
++three 3
++four 4
++five 5
+-----------------------------------------------------
+
+ which gets highlighted as:
+
+-----------------------------------------------------
+-one
+-t-{wo}
+-three
+-f-{our}
++two 2
++t+{hree 3}
++four 4
++f+{ive 5}
+-----------------------------------------------------
+
+ because it matches "two" to "three 3", and so forth. It would be
+ nicer as:
+
+-----------------------------------------------------
+-one
+-two
+-three
+-four
++two +{2}
++three +{3}
++four +{4}
++five 5
+-----------------------------------------------------
+
+ which would probably involve pre-matching the lines into pairs
+ according to some heuristic.