diff-highlight: don't highlight whole lines

If you have a change like: -foo +bar we end up highlighting the entirety of both lines (since the whole thing is changed). But the point of diff highlighting is to pinpoint the specific change in a pair of lines that are mostly identical. In this case, the highlighting is just noise, since there is nothing to pinpoint, and we are better off doing nothing. The implementation looks for "interesting" pairs by checking to see whether they actually have a matching prefix or suffix that does not simply consist of colorization and whitespace. However, the implementation makes it easy to plug in other heuristics, too, like: 1. Depending on the source material, the set of "boring" characters could be tweaked to include language-specific stuff (like braces or semicolons for C). 2. Instead of saying "an interesting line has at least one character of prefix or suffix", we could require that less than N percent of the line be highlighted. The simple "ignore whitespace, and highlight if there are any matched characters" implemented by this patch seems to give good results on git.git. I'll leave experimentation with other heuristics to somebody who has a dataset that does not look good with the current code. Based on an original idea and implementation by Michał Kiedrowicz. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Jeff King <peff@peff.net> 2012-02-13 17:32:47 -0500
committer: Junio C Hamano <gitster@pobox.com> 2012-02-13 15:57:06 -0800
commit: 097128d1bce7194702f336e30c5e228aa8ba774f (patch)
tree: 2ca8a1f734ebc257394df4d79d16a41076ef6617 /contrib/diff-highlight
parent: 2b21008d3c43d41588e84b45907bb8c2e86278f6 (diff)
download: git-097128d1bce7194702f336e30c5e228aa8ba774f.tar.gz
git-097128d1bce7194702f336e30c5e228aa8ba774f.tar.xz
1 files changed, 26 insertions, 2 deletions
diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight
index c3302dd81..0d8df84a2 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -8,6 +8,7 @@ use strict;
 my $HIGHLIGHT   = "\x1b[7m";
 my $UNHIGHLIGHT = "\x1b[27m";
 my $COLOR = qr/\x1b\[[0-9;]*m/;
+my $BORING = qr/$COLOR|\s/;
 
 my @window;
 
@@ -104,8 +105,14 @@ sub show_pair {
 		}
 	}
 
-	print highlight(\@a, $pa, $sa);
-	print highlight(\@b, $pb, $sb);
+	if (is_pair_interesting(\@a, $pa, $sa, \@b, $pb, $sb)) {
+		print highlight(\@a, $pa, $sa);
+		print highlight(\@b, $pb, $sb);
+	}
+	else {
+		print join('', @a);
+		print join('', @b);
+	}
 }
 
 sub split_line {
@@ -125,3 +132,20 @@ sub highlight {
 		@{$line}[($suffix+1)..$#$line]
 	);
 }
+
+# Pairs are interesting to highlight only if we are going to end up
+# highlighting a subset (i.e., not the whole line). Otherwise, the highlighting
+# is just useless noise. We can detect this by finding either a matching prefix
+# or suffix (disregarding boring bits like whitespace and colorization).
+sub is_pair_interesting {
+	my ($a, $pa, $sa, $b, $pb, $sb) = @_;
+	my $prefix_a = join('', @$a[0..($pa-1)]);
+	my $prefix_b = join('', @$b[0..($pb-1)]);
+	my $suffix_a = join('', @$a[($sa+1)..$#$a]);
+	my $suffix_b = join('', @$b[($sb+1)..$#$b]);
+
+	return $prefix_a !~ /^$COLOR*-$BORING*$/ ||
+	       $prefix_b !~ /^$COLOR*\+$BORING*$/ ||
+	       $suffix_a !~ /^$BORING*$/ ||
+	       $suffix_b !~ /^$BORING*$/;
+}
author	Jeff King <peff@peff.net>	2012-02-13 17:32:47 -0500
committer	Junio C Hamano <gitster@pobox.com>	2012-02-13 15:57:06 -0800
commit	097128d1bce7194702f336e30c5e228aa8ba774f (patch)
tree	2ca8a1f734ebc257394df4d79d16a41076ef6617 /contrib/diff-highlight
parent	2b21008d3c43d41588e84b45907bb8c2e86278f6 (diff)
download	git-097128d1bce7194702f336e30c5e228aa8ba774f.tar.gz git-097128d1bce7194702f336e30c5e228aa8ba774f.tar.xz