aboutsummaryrefslogtreecommitdiff
path: root/diffcore-pickaxe.c
Commit message (Collapse)AuthorAge
* pickaxe: use textconv for -S countingJeff King2012-10-28
| | | | | | | | | | | | | | | We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net>
* pickaxe: hoist empty needle checkJeff King2012-10-28
| | | | | | | | | | | If we are given an empty pickaxe needle like "git log -S ''", it is impossible for us to find anything (because no matter what the content, the count will always be 0). We currently check this at the lowest level of contains(). Let's hoist the logic much earlier to has_changes(), so that it is simpler to return our answer before loading any blob data. Signed-off-by: Jeff King <peff@peff.net>
* diff_grep: use textconv buffers for add/deleted filesJeff King2012-10-28
| | | | | | | | | | | | | | | | | | | | | | | If you use "-G" to grep a diff, we will apply a configured textconv filter to the data before generating the diff. However, if the diff is an addition or deletion, we do not bother running the diff at all, and just look for the token in the added (or removed) content. This works because we know that the diff must contain every line of content. However, while we used the textconv-derived buffers in the regular diff, we accidentally passed the original unmodified buffers to regexec when checking the added or removed content. This could lead to an incorrect answer. Worse, in some cases we might have a textconv buffer but no original buffer (e.g., if we pulled the textconv data from cache, or if we reused a working tree file when generating it). In that case, we could actually feed NULL to regexec and segfault. Reported-by: Peter Oberndorfer <kumbayo84@arcor.de> Signed-off-by: Jeff King <peff@peff.net>
* pickaxe: allow -i to search in patch case-insensitivelyJunio C Hamano2012-02-28
| | | | | | | | | | | | | | | | | | | "git log -S<string>" is a useful way to find the last commit in the codebase that touched the <string>. As it was designed to be used by a porcelain script to dig the history starting from a block of text that appear in the starting commit, it never had to look for anything but an exact match. When used by an end user who wants to look for the last commit that removed a string (e.g. name of a variable) that he vaguely remembers, however, it is useful to support case insensitive match. When given the "--regexp-ignore-case" (or "-i") option, which originally was designed to affect case sensitivity of the search done in the commit log part, e.g. "log --grep", the matches made with -S/-G pickaxe search is done case insensitively now. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pickaxe: factor out pickaxeRené Scharfe2011-10-07
| | | | | | | | Move the duplicate diff queue loop into its own function that accepts a match function: has_changes() for -S and diff_grep() for -G. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pickaxe: give diff_grep the same signature as has_changesRené Scharfe2011-10-07
| | | | | | | | | Change diff_grep() to match the signature of has_changes() as a preparation for the next patch that will use function pointers to the two. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pickaxe: pass diff_options to contains and has_changesRené Scharfe2011-10-07
| | | | | | | | | | | | | | | | Remove the unused parameter needle from contains() and has_changes(). Also replace the parameter len with a pointer to the diff_options. We can use its member pickaxe to check if the needle is an empty string and use the kwsmatch structure to find out the length of the match instead. This change is done as a preparation to unify the signatures of has_changes() and diff_grep(), which will be used in the patch after the next one to factor out common code. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pickaxe: factor out has_changesRené Scharfe2011-10-07
| | | | | | | Move duplicate if/else construct into its own helper function. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pickaxe: plug regex/kws leakRené Scharfe2011-10-07
| | | | | | | | | With -S... --pickaxe-all, free the regex or the kws before returning even if we found a match. Also get rid of the variable has_changes, as we can simply break out of the loop. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pickaxe: plug regex leakRené Scharfe2011-10-07
| | | | | | | | | With -G... --pickaxe-all, free the regex before returning even if we found a match. Also get rid of the variable has_changes, as we can simply break out of the loop. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pickaxe: plug diff filespec leak with empty needleRené Scharfe2011-10-07
| | | | | | | | Check first for the unlikely case of an empty needle string and only then populate the filespec, lest we leak it. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Use kwset in pickaxeFredrik Kuivinen2011-08-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* diffcore-pickaxe.c: a void function shouldn't try to return somethingBrandon Casey2010-10-06
| | | | | Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'maint'Junio C Hamano2010-10-06
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * maint: Documentation/git-clone: describe --mirror more verbosely do not depend on signed integer overflow work around buggy S_ISxxx(m) implementations xdiff: cast arguments for ctype functions to unsigned char init: plug tiny one-time memory leak diffcore-pickaxe.c: remove unnecessary curly braces t3020 (ls-files-error-unmatch): remove stray '1' from end of file setup: make sure git dir path is in a permanent buffer environment.c: remove unused variable git-svn: fix processing of decorated commit hashes git-svn: check_cherry_pick should exclude commits already in our history Documentation/git-svn: discourage "noMetadata"
| * diffcore-pickaxe.c: remove unnecessary curly bracesBrandon Casey2010-10-05
| | | | | | | | | | Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | git log/diff: add -G<regexp> that greps in the patch textJunio C Hamano2010-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | diff: pass the entire diff-options to diffcore_pickaxe()Junio C Hamano2010-08-31
|/ | | | | | | That would make it easier to give enhanced feature to the pickaxe transformation. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Add a macro DIFF_QUEUE_CLEAR.Bo Yang2010-05-07
| | | | | | | | Refactor the diff_queue_struct code, this macro help to reset the structure. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pickaxe: count regex matches only onceRené Scharfe2009-03-17
| | | | | | | | | | | | | | | | When --pickaxe-regex is used, forward past the end of matches instead of advancing to the byte after their start. This way matches count only once, even if the regular expression matches their tail -- like in the fixed-string fork of the code. E.g.: /.*/ used to count the number of bytes instead of the number of lines. /aa/ resulted in a count of two in "aaa" instead of one. Also document the fact that regexec() needs a NUL-terminated string as its second argument by adding an assert(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* diffcore-pickaxe: use memmem()René Scharfe2009-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use memmem() instead of open-coding it. The system libraries usually have a much faster version than the memcmp()-loop here. Even our own fall-back in compat/, which is used on Windows, is slightly faster. The following commands were run in a Linux kernel repository and timed, the best of five results is shown: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null On Ubuntu 8.10 x64, before (v1.6.2-rc2): 8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30952minor)pagefaults 0swaps And with the patch: 1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30645minor)pagefaults 0swaps On Fedora 10 x64, before: 8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+29268minor)pagefaults 0swaps And with the patch: 1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32253minor)pagefaults 0swaps On Windows Vista x64, before: real 0m9.204s user 0m0.000s sys 0m0.000s And with the patch: real 0m8.470s user 0m0.000s sys 0m0.000s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* War on whitespaceJunio C Hamano2007-06-07
| | | | | | | | | This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* diff -S: release the image after looking for needle in itJunio C Hamano2007-05-07
| | | | Signed-off-by: Junio C Hamano <junkio@cox.net>
* diffcore-pickaxe: fix infinite loop on zero-length needleJeff King2007-01-25
| | | | | | | | | | | | | | | | | | | | | | | The "contains" algorithm runs into an infinite loop if the needle string has zero length. The loop could be modified to handle this, but it makes more sense to simply have an empty needle return no matches. Thus, a command like git log -S produces no output. We place the check at the top of the function so that we get the same results with or without --pickaxe-regex. Note that until now, git log -S --pickaxe-regex would match everything, not nothing. Arguably, an empty pickaxe string should simply produce an error message; however, this is still a useful assertion to add to the algorithm at this layer of the code. Noticed by Bill Lear. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
* simplify inclusion of system header files.Junio C Hamano2006-12-20
| | | | | | | | | | | | | | | | | | | | This is a mechanical clean-up of the way *.c files include system header files. (1) sources under compat/, platform sha-1 implementations, and xdelta code are exempt from the following rules; (2) the first #include must be "git-compat-util.h" or one of our own header file that includes it first (e.g. config.h, builtin.h, pkt-line.h); (3) system headers that are included in "git-compat-util.h" need not be included in individual C source files. (4) "git-compat-util.h" does not have to include subsystem specific header files (e.g. expat.h). Signed-off-by: Junio C Hamano <junkio@cox.net>
* On some platforms, certain headers need to be included before regex.hJohannes Schindelin2006-04-04
| | | | | | | | Happily, these are already included in cache.h, which is included anyway... so: change the order of includes. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Support for pickaxe matching regular expressionsPetr Baudis2006-04-04
| | | | | | | | | | | | | git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz>
* [PATCH] diffcore-pickaxe: switch to "counting" behaviour.Junio C Hamano2005-07-23
| | | | | | | | | | | | | Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Do not include unused header files.Junio C Hamano2005-05-29
| | | | | | | | Some source files were including "delta.h" without actually needing it. Remove them. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Optimize diff-tree -[CM] --stdinJunio C Hamano2005-05-29
| | | | | | | | | | | | | | | | | | | | | | | | This attempts to optimize "diff-tree -[CM] --stdin", which compares successible tree pairs. This optimization does not make much sense for other commands in the diff-* brothers. When reading from --stdin and using rename/copy detection, the patch makes diff-tree to read the current index file first. This is done to reuse the optimization used by diff-cache in the non-cached case. Similarity estimator can avoid expanding a blob if the index says what is in the work tree has an exact copy of that blob already expanded. Another optimization the patch makes is to check only file sizes first to terminate similarity estimation early. In order for this to work, it needs a way to tell the size of the blob without expanding it. Since an obvious way of doing it, which is to keep all the blobs previously used in the memory, is too costly, it does so by keeping the filesize for each object it has already seen in memory. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Add --pickaxe-all to diff-* brothers.Junio C Hamano2005-05-29
| | | | | | | | | | When --pickaxe-all is given in addition to -S, pickaxe shows the entire diffs contained in the changeset, not just the diffs for the filepair that touched the sought-after string. This is useful to see the changes in context. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Introduce diff_free_filepair() funcion.Junio C Hamano2005-05-29
| | | | | | | | This introduces a new function to free a common data structure, and plugs some leaks. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Performance fix for pickaxe.Junio C Hamano2005-05-23
| | | | | | | | The pickaxe was expanding the blobs and searching in them even when it should have already known that both sides are the same. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Rename/copy detection fix.Junio C Hamano2005-05-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rename/copy detection logic in earlier round was only good enough to show patch output and discussion on the mailing list about the diff-raw format updates revealed many problems with it. This patch fixes all the ones known to me, without making things I want to do later impossible, mostly related to patch reordering. (1) Earlier rename/copy detector determined which one is rename and which one is copy too early, which made it impossible to later introduce diffcore transformers to reorder patches. This patch fixes it by moving that logic to the very end of the processing. (2) Earlier output routine diff_flush() was pruning all the "no-change" entries indiscriminatingly. This was done due to my false assumption that one of the requirements in the diff-raw output was not to show such an entry (which resulted in my incorrect comment about "diff-helper never being able to be equivalent to built-in diff driver"). My special thanks go to Linus for correcting me about this. When we produce diff-raw output, for the downstream to be able to tell renames from copies, sometimes it _is_ necessary to output "no-change" entries, and this patch adds diffcore_prune() function for doing it. (3) Earlier diff_filepair structure was trying to be not too specific about rename/copy operations, but the purpose of the structure was to record one or two paths, which _was_ indeed about rename/copy. This patch discards xfrm_msg field which was trying to be generic for this wrong reason, and introduces a couple of fields (rename_score and rename_rank) that are explicitly specific to rename/copy logic. One thing to note is that the information in a single diff_filepair structure _still_ does not distinguish renames from copies, and it is deliberately so. This is to allow patches to be reordered in later stages. (4) This patch also adds some tests about diff-raw format output and makes sure that necessary "no-change" entries appear on the output. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Diffcore updates.Junio C Hamano2005-05-22
| | | | | | | | | | | | | | This moves the path selection logic from individual programs to a new diffcore transformer (diff-tree still needs to have its own for performance reasons). Also the header printing code in diff-tree was tweaked not to produce anything when pickaxe is in effect and there is nothing interesting to report. An interesting example is the following in the GIT archive itself: $ git-whatchanged -p -C -S'or something in a real script' Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] The diff-raw format updates.Junio C Hamano2005-05-21
| | | | | | | | | | | | | | | | | Update the diff-raw format as Linus and I discussed, except that it does not use sequence of underscore '_' letters to express nonexistence. All '0' mode is used for that purpose instead. The new diff-raw format can express rename/copy, and the earlier restriction that -M and -C _must_ be used with the patch format output is no longer necessary. The patch makes -M and -C flags independent of -p flag, so you need to say git-whatchanged -M -p to get the diff/patch format. Updated are both documentations and tests. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Prepare diffcore interface for diff-tree header supression.Junio C Hamano2005-05-21
| | | | | | | | This does not actually supress the extra headers when pickaxe is used, but prepares enough support for diff-tree to implement it. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Introducing software archaeologist's tool "pickaxe".Junio C Hamano2005-05-21
This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD | ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>