diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2007-10-25 11:23:26 -0700 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2007-10-26 23:18:06 -0700 |
commit | 9027f53cb5051bf83a0254e7f8aeb5d1a206de0b (patch) | |
tree | 3ee3d48a47f89d7c41c5a9e4430f0db6507f4d9e /hash.h | |
parent | 644797119d7a3b7a043a51a9cccd8758f8451f91 (diff) | |
download | git-9027f53cb5051bf83a0254e7f8aeb5d1a206de0b.tar.gz git-9027f53cb5051bf83a0254e7f8aeb5d1a206de0b.tar.xz |
Do linear-time/space rename logic for exact renames
This implements a smarter rename detector for exact renames, which
rather than doing a pairwise comparison (time O(m*n)) will just hash the
files into a hash-table (size O(n+m)), and only do pairwise comparisons
to renames that have the same hash (time O(n+m) except for unrealistic
hash collissions, which we just cull aggressively).
Admittedly the exact rename case is not nearly as interesting as the
generic case, but it's an important case none-the-less. A similar general
approach should work for the generic case too, but even then you do need
to handle the exact renames/copies separately (to avoid the inevitable
added cost factor that comes from the _size_ of the file), so this is
worth doing.
In the expectation that we will indeed do the same hashing trick for the
general rename case, this code uses a generic hash-table implementation
that can be used for other things too. In fact, we might be able to
consolidate some of our existing hash tables with the new generic code
in hash.[ch].
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'hash.h')
-rw-r--r-- | hash.h | 43 |
1 files changed, 43 insertions, 0 deletions
@@ -0,0 +1,43 @@ +#ifndef HASH_H +#define HASH_H + +/* + * These are some simple generic hash table helper functions. + * Not necessarily suitable for all users, but good for things + * where you want to just keep track of a list of things, and + * have a good hash to use on them. + * + * It keeps the hash table at roughly 50-75% free, so the memory + * cost of the hash table itself is roughly + * + * 3 * 2*sizeof(void *) * nr_of_objects + * + * bytes. + * + * FIXME: on 64-bit architectures, we waste memory. It would be + * good to have just 32-bit pointers, requiring a special allocator + * for hashed entries or something. + */ +struct hash_table_entry { + unsigned int hash; + void *ptr; +}; + +struct hash_table { + unsigned int size, nr; + struct hash_table_entry *array; +}; + +extern void *lookup_hash(unsigned int hash, struct hash_table *table); +extern void **insert_hash(unsigned int hash, void *ptr, struct hash_table *table); +extern int for_each_hash(struct hash_table *table, int (*fn)(void *)); +extern void free_hash(struct hash_table *table); + +static inline void init_hash(struct hash_table *table) +{ + table->size = 0; + table->nr = 0; + table->array = NULL; +} + +#endif |