aboutsummaryrefslogtreecommitdiff
path: root/csum-file.c
Commit message (Collapse)AuthorAge
* pack-objects: use fixup_pack_header_footer()'s validation modeNicolas Pitre2008-08-29
| | | | | | | | | When limiting the pack size, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Make pack creation always fsync() the resultLinus Torvalds2008-05-31
| | | | | | | | | | This means that we can depend on packs always being stable on disk, simplifying a lot of the object serialization worries. And unlike loose objects, serializing pack creation IO isn't going to be a performance killer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* remove dead code from the csum-file interfaceNicolas Pitre2007-11-05
| | | | | | | | | | | | The provided name argument is always constant and valid in every caller's context, so no need to have an array of PATH_MAX chars to copy it into when a simple pointer will do. Unfortunately that means getting rid of wascally wabbits too. The 'error' field is also unused. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* make display of total transferred more accurateNicolas Pitre2007-11-05
| | | | | | | | | | | | | | | The throughput display needs a delay period before accounting and displaying anything. Yet it might be called after some amount of data has already been transferred. The display of total data is therefore accounted late and therefore smaller than the reality. Let's call display_throughput() with an absolute amount of transferred data instead of a relative number, and let the throughput code find the relative amount of data by itself as needed. This way the displayed total is always exact. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* add throughput display to git-pushNicolas Pitre2007-10-30
| | | | | | | | | This one triggers only when git-pack-objects is called with --all-progress and --stdout which is the combination used by git-push. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pack-objects.c: fix some global variable abuse and memory leaksNicolas Pitre2007-10-17
| | | | | | | | | | | To keep things well layered, sha1close() now returns the file descriptor when it doesn't close the file. An ugly cast was added to the return of write_idx_file() to avoid a warning. A proper fix will come separately. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* More staticJunio C Hamano2007-06-13
| | | | | | There still are quite a few symbols that ought to be static. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Alter sha1close() 3rd argument to request flush onlyDana L. How2007-05-20
| | | | | | | | | update=0 suppressed writing the final SHA-1 but was not used. Now final=0 suppresses SHA-1 finalization, SHA-1 writing, and closing -- in other words, only flush the buffer. Signed-off-by: Dana L. How <danahow@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Custom compression levels for objects and packsDana How2007-05-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add config variables pack.compression and core.loosecompression , and switch --compression=level to pack-objects. Loose objects will be compressed using core.loosecompression if set, else core.compression if set, else Z_BEST_SPEED. Packed objects will be compressed using --compression=level if seen, else pack.compression if set, else core.compression if set, else Z_DEFAULT_COMPRESSION. This is the "pack compression level". Loose objects added to a pack undeltified will be recompressed to the pack compression level if it is unequal to the current loose compression level by the preceding rules, or if the loose object was written while core.legacyheaders = true. Newly deltified loose objects are always compressed to the current pack compression level. Previously packed objects added to a pack are recompressed to the current pack compression level exactly when their deltification status changes, since the previous pack data cannot be reused. In either case, the --no-reuse-object switch from the first patch below will always force recompression to the current pack compression level, instead of assuming the pack compression level hasn't changed and pack data can be reused when possible. This applies on top of the following patches from Nicolas Pitre: [PATCH] allow for undeltified objects not to be reused [PATCH] make "repack -f" imply "pack-objects --no-reuse-object" Signed-off-by: Dana L. How <danahow@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
* compute a CRC32 for each object as stored in a packNicolas Pitre2007-04-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The most important optimization for performance when repacking is the ability to reuse data from a previous pack as is and bypass any delta or even SHA1 computation by simply copying the raw data from one pack to another directly. The problem with this is that any data corruption within a copied object would go unnoticed and the new (repacked) pack would be self-consistent with its own checksum despite containing a corrupted object. This is a real issue that already happened at least once in the past. In some attempt to prevent this, we validate the copied data by inflating it and making sure no error is signaled by zlib. But this is still not perfect as a significant portion of a pack content is made of object headers and references to delta base objects which are not deflated and therefore not validated when repacking actually making the pack data reuse still not as safe as it could be. Of course a full SHA1 validation could be performed, but that implies full data inflating and delta replaying which is extremely costly, which cost the data reuse optimization was designed to avoid in the first place. So the best solution to this is simply to store a CRC32 of the raw pack data for each object in the pack index. This way any object in a pack can be validated before being copied as is in another pack, including header and any other non deflated data. Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia: Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very short messages. He wrote "Briefly, the problem is that, for very short packets, Adler32 is guaranteed to give poor coverage of the available bits. Don't take my word for it, ask Mark Adler. :-)" The problem is that sum A does not wrap for short messages. The maximum value of A for a 128-byte message is 32640, which is below the value 65521 used by the modulo operation. An extended explanation can be found in RFC 3309, which mandates the use of CRC32 instead of Adler-32 for SCTP, the Stream Control Transmission Protocol. In the context of a GIT pack, we have lots of small objects, especially deltas, which are likely to be quite small and in a size range for which Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the possibility for recovery from certain types of small corruptions like single bit errors which are the most probable type of corruptions. OK what this patch does is to compute the CRC32 of each object written to a pack within pack-objects. It is not written to the index yet and it is obviously not validated when reusing pack data yet either. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Convert memcpy(a,b,20) to hashcpy(a,b).Shawn Pearce2006-08-23
| | | | | | | | | | | | | | | | | | | | | | | This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void*. This is a reasonable tradeoff as most call sites already use unsigned char* and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Make sha1flush void and remove conditional return.David Rientjes2006-08-14
| | | | | Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Make zlib compression level configurable, and change default.Joachim B Haga2006-07-03
| | | | | | | | | | | | | | | | With the change in default, "git add ." on kernel dir is about twice as fast as before, with only minimal (0.5%) change in object size. The speed difference is even more noticeable when committing large files, which is now up to 8 times faster. The configurability is through setting core.compression = [-1..9] which maps to the zlib constants; -1 is the default, 0 is no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. Signed-off-by: Joachim B Haga (cjhaga@fys.uio.no) Acked-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* Remove all void-pointer arithmetic.Florian Forster2006-06-20
| | | | | | | | ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* xread/xwrite: do not worry about EINTR at calling sites.Junio C Hamano2005-12-19
| | | | | | | | | | We had errno==EINTR check after read(2)/write(2) sprinkled all over the places, always doing continue. Consolidate them into xread()/xwrite() wrapper routines. Credits for suggestion goes to HPA -- bugs are mine. Signed-off-by: Junio C Hamano <junkio@cox.net>
* [PATCH] Plug memory leak in sha1close()Sergey Vlasov2005-08-08
| | | | | | | | sha1create() and sha1fd() malloc the returned struct sha1file; sha1close() should free it. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net>
* [PATCH] Let umask do its work upon filesystem object creation.Junio C Hamano2005-07-06
| | | | | | | | IIRC our strategy was to let the users' umask take care of the final mode bits. This patch fixes places that deviate from it. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* csum-file: add "sha1fd()" to create a SHA1 csum file from an existing file ↵Linus Torvalds2005-06-28
| | | | | | descriptor We'll use this soon to write pack-files to stdout.
* csum-file: fix missing buf pointer updateLinus Torvalds2005-06-27
| | | | This would create broken pack archives for anything nontrivial.
* csum-file interface updates: return resulting SHA1Linus Torvalds2005-06-26
| | | | | | | | | | | Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to.
* git-pack-objects: write the pack files with a SHA1 csumLinus Torvalds2005-06-26
We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index.