org/kernel/git.git - forked from git/git

	Commit message (Collapse)	Author	Age
*	Merge branch 'nd/stream-pack-objects'	Junio C Hamano	2012-06-28
\|\ \| \| \| \| \| \| \| \|	"pack-objects" learned to read large loose blobs using the streaming API, without the need to hold everything in core at once.
\| *	pack-objects: use streaming interface for reading large loose blobs	Nguyễn Thái Ngọc Duy	2012-05-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git usually streams large blobs directly to packs. But there are cases where git can create large loose blobs (unpack-objects or hash-object over pipe). Or they can come from other git implementations. core.bigfilethreshold can also be lowered down and introduce a new wave of large loose blobs. Use streaming interface to read/compress/write these blobs in one go. Fall back to normal way if somehow streaming interface cannot be used. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* \|	index-pack: use streaming interface on large blobs (most of the time)	Nguyễn Thái Ngọc Duy	2012-05-23
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
*	archive-zip: streaming for deflated files	René Scharfe	2012-05-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After an entry has been streamed out, its CRC and sizes are written as part of a data descriptor. For simplicity, we make the buffer for the compressed chunks twice as big as for the uncompressed ones, to be sure the result fit in even if deflate makes them bigger. t5000 verifies output. t1050 makes sure the command always respects core.bigfilethreshold Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
*	archive-zip: streaming for stored files	René Scharfe	2012-05-03
\| \| \| \| \| \| \| \| \| \| \| \| \|	Write a data descriptor containing the CRC of the entry and its sizes after streaming it out. For simplicity, do that only if we're storing files (option -0) for now. t5000 verifies output. t1050 makes sure the command always respects core.bigfilethreshold Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
*	archive-tar: stream large blobs to tar file	Nguyễn Thái Ngọc Duy	2012-05-03
\| \| \| \| \| \| \| \|	t5000 verifies output while t1050 makes sure the command always respects core.bigfilethreshold Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
*	update-server-info: respect core.bigfilethreshold	Nguyễn Thái Ngọc Duy	2012-03-07
\| \| \| \| \| \| \| \| \| \|	This command indirectly calls check_sha1_signature() (add_info_ref -> deref_tag -> parse_object -> ..) , which may put whole blob in memory if the blob's size is under core.bigfilethreshold. As config is not read, the threshold is always 512MB. Respect user settings here. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
*	show: use streaming API for showing blobs	Nguyễn Thái Ngọc Duy	2012-03-07
\| \| \| \| \|	Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
*	cat-file: use streaming API to print blobs	Nguyễn Thái Ngọc Duy	2012-03-07
\| \| \| \| \|	Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
*	Add more large blob test cases	Nguyễn Thái Ngọc Duy	2012-03-07
\| \| \| \| \| \| \| \| \| \| \| \| \|	New test cases list commands that should work when memory is limited. All memory allocation functions () learn to reject any allocation larger than $GIT_ALLOC_LIMIT if set. () Not exactly all. Some places do not use x* functions, but malloc/calloc directly, notably diff-delta. These code path should never be run on large blobs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
*	bulk-checkin: replace fast-import based implementation	Junio C Hamano	2011-12-01
\| \| \| \| \| \| \| \| \|	This extends the earlier approach to stream a large file directly from the filesystem to its own packfile, and allows "git add" to send large files directly into a single pack. Older code used to spawn fast-import, but the new bulk-checkin API replaces it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
*	Bigfile: teach "git add" to send a large file straight to a pack	Junio C Hamano	2011-05-13
	When adding a new content to the repository, we have always slurped the blob in its entirety in-core first, and computed the object name and compressed it into a loose object file. Handling large binary files (e.g. video and audio asset for games) has been problematic because of this design. At the middle level of "git add" callchain is an internal API index_fd() that takes an open file descriptor to read from the working tree file being added with its size. Teach it to call out to fast-import when adding a large blob. The write-out codepath in entry.c::write_entry() should be taught to stream, instead of reading everything in core. This should not be so hard to implement, especially if we limit ourselves only to loose object files and non-delta representation in packfiles. Signed-off-by: Junio C Hamano <gitster@pobox.com>