logmsg_reencode: lazily load missing commit buffers

Usually a commit that makes it to logmsg_reencode will have been parsed, and the commit->buffer struct member will be valid. However, some code paths will free commit buffers after having used them (for example, the log traversal machinery will do so to keep memory usage down). Most of the time this is fine; log should only show a commit once, and then exits. However, there are some code paths where this does not work. At least two are known: 1. A commit may be shown as part of a regular ref, and then it may be shown again as part of a submodule diff (e.g., if a repo contains refs to both the superproject and subproject). 2. A notes-cache commit may be shown during "log --all", and then later used to access a textconv cache during a diff. Lazily loading in logmsg_reencode does not necessarily catch all such cases, but it should catch most of them. Users of the commit buffer tend to be either parsing for structure (in which they will call parse_commit, and either we will already have parsed, or we will load commit->buffer lazily there), or outputting (either to the user, or fetching a part of the commit message via format_commit_message). In the latter case, we should always be using logmsg_reencode anyway (and typically we do so via the pretty-print machinery). If there are any cases that this misses, we can fix them up to use logmsg_reencode (or handle them on a case-by-case basis if that is inappropriate). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Jeff King <peff@peff.net> 2013-01-26 04:44:28 -0500
committer: Junio C Hamano <gitster@pobox.com> 2013-01-26 13:28:22 -0800
commit: be5c9fb9049ed470e7005f159bb923a5f4de1309 (patch)
tree: 41dbf31b64bb82400bad0c20ec3cee0f1e3d9c90 /pretty.c
parent: dd0d388c44c28ebc021a24eeddc60287d4ea249c (diff)
download: git-be5c9fb9049ed470e7005f159bb923a5f4de1309.tar.gz
git-be5c9fb9049ed470e7005f159bb923a5f4de1309.tar.xz
1 files changed, 49 insertions, 8 deletions
diff --git a/pretty.c b/pretty.c
index c67534943..eae57ad9d 100644
--- a/pretty.c
+++ b/pretty.c
@@ -592,18 +592,59 @@ char *logmsg_reencode(const struct commit *commit,
 	char *msg = commit->buffer;
 	char *out;
 
+	if (!msg) {
+		enum object_type type;
+		unsigned long size;
+
+		msg = read_sha1_file(commit->object.sha1, &type, &size);
+		if (!msg)
+			die("Cannot read commit object %s",
+			    sha1_to_hex(commit->object.sha1));
+		if (type != OBJ_COMMIT)
+			die("Expected commit for '%s', got %s",
+			    sha1_to_hex(commit->object.sha1), typename(type));
+	}
+
 	if (!output_encoding || !*output_encoding)
 		return msg;
 	encoding = get_header(commit, msg, "encoding");
 	use_encoding = encoding ? encoding : utf8;
-	if (same_encoding(use_encoding, output_encoding))
-		if (encoding) /* we'll strip encoding header later */
-			out = xstrdup(commit->buffer);
-		else
-			return msg; /* nothing to do */
-	else
-		out = reencode_string(commit->buffer,
-				      output_encoding, use_encoding);
+	if (same_encoding(use_encoding, output_encoding)) {
+		/*
+		 * No encoding work to be done. If we have no encoding header
+		 * at all, then there's nothing to do, and we can return the
+		 * message verbatim (whether newly allocated or not).
+		 */
+		if (!encoding)
+			return msg;
+
+		/*
+		 * Otherwise, we still want to munge the encoding header in the
+		 * result, which will be done by modifying the buffer. If we
+		 * are using a fresh copy, we can reuse it. But if we are using
+		 * the cached copy from commit->buffer, we need to duplicate it
+		 * to avoid munging commit->buffer.
+		 */
+		out = msg;
+		if (out == commit->buffer)
+			out = xstrdup(out);
+	}
+	else {
+		/*
+		 * There's actual encoding work to do. Do the reencoding, which
+		 * still leaves the header to be replaced in the next step. At
+		 * this point, we are done with msg. If we allocated a fresh
+		 * copy, we can free it.
+		 */
+		out = reencode_string(msg, output_encoding, use_encoding);
+		if (out && msg != commit->buffer)
+			free(msg);
+	}
+
+	/*
+	 * This replacement actually consumes the buffer we hand it, so we do
+	 * not have to worry about freeing the old "out" here.
+	 */
 	if (out)
 		out = replace_encoding_header(out, output_encoding);
author	Jeff King <peff@peff.net>	2013-01-26 04:44:28 -0500
committer	Junio C Hamano <gitster@pobox.com>	2013-01-26 13:28:22 -0800
commit	be5c9fb9049ed470e7005f159bb923a5f4de1309 (patch)
tree	41dbf31b64bb82400bad0c20ec3cee0f1e3d9c90 /pretty.c
parent	dd0d388c44c28ebc021a24eeddc60287d4ea249c (diff)
download	git-be5c9fb9049ed470e7005f159bb923a5f4de1309.tar.gz git-be5c9fb9049ed470e7005f159bb923a5f4de1309.tar.xz