#+TITLE: Git Resurrecting History #+DESCRIPTION: Resurrect Lost History from the Information Manager from Hell #+TAGS: Git #+TAGS: Tips and Tricks #+TAGS: How-to #+DATE: 2016-09-04 #+SLUG: git-resurrecting-history #+LINK: man-git-reflog https://www.kernel.org/pub/software/scm/git/docs/git-reflog.html #+LINK: git-scm-book-internals https://git-scm.com/book/en/v2/Git-Internals-Git-References #+LINK: man-git-fsck https://www.kernel.org/pub/software/scm/git/docs/git-fsck.html #+LINK: kb-git-in-reverse https://kennyballou.com/blog/2016/01/git-in-reverse/ #+LINK: man-git-show https://www.kernel.org/pub/software/scm/git/docs/git-show.html #+LINK: man-git-cat-file https://www.kernel.org/pub/software/scm/git/docs/git-cat-file.html #+LINK: man-git-reset https://www.kernel.org/pub/software/scm/git/docs/git-reset.html #+LINK: man-git-rebase https://www.kernel.org/pub/software/scm/git/docs/git-rebase.html #+LINK: xkcd-git https://xkcd.com/1597 #+BEGIN_PREVIEW We all make mistakes. They are inevitable. We must accept that we make them and move on. But making mistakes in Git seems to be overly complex to resolve and most will simply result to cloning anew and copying the working tree (or some subset) and moving on. This, to me, however, seems like a waste of bandwidth as most issues resulting in broken history are in fact quite easy to resolve, especially so once the necessary tools are known. #+END_PREVIEW ** Git Reflog :PROPERTIES: :CUSTOM_ID: git-reflog :END: #+BEGIN_QUOTE Reference logs or "reflogs", record when the tips of branches and other references were updated in the local repository. --[[man-git-reflog][~git-reflog(1)~]] #+END_QUOTE That is, the reference log is the (meta)log of the actions against branches (tips) and other [[git-scm-book-internals][references]]. Every time we commit, merge, change branches, or perform /any/ action that might alter the commit a reference points to, this change is stored in the reflog of the current repository. For a freshly cloned repository, the reflog will be quite boring, e.g., a single entry for the initial clone. However, after working on a project for a while, the reflog will have quite the history of actions performed. For example, here is the first 24 lines of the reflog for this blog's repository: #+BEGIN_EXAMPLE a1bbd00 HEAD@{0}: checkout: moving from master to git_resurrection a1bbd00 HEAD@{1}: commit: Update paths of SSL certificate and key d7fd8f8 HEAD@{2}: commit: Add all targets to phony f639cbe HEAD@{3}: commit: Add phony target list 8f3bba4 HEAD@{4}: commit: Add build to deploy dependency 5331695 HEAD@{5}: merge elixir_releases: Fast-forward 1a27df5 HEAD@{6}: checkout: moving from elixir_functional_fib to master 61f755b HEAD@{7}: checkout: moving from master to elixir_functional_fib 1a27df5 HEAD@{8}: checkout: moving from elixir_releases to master 5331695 HEAD@{9}: rebase -i (finish): returning to refs/heads/elixir_releases 5331695 HEAD@{10}: rebase -i (squash): Add Elixir OTP Releases Post 07f3995 HEAD@{11}: rebase -i (squash): # This is a combination of 4 commits. 9b7bc7b HEAD@{12}: rebase -i (squash): # This is a combination of 3 commits. 06414a7 HEAD@{13}: rebase -i (squash): # This is a combination of 2 commits. cb59962 HEAD@{14}: rebase -i (start): checkout HEAD~5 bf8836f HEAD@{15}: commit: WIP: elixir otp releases 34bc98a HEAD@{16}: commit: WIP: update ends 00fc016 HEAD@{17}: commit: WIP: elixir otp releases e859353 HEAD@{18}: commit: WIP: elixir otp release post cb59962 HEAD@{19}: commit: WIP: elixir otp releases post 1a27df5 HEAD@{20}: checkout: moving from master to elixir_releases 1a27df5 HEAD@{21}: checkout: moving from elixir_functional_fib to master 61f755b HEAD@{22}: commit: WIP: some post about fib 4137e6e HEAD@{23}: checkout: moving from master to elixir_functional_fib #+END_EXAMPLE The first column is the commit SHA-1 that is the /result/ of the action, the second column provides a shortcut reference that can be used anywhere a regular reference can be, the 3rd column is the action, e.g., ~checkout~, ~commit~, ~merge~, etc., and a short description of the action. In the case of commits, the description text will be the summary line of the commit message. From the reflog, we can see I've recently made a branch for this post, before that, I made several commits against the ~master~ branch, and before that, I performed a fast-forward merge of the local ~elixir_releases~ branch into the ~master~ branch. Etc. This is some pretty powerful information for digging into the history of the repository. The reflog is indispensable for working out how to recover lost changes. ** Git Fsck :PROPERTIES: :CUSTOM_ID: git-fsck :END: [[man-git-reflog][~git-reflog(1)~]] is a very useful tool, but, another way history can be lost is by becoming "unreachable". This is where [[man-git-fsck][~git-fsck(1)~]] can help! [[man-git-fsck][~git-fsck(1)~]] searches the Git object store, and will report objects that are dangling or unreachable from a named reference. This way, we can find commits, or even blobs, that have been lost to us because they do not exist in the directed acyclic graph (DAG) of Git, but /do/ exist in the object store itself. For example, running ~git fsck~ on this repository yields the following output: #+BEGIN_EXAMPLE ± git fsck Checking object directories: 100% (256/256), done. Checking objects: 100% (150/150), done. dangling commit 16f6063abde9dcd8279fb2a7ddd4998aaf44acc7 #+END_EXAMPLE Now, if we add another option, namely, ~--unreachable~, we get the following: #+BEGIN_EXAMPLE ± git fsck --unreachable unreachable blob 20c1e21948ab5d9553c11fa8a7230d73055c207e unreachable commit 16f6063abde9dcd8279fb2a7ddd4998aaf44acc7 unreachable commit 41a324739bc3f1d265ecc474c58256e3a4ad4982 unreachable blob c4131dc6d091b1c16943554fa2396f5d405e8537 #+END_EXAMPLE Furthermore, objects listed in the reflog are considered "reachable", but may be still eluding our search. Adding ~--no-reflogs~ to [[man-git-fsck][~git-fsck(1)~]] can help make these objects more visible: #+BEGIN_EXAMPLE ± git fsck --unreachable --no-reflogs unreachable commit 00fc0164a78fe6b46e56781d434fdbb893f11534 unreachable blob 18a484273f75e4a3dcac75cb5229a614f6090be0 unreachable commit 1cdc30ebd6ebbaba4a8c28fb35457a8d5cb4326f unreachable blob 27c4af632030e3d794181024fba120c6db44eef5 unreachable commit 31a0e98166bc48bf1f725a657e27632c99568da0 unreachable commit 34bc98ae27f3db69df82b186cf2ef8a86b42ea12 unreachable commit 8f08be163f185dd130a86d67daf61639632c4e20 unreachable commit bf8836f2e435ee241ebe53f0eae4ee98bd887082 unreachable commit 06414a75d58cee81fb2035b8af45a543c6bb09ef unreachable blob 1f853af2881919bc62321b536bfc0de6e9602db6 unreachable blob 20c1e21948ab5d9553c11fa8a7230d73055c207e unreachable commit 54cd8b9b5c58409ce3f509e74d5a7a7ac4a73309 unreachable commit a9693871e765355b6d9a57a612a76f454b177da0 unreachable commit ad45856329ff97bd35ac17325952c21e53d51b28 unreachable blob b8154e42d08b74ae6b9817e12b7764d55760c86e unreachable commit cb599620e2d364e2ab44ada45f16df05c5fe3f51 unreachable commit e859353ddc681177141d84a0053b9b8ecad1151e unreachable blob fed50bb1d7c749767de7589cc8ef0acf8caf8226 unreachable blob 056a7e48130d8d22227367ae9753cb5c9afe2d39 unreachable commit 16f6063abde9dcd8279fb2a7ddd4998aaf44acc7 unreachable commit 54def8ee3ea0c7043767185e5900480d24ddb351 unreachable commit 65d2a1553e3c1dd745afa318135a5957e50dd6ef unreachable commit 741afdc2f13e76bd0c48e1df7419b37e57733de3 unreachable commit 7bb6b449ced0493f2d3cc975157aefa84b082e04 unreachable commit 7e067ad694538a410f98732ce3052546aadc0240 unreachable commit 809e9d1f131f54701325357199643505773f5d25 unreachable blob 8802d6dcac8b14399ca4082987a76be4b179333c unreachable blob 8b82ffa1eb05ef3306ab62e1120f77a80a887d94 unreachable commit 9af67536e6852fe928934ba0950809597d73a173 unreachable blob b23eefdac6b2056e25c748679958179bdbd8f81f unreachable blob b66ef50f82242ec929141cf3246278c6160e230a unreachable blob c2fa5a98fe1010a1255f032ba34a612e404c7062 unreachable blob dd42939b3f6cf542064eb011b74749195c951957 unreachable commit 07f39952cd161438ff4b208b6cb10b287881db85 unreachable blob 1c0327c6a73923e932eb4f4bf877f660bd13a7b0 unreachable commit 41a324739bc3f1d265ecc474c58256e3a4ad4982 unreachable commit 74671b411e2cf1209bc681f0349e24ef7fe00f19 unreachable commit 9437cbb0500b22a57a62e2cf0a512b1b56ce6a96 unreachable commit 9a0f5f8c63c184cd5082f27dbe513b3e683bc1ad unreachable commit 9b7bc7bf0f01a84621e23bfa02e0a09f63da1747 unreachable commit bce7c8dbcc56e6935015a5fb2c74224bb8d9f768 unreachable blob c4131dc6d091b1c16943554fa2396f5d405e8537 unreachable blob c69782e19aee6d89de4f6bcf9ed14813f72c8c10 unreachable blob d79fb0b95796290c33d6f3dee004235dad7d8893 unreachable commit dabb01b3df1371602f3f0689d25359597db54423 unreachable blob ec2ba85be58685070a44727bc2591b9a32eb6457 #+END_EXAMPLE Using these hashes, one could inspect them using other [[kb-git-in-reverse][familiar tools]], namely, [[man-git-show][~git-show(1)~]] and [[man-git-cat-file][~git-cat-file(1)~]] to figure out if these are worth resurrecting or even are in fact the objects we want to resurrect. ** Resurrection Example :PROPERTIES: :CUSTOM_ID: resurrection-example :END: Now that we have some tools, let's examine a situation where a change to the history was made that needs to be corrected: deleting branch references. Let's assume we are working on a topic branch for some new awesome feature. However, after some developing, we discover this solution might not be worth pursuing anymore. In a fit of rage of our wasted effort, we dump the branch. Perhaps several days go by, and we discover we want to look back at something we did in that previous branch for some reason or another, but we certainly don't remember the commit hash of that branch. For concreteness, let's create a repository that will demonstrate this problem: #+BEGIN_EXAMPLE $ cd $(mktemp -d) $ git init foobar $ cd foobar ± touch foo ± git add foo ± git commit -m 'initial commit' ± touch bar ± git add bar ± git commit -m 'add bar' ± git log --oneline 1cf706a add bar 11d3501 initial commit #+END_EXAMPLE #+BEGIN_QUOTE I created this example repository in a temporary directory because it's not likely to be useful after the demonstration of this problem. Feel free to create the repository wherever you please, provided you are following along. #+END_QUOTE From here, we may decide to branch and start working on our epic topic branch: #+BEGIN_EXAMPLE ± git checkout -b topic/epic_feature ± echo 1 >> foo ± git commit -am 'update foo: add 1' ± echo 2 >> bar ± git commit -am 'update bar: add 2' ± touch foobar ± git add foobar ± git commit -m 'add foobar' ± git log --oneline 2e0bcc6 add foobar f2239ca update bar: add 2 32d8e6d update foo: add 1 1cf706a add bar 11d3501 initial commit #+END_EXAMPLE From here, we decide that the ~topic/epic_feature~ branch is going anywhere but the ~master~ branch. Therefore, we, swiftly, dump it into the ether: #+BEGIN_EXAMPLE ± git checkout master Switch to branch 'master' ± git branch -D topic/epic_foobar Deleted branch topic/epic_feature (was 2e0bcc6). #+END_EXAMPLE Several days pass, we perform other commits on other branches, merge them into ~master~, decide on some other things to work on. But eventually, we are reminded that our old ~topic/epic_feature~ branch had something similar to what we are doing now. It would be nice to recover it and its changes for examination. However, we likely lost the commit hash of the branch. *** Solution :PROPERTIES: :CUSTOM_ID: solution :END: If we take a quick look at our ~git-fsck~ output, we might see something that may lead us to our commit hash: #+BEGIN_EXAMPLE ± git fsck Checking object directories: 100% (256/256), done. #+END_EXAMPLE Well, that was less than helpful. What happened? Turns out, as mentioned above, ~git-fsck~ considers objects "reachable" if they are pointed to by a reference /or/ are in the reflog. Let's add the ~--no-reflogs~ flag: #+BEGIN_EXAMPLE ± git fsck --no-reflogs Checking object directories: 100% (256/256), done. dangling commit 2e0bcc62122f2d7bf895958ac8fed1ec05d4d904 #+END_EXAMPLE This looks more promising! Let's checkout this hash and inspect it: #+BEGIN_EXAMPLE ± git checkout 2e0bcc62122f2d7bf895958ac8fed1ec05d4d904 Note: checking out '2e0bcc62122f2d7bf895958ac8fed1ec05d4d904'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b HEAD is now at 2e0bcc6... add foobar ± git log --oneline 2e0bcc6 add foobar f2239ca update bar: add 2 32d8e6d update foo: add 1 1cf706a add bar 11d3501 initial commit #+END_EXAMPLE This indeed looks like the branch we created (several days) before. Git's interface, as a helpful reminder, explains to us how to (re)create this point as a reference (branch). It is, thus, our choice to examine the branch as-is, or recreate the reference for later inspection. ** Another Resurrection Example :PROPERTIES: :CUSTOM_ID: another-resurrection-example :END: For another example, let's examine when we create a branch and change the parent commit of the branch point. We will start with some commands that create and initialize the repository into an initial state, that is, before any mistakes are made: #+BEGIN_EXAMPLE $ cd $(mktemp -d) $ git init foobar $ cd foobar ± touch foo ± git add foo ± git commit -m 'initial commit' ± touch bar ± git add bar ± git commit -m 'add bar' ± echo 1 >> foo ± git commit -am 'update foo: add 1' ± git checkout -b topic/foobar ± echo 1 >> bar ± git commit -am 'update bar: add 1' #+END_EXAMPLE #+BEGIN_QUOTE Notice, again, I've created this repository in a temporary directory for my own system's tidyness. Futhermore, note ~mktemp -d~ will create a /different/ temporary directory. As such, the ~foobar~ project from this example and the previous example /will/ be different. #+END_QUOTE From here, our one line log should look similar to the following: #+BEGIN_EXAMPLE ± git log --oneline 3de2659 update bar: add 1 5e6dd5f update foo: add 1 9640abb add bar 31d2347 initial commit #+END_EXAMPLE Furthermore, here is an image that describes the state of the repository. #+ATTR_HTML: :align center #+HTML:
#+NAME: fig: Example Repository State 1 [[file:/media/git-repo-state-1.svg]] #+HTML:
Next, we will create a few more commits, but instead of doing things properly, we are going to (intentionally) make a mistake. We will merge our ~topic/foobar~ branch into ~master~, create a new file, ~foobar~, and create a branch, ~topic/bad~, from ~topic/foobar~. In the ~topic/bad~ branch, we will create some new commits, but then we will squash the /two previous/ commits. Let's begin issuing commands against our repository: #+BEGIN_EXAMPLE ± git checkout master ± git merge --ff-only topic/foobar ± touch foobar ± git add foobar ± git commit -m 'add foobar' ± git checkout -b topic/bad topic/foobar ± echo 2 >> foo ± git commit -am 'update foo: add 2' ± echo 2 >> bar ± git commit -am 'update bar: add 2' #+END_EXAMPLE Thusly, our repository should look similar to the following image: #+ATTR_HTML: :align center #+HTML:
#+NAME: fig: Example Repository State 2 [[file:/media/git-repo-state-2.svg]] #+HTML:
Now, for the mistake: #+BEGIN_EXAMPLE ± git rebase -i HEAD~3 (squash the previous commits) pick 3de26 squash 4babf squash 7647f #+END_EXAMPLE This should result in a repository that looks like the following: #+ATTR_HTML: :align center #+HTML:
#+NAME: fig: Example Repository State 3 [[file:/media/git-repo-state-3.svg]] #+HTML:
Assuming we didn't recognize the mistake, we might attempt to merge the branch: #+BEGIN_EXAMPLE ± git checkout master ± git merge --ff-only topic/bad fatal: Not possible to fast-forward, aborting. #+END_EXAMPLE Well, of course, the ~master~ branch is ahead by one commit, and the ~topic/bad~ branch is "behind" by two. We can see this be viewing the logs when going from ~master~ to ~topic/bad~ and then vice-versa: #+BEGIN_EXAMPLE ± git log --oneline master..topic/bad 3b71666 update bar: add 1 ± git log --oneline topic/bad..master 7387d60 add foobar 3de2659 update bar: add 1 #+END_EXAMPLE But another issue emerges from viewing these log outputs from our mistake ignorant brains: two of the commits look the same, e.g., have the same commit message. Not only have we combined two of our changes from ~topic/bad~ but we combined them with a commit that was /already/ merged into the ~master~ branch. Assuming ~master~ is a stable and "branchable" branch, we will not be able to simply rebase one way and return, the commits are too intermingled. #+BEGIN_QUOTE Branchable, in this context, means the branch is safe to base work, no one on our team (or ourselves, if we practice proper discipline) will come behind us and change the history of this branch. This is an important assumption in /any/ distributed workflow. Every project should have (at least) one "branchable" reference, many choose this to be the ~master~ branch. #+END_QUOTE *** Solutions :PROPERTIES: :CUSTOM_ID: solutions :END: One way we can fix this is to simply not care. But that's not what we are about: we like clean history, this situation and such a solution is clearly not clean! Therefore, we will have to return the ~topic/bad~ branch to a clean state before continuing with merging the work done in the branch. Let's start with examining the reflog: #+BEGIN_EXAMPLE ± git reflog 7387d60 HEAD@{0}: checkout: moving from topic/bad to master 3b71666 HEAD@{1}: rebase -i (finish): returning to refs/heads/topic/bad 3b71666 HEAD@{2}: rebase -i (fixup): update bar: add 1 4cc10e9 HEAD@{3}: rebase -i (fixup): # This is a combination of 2 commits. 3de2659 HEAD@{4}: rebase -i (start): checkout HEAD~3 7647f9c HEAD@{5}: commit: update bar: add 2 4babfe7 HEAD@{5}: commit: update foo: add 2 3de2659 HEAD@{6}: checkout: moving from master to topic/bad 7387d60 HEAD@{7}: commit: add foobar 3de2659 HEAD@{8}: checkout: moving from topic/bad to master 3de2659 HEAD@{9}: checkout: moving from master to topic/bad 3de2659 HEAD@{10}: merge topic/foobar: Fast-forward 5e6dd5f HEAD@{11}: checkout: moving from topic/foobar to master 3de2659 HEAD@{12}: commit: update bar: add 1 5e6dd5f HEAD@{13}: checkout: moving from master to topic/foobar 5e6dd5f HEAD@{14}: commit: update foo: add 1 9640abb HEAD@{15}: commit: add bar 31d2347 HEAD@{16}: commit (initial): initial commit #+END_EXAMPLE Examining ~HEAD@{5}~ we will see the commit of ~topic/bad~ /before/ we attempted to rebase the three commits. If we start there, we may be able to salvage the history. #+BEGIN_EXAMPLE ± git checkout topic/bad ± git reset --hard 7647f9c ± git log --oneline 7647f9c update bar: add 2 4babfe7 update foo: add 2 3de2659 update bar: add 1 5e6dd5f update foo: add 1 9640abb add bar 31d2347 initial commit #+END_EXAMPLE #+BEGIN_QUOTE Obligatory notice, blindly using ~git reset --hard~ can lead to dark, scary places. As with the first example in this post, ~git reset --hard~ is an even more subtle way to lose commits. Pause before pressing enter /every time/ you type ~git reset --hard~. #+END_QUOTE Perfect, we are back to the state of the branch as seen in the following image: #+ATTR_HTML: :align center #+HTML:
#+NAME: fig: Example Repository State Before Mistake [[file:/media/git-repo-state-2.svg]] #+HTML:
From here, we can merge the two branches however we please: rebase and fast-forward or regular old merge commits. The first way of merging the two branches may proceed as follows: #+BEGIN_EXAMPLE ± git branch topic/bad ± git rebase master First, rewinding head to replay your work on top of it... Applying: update foo: add 2 Applying: update bar: add 2 ± git checkout master Switched to branch 'master' ± git merge --ff-only topic/bad Updating 7387d60..577aa0b Fast-forward bar | 1 + foo | 1 + 2 files changed, 2 insertions(+) #+END_EXAMPLE Afterwards, our repository will look like the following figure: #+ATTR_HTML: :align center #+HTML:
#+NAME: fig: Example Repository State After Rebase Fast-Forward Merge [[file:/media/git-repo-state-4.svg]] #+HTML:
#+BEGIN_QUOTE If we wanted to rebase the two commits from ~topic/bad~ together, we could have easily done so /right/ before switching to the ~master~ branch. #+END_QUOTE Proceeding with a regular merge commit would proceed similar to the following: #+BEGIN_EXAMPLE ± git checkout master Switched to branch 'master' ± git merge --no-ff topic/bad -m 'merge branch "topic/bad"' Merge made by the 'recursive' strategy. bar | 1 + foo | 1 + 2 files changed, 2 insertions(+) #+END_EXAMPLE Afterwards, our repository will look like the following figure: #+ATTR_HTML: :align center #+HTML:
#+NAME: fig: Example Repository State After Merge Commit [[file:/media/git-repo-state-5.svg]] #+HTML:
** Summary :PROPERTIES: :CUSTOM_ID: summary :END: The best way to fix Git repository history is not to make mistakes in the first place. However, since mistakes are inevitable, we must learn the tools to discover, recover, and return to the appropriate state to correct our mistakes. More importantly, we must learn the courage to make mistakes, knowing we have an escape route. This way, we can avoid keeping around a ~git.txt~ file ([[xkcd-git][xkcd]]) when our repository eventually melts down. ** References - [[man-git-reflog][~git-reflog(1)~]] - [[git-scm-book-internals][Git SCM book, Internals Chapter]] - [[man-git-fsck][~git-fsck(1)~]] - [[kb-git-in-reverse][Git in Reverse]] - [[man-git-show][~git-show(1)~]] - [[man-git-cat-file][~git-cat-file(1)~]] - [[man-git-reset][~git-reset(1)~]] - [[man-git-rebase][~git-rebase(1)~]] - [[xkcd-git][XKCD: Git]]