aboutsummaryrefslogtreecommitdiff
path: root/content/blog/hunk_editing.markdown
blob: 314aa833b9fffd2647d371a3c88edfb5783a6171 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
---
title: "The Art of Manually Editing Hunks"
description: "How to edit hunk diffs"
tags:
  - "Git"
  - "How-to"
  - "Tips and Tricks"
date: "2015-10-24"
updated: "2015-10-24"
categories:
  - "Development"
slug: "art-manually-edit-hunks"
---

There's a certain art to editing hunks, seemingly arcane. Hunks are blocks of
changes typically found in unified diff patch files, or, more commonly today,
found in Git patches.

Git uses its own variant of the [unified diff format][1], but it isn't much
different. The differences between the unified format and Git's are usually not
significant. The patch files created with [`git-show`][4] or [`git-diff`][2]
are consumable by the usual tools, `patch`, `git`, `vimdiff`, etc.

## Short Introduction to Unified Diff ##

A unified diff may look something similar to (freely copied from the
`diffutils` manual):

    --- lao	2002-02-21 23:30:39.942229878 -0800
    +++ tzu	2002-02-21 23:30:50.442260588 -0800
    @@ -1,7 +1,6 @@
    -The Way that can be told of is not the eternal Way;
    -The name that can be named is not the eternal name.
     The Nameless is the origin of Heaven and Earth;
    -The Named is the mother of all things.
    +The named is the mother of all things.
    +
     Therefore let there always be non-being,
       so we may see their subtlety,
     And let there always be being,
    @@ -9,3 +8,6 @@
     The two are the same,
     But after they are produced,
       they have different names.
    +They both may be called deep and profound.
    +Deeper and more profound,
    +The door of all subtleties!

The first two lines define the files that are input into the `diff` program,
the first, `lao`, being the "source" file and the second, `tzu`, being the
"new" file. The starting characters `---` and `+++` denote the lines from each.

`+` denotes a line that will be added to the first file and `-` denotes a line
that will be removed from the first file. Lines with no changes are preceded by
a single space.

The `@@ -1,7 +1,6 @@` and `@@ -9,3 +8,6 @@` are the hunk identifiers. That is,
diff hunks are the blocks identified by `@@ -line number[,context] +line
number[, context] @@` in the diff format. The `context` number is optional and
occasionally not needed. However, it is always included in when using
`git-diff`. The line numbers defines the number the hunk begins. The context
number defines the number of lines in the hunk. Unlike the line number, it
often differs between the two files. In the first hunk of the example above,
the context numbers are `7` and `6`, respectively. That is, lines preceded with
a `-` and a space equals 7. Similarly, lines starting with a `+` and a space
equals 6.

> Lines starting with a space count towards the context of both files.

Since the second file has a smaller context, this means we are removing more
(by one) lines than we are adding. To `diff`, updating a line is the same as
removing the old line and adding a new line (with the changes).

Armed with this information, we can start editing hunks that can be cleanly
applied.

## Motivation ##

What might be the motivation for even wanting to edit hunk files? The biggest I
see is when using `git-add --patch`. Particularly when the changes run together
and cannot be split apart automatically. We can see this in the diff above.

The trivial case is being able to stage a single hunk of the above diff,
nothing has to be done to stage the changes separately other than using the
`--patch` option.

However, staging separate changes inside a hunk becomes slightly more
complicated. Often, if the changes are broken up with a even just a single
line (if it exists), they can be split. When they run together, it becomes more
difficult to do.

Of course, a way to solve this problem, is to manually back out the changes (a
series of "undos"), save the file, stage it, play back the changes (a series of
"redos", perhaps). This can be very error prone and if you make any other
changes during between undo and redo, you may have lost the changes. Therefore,
being able to manually edit the specific hunk into the right shape, no changes
are lost.

## Hunk Editing Example ##

Let's walk through an example of staging some changes, and manually editing a
hunk to stage them into the patches we want.

Create a temporary Git repository, this will be a just some basic stuff for
testing.

    % cd /tmp
    % git init foo
    % cd foo

> From here on, we will assume the working directory to be `/tmp/foo`.

Inside this new Git repository, add a new file, `quicksort.exs`:

    defmodule Quicksort do

      def sort(list) do
        _sort(list)
      end

      defp _sort([]), do: []
      defp _sort(list = [h|t]) do
        _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
      end

    end

Perform the usual actions, `git-add` and `git-commit`:

    % git add quicksort.exs
    % git commit -m 'initial commit'

Now, let's make some changes. For one, there's compiler warning about the
unused variable `t` and the actually sorting seems a bit dense. Let's fix the
warning and breakup the sorting:

    defmodule Quicksort do

      def sort(list) do
        _sort(list)
      end

      defp _sort([]), do: []
      defp _sort(list = [h|_]) do
        (list |> Enum.filter(&(&1 < h)) |> _sort)
        ++ [h] ++
        (list |> Enum.filter(&(&1 > h)) |> _sort)
      end

    end

Saving this version of the file should produce a diff similar to the following:

    diff --git a/quicksort.exs b/quicksort.exs
    index 97b60b4..ed2446b 100644
    --- a/quicksort.exs
    +++ b/quicksort.exs
    @@ -5,8 +5,10 @@ defmodule Quicksort do
       end

       defp _sort([]), do: []
    -  defp _sort(list = [h|t]) do
    -    _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
    +  defp _sort(list = [h|_]) do
    +    (list |> Enum.filter(&(&1 < h)) |> _sort)
    +    ++ [h] ++
    +    (list |> Enum.filter(&(&1 > h)) |> _sort)
       end

     end

However, since these changes are actually, argubly, two different changes, they
should live in two commits. Let's stage the change for `t` to `_`:

    % git add --patch

We will be presented with the diff from before:

    diff --git a/quicksort.exs b/quicksort.exs
    index 97b60b4..ed2446b 100644
    --- a/quicksort.exs
    +++ b/quicksort.exs
    @@ -5,8 +5,10 @@ defmodule Quicksort do
       end

       defp _sort([]), do: []
    -  defp _sort(list = [h|t]) do
    -    _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
    +  defp _sort(list = [h|_]) do
    +    (list |> Enum.filter(&(&1 < h)) |> _sort)
    +    ++ [h] ++
    +    (list |> Enum.filter(&(&1 > h)) |> _sort)
       end

     end
    Stage this hunk [y,n,q,a,d,/,e,?]?

First thing we want to try is using the `split(s)` option. However, this is an
invalid choice because Git does not know how to split this hunk and we will be
presented with the available options and the hunk again. The option we then
want is `edit(e)`.

We will be dropped into our default editor, environment variable `$EDITOR`, Git
`core.editor` setting. From there, we will be presented with something of the
following:

    # Manual hunk edit mode -- see bottom for a quick guide
    @@ -5,8 +5,10 @@ defmodule Quicksort do
       end

       defp _sort([]), do: []
    -  defp _sort(list = [h|t]) do
    -    _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
    +  defp _sort(list = [h|_]) do
    +    (list |> Enum.filter(&(&1 < h)) |> _sort)
    +    ++ [h] ++
    +    (list |> Enum.filter(&(&1 > h)) |> _sort)
       end

     end
    # ---
    # To remove '-' lines, make them ' ' lines (context).
    # To remove '+' lines, delete them.
    # Lines starting with # will be removed.
    #
    # If the patch applies cleanly, the edited hunk will immediately be
    # marked for staging. If it does not apply cleanly, you will be given
    # an opportunity to edit again. If all lines of the hunk are removed,
    # then the edit is aborted and the hunk is left unchanged.

From here, we want to replace the leading minus of the change removal to a
space and remove the last three additions.

That is, we want the diff to look like:

    @@ -5,8 +5,10 @@ defmodule Quicksort do
       end

       defp _sort([]), do: []
    -  defp _sort(list = [h|t]) do
     sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
    +  defp _sort(list = [h|_]) do
       end

     end

Saving and closing the editor now, Git will have staged the desired diff. We
can check the staged changes via `git-diff`:

    % git diff --cached
    diff --git a/quicksort.exs b/quicksort.exs
    index 97b60b4..94a5101 100644
    --- a/quicksort.exs
    +++ b/quicksort.exs
    @@ -5,8 +5,8 @@ defmodule Quicksort do
       end

       defp _sort([]), do: []
    -  defp _sort(list = [h|t]) do
         _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
    +  defp _sort(list = [h|_]) do
       end

     end

Notice, the hunk context data was updated correctly to match the new changes.

From here, commit the first change, and then add and commit the second change.

Something to watch out for is overzealously removing changed lines. For
example, in Elixir quicksort example we have just did, if we entirely removed
the second `-` from the diff _and_ manually updated the hunk header, the patch
will never apply cleanly. Therefore, be especially careful with removing `-`
lines.

[1]: https://www.gnu.org/software/diffutils/manual/html_node/Unified-Format.html

[2]: https://www.kernel.org/pub/software/scm/git/docs/git-diff.html

[3]: https://www.gnu.org/licenses/fdl.html

[4]: https://www.kernel.org/pub/software/scm/git/docs/git-show.html