aboutsummaryrefslogtreecommitdiff
path: root/posts/hunk-editing.org
blob: df9398231b5d31c352c067c68ac24f7d4dc5791b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
#+TITLE: The Art of Manually Editing Hunks
#+DESCRIPTION: How to edit hunk diffs
#+TAGS: Git
#+TAGS: How-to
#+TAGS: Tips and Tricks
#+DATE: 2015-10-24
#+SLUG: art-manually-edit-hunks
#+LINK: udiff https://www.gnu.org/software/diffutils/manual/html_node/Unified-Format.html
#+LINK: git-show https://www.kernel.org/pub/software/scm/git/docs/git-show.html
#+LINK: git-diff https://www.kernel.org/pub/software/scm/git/docs/git-diff.html

#+BEGIN_PREVIEW
There's a certain art to editing hunks, seemingly arcane.  Hunks are blocks of
changes typically found in unified diff patch files, or, more commonly today,
found in Git patches.
#+END_PREVIEW

Git uses its own variant of the [[udiff][unified diff format]], but it isn't
much different.  The differences between the unified format and Git's are
usually not significant.  The patch files created with [[git-show][git-show]]
or [[git-diff][git-diff]] are consumable by the usual tools, ~patch~, ~git~,
~vimdiff~, etc.

** Short Introduction to Unified Diff

A unified diff may look something similar to (freely copied from the
~diffutils~ manual):

#+BEGIN_SRC diff
    --- lao 2002-02-21 23:30:39.942229878 -0800
    +++ tzu 2002-02-21 23:30:50.442260588 -0800
    @@ -1,7 +1,6 @@
    -The Way that can be told of is not the eternal Way;
    -The name that can be named is not the eternal name.
     The Nameless is the origin of Heaven and Earth;
    -The Named is the mother of all things.
    +The named is the mother of all things.
    +
     Therefore let there always be non-being,
       so we may see their subtlety,
     And let there always be being,
    @@ -9,3 +8,6 @@
     The two are the same,
     But after they are produced,
       they have different names.
    +They both may be called deep and profound.
    +Deeper and more profound,
    +The door of all subtleties!
#+END_SRC

The first two lines define the files that are input into the ~diff~ program,
the first, ~lao~, being the "source" file and the second, ~tzu~, being the
"new" file.  The starting characters ~---~ and ~+++~ denote the lines from
each.

~+~ denotes a line that will be added to the first file and ~-~ denotes a line
that will be removed from the first file.  Lines with no changes are preceded
by a single space.

The ~@@ -1,7 +1,6 @@~ and ~@@ -9,3 +8,6 @@~ are the hunk identifiers.  That is,
diff hunks are the blocks identified by ~@@ -line number[,context] +line
number[, context] @@~ in the diff format.  The ~context~ number is optional and
occasionally not needed.  However, it is always included in when using
~git-diff~.  The line numbers defines the number the hunk begins.  The context
number defines the number of lines in the hunk.  Unlike the line number, it
often differs between the two files.  In the first hunk of the example above,
the context numbers are ~7~ and ~6~, respectively.  That is, lines preceded
with a ~-~ and a space equals 7.  Similarly, lines starting with a ~+~ and a
space equals 6.

#+BEGIN_QUOTE
  Lines starting with a space count towards the context of both files.
#+END_QUOTE

Since the second file has a smaller context, this means we are removing more
(by one) lines than we are adding.  To ~diff~, updating a line is the same as
removing the old line and adding a new line (with the changes).

Armed with this information, we can start editing hunks that can be cleanly
applied.

** Motivation

What might be the motivation for even wanting to edit hunk files? The biggest I
see is when using ~git-add --patch~.  Particularly when the changes run
together and cannot be split apart automatically.  We can see this in the diff
above.

The trivial case is being able to stage a single hunk of the above diff,
nothing has to be done to stage the changes separately other than using the
~--patch~ option.

However, staging separate changes inside a hunk becomes slightly more
complicated.  Often, if the changes are broken up with a even just a single
line (if it exists), they can be split.  When they run together, it becomes
more difficult to do.

Of course, a way to solve this problem, is to manually back out the changes (a
series of "undos"), save the file, stage it, play back the changes (a series of
"redos", perhaps).  This can be very error prone and if you make any other
changes during between undo and redo, you may have lost the changes.
Therefore, being able to manually edit the specific hunk into the right shape,
no changes are lost.

** Hunk Editing Example

Let's walk through an example of staging some changes, and manually editing a
hunk to stage them into the patches we want.

Create a temporary Git repository, this will be a just some basic stuff for
testing.

#+BEGIN_SRC sh
    % cd /tmp
    % git init foo
    % cd foo
#+END_SRC

#+BEGIN_QUOTE
  From here on, we will assume the working directory to be ~/tmp/foo~.
#+END_QUOTE

Inside this new Git repository, add a new file, ~quicksort.exs~:

#+BEGIN_SRC elixir
    defmodule Quicksort do

      def sort(list) do
        _sort(list)
      end

      defp _sort([]), do: []
      defp _sort(list = [h|t]) do
        _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
      end

    end
#+END_SRC

Perform the usual actions, ~git-add~ and ~git-commit~:

#+BEGIN_SRC sh
    % git add quicksort.exs
    % git commit -m 'initial commit'
#+END_SRC

Now, let's make some changes.  For one, there's compiler warning about the
unused variable ~t~ and the actually sorting seems a bit dense.  Let's fix the
warning and breakup the sorting:

#+BEGIN_SRC elixir
    defmodule Quicksort do

      def sort(list) do
        _sort(list)
      end

      defp _sort([]), do: []
      defp _sort(list = [h|_]) do
        (list |> Enum.filter(&(&1 < h)) |> _sort)
        ++ [h] ++
        (list |> Enum.filter(&(&1 > h)) |> _sort)
      end

    end
#+END_SRC

Saving this version of the file should produce a diff similar to the following:

#+BEGIN_SRC diff
    diff --git a/quicksort.exs b/quicksort.exs
    index 97b60b4..ed2446b 100644
    --- a/quicksort.exs
    +++ b/quicksort.exs
    @@ -5,8 +5,10 @@ defmodule Quicksort do
       end

       defp _sort([]), do: []
    -  defp _sort(list = [h|t]) do
    -    _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
    +  defp _sort(list = [h|_]) do
    +    (list |> Enum.filter(&(&1 < h)) |> _sort)
    +    ++ [h] ++
    +    (list |> Enum.filter(&(&1 > h)) |> _sort)
       end

     end
#+END_SRC

However, since these changes are actually, argubly, two different changes, they
should live in two commits.  Let's stage the change for ~t~ to ~_~:

#+BEGIN_SRC sh
    % git add --patch
#+END_SRC

We will be presented with the diff from before:

#+BEGIN_SRC sh
    diff --git a/quicksort.exs b/quicksort.exs
    index 97b60b4..ed2446b 100644
    --- a/quicksort.exs
    +++ b/quicksort.exs
    @@ -5,8 +5,10 @@ defmodule Quicksort do
       end

       defp _sort([]), do: []
    -  defp _sort(list = [h|t]) do
    -    _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
    +  defp _sort(list = [h|_]) do
    +    (list |> Enum.filter(&(&1 < h)) |> _sort)
    +    ++ [h] ++
    +    (list |> Enum.filter(&(&1 > h)) |> _sort)
       end

     end
    Stage this hunk [y,n,q,a,d,/,e,?]?
#+END_SRC

First thing we want to try is using the ~split(s)~ option.  However, this is an
invalid choice because Git does not know how to split this hunk and we will be
presented with the available options and the hunk again.  The option we then
want is ~edit(e)~.

We will be dropped into our default editor, environment variable ~$EDITOR~, Git
~core.editor~ setting.  From there, we will be presented with something of the
following:

#+BEGIN_SRC diff
    # Manual hunk edit mode -- see bottom for a quick guide
    @@ -5,8 +5,10 @@ defmodule Quicksort do
       end

       defp _sort([]), do: []
    -  defp _sort(list = [h|t]) do
    -    _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
    +  defp _sort(list = [h|_]) do
    +    (list |> Enum.filter(&(&1 < h)) |> _sort)
    +    ++ [h] ++
    +    (list |> Enum.filter(&(&1 > h)) |> _sort)
       end

     end
    # ---
    # To remove '-' lines, make them ' ' lines (context).
    # To remove '+' lines, delete them.
    # Lines starting with # will be removed.
    #
    # If the patch applies cleanly, the edited hunk will immediately be
    # marked for staging. If it does not apply cleanly, you will be given
    # an opportunity to edit again. If all lines of the hunk are removed,
    # then the edit is aborted and the hunk is left unchanged.
#+END_SRC

From here, we want to replace the leading minus of the change removal to a
space and remove the last three additions.

That is, we want the diff to look like:

#+BEGIN_SRC diff
    @@ -5,8 +5,10 @@ defmodule Quicksort do
       end

       defp _sort([]), do: []
    -  defp _sort(list = [h|t]) do
     sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
    +  defp _sort(list = [h|_]) do
       end

     end
#+END_SRC

Saving and closing the editor now, Git will have staged the desired diff.  We
can check the staged changes via ~git-diff~:

#+BEGIN_SRC diff
    % git diff --cached
    diff --git a/quicksort.exs b/quicksort.exs
    index 97b60b4..94a5101 100644
    --- a/quicksort.exs
    +++ b/quicksort.exs
    @@ -5,8 +5,8 @@ defmodule Quicksort do
       end

       defp _sort([]), do: []
    -  defp _sort(list = [h|t]) do
         _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
    +  defp _sort(list = [h|_]) do
       end

     end
#+END_SRC

Notice, the hunk context data was updated correctly to match the new changes.

From here, commit the first change, and then add and commit the second change.

Something to watch out for is over zealously removing changed lines.  For
example, in Elixir quicksort example we have just did, if we entirely removed
the second ~-~ from the diff /and/ manually updated the hunk header, the patch
will never apply cleanly.  Therefore, be especially careful with removing ~-~
lines.