posts/elixir-otp-releases.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320

#+TITLE: Releasing Elixir/OTP applications to the World
#+DESCRIPTION: The perils of releasing OTP applications in the wild
#+TAGS: Erlang/OTP
#+TAGS: Elixir
#+TAGS: Phoenix
#+TAGS: Docker
#+TAGS: How-to
#+TAGS: Tips and Tricks
#+DATE: 2016-05-27
#+SLUG: elixir-otp-releases
#+LINK: erlang-docs-systools http://erlang.org/doc/man/systools.html
#+LINK: erlang-docs-reltool http://erlang.org/doc/man/reltool.html
#+LINK: rebar https://github.com/erlang/rebar3/releases
#+LINK: relx https://github.com/erlware/relx
#+LINK: erlang-docs-release http://erlang.org/doc/design_principles/release_structure.html
#+LINK: exrm https://github.com/bitwalker/exrm
#+LINK: wiki-autotools https://en.wikipedia.org/wiki/GNU_Build_System
#+LINK: phoenix-docs-releases http://www.phoenixframework.org/docs/advanced-deployment
#+LINK: erlang-solutions-homepage https://erlang-solutions.com
#+LINK: alpine-linux http://alpinelinux.org
#+LINK: hex-comeonin https://hex.pm/packages/comeonin
#+LINK: docker https://docker.com
#+LINK: docker-hub https://hub.docker.com/explore/
#+LINK: kb-docker-elixir-centos https://github.com/kennyballou/docker-elixir-centos
#+LINK: gnu-automake-crosscompile https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation

#+BEGIN_PREVIEW
Developing Elixir/OTP applications is an enlightening, mind-boggling, and
ultimately enjoyable experience.  There are so many features of the language
that change the very way we as developers think about concurrency and program
structure.  From writing pure functional code, to using message passing to
coordinate complex systems, it is one of the best languages for the SMP
revolution that has been slowly boiling under our feet.
#+END_PREVIEW

However, /releasing/ Elixir and OTP applications is an entirely different and
seemingly seldom discussed topic.

The distribution tool chain of Erlang and OTP is a complicated one, There's
[[erlang-docs-systools][~systools~]], [[erlang-docs-reltool][~reltool~]],
[[rebar][~rebar(?3)~]], and [[relx][~relx~]] just to name a few that all
ultimately help in creating an Erlang/OTP [[erlang-docs-release]["release"]].
Similar to ~rebar3~, [[exrm][~exrm~]] takes the high-level abstraction approach
to combining ~reltool~ and ~relx~ into a single tool chain for creating
releases of Elixir projects.  Of course, we can also borrow from the collection
of [[wiki-autotools][autotools]].

There are plenty of articles and posts discussing how and why to use ~exrm~.  I
feel many of them, however, fail to /truly/ discuss /how/ to do this
effectively.  Most will mention the surface of the issue, but never give the
issue any real attention.  As any developer that wants to eventually /ship/
code, this is entirely too frustrating to leave alone.

There are "ways" of deploying OTP code relatively simply, however, these
methods generally avoid good practice of continuous integration/continuous
deployment, e.g., "build the OTP application /on/ the target system" or simply
use ~mix run~, etc.

I cannot speak for everyone, but my general goal is to /not/ have such a manual
step in my release pipeline, let alone having a possibly full autotool chain
and Erlang/Elixir stack on the production system is slightly unnerving for it's
own set of reasons.

** Problem

Here are some selected quotes; I'm not trying to pick on anyone in particular
or the community at large, but I'm trying to show a representation of why this
very topic is an issue in the first place.

#+BEGIN_QUOTE
  We need to be sure that the architectures for both our build and hosting
  environments are the same, e.g. 64-bit Linux -> 64-bit Linux.  If the
  architectures don't match, our application might not run when deployed.
  Using a virtual machine that mirrors our hosting environment as our build
  environment is an easy way to avoid that problem.
  [[phoenix-docs-releases][Phoenix Exrm Releases]].
#+END_QUOTE

And another, similar quote:

#+BEGIN_QUOTE
  One important thing to note, however: /you must use the same architecture for
  building your release that the release is getting deployed to./ If your
  development machine is OS X and you're deploying to a Linux server, you need
  a Linux machine to build your exrm release or it isn't going to work, or you
  can just build on the same server you're going to be running everything on.
  [[erlang-solutions-homepag][Brandon Richey]].
#+END_QUOTE

Unfortunately, these miss a lot of the more subtle issues, dependency hell is
real, and we're about to really dive into it.

There are a few examples where "same architecture" isn't enough, and this is
where we will spend the majority of our time.

For these examples, we will assume our host machine is running GNU/Linux,
specifically Arch Linux, and our target machine is running CentOS 7.2.  Both
machines are running the ~AMD64~ instruction sets, the architectures are the
/same/.

*** Shared Objects

Let's start with the most simplistic issue, different versions of shared
objects.

Arch Linux is a rolling release distribution that is generally /right/ on the
bleeding edge of packages, upstream is usually the development sources
themselves.  When ~ncurses~ moves version 6, Arch isn't far behind in putting
it in the stable package repository (and rebuilding a number of packages that
depend on ~ncurses~).  CentOS, on the other hand, is not so aggressive.
Therefore, when using the default ~relx~ configuration with ~exrm~, the Erlang
runtime system (ERTS) bundled with the release /will/ be incompatible with the
target system.

When the OTP application is started, an obscure linking error will be emitted
complaining about how ERTS cannot find a ~ncurses.so.6~ file and promptly fail.

Worse, after possibly "fixing" this issue, ~ncurses~ is only one of a few
shared objects Erlang needs to run, depending on what was enabled when Erlang
was built or what features the deployed application needs.

*** Erlang Libraries

We may try to resolve this issue by adding a particular ~rel/relx.config~ file
to our Elixir project.  Specifically, we will /not/ bundle ERTS, opting to use
the target's ERTS instead.

#+BEGIN_EXAMPLE erlang
    {include_erts, false}.
#+END_EXAMPLE

This seems like a promising approach, until another error message is emitted at
startup, namely, ERTS cannot find ~stdlib-2.8~ in the ~/usr/lib/erlang/lib~
folder.

Did I mention that our current build system is Arch and our target is CentOS?
Arch may have the /newest/ version of Erlang in the repository and CentOS is
still at whatever it was at before: R16B unless the
[[erlang-solutions-homepage][Erlang Solutions]] release is being used.

Since Erlang applications do (patch number) version locking, applications in
the dependency tree will need to match exactly and it's guaranteed that any and
all OTP applications will be at least depending on the Erlang kernel and the
Erlang standard library, these are at least two OTP applications /our/
application is going to need that are /no longer packaged when ~relx~ doesn't
bundle ERTS/.

Even if we specify another option to ~relx~, namely, ~{system_libs, true}.~, we
are left with the same lack of Erlang system libraries.

That's correct and there is some sensible reasons for this.  If we ask ~exrm~
and therefore ~relx~ to not include the build system's ERTS, we are /also/
excluding the standard Erlang libraries from the release as well, asking to
include the standard libraries of the build system's ERTS could run into the
/very/ same issues as above for a whole host of other reasons.

We are left to attempt more solutions.

*** Docker or Virtualization

Next, since we do want to ultimately get our build running in a CI/CD
environment, we may look toward virutalization/containerization.  Being
sensible people, we try to use a small image, maybe basing our image on
[[alpine-linux][Alpine Linux]] as to be nice to our precious ~/var~ or SSD
space.  We may even go so far as to build Erlang and Elixir ourselves in these
images to make sure we have the most control over them as we can.  Furthermore,
since we are building everything ourself, shipping the built ERTS seems like a
good idea too, so we can delete the ~rel/relx.config~ file.

This seems promising.  However, we have shared object problems again.  Since we
are building Erlang and Elixir ourselves, we decided to disable ~termcap~
support thus no longer requiring the ~ncurses~ library altogether.  We hope
that the ~openssl~ libraries are the same, so we don't have to worry about that
mess, and we move on.

This time, when we attempt to deploy the application, we get a different,
obscure error: something about our ~musl~ C library isn't found on the target
system.  Right, because we are trying to create a small image, we opted to use
the ~musl~ C library because of its size and being easily supported in the
Alpine Linux container.  Trying to use GNU C library is too cumbersome and
would only inflate the image beyond any gains we would achieve by using Alpine
in the first place.

That's not going to work.

*** OTP as Project Dependency

Another option we might try is make Erlang a build dependency of our Elixir
application, this /could/ be achieved via the following structure:

#+BEGIN_EXAMPLE elixir
    {:otp,
     "~> 18.3.2",
     github: "erlang/otp",
     tag: "OTP-18.3.2",
     only: :prod,
     compile: "./otp_build autoconf;" <>
              "./configure --without-termcap --without-javac;" <>
              "make -j4" <>
              "DISTDIR=/tmp/erlang make install"
    }
#+END_EXAMPLE

Then using ~rel/relx.config~ with:

#+BEGIN_EXAMPLE erlang
    {include_erts, "/tmp/erlang"}.
#+END_EXAMPLE

/May/ turn out to work, assuming the build server and the target system have
the same shared objects for OpenSSL and others that may be enabled by default.

#+BEGIN_QUOTE
  However, I didn't follow this idea all the way to the end as I wasn't
  entirely happy with it, and it would fall to some later issues.
#+END_QUOTE

Notably, though, this will inflate the production builds drastically since our
~mix deps.get~ and ~mix deps.compile~ steps will hang attempting to build
Erlang itself.

However, again, we will likely run into issues with the C library used by the
build system/container.  Going this route doesn't allow us to use Alpine Linux
either.

Worse, there's another issue that hasn't even shown itself but is lying in
wait: native implemented (or interface) functions (NIFs).

If our project has a dependency that builds a NIF as part of its build
(Elixir's [[hex-comeonin][comeonin]] is a good example of this), unless the NIF
is statically compiled, we are back to square one and shared objects are not
our friends.  Furthermore, if we are using a different standard library
implementation, i.e., ~musl~ vs ~glibc~, the dependency will likely complain
about it as well.

** Non-Solution Solutions

Of course, all of these above issues can be solved by "just building on the
target machine" or by simply using ~mix run~ on the target instead.  However, I
personally find these solutions unacceptable.

I'm not overly fond of requiring my target hosts, my production machines,
running a full development tool chain.  Before this is dismissed as a personal
issue, remember that our dependency tree may contain NIFs outside of our
control.  Therefore, it's not just Erlang/Elixir that are required to be on the
machine, but a C standard library and autotools too.

This solution doesn't immediately give the impression of scaling architecture.
If a new release needs to be deployed, each server will now need to spare some
load for building the project and its dependencies before any real, actual
upgrading can continue.

** Solutions(?)

What are we to do? How are we to build Erlang/Elixir/OTP applications as part
of our CI/CD pipeline? Particularly, how are we to build our applications on a
CI/CD system and /not/ the production box(es) themselves?

If any of the above problems tell us anything, it's that the build system must
be either the /exact same/ machine or clone with build tools.  Thankfully, we
can achieve a "clone" without too much work using [[docker][Docker]] and the
[[docker-hub][official image registries]].

By using the official CentOS image and a specific tag, we can match our target
system almost exactly.  Furthermore, building the Erlang/Elixir stack from
source is a relatively small order for a Docker container too, making
versioning completely within reach.  Moreover, since the build host and the
target host are nearly identical, bundling ERTS should be a non-issue.

#+BEGIN_QUOTE
  This is the observed result of using
  [[kb-docker-elixir-centos][docker-elixir-centos]] for a base image for CI
  builds.
#+END_QUOTE

Another possible solution is to ship Docker containers as the artifact of the
build.  However, this, to do well, requires a decent Docker capable
infrastructure and deployment process.  Furthermore, going this route, it's
unlikely that ~exrm~ is even necessary at all.  It is likely more appropriate
to simply use ~mix run~ or whatever the project's equivalent is.  Another thing
lost here, is [[erlang-docs-rel][relups]], which is essentially the whole
reason of wanting to use ~exrm~ in the first place.

As such, if using ~exrm~ is desired, setting up a build server will be
imperative to building reliably and without building on production.  Scaling
from a solid build foundation will be much easier than building and "deploying"
on the production farm itself.

** Moving Forward

Releasing software isn't in a particularly hard class of problems, but it does
have its challenges.  Some languages attempt to solve this challenge in its
artifact/build result.  Other languages, unfortunately, don't attempt to solve
this problem at all.  Though, I can see it possible to eventually reach a goal
of being able to create binary releases with steps as simple as ~./configure &&
make && make install && tar~.

But we aren't there yet.

But we are close.

The current way Erlang/OTP applications want to be deployed includes wanting to
ship /with/ the runtime, this is a great starting point.

To move to a better, easier release cycle, we need a few things:

- The ability to (natively) cross-compile to different architectures and
  different versions of ERTS /and/ cross-compile Erlang code itself.

- The ability to easily statically compile ERTS and bundle the result for the
  specified architecture.

Cross-compiling to different versions of ERTS is likely a harder problem to
tackle.  But being able to cross-compile the ERTS itself is likely much easier
since this is already a [[gnu-automake-crosscompile][feature]] of GCC.

Thus, our problem is now how do we add and/or expose the facility of
customizing the appropriate build flags to our projects and dependencies to
cross-compile a static ERTS and any NIFs and bundle these into a solid OTP
release.