#+TITLE: NixOS Setup and Configuration #+DESCRIPTION: NixOS setup with RAID, LUKS, and LVM #+TAGS: GNU/Linux #+TAGS: nixos #+TAGS: nix #+TAGS: md #+TAGS: luks #+TAGS: lvm #+DATE: 2019-07-23 #+SLUG: nixos-md-luks-lvm-setup #+LINK: arch-dm-crypt-dev-enc https://wiki.archlinux.org/index.php/Dm-crypt/Device_encryption #+LINK: arch-dm-crypt-prep https://wiki.archlinux.org/index.php/Dm-crypt/Drive_preparation #+LINK: arch-linux https://www.archlinux.org/ #+LINK: arch-lvm https://wiki.archlinux.org/index.php/LVM #+LINK: bios https://en.wikipedia.org/wiki/BIOS #+LINK: cfg.nix.git https://git.devnulllabs.io/cfg.nix.git #+LINK: docker https://www.docker.com/ #+LINK: fhs http://www.pathname.com/fhs/ #+LINK: fralef-docker-iptables https://fralef.me/docker-and-iptables.html #+LINK: gentoo https://gentoo.org/ #+LINK: gentoo-lvm https://wiki.gentoo.org/wiki/LVM #+LINK: glibc https://www.gnu.org/software/libc/ #+LINK: gnu https://www.gnu.org #+LINK: grub2 https://www.gnu.org/software/grub/ #+LINK: hivestream-gentoo http://www.hivestream.de/gentoo-installation-with-raid-lvm-luks-and-systemd.html #+LINK: linux https://www.kernel.org/ #+LINK: luks https://gitlab.com/cryptsetup/cryptsetup/blob/master/README.md #+LINK: lvm https://www.sourceware.org/lvm2/ #+LINK: md http://neil.brown.name/blog/mdadm #+LINK: moby-nftables-issue https://github.com/moby/moby/issues/26824 #+LINK: nftables https://wiki.nftables.org/wiki-nftables/index.php/Main_Page #+LINK: nix https://nixos.org/nix/ #+LINK: nix-paper https://www.usenix.org/legacy/events/lisa04/tech/full_papers/dolstra/dolstra.pdf #+LINK: nixos https://nixos.org/ #+LINK: nixos-manual https://nixos.org/nixos/manual/index.html #+LINK: nixos-paper https://nixos.org/~eelco/pubs/nixos-icfp2008-final.pdf #+LINK: nixos-post https://kennyballou.com/blog/2019/07/nixos #+LINK: stephank-docker-nftables https://stephank.nl/p/2017-06-05-ipv6-on-production-docker.html #+LINK: uefi https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface #+LINK: wiki-btrfs https://en.wikipedia.org/wiki/Btrfs #+LINK: wiki-ext4 https://en.wikipedia.org/wiki/Ext4 #+LINK: wiki-luks https://en.wikipedia.org/wiki/Linux_Unified_Key_Setup #+LINK: wiki-lvm https://en.wikipedia.org/wiki/Logical_Volume_Manager_%28Linux%29 #+LINK: wiki-md https://en.wikipedia.org/wiki/Mdadm #+LINK: wiki-raid https://en.wikipedia.org/wiki/RAID #+LINK: wiki-serpent https://en.wikipedia.org/wiki/Serpent_(cipher) #+LINK: wiki-twofish https://en.wikipedia.org/wiki/Twofish #+LINK: wiki-xfs https://en.wikipedia.org/wiki/XFS #+BEGIN_PREVIEW A brief overview (read instructions) on setting up a new [[nixos][NixOS]] system with [[lvm][LVM]] on [[luks][LUKS]] on [[md][md]]. We will go through drive preparation, basic [[nixos][NixOS]] installation instructions, and slight modifications to the instructions for installing a new system from configuration. #+END_PREVIEW ** NixOS [[nixos-post][Previously]], we introduced the concepts and ideas behind [[nixos][NixOS]] and by extension [[nix][nix]], the package manager. We, therefore, will not be reiterating the discussion here. ** System Installation and Configuration Installing [[nixos][NixOS]] is fairly straight forward. However, that is a relative term. My experience is with [[arch-linux][Arch Linux]] and more recently [[gentoo-linux][Gentoo]], both not known for having forgiving installations. #+begin_quote I don't want to replace the [[nixos-manual][manual]], I only want to really supplement it with the steps where I deviated from its path or add information for *my* personal configuration/preferences. #+end_quote That said, we will focus on disk preparation and partitioning as that is the most complicated portion of our installation. We will walk through the installation of two machines, first, will be my current laptop with two SSD's, second, my main desktop with six hard drives. Since we are doing two setups, we will also have a chance to do both [[bios][BIOS]] and [[uefi][UEFI]] partitioning schemes. #+begin_quote I am assuming the use of the [[nixos][NixOS]] [[nixos-live][live installation]] medium. #+end_quote *** Laptop **** Disk Preparation #+begin_quote I am assuming the use of the [[nixos][NixOS]] [[nixos-live][live installation]] medium. #+end_quote Since this will be an encrypted everything (sans ~/boot~), we will need to securely erase all the drives. For each drive, ~${device}~, perform the following: #+begin_quote Lines beginning with ~#~ are commands to be executed as the ~root~ user. #+end_quote #+begin_example # cryptsetup open --type plain --key-file=/dev/urandom ${device} wipe-me # dd if=/dev/zero of=/dev/mapper/wipe-me status=progress # cryptsetup close #+end_example #+begin_quote For large hard drives, this step can take a considerable amount of time. This can be done in parallel by using different identifiers than ~wipe-me~. This probably *cannot* be parallelized if using the more paranoid random source ~/dev/random~ device instead of ~/dev/urandom~ as there will likely not be enough entropy for more than one device. #+end_quote Concretely, this may look like: #+begin_example # cryptsetup open --type plain --key-file=/dev/urandom /dev/sda wipe-me # dd if=/dev/zero of=/dev/mapper/wipe-me status=progress # cryptsetup close #+end_example After securely erasing each hard drive to be used, we will next setup the various partitions for each drive. Since we will be using [[lvm][~LVM~]] on a [[luks][~LUKS~]] container, residing on a [[wiki-raid][RAID 1]] pair of hard drives, our partitioning scheme will be pretty simple. Since [[nixos][NixOS]], by default, uses [[grub2][Grub2]], we will need to create a 2 MB first partition for [[bios][BIOS]] systems. After partitioning the disk, the partition table should look similar to the following: #+begin_example Device Start End Sectors Size Type /dev/sda1 2048 6143 4096 2M BIOS boot /dev/sda2 6144 1054719 1048576 512M Linux filesystem /dev/sda3 1054720 1953525134 1952470415 931G Linux RAID #+end_example Perform or replicate the partition table to the second disk. After which, we will begin the configuration of the mirror. #+begin_quote Certainly, it's possible to securely erase one disk, partition it, then copy it to the other disk via ~dd if=/dev/sda of=/dev/sdb status=progress~. #+end_quote We will create two mirrors for this configuration, one for the ~/boot~ partition and another for the [[luks][~LUKS~]] container: #+begin_example # mdadm --create /dev/md1 --level=mirror --raid-devices=2 /dev/sda2 /dev/sdb2 # mdadm --create /dev/md2 --level=mirror --raid-devices=2 /dev/sda3 /dev/sdb3 #+end_example After creating the mirrors, we need to create the [[luks][~LUKS~]] container and the format the ~/boot~ partition. Boot Partition: #+begin_example # mkfs.ext4 -L boot /dev/md1 #+end_example [[luks][~LUKS~]] Container: #+begin_quote When configuring encrypted containers, there are lot of different options and parameters to choose from. For example, there are various cryptography schemes and modes to choose from. ~AES-XTS-PLAIN64~ is a solid choice since most CPU's will have extensions for doing ~AES~, increasing the throughput. I personally, have been looking into the other ~AES~ finalists such as [[wiki-twofish][Twofish]] and [[wiki-serpent][Serpent]]. #+end_quote #+begin_example # cryptsetup -v \ --type luks \ --cipher twofish-xts-plain64 \ --key-size 512 \ --hash sha512 \ --iter-time 5000 \ --use-random \ --verify-passphrase \ luksFormat \ /dev/md2 #+end_example Once the [[luks][~LUKS~]] container is created, open it: #+begin_example # cryptsetup open /dev/md2 cryptroot #+end_example Now, we can begin creating the [[lvm][~LVM~]] volumes: #+begin_example # pvcreate /dev/mapper/cryptroot # vgcreate vg0 /dev/mapper/cryptroot # lvcreate -L 1G vg0 -n root # lvcreate -L 10G vg0 -n var # lvcreate -L 20G vg0 -n opt # lvcreate -L 32G vg0 -n swap # lvcreate -L 100G vg0 -n nix # lvcreate -L 100G vg0 -n home # lvcreate -L 100G vg0 -n docker #+end_example Notice, there is no ~/usr~ in our [[lvm][~LVM~]] configuration. Furthermore, notice ~/~ is particularly small. [[nixos][NixOS]] is particularly different when it comes [[fhs][Filesystem Hierarchy]]. Notably, there is a large portion of the volume set aside for ~/nix~. The majority of the "system" will be in this directory. Now we need to format the volumes: #+begin_example # mkfs.ext4 -L root /dev/mapper/vg0-root # mkfs.ext4 -L var /dev/mapper/vg0-var # mkfs.ext4 -L opt /dev/mapper/vg0-opt # mkswap /dev/mapper/vg0-swap # mkfs.xfs -L nix /dev/mapper/vg0-nix # mkfs.xfs -L home /dev/mapper/vg0-home # mkfs.btrfs -L docker /dev/mapper/vg0-docker #+end_example Most volumes will be formatted with the [[wiki-ext4][~ext4~ filesystem]], typical for standard [[gnu][GNU]]/[[linux][Linux]] systems. However, we will use [[wiki-xfs][~XFS~]] for ~/nix~ and ~/home~. [[wiki-xfs][~XFS~]] is particularly well suited for purposes of these directories. Furthermore, since [[docker][~Docker~]] is an (unfortunate) necessity, creating a proper [[wiki-cow][COW]] filesystem using [[wiki-btrfs][~Btrfs~]], we get better management of [[docker][Docker]] images. Next, we will mount these volumes into various folders to begin the installation, creating the folder trees as necessary to mount: #+begin_example # mount /dev/mapper/vg0-root /mnt/ # mkdir -p /mnt/{var,nix,home,boot,opt} # mount /dev/md1 /mnt/boot # mount /dev/mapper/vg0-opt /mnt/opt # mount /dev/mapper/vg0-var /mnt/var # mount /dev/mapper/vg0-home /mnt/home # mount /dev/mapper/vg0-nix /mnt/nix # mkdir -p /mnt/var/lib/docker # mount /dev/mapper/vg0-docker /mnt/var/lib/docker #+end_example *** Desktop The desktop preparation and configuration are very similar to the laptop. However, as noted above, the complication comes from the fact that instead of a single pair of drives, we will have 3 pairs of drives. Everything else is essentially the same. **** Disk Preparation We first start by securely erasing all the devices: #+begin_example # cryptsetup open --type plain --key-file /dev/urandom /dev/nvme0n1 wipe-me # dd if=/dev/zero of=/dev/mapper/wipe-me # cryptsetup close wipe-me #+end_example #+begin_quote Remember, we don't _have_ to securely erase _every_ device since we will be mirroring several of them together. This does require that each drive are *identical*. If they are not identical, it is likely safer to erase every drive. #+end_quote Next, we will begin by partitioning each of the devices: #+begin_example # gdisk /dev/nvme0n1 Command (? for help): n Partition number (1-128, default 1): 1 First sector: Last sector: +512M Hex code or GUID: EF00 Command (? for help): n First sector: Last sector: Hex code or GUID: FD00 Command (? for help): w #+end_example This will create the boot ~EFI~ system partition and the first encrypted container partition. We do essentially the same thing for each of the pairs. However, the next two only need a single partition for the [[md][~md~]] container. Unlike the secure erasing above, we _do_ need to create the partition tables for *each* device. After partitioning the drives, we will construct the [[wiki-raid][mirrors]]: #+begin_example # mdadm --create /dev/md1 --level=mirror --raid-devices=2 --metadata 1.0 /dev/nvme0n1p1 /dev/nvme1n1p1 # mdadm --create /dev/md2 --level=mirror --raid-devices=2 /dev/nvme0n1p2 /dev/nvme1n1p2 # mdadm --create /dev/md3 --level=mirror --raid-devices=2 /dev/sda1 /dev/sdb1 # mdadm --create /dev/md4 --level=mirror --raid-devices=2 /dev/sdd1 /dev/sde1 #+end_example We need to create the ~/boot~ mirror with ~metadata 1.0~ so that the super blocks are put at the end of the RAID such that the ~UEFI~ does not get confused when attempting to boot the system. Otherwise, we use the default for all other mirrors. To monitor the progress of the mirror synchronization, use the following command: #+begin_example # watch cat /proc/mdstat #+end_example It's not vitally important that the mirrors are synchronized before continuing. Although, from a reliability perspective, it is "safer". #+begin_quote It's also possible to specify the second device as ~missing~ in each of the above commands. This way, the synchronization process can effectively be deferred until the end. #+end_quote After creating each of the mirrors, we need to format the ~/boot~ ~EFI~ system partition. This is a ~UEFI~ system, therefore, we will be using ~vfat~ for the filesystem. #+begin_example # mkfs.vfat -n boot /dev/md1 #+end_example Now, we must create the various [[luks][~LUKS~]] containers: #+begin_example # cryptsetup -v \ --type luks \ --cipher twofish-xts-plain64 \ --key-size 512 \ --hash sha512 \ --iter-time 5000 \ --use-random \ --verify-passphrase \ luksFormat \ /dev/md2 # cryptsetup -v \ --type luks \ --cipher twofish-xts-plain64 \ --key-size 512 \ --hash sha512 \ --iter-time 5000 \ --use-random \ --verify-passphrase \ luksFormat \ /dev/md3 # cryptsetup -v \ --type luks \ --cipher twofish-xts-plain64 \ --key-size 512 \ --hash sha512 \ --iter-time 5000 \ --use-random \ --verify-passphrase \ luksFormat \ /dev/md4 #+end_example Next, we will open and start creating our [[lvm][~LVM~]] volumes: #+begin_example # cryptsetup open /dev/md2 cvg0 # cryptsetup open /dev/md3 cvg1 # cryptsetup open /dev/md4 cvg2 #+end_example Now the [[lvm][~LVM~]] setup: #+begin_example # pvcreate /dev/mapper/cvg0 # vgcreate vg0 /dev/mapper/cvg0 # pvcreate /dev/mapper/cvg1 # vgcreate vg1 /dev/mapper/cvg1 # pvcreate /dev/mapper/cvg2 # vgcreate vg2 /dev/mapper/cvg2 #+end_example Now that the volume groups are created, we will start creating the actual logical volumes: #+begin_example # lvcreate -L 1G -n root vg0 # lvcreate -L 100G -n nix vg0 # lvcreate -L 15G -n opt vg0 # lvcreate -L 20G -n var vg1 # lvcreate -L 100G -n docker vg1 # lvcreate -L 64G -n swap vg1 # lvcreate -L 1T -n home vg2 #+end_example Finally, we can format each of the partitions: #+begin_example # mkfs.ext4 -L root /dev/mapper/vg0-root # mkfs.ext4 -L opt /dev/mapper/vg0-opt # mkfs.xfs -L nix /dev/mapper/vg0-nix # mkfs.ext4 -L var /dev/mapper/vg1-var # mkfs.btrfs -L docker /dev/mapper/vg1-docker # mkfs.xfs -L home /dev/mapper/vg2-home # mkswap /dev/mapper/vg1-swap #+end_example Before moving onto the next step, we first need to mount each of volumes in the desired path: #+begin_example # mount /dev/mapper/vg0-root /mnt # mkdir -p /mnt/{boot,home,nix,var,opt} # mount /dev/md1 /mnt/boot # mount /dev/mapper/vg0-nix /mnt/nix # mount /dev/mapper/vg0-opt /mnt/opt # mount /dev/mapper/vg1-var /mnt/var # mkdir -p /mnt/var/lib/docker # mount /dev/mapper/vg1-docker /mnt/docker # mount /dev/mapper/vg2-home /mnt/home #+end_example *** NixOS Configuration and Installation Once the disk preparation is complete, we can follow the steps from the [[nixos-manual][NixOS Manual]] to create the initial configuration: #+begin_example # nixos-generate-config --root /mnt #+end_example After this is done, we can move onto configuring the system the way we want. However, this is where we will deviate slightly from the manual. First, we will need to install ~git~ so we can pull down our configuration. #+begin_quote The following steps are very personal. You're free to use my [[cfg.nix.git][configuration]] if you do not have your own, or if you would like to try it out. However, you will likely want different things from _your_ system. Change the following steps as necessary. #+end_quote #+begin_example # nix-env -i git # cd /mnt/etc/ # mv nixos nixos.bak # git clone git://git.devnulllabs.io/cfg.nix.git nixos # cd nixos # cp ../nixos.bak/hardware-configuration.nix . #+end_example My set of [[nix][Nix]] [[cfg.nix.git][configuration]] includes subfolders for each machine. To setup a new machine, I soft link ("symlink") the machine's ~configuration.nix~ into the ~[/mnt]/etc/nixos~ folder. If this is a new machine or a rebuild, I typically merge the differences between the ~hardware-configuration.nix~ files. After which, I perform the regular installation. #+begin_example nixos-install --no-root-passwd #+end_example Once this finishes, the installation and configuration is done. Reboot the machine, remove the installation/live media, use the freshly installed machine as if it was always there. **** UEFI Notes Aside from learning about the ~mdadm~ metadata placement being an issue for [[wiki-uefi][UEFI]] systems to boot, I also had played around with the various settings for [[grub][GRUB]] to install correctly without errors and warnings. Here's the full [[grub][GRUB]] configuration: #+begin_src nix boot.loader.systemd-boot = { enable = true; editor = false; }; boot.loader.efi = { canTouchEfiVariables = false; }; boot.loader.grub = { enable = true; copyKernels = true; efiInstallAsRemovable = true; efiSupport = true; fsIdentifier = "uuid"; splashMode = "stretch"; version = 2; device = "nodev"; extraEntries = '' menuentry "Reboot" { reboot } menuentry "Poweroff" { halt } ''; }; #+end_src Of particular importance are the following variables: - ~boot.loader.systemd-boot.enable~ - ~boot.loader.efi.canTouchEfiVariables~ - ~boot.loader.grub.efiInstallAsRemovable~ - ~boot.loader.grub.device~ Ideally, ~boot.loader.grub.efiSupport~ would be sufficient to tell [[grub][GRUB]] to install the [[wiki-uefi][UEFI]] payload instead. However, as it turns out, there is a few more settings required to ensure proper booting in [[wiki-uefi][UEFI]] environments, particularly when using [[raid][RAID]]. According to the manual, it's required to set ~boot.loader.systemd-boot.enable~ to ~true~. Setting ~boot.loader.grub.device~ or ~boot.loader.grub.devices~ to anything other than ~"nodev"~ or ~[ "nodev" ]~ disables ~boot.loader.grub.efiSupport~. Moreover, with ~boot.loader.efi.canTouchEfiVariables~, the installation/build process attempts to run ~efibootmgr~ to modify the NVRAM of the motherboard, setting the boot targets, this fails when used with ~boot.loader.grub.device = "nodev"~. Therefore, it is required to set ~boot.loader.efi.canTouchEfiVariables = false~ and ~boot.loader.grub.efiInstallAsRemovable~ such that installation process simply places the [[grub][GRUB]] [[wiki-uefi][UEFI]] payload in the "default" search location for the motherboard, consulted before the NVRAM settings. **** Docker, ~nftables~, and NixOS Notes In developing the system configuration, I came across some issues with respect to [[docker][Docker]] and [[nftables][~nftables~]]. The [[nftables][~nftables~]] project became standard in the [[linux][Linux]] kernel in version 3.13 and replaces the myriad of existing ~{ip,ip6,arp,eb}_tables~ tools and (kernel) code. Specifically, any [[linux][Linux]] kernel above 3.13, ~iptables~ and friends are now simply a user-space front-end to the [[nftables][~nftables~]] kernel backend. However, [[docker][Docker]] still does not support [[nftables][~nftables~]] directly; there's an [[moby-nftables-issue][issue]] from 2016. With some [[stephank-docker-nftables][digging]] and [[fralef-docker-iptables][work]], there's a way to get [[nftables][~nftables~]] and [[docker][Docker]] to work nicely with each other. Specifically, we configure [[docker][Docker]] to not modify the ~iptables~ rules using the ~--iptables=false~ configuration flag for the daemon. In this configuration, we can tightly control the firewall with whatever tool we wish, in this case, [[nftables][~nftables~]]. This comes with the added benefit of bound ports are not automatically opened to the world. However, when using [[nixos][NixOS]], any modification to the [[nftables][~nftables~]] ruleset will require a reload. However, with [[docker][Docker]] loaded as well, this reload process can actually bring down the firewall completely since [[docker][Docker]] (even with ~--iptables=false~) will attempt to load the ~iptables~ kernel module, blocking the resulting ~nftables~ module load. When using a system such as [[gentoo][Gentoo]] this was never an issue, since the configuration completely ignore the ~iptables~ subsystem (since it was compiled out). In [[nixos][NixOS]], there's a bit more dance involved for the time being. This is really a minor annoyance as the firewall rules are only seldom changed.