home blog misc links contact about

Resurrecting a backup btrfs snapshot in a NixOS container

Do you ever have an idea that sounds like it’s too insane to actually work out, and then you try it, and against all expectations, it does work out? Pretty much this happened to me while I was trying to recover some corrupted files on my Nextcloud instance.

The problem

I am running a Nextcloud instance on a NixOS server, and I’m using btrfs snapshots for incremental backups. I once did some tinkering with the filesystem layout - I set up a couple of bind-mounts in the /var/lib/nextcloud/data directory to divide backups into a “less important” and a “more important” btrfs subvolume. Of course, as it always happens, I messed up the bind-mounts and Nextcloud ran on an empty data directory with its old database for a couple of hours.

Luckily, only one user was affected - they reported that they could not download any files anymore. Apparently, Nextcloud is keeping track of the user files in its SQL database too, and when it can’t find a file on the filesystem, it deletes the database record of that file. I quickly fixed the bind-mounts, and the user reported that they had lost nothing of importance. Well, guess I had luck this time.

Except that I hadn’t. Two weeks later, said user reported a really important file missing. A quick check revealed that it was still present on the filesystem, but not in the database. And of course I couldn’t just send the file to the user - I have server-side encryption enabled. AND Nextcloud server-side encryption is per-file, and really complicated with tons of stuff factoring into the keys. Well, well. If it isn’t the consequences of my own actions.

The solution

After some tinkering with the server-side encryption keys, I quickly realized that the process of manually decrypting the file would be pretty complicated and at least a weekend project that would’ve included reading up on a zillion AES modes and reading lots of Nextcloud source code. Manually copying the old file and encryption keys into a new file that was present in the database also didn’t work.

So what now? I couldn’t just roll back a server with federated services to a 2 weeks old snapshot. My next idea was to set up some kind of secondary VM with the old snapshot - the user could then download the file via the web interface of the Nextcloud instance on the VM. Under normal circumstances, that probably would’ve been a weekend project too. But with NixOS and btrfs, it took me a mere 20 minutes.

Preparing the btrfs subvolume

Btrfs snapshots are read-only. (For a good reason. You don’t want your incremental backups to be inconsistent). So the first thing I did was to create a snapshot of the snapshot and mark it read-write:

# btrfs subvolume snapshot /persist/.snapshots/11921/snapshot/ /tmp/tmp-vm
# btrfs property set -ts /tmp/tmp-vm/ ro false

Now I had the relevant parts 1 of the 2 weeks old server filesystem in a writable subvolume under /tmp/tmp-vm. The next step was to set up a VM that ran from that snapshot.

NixOS container

Then, I realized that I didn’t need a full VM. I could just use the NixOS container feature. Basically, it sets up systemd-nspawn containers which you can manage via your NixOS configuration. That a HUGE advantage, because now I could literally just reuse the Nextcloud config of my server for the temporary VM.

First of all, I added the container config to my NixOS server config:

  containers.tmp-vm = {
    config = { config, lib, pkgs, ... }: {
        # (Container NixOS configuration here)
    };

    autoStart = true;
    privateNetwork = true;
    hostAddress = "192.168.113.1";
    localAddress = "192.168.113.2";
    tmpfs = [ "/" ];

    bindMounts = {
      "/var/lib/nextcloud" = {
        hostPath = "/tmp/tmp-vm/var/lib/nextcloud";
        isReadOnly = false;
      };
      "/var/lib/mysql" = {
        hostPath = "/tmp/tmp-vm/var/lib/mysql";
        isReadOnly = false;
      };
      "/var/lib/secrets" = {
        hostPath = "/tmp/tmp-vm/var/lib/secrets";
        isReadOnly = false;
      };
    };

The config attribute takes a NixOS module. This is the NixOS configuration from which the container is build. In the following lines, I created the virtual networking interface (obviously, I wanted the host to be able to talk to the container). Also, the VM is temporary, so I just made / a tmpfs. Finally, I bind-mounted the relevant parts from the rw snapshot to the container.

Inside the NixOS container

Next up: The NixOS configuration for the container. My NixOS config is modularized, so I just imported my Nextcloud module. It just needed some minor tweaks: The container doesn’t need a firewall. Also, I didn’t have the nerves to set up NAT for an ACME cert renewal inside the container, so I set the container Nextcloud instance to HTTP only. Finally, the NixOS state version and the user uids should match that of the host.

# (Container NixOS configuration)
imports = [ ../../modules/nexcloud ];
networking.firewall.enable = false;
services.nextcloud.https = lib.mkForce false;

system.stateVersion = "22.11";
users.users.nextcloud.uid = 991;
users.users.mysql.uid = 84;

The container Nextcloud should be accessible under a separate subdomain than my main cloud, so I overrode that too:

# (Container NixOS configuration)
services.nextcloud.hostName = lib.mkForce "cloud2.aidoskyneen.eu";

Now, I just needed to configure the host Nginx instance to reverse-proxy cloud2.aidoskyneen.eu2 into the container:

# (Host NixOS configuration)
services.nginx.virtualHosts."cloud2.aidoskyneen.eu" = {
  forceSSL = true;
  enableACME = true;
  locations."/" = {
    proxyPass = "http://192.168.113.2";
  };
};

nixos-rebuild and done. The user could now log into the 2-weeks old VM with their usual credentials, download the missing file, and the problem was solved without affecting the host at all. That was stupidly easy.


  1. I am using NixOS impermanence.↩︎

  2. Wildcard subdomains are cool.↩︎