home blog misc links contact about
Do you ever have an idea that sounds like it’s too insane to actually work out, and then you try it, and against all expectations, it does work out? Pretty much this happened to me while I was trying to recover some corrupted files on my Nextcloud instance.
I am running a Nextcloud instance on a NixOS server, and I’m using
btrfs snapshots for incremental backups. I once did some tinkering with
the filesystem layout - I set up a couple of bind-mounts in the
/var/lib/nextcloud/data
directory to divide backups into a
“less important” and a “more important” btrfs subvolume. Of course, as
it always happens, I messed up the bind-mounts and Nextcloud ran on an
empty data
directory with its old database for a couple of
hours.
Luckily, only one user was affected - they reported that they could not download any files anymore. Apparently, Nextcloud is keeping track of the user files in its SQL database too, and when it can’t find a file on the filesystem, it deletes the database record of that file. I quickly fixed the bind-mounts, and the user reported that they had lost nothing of importance. Well, guess I had luck this time.
Except that I hadn’t. Two weeks later, said user reported a really important file missing. A quick check revealed that it was still present on the filesystem, but not in the database. And of course I couldn’t just send the file to the user - I have server-side encryption enabled. AND Nextcloud server-side encryption is per-file, and really complicated with tons of stuff factoring into the keys. Well, well. If it isn’t the consequences of my own actions.
After some tinkering with the server-side encryption keys, I quickly realized that the process of manually decrypting the file would be pretty complicated and at least a weekend project that would’ve included reading up on a zillion AES modes and reading lots of Nextcloud source code. Manually copying the old file and encryption keys into a new file that was present in the database also didn’t work.
So what now? I couldn’t just roll back a server with federated services to a 2 weeks old snapshot. My next idea was to set up some kind of secondary VM with the old snapshot - the user could then download the file via the web interface of the Nextcloud instance on the VM. Under normal circumstances, that probably would’ve been a weekend project too. But with NixOS and btrfs, it took me a mere 20 minutes.
Btrfs snapshots are read-only. (For a good reason. You don’t want your incremental backups to be inconsistent). So the first thing I did was to create a snapshot of the snapshot and mark it read-write:
# btrfs subvolume snapshot /persist/.snapshots/11921/snapshot/ /tmp/tmp-vm
# btrfs property set -ts /tmp/tmp-vm/ ro false
Now I had the relevant parts 1 of the 2 weeks old
server filesystem in a writable subvolume under
/tmp/tmp-vm
. The next step was to set up a VM that ran from
that snapshot.
Then, I realized that I didn’t need a full VM. I could just use the
NixOS container
feature. Basically, it sets up systemd-nspawn
containers
which you can manage via your NixOS configuration. That a HUGE
advantage, because now I could literally just reuse the Nextcloud config
of my server for the temporary VM.
First of all, I added the container config to my NixOS server config:
-vm = {
containers.tmpconfig = { config, lib, pkgs, ... }: {
# (Container NixOS configuration here)
};
autoStart = true;
privateNetwork = true;
hostAddress = "192.168.113.1";
localAddress = "192.168.113.2";
tmpfs = [ "/" ];
bindMounts = {
"/var/lib/nextcloud" = {
hostPath = "/tmp/tmp-vm/var/lib/nextcloud";
isReadOnly = false;
};
"/var/lib/mysql" = {
hostPath = "/tmp/tmp-vm/var/lib/mysql";
isReadOnly = false;
};
"/var/lib/secrets" = {
hostPath = "/tmp/tmp-vm/var/lib/secrets";
isReadOnly = false;
};
};
The config
attribute takes a NixOS module. This is the
NixOS configuration from which the container is build. In the following
lines, I created the virtual networking interface (obviously, I wanted
the host to be able to talk to the container). Also, the VM is
temporary, so I just made /
a tmpfs. Finally, I
bind-mounted the relevant parts from the rw snapshot to the
container.
Next up: The NixOS configuration for the container. My NixOS config is modularized, so I just imported my Nextcloud module. It just needed some minor tweaks: The container doesn’t need a firewall. Also, I didn’t have the nerves to set up NAT for an ACME cert renewal inside the container, so I set the container Nextcloud instance to HTTP only. Finally, the NixOS state version and the user uids should match that of the host.
# (Container NixOS configuration)
[ ../../modules/nexcloud ];
imports = false;
networking.firewall.enable = false;
services.nextcloud.https = lib.mkForce
"22.11";
system.stateVersion = 991;
users.users.nextcloud.uid = 84; users.users.mysql.uid =
The container Nextcloud should be accessible under a separate subdomain than my main cloud, so I overrode that too:
# (Container NixOS configuration)
"cloud2.aidoskyneen.eu"; services.nextcloud.hostName = lib.mkForce
Now, I just needed to configure the host Nginx instance to
reverse-proxy cloud2.aidoskyneen.eu
2
into the container:
# (Host NixOS configuration)
"cloud2.aidoskyneen.eu" = {
services.nginx.virtualHosts.forceSSL = true;
enableACME = true;
locations."/" = {
proxyPass = "http://192.168.113.2";
};
};
nixos-rebuild
and done. The user could now log into the
2-weeks old VM with their usual credentials, download the missing file,
and the problem was solved without affecting the host at all. That was
stupidly easy.
I am using NixOS impermanence.↩︎
Wildcard subdomains are cool.↩︎