Rogal Dorn Gets a New Body: What NixOS Makes Trivial Would Take a Weekend Otherwise
A hardware migration story in which the configuration survives unscathed, the router briefly forgets who lives here, and a search index has an existential crisis about its own version number.
The Patient
Rogal Dorn, for those unfamiliar with my increasingly elaborate Warhammer 40K homelab taxonomy, is the main home server. The backbone. The machine running Jellyfin, Gitea, Immich, Authelia, Karakeep, and a frankly irresponsible number of other services that my household has quietly come to depend on. Named after the Primarch of the Imperial Fists, who was famously recalled to Terra to fortify it against an existential threat. The parallel felt apt.
Until recently, Rogal Dorn ran on a Lenovo ThinkPad P53: Intel i7-9850H, 16GB DDR4-2666 ECC, an Nvidia Quadro T1000M that mostly sat there being a Quadro. It was, by all accounts, a machine that should not have been used as a server. It was a workstation-class laptop doing a server's job in a server rack it was never designed to fit into. It was also, by the standards of improvised homelab hardware, genuinely excellent at it.
Then it died. Slowly, then all at once, in the way that only genuinely cursed hardware can manage.
The P53 had a battery problem. Several cells had deteriorated to the point where the firmware decided, quite reasonably from a fire-prevention standpoint, that it would no longer charge the battery. The consequence of this decision was not "replace the battery," but rather: the machine could run until the battery depleted, then had to sit idle until enough charge crept back in for another session, then run again until depleted, and so on. A laptop PSU dying in a laptop that was being used as a server, which is to say a machine that was supposed to run continuously without interruption, is a particular kind of irony.
The firmware could not detect the deterioration. It knew something was wrong with charging, so it stopped charging, but it had no mechanism to communicate "your battery is dying and this will keep getting worse." The options were: replace the laptop, or continue running a machine in a state where, if left unattended long enough, the battery would eventually reach a state of chemistry that tends to express itself as heat. Not ideal for something sitting in a cabinet in a living space.
So: new hardware. No fanfare, no kernel panic, no helpful last words in the journal. Services went dark. Jellyfin went quiet. My family noticed within approximately four minutes.
The replacement was a Minisforum MS-A2: AMD Ryzen 9 9955HX, integrated Radeon 890M, a machine about the size of a thick paperback that draws a fraction of what a workstation laptop idles at and, crucially, has four ethernet ports. Four. On a device you could mistake for an Apple TV if you weren't paying attention. The Adeptus Mechanicus would approve of the form factor, even if they'd require seventeen rituals before acknowledging the silicon was blessed.
What followed was either a testament to declarative infrastructure or a moderately stressful afternoon, depending on your outlook. Probably both.
The Transplant
Here is the thing about NixOS that sounds like marketing until you actually need it: the configuration is not on the machine. It's in a git repository. The machine's entire intended state, every service, every user, every firewall rule, every cronjob, all of it, lives in version-controlled Nix files that could be applied to any compatible piece of hardware.
What is on the machine is the data: the NVMe drive, containing the filesystems, the databases, the media, the state that actually accumulates over time and cannot be trivially regenerated from a config file. That was intact. The P53 had suffered some form of board-level failure; the drive was fine.
So the transplant procedure was this:
- Remove the NVMe from the dead P53.
- Install it in the MS-A2.
- Boot.
That's it. The machine came up. The old configuration was there, the old filesystems were there, all the services attempted to start. It was running the P53's config on MS-A2 hardware, which is not exactly a supported configuration, but it booted and most things worked. This is the part where NixOS earns its reputation: the configuration being separate from the hardware means hardware failure becomes a logistics problem rather than a data recovery problem. You're not reinstalling an OS and then trying to remember what you had configured. You're just pointing different hardware at the same instructions.
A traditional setup would have involved reinstalling the OS, reinstalling each service, recovering configuration from wherever you'd last backed it up (and hoping that was recently), and then spending the better part of a weekend wondering if you'd missed anything. Here, the concern was simpler: update the configuration to match the new hardware, deploy it, done.
The filesystem UUIDs, incidentally, did not change. The root partition, the boot partition, the swap device: all still identified by the same UUIDs as when the P53 had them. Because UUIDs identify filesystems, not hardware. The new machine simply mounted the same drives with the same identifiers. This felt almost suspicious in how straightforwardly it worked.
The Router Wasn't Told
The first sign that the universe hadn't entirely given up on making this difficult was the IP address. Rogal
Dorn lives at *.*.*.95, maintained by a DHCP reservation in the router tied to the MAC
address of whichever network interface asks for it.
The MS-A2's MAC address is, self-evidently, not the P53's MAC address. The router had no idea who this new
machine was. It handed out *.*.*.43, which is what routers do when they encounter
unfamiliar hardware: they assign something arbitrary from the pool and wait for you to notice.
Diagnosing this required exactly one command:
ip link show
Which revealed, amongst other things, that the active interface was enp3s0
with MAC address 38:05:25:36:aa:05, that the machine had four ethernet ports total (the MS-A2
is, again, a device with four ethernet ports, a fact that continues to delight me), and that the old
configuration was trying to bring up enp0s31f6, which doesn't exist on
this hardware and never will.
This diagnostic took about thirty seconds. The preceding forty-four and a half minutes were spent on less productive endeavours, chief amongst them: inserting an ethernet cable into an SFP+ port. The MS-A2 has both RJ45 ports and SFP+ ports in relatively close proximity, and in a moment of confidence that I will not be repeating, I confirmed that an ethernet cable will physically enter an SFP+ cage if you apply the appropriate amount of optimism. It will not, however, establish a network connection. The port sits there. The cable sits in it. Nothing happens. The diagnosis of this situation required more time than it should have.
I am not a network engineer. I have made peace with this.
Update the ARP/DHCP reservation in the router with the correct MAC. Plug the cable into an actual RJ45 port. Then:
sudo nmcli networking off && sudo nmcli networking on
Back on .95. The router, newly informed of who lived here now, cooperated. This is
not a NixOS problem. This is just how networks work when you're willing to physically insert cables
into ports that were not designed to receive them, and it is slightly annoying every time regardless.
What Actually Needed Changing
The interesting part of a hardware migration on NixOS is the accounting exercise: what in the configuration was actually about the hardware, and what was just configuration? The answer, in this case, was less than you might expect.
The Kernel Modules
The hardware configuration file, generated by nixos-generate-config when the system was first
installed, listed the kernel modules appropriate for P53 hardware. The MS-A2 is AMD throughout, with
different storage controllers and no SD card reader. The updated hardware configuration ended up
considerably simpler:
boot.initrd.availableKernelModules = [ "xhci_pci" "nvme" ];
boot.initrd.kernelModules = [ ];
boot.kernelModules = [ "kvm-amd" ];
hardware.cpu.amd.updateMicrocode =
lib.mkDefault config.hardware.enableRedistributableFirmware;
The P53 configuration had rtsx_pci_sdmmc for its SD card reader, and
hardware.cpu.intel.updateMicrocode for its Intel CPU. The MS-A2 has neither an SD card reader
nor an Intel CPU, which made those lines somewhat aspirational. Removed.
The Network Interface
One line in vars/networking.nix:
rogaldorn = {
# Minisforum MS-A2 - Main home server
iface = "enp3s0"; # was: "enp0s31f6"
ipv4 = "*.*.*.95";
};
This propagated automatically through every place the interface name was used, because the vars system generates interface configuration from a single source of truth. Except for one place: the NetAlertX network scanner configuration, which had the interface name embedded as a literal string in a config file template. That required a separate change. One line. Noted for the future.
The Laptop-Specific Configuration
This is where the P53's history as a laptop-disguised-as-server came back to haunt things. The configuration had accumulated several blocks of code that made sense for a machine with a battery and a lid:
- A lid switch handler telling logind to ignore the lid closing. The MS-A2 does not have a lid. Configuring what to do when it closes is, philosophically speaking, a solved problem.
- A battery guard service: a systemd timer that polled
/sys/class/power_supply/BAT0/capacityevery two minutes and initiated a clean shutdown if the battery dropped below 5% while unplugged. The MS-A2 does not have a battery. The service would have simply done nothing on every poll, exited cleanly, and wasted everyone's time. - acpid, enabled for AC connect/disconnect event logging. Mini PCs are permanently plugged in. The events never come.
- thermald, Intel's dynamic thermal management daemon. AMD CPUs do not use thermald. It would have installed, started, found no Intel thermal zones, and sat there consuming memory while contributing nothing.
- A service to disable Intel turbo boost by writing to
/sys/devices/system/cpu/intel_pstate/no_turbo. The MS-A2 has an AMD CPU. Theintel_pstatedriver does not exist. The service would have checked for the file, found nothing, and exited. Harmless, pointless, and slightly embarrassing.
All of it removed. The configuration went from managing a laptop's power anxiety to managing a box that is simply plugged in and expected to work. Which is, frankly, a relief.
The GPU Stack
The P53 had an Nvidia Quadro T1000M. This was used for CUDA acceleration in Ollama, the local LLM service, and for hardware video transcoding in Jellyfin via VAAPI with the Intel integrated graphics. Both required configuration. Both required packages. The Nvidia driver configuration alone was:
services.xserver.videoDrivers = [ "nvidia" ];
hardware.nvidia = {
modesetting.enable = true;
open = false;
};
Plus the Ollama service itself, configured with pkgs.ollama-cuda and loaded models including
deepseek-r1, llama3.2, and codellama. The MS-A2 has no discrete GPU. It has an integrated Radeon 890M,
which is an RDNA3.5 integrated graphics unit and is fine for display output and hardware video decode but
is not, by any stretch, the place you want to run large language models.
Ollama on the MS-A2 would have meant CPU inference. Slow, warm, and unnecessary given that a Mac Mini is taking over that responsibility. So: Ollama removed entirely. The Nvidia driver configuration, the CUDA packages, the model loading, the Open WebUI frontend: gone. The configuration is shorter. This counts as a win.
Jellyfin's VAAPI configuration replaced the Intel stack with AMD's equivalent. The Radeon 890M uses Mesa's radeonsi driver for hardware video decode, which is considerably less ceremony than Intel's setup required:
hardware.graphics = {
enable = true;
extraPackages = with pkgs; [
mesa # includes radeonsi VA-API for AMD
libva-vdpau-driver
libvdpau-va-gl
rocmPackages.clr # OpenCL for AMD (tonemapping, subtitle burn-in)
];
};
The Intel configuration had required overriding the vaapiIntel package to enable hybrid
codec support. AMD does not require this. You add Mesa. Mesa includes the driver. Hardware transcoding
works. The contrast in complexity is notable.
One More Thing, Obviously
The deploy completed. Services came up. A brief survey of systemctl status revealed one
casualty: karakeep-meilisearch.service, in a restart loop, producing this with metronomic
regularity:
error=Your database version (1.38.2) is incompatible
with your current engine version (1.41.0).
MeiliSearch, the search engine used by Karakeep for bookmark indexing, had been updated in nixpkgs at some point between the last deployment and this one. The binary was now 1.41.0. The database on disk was written by 1.38.2. MeiliSearch considers this an insurmountable philosophical difference and refuses to start rather than attempting to read data it considers beneath it.
To be clear about what MeiliSearch actually contains in this context: search index data. Derived state.
A representation of the bookmarks that already exist in Karakeep's SQLite database, optimised for full-text
search. Nothing that cannot be regenerated. The fix was to stop the service, remove
/var/lib/karakeep/meilisearch/*, and start it again. MeiliSearch initialised a fresh 1.41.0
database. Karakeep re-indexed everything. Done in five minutes.
The annoying part is not the fix. The fix is trivial. The annoying part is that the fix requires knowing what the error means, finding where the data lives, and performing a manual operation that is, fundamentally, "delete some files." In a system that is supposed to be deployable without manual intervention, "delete some files" is a failure mode.
Making It Structural
The solution is not complicated. MeiliSearch writes its version to a file called VERSION in
the data directory when it initialises. The binary version is available from meilisearch --version.
If those two things disagree, the data directory is stale and should be cleared before starting.
This is now an ExecStartPre script in the karakeep-meilisearch systemd service:
ExecStartPre = pkgs.writeShellScript "meilisearch-version-check" ''
VERSION_FILE="/var/lib/karakeep/meilisearch/VERSION"
if [ -f "$VERSION_FILE" ]; then
DB_VERSION=$(cat "$VERSION_FILE")
BINARY_VERSION=$(${pkgs.meilisearch}/bin/meilisearch --version \
| grep -oE '[0-9]+\.[0-9]+\.[0-9]+')
if [ "$DB_VERSION" != "$BINARY_VERSION" ]; then
echo "MeiliSearch version mismatch \
(db: $DB_VERSION, binary: $BINARY_VERSION) -- wiping index"
rm -rf /var/lib/karakeep/meilisearch/*
fi
fi
'';
Ten lines. The script runs before MeiliSearch starts, checks the version, wipes the directory if there's a mismatch, and exits. MeiliSearch then starts against a clean data directory and initialises normally. The next deploy that includes a MeiliSearch version bump will handle itself. No manual intervention, no restart loops in production, no five-minute detour into a service's data directory.
This pattern is specific to MeiliSearch because MeiliSearch is a search index: its data is entirely derived from primary sources and can be regenerated without loss. It would be the wrong pattern for a PostgreSQL instance or anything containing data that cannot be reconstructed. Those services either get explicit version pinning or their own proper migration handling. MeiliSearch gets the wipe script because wiping MeiliSearch is correct.
The Honest Conclusion
Total downtime: the time it took to move a screwdriver and wait for a Colmena deploy. The actual configuration work was under an hour. The networking section took forty-five minutes, of which approximately forty-four were attributable to my personally testing the hypothesis that ethernet cables work in SFP+ ports. They do not. The configuration changes were roughly forty lines across six files: the hardware configuration, the networking vars, the configuration.nix cleanup, the Jellyfin VAAPI update, the removal of Ollama, the NetAlertX interface update.
The services that were running on the P53 are running on the MS-A2. The data that was on the P53's NVMe is still on the NVMe, now in a different chassis. The filesystem UUIDs are the same. The service configurations are the same, minus the hardware-specific code that no longer applied. Everything that was not about the P53's specific hardware survived the transplant without modification.
This is what declarative configuration is actually for. Not the theoretical elegance of reproducible builds, not the philosophical appeal of treating infrastructure as code, not the bragging rights at conferences. The point is that when hardware dies, which it does, the system's intended state exists somewhere other than the machine that just became a paperweight. You pull the drive, you put it in something else, you update the twelve lines that were about the old hardware, and you deploy.
The alternative is a weekend and a list of things you've forgotten you configured. I've done that. It is less fun.
Rogal Dorn is back online. The Primarch has a new body. The services run. The router knows who lives here. MeiliSearch will handle its own upgrades from now on. This is, by the standards of unexpected hardware failure on a Monday, a remarkably satisfactory outcome.
The full configuration for this setup is in the nix-templates repository, if you want to see what forty lines of change looks like across a NixOS homelab migration.