The SurfSense NixOS Saga: From Pure Nix Dreams to Pragmatic Solutions
My comprehensive journey through packaging modern AI applications on NixOS, featuring shell script battles, Docker compromises, and the quest for true automation
Introduction: The Promise and the Problem
When I first discovered SurfSense, a self-hosted alternative to NotebookLM and Perplexity, I was excited. An open-source AI-powered knowledge management system that respects privacy? Perfect for my NixOS setup. As a NixOS enthusiast, I couldn't resist the challenge: could I deploy this cutting-edge AI application using pure, declarative Nix configuration?
This is the story of my two parallel attempts that converged on a simple truth: sometimes the best NixOS solution isn't the purest one.
Part I: My Pure Nix Ambition
Day 1: Naive Optimism
My first attempt embodied the NixOS philosophy perfectly. I'd create a module that would:
python = pkgs.python3.withPackages (ps: with ps; [
fastapi uvicorn pydantic sqlalchemy
# ... surely all AI packages exist in nixpkgs, right?
]);
Reality Check #1: ModuleNotFoundError: No module named 'chonkie'
The AI/ML Python ecosystem moves faster than nixpkgs maintainers can package. Libraries like
chonkie, litellm, and rerankers simply didn't exist in the Nix ecosystem.
Day 2: The Custom Package Rabbit Hole
Undeterred, I dove into packaging these dependencies myself:
chonkie = super.buildPythonPackage rec {
pname = "chonkie";
version = "1.0.6";
src = super.fetchPypi {
inherit pname version;
hash = "sha256-0000000000000000000000000000000000000000000=";
};
buildInputs = [ poetry-core ]; # Modern Python uses Poetry
};
This triggered a cascade of issues:
- Hash mismatches requiring manual verification
- Poetry build system incompatibilities
- Dependency version conflicts
- Native binary requirements (static-ffmpeg)
- Packages depending on other unpackaged libraries
The Dependency Tree of Doom
I mapped out what I was actually trying to package:
SurfSense
├── Standard Web Stack (✅ In nixpkgs)
│ ├── fastapi
│ ├── uvicorn
│ └── pydantic
└── AI/ML Stack (❌ Missing from nixpkgs)
├── chonkie (custom embeddings)
├── litellm (LLM abstraction)
├── rerankers (AI ranking models)
├── langgraph (LangChain ecosystem)
└── static-ffmpeg (media processing)
Each missing package had its own dependencies, creating an exponential packaging burden.
Day 3: The Poetry2nix False Hope
Poetry2nix promised salvation, automatic conversion of Poetry projects to Nix. But it too struggled with:
- Complex native dependencies
- Packages requiring network access during build
- AI model downloads during installation
- Build systems that assumed mutable environments
The Moment of Clarity
After three days of fighting Python packaging, I had an epiphany: I was solving the wrong problem.
The SurfSense developers had already solved dependency management. They provided tested Docker images. Why was I recreating months of their work?
Part II: My Shell Script Nightmare
While exploring the pure Nix approach, I also attempted the "practical" route using Docker/Podman with shell scripts.
The @ Symbol That Broke Everything
My first enemy appeared innocent enough:
[email protected]
Nix's parser choked on the @ symbol:
error: syntax error, unexpected '@', expecting '}'
This forced me into increasingly absurd workarounds:
EMAIL_AT="@"
echo "admin''${EMAIL_AT}surfsense.com" > .env
Heredoc Inception Hell
Embedding shell scripts that create files with heredocs inside Nix expressions created a quoting nightmare:
writeShellScript "setup" ''
cat > docker-compose.yml << 'COMPOSE_EOF'
version: '3.8'
services:
db:
environment:
POSTGRES_PASSWORD: ''${PASSWORD} # Which context is this?
COMPOSE_EOF
''
Three layers of string interpolation made debugging nearly impossible. I spent hours trying to figure out which quotes belonged to which context.
The Port Conflict Dance
Error: cannot listen on TCP port: listen tcp4 :5432: bind: address already in use
Of course, my NixOS system already ran PostgreSQL. But changing ports triggered cascading configuration updates across multiple files and services.
Part III: The Convergence, Docker as a Pragmatic Solution
Both my attempts eventually converged on the same conclusion: use Docker/Podman for the application while leveraging NixOS for infrastructure.
The Hybrid Architecture
┌─────────────────────────────────────┐
│ NixOS Management │
├─────────────────────────────────────┤
│ • PostgreSQL with pgvector │
│ • Systemd service orchestration │
│ • Firewall rules │
│ • User/permission management │
│ • Environment configuration │
└─────────────────────────────────────┘
↕
┌─────────────────────────────────────┐
│ Docker/Podman Containers │
├─────────────────────────────────────┤
│ • SurfSense backend (Python + AI) │
│ • SurfSense frontend (React) │
│ • pgAdmin interface │
└─────────────────────────────────────┘
This gave me the best of both worlds:
- NixOS declarative configuration for infrastructure
- Container isolation for complex dependencies
- Maintainable and upgradeable system
Part IV: The Arion Revolution
Just when I thought shell scripts were my only option, I discovered Arion, "Docker Compose with help from Nix."
Before Arion: String Escaping Hell
# Nightmare fuel
script = ''
cat > .env << 'EOF'
POSTGRES_PASSWORD=$(echo "''${PASSWORD//\"/\\\"}")
[email protected] # This still breaks
EOF
'';
After Arion: Nix-Native Bliss
services.backend = {
service.image = "ghcr.io/modsetter/surfsense-backend:latest";
service.environment = {
DATABASE_URL = "postgresql+asyncpg://postgres:postgres@db:5432/surfsense";
BACKEND_URL = "http://localhost:8000";
};
service.ports = [ "8000:8000" ];
};
Clean. Declarative. No escaping nightmares.
Part V: The Final Boss, Cache Invalidation
Victory seemed complete until I discovered systemd's aggressive caching. Changes to my configuration weren't taking effect without manual intervention:
# The ritual after every config change
sudo rm -f /var/lib/surfsense/.tmp-arion*
sudo systemctl restart surfsense-setup
sudo systemctl restart surfsense
The Automated Solution
I forced fresh configurations on every service start:
systemd.services.surfsense = {
serviceConfig = {
ExecStartPre = [
# Force setup to re-run
"${pkgs.systemd}/bin/systemctl stop surfsense-setup.service || true"
"${pkgs.systemd}/bin/systemctl start surfsense-setup.service"
# Wait for completion
"${pkgs.coreutils}/bin/sleep 2"
];
ExecStart = "${pkgs.arion}/bin/arion up -d";
};
};
Lessons I Learned
1. The Right Tool for the Right Job
- NixOS excels at: System configuration, service management, infrastructure
- Containers excel at: Complex dependency management, bleeding-edge ecosystems
- Use both: Don't be a purist when pragmatism delivers better results
2. The AI/ML Ecosystem Exception
The AI/ML Python ecosystem is uniquely challenging for Nix:
- Rapid release cycles (daily/weekly updates)
- Complex native dependencies (CUDA, specialised libraries)
- Large model downloads during installation
- Assumption of mutable environments
Until this stabilises, containers are the pragmatic choice.
3. String Escaping is a Code Smell
If you're fighting with multiple levels of string escaping, you're probably using the wrong approach. Arion eliminated my escaping nightmares by staying in Nix-land.
4. Cache Invalidation Remains Hard
Even in declarative systems, cache invalidation is tricky. I learnt to design for it from the start:
- Force fresh reads of configuration
- Clear temporary files explicitly
- Use service dependencies wisely
- Test configuration updates thoroughly
5. Documentation Lies by Omission
Both attempts suffered from my assumptions:
- Repository structures aren't always obvious
- "Docker compatible" has asterisks
- Directory names matter
- Default ports conflict
Always explore the actual codebase.
Conclusion: Pragmatic Purity
My SurfSense journey taught me that NixOS's greatest strength isn't enforcing purity, it's enabling pragmatic, maintainable solutions. By combining:
- NixOS's declarative infrastructure management
- Container isolation for complex applications
- Arion's Nix-native orchestration
- Careful attention to automation details
I achieved a solution that's both maintainable and functional. One command, nixos-rebuild switch
, deploys a complete AI knowledge management system.
The pure Nix approach would have been theoretically elegant but practically unmaintainable. The shell script approach was flexible but fragile. The hybrid solution with Arion is just right, leveraging each technology's strengths while avoiding their weaknesses.
Sometimes the best pure Nix solution is knowing when not to use pure Nix. And sometimes the best automation requires fighting cache invalidation until it's truly automatic.
Epilogue: The Community Wisdom
This journey exemplifies a broader pattern I'm seeing in the NixOS community. We're adopting similar hybrid approaches for:
- Kubernetes: NixOS manages nodes, K8s manages containers
- Development environments: Nix provides tools, containers provide services
- Machine learning: Nix manages CUDA/drivers, containers provide frameworks
The future isn't pure Nix or pure containers, it's intelligent integration.
Want to try SurfSense on NixOS? The complete configuration is available at [link-to-config].
Have your own NixOS packaging war stories? Share them, we're all learning together in this declarative adventure.
Special thanks to the NixOS community for Arion, and to the SurfSense team for excellent container images that made this integration possible.