Knowledge Base
Nelson Home — Operational Knowledge Base
Durable patterns, API details, and gotchas. Organized by topic, not by date. Agents: add new sections when you discover patterns worth preserving.
Public Exposure & Domain Strategy
- Owned Domains:
tudhopenelson.duckdns.org(Dynamic DNS)tudhopenelson.compalladiumresearch.comtanzolabs.com
- Active Public Endpoints:
proxmox.tudhopenelson.duckdns.org→nelson-pve:8006(SSL: Let's Encrypt via NPM)
- Strategy Checklist:
- [ ] Determine which internal services (Nextcloud, Vaultwarden, etc.) should be exposed via
*.tudhopenelson.com. - [ ] Configure DNS-01 challenge for wildcard SSL if moving to Caddy.
- [ ] Implement Access Lists in NPM/Caddy for sensitive public endpoints. [Gemini, 2026-02-19 22:10]
- [ ] Determine which internal services (Nextcloud, Vaultwarden, etc.) should be exposed via
Identity & SSH Hardening (Deterministic Identity)
To prevent "unreachable" errors and authentication drift, Nelson Home uses a Deterministic Identity Architecture:
- Semaphore Identity: A dedicated SSH key pair (
semaphore@nelson.home) is the sole identity for automation. - IaC Source of Truth: The public half of this key is stored in
ansible/group_vars/all/common.ymlasautomation_public_key. - Enforcement: The
ansible/harden_ssh_keys.ymlplaybook ensures this exact key is authorized on every node in the lab. - Bootstrap Pattern: When adding a new node or recovering from an auth failure, run
.ops/bootstrap_identity.shfrom your MacBook. This script uses standard SSH to "push" the Semaphore identity to all nodes without requiring Ansible on your workstation. [Gemini, 2026-03-07]
Robust Audit Strategy
To ensure audits run across heterogeneous nodes (different OS versions, missing Python libraries, restricted API access), the following patterns are now preferred in Nelson Home:
- Proxmox Infrastructure: Use native
pveshon the Proxmox host (hosts: proxmox_nodes) instead of external API calls. This avoids the need for external API tokens and handles authentication natively via SSH. - Docker Audits: Use a shell-based fallback (
docker inspect) when thecommunity.docker.docker_container_infomodule fails due to a missingrequestslibrary on the target node (common on minimal LXC/edge installs). - Multi-Node Aggregation: For audits covering multiple hosts, collect information into host-specific facts and then use a
run_oncetask onlocalhostto aggregate them into a single markdown report. This prevents race conditions and ensures a complete view of the lab. [Gemini, 2026-03-07]
UniFi + MongoDB
- The
linuxserver/unifi-network-application:latestimage now requires theunifiMongoDB user to havedbOwneron three databases:unifi,unifi_stat, andunifi_audit. - The community
init-mongo.shscript only createsunifiandunifi_stat— it is missingunifi_audit. - Newer image versions (pulled 2026-02-19) added the
unifi_auditrequirement. Tomcat fails to start with error 13 (Unauthorized) if the permission is missing. - Live fix (no data loss):
docker exec unifi-db mongosh --username root --password example --authenticationDatabase admin --eval 'db.getSiblingDB("admin").grantRolesToUser("unifi", [{ role: "dbOwner", db: "unifi_audit" }])' init-mongo.shhas been patched to includeunifi_auditfor future fresh installs. [Claude Code, 2026-02-19]
UniFi Device Adoption (Force Re-Adopt)
When a UniFi device gets stuck adopting after the controller moves to a new IP:
- SSH into the device (e.g., USG at 192.168.1.1):
ssh btnelson@192.168.1.1- Password: stored in Vaultwarden (do NOT put here — this file is in git)
- Enter the UniFi management console:
mca-cli - Point the device at the controller's inform URL:
set-inform http://<controller-ip>:8080/inform- Current controller:
192.168.1.11:8080(ubuntu-server, migrate to nelson-manager)
- Current controller:
- The device should appear as "Pending Adoption" in the controller UI within ~30 seconds
- Click Adopt in the controller UI
Applies to: USG, APs, switches — any UniFi device that loses track of the controller.
Docker-in-LXC: sysctl net.ipv4.ip_unprivileged_port_start Permission Denied
- Symptom: Docker containers (NPM, AdGuard Home) in a Proxmox LXC fail to start with error:
open sysctl net.ipv4.ip_unprivileged_port_start file: reopen fd 8: permission denied. - Root Cause: Newer Docker/runc versions (e.g., 28.x) attempt to set certain sysctls during container initialization. Even in a privileged LXC, the default AppArmor profile can block these calls.
- Fix: Set
lxc.apparmor.profile: unconfinedin the LXC configuration file on the Proxmox host (/etc/pve/lxc/<vmid>.conf) and restart the LXC. - Prevention: Always use
lxc.apparmor.profile: unconfinedfor LXCs hosting Docker workloads in Nelson Home, especially when using newer Docker versions. [Gemini, 2026-02-19 21:50]
AdGuard Home API
- REST API runs on port 80 (same as dashboard). Port 3000 is the initial setup wizard ONLY.
- Auth: HTTP Basic Auth (
force_basic_auth: yesin Ansible). - List rewrites:
GET /control/rewrite/list— returns[{domain, answer}]. - Add rewrite:
POST /control/rewrite/addwith{domain, answer}body.
Nginx Proxy Manager API
- Auth:
POST /api/tokenswith{identity, secret}— returns{token}. - Use
Authorization: Bearer <token>for all subsequent calls. - List hosts:
GET /api/nginx/proxy-hosts— array of proxy host objects. - Create:
POST(status 201). Update:PUT /api/nginx/proxy-hosts/<id>(status 200). - Idempotency pattern: Fetch all hosts, build dict keyed by
domain_names[0]. For each desired host: not in map = create, in map + changed = update, otherwise skip.
Semaphore Template Configuration
When configuring a new template in Semaphore, ensure the following are set correctly to avoid execution failures:
- Variable Group: Every template must have a Variable Group assigned (e.g., "Default" or "Unifi Secrets") to provide required variables like
unifi_password,npm_user, etc. - Vaults: Even if a playbook does not explicitly use secrets, you must assign the "default" vault key if the repository contains any encrypted files (like
ansible/group_vars/all/secrets.yml). Ansible decrypts all group variables at startup; without the vault key, the task will fail with "Attempting to decrypt but no vault secrets found". - Specific Example (Audit Network):
- Variable Group:
Unifi Secrets(containsunifi_passwordfor theansibleuser). - Vaults:
default(to allow Ansible to load the encryptedsecrets.yml).
- Variable Group:
Service Credentials (.env.local)
- All service credentials are stored in
.env.localin the repo root (gitignored). - Agents MUST read
.env.localto get API credentials before calling service APIs. - Variables:
SEMAPHORE_USER,SEMAPHORE_PASSWORD,SEMAPHORE_URL,NPM_USER,NPM_PASSWORD,ADGUARD_USER,ADGUARD_PASSWORD,PROXMOX_PASSWORD,ANSIBLE_VAULT_PASSWORD. .env.localis the Mac-local credential store. It is NOT committed to git.- If a credential doesn't work, it may have been rotated — check Vaultwarden. [Claude Code, 2026-03-08 09:00]
Semaphore API
- Credentials: Read from
.env.local(SEMAPHORE_USER,SEMAPHORE_PASSWORD,SEMAPHORE_URL). - Auth:
POST /api/auth/loginwith cookie jar to get session. - List templates:
GET /api/project/1/templates. - Run task:
POST /api/project/1/taskswith{template_id}. - Poll:
GET /api/project/1/tasks/<id>(checkstatus == "success"). - Output:
GET /api/project/1/tasks/<id>/output. - Template update gotcha: PUT body MUST include
"id": <template_id>matching the URL, or Semaphore returns 400.
Semaphore Networking
- Semaphore runs in Docker with its own network namespace.
127.0.0.1inside Semaphore = container loopback, NOT the host.- Always use the LAN IP (currently
192.168.1.30) to reach host services from playbooks runninghosts: localhostin Semaphore.
Ansible + Semaphore Vault
- Even if a playbook doesn't reference vault variables, Ansible still decrypts
group_vars/all/secrets.ymlwhen loading group vars. - Semaphore templates MUST have
vault_key_idassigned or they fail with "Attempting to decrypt but no vault secrets found".
hostvars Key Names
hostvarsdict is keyed by the inventory hostname, not alias.hostvars['nelson-edge']['ansible_host']works (LXC).hostvars['nelson-pve']['ansible_host']works (Proxmox).- The ubuntu-server host's inventory key is its IP
192.168.1.11, not the stringubuntu-server.
Homepage Dashboard
- Homepage watches its config directory (mounted from repo).
- A
git pullon the server is sufficient to update — no container restart needed. - Use
sync_repo.ymlfor config-only changes instead ofdeploy_stack.yml.
Proxmox Operations
- Always use
delegate_to: nelson-pvefor tasks on Proxmox, never raw SSH shell commands. - Backup drive: 5TB WD Elements at
/mnt/nelson-backups(UUID: 03575663-f377-4557-8d64-8bc9f161916e). - nelson-manager is a Privileged LXC (required for Docker sysctl + AppArmor). nelson-identity was decommissioned 2026-02-19.
Node Network Interfaces
nelson-edge (Proxmox LXC, 192.168.1.2)
- Primary entrypoint for internal DNS and SSL routing.
- Runs Docker workloads inside LXC (requires
unconfinedAppArmor profile). - Legacy Pi still on network: A physical Raspberry Pi (MAC
dc:a6:32:a5:4d:fb, hostname "nelson-edge") appears at 192.168.1.247 (wired) and 192.168.1.90 (wireless) in network audits. This is the old edge node before migration to Proxmox LXC. It is not the active edge node. Do not confuse it with the LXC at 192.168.1.2.
SSH & Git
- Always ensure
originis set to SSH (git@github.com) on controller nodes to prevent auth hangs. - Never overwrite existing SSH keys unless specifically requested; reuse existing keys.
Tailscale DNS & Split DNS
- Tailnet:
tadpole-dory.ts.netwith MagicDNS enabled. - Split DNS:
.nelson.homequeries are routed to100.77.163.93(nelson-edge's Tailscale IP → AdGuard Home). - Global nameservers: Cloudflare (1.1.1.1) + Google (8.8.8.8) for everything else. "Override DNS servers" is ON.
- Gotcha (2026-03-08): The old global nameserver
100.75.196.25(nelson-pi) had stale DNS records pointing.nelson.homeservices to192.168.1.11(monolith) instead of192.168.1.2(nelson-edge/NPM). This broke all.nelson.homeURLs on Tailscale-connected devices. Fix: replaced with split DNS rule pointing.nelson.home→ nelson-edge's AdGuard. - Prevention: When changing DNS infrastructure, verify Tailscale nameserver config matches — Tailscale DNS overrides local network DNS when "Override DNS servers" is enabled. [Claude Code, 2026-03-08 00:15]
Observability Architecture
Nelson Home uses a two-tier alerting strategy:
Grafana (resource thresholds):
- Datasource: Prometheus (
uid: Prometheus, URL:http://prometheus:9090) - Custom dashboard: "Nelson Home Overview" (
uid: nelson-home-overview) — set as home dashboard - Alert rules (folder:
Nelson Home Alerts, group:infrastructure):- Node Down:
up{job="node"} < 1for 2m → critical - High CPU:
> 90%for 5m → warning - High Memory:
> 90%for 5m → warning - Disk Critical:
> 85%for 5m → critical
- Node Down:
- Contact point: Telegram bot
nelson-home(chat ID:8150264504) - Notification policy: group by
alertname, wait 30s, repeat 4h
Uptime Kuma (service availability):
- 13 monitors: Semaphore, Vaultwarden, AdGuard, UniFi, Proxmox, Grafana, Prometheus, NPM, Ender 3, + 4 node pings
- API key stored in Semaphore Default variable group as
uptime_kuma_api_key - Telegram notification: same bot, applied to all monitors as default
Why both: Grafana depends on Prometheus — if Prometheus dies, Grafana can't alert. Kuma is standalone and will still catch service outages. They don't duplicate: Grafana watches performance, Kuma watches reachability.
Semaphore variables for monitoring:
grafana_admin_user/grafana_admin_password— Grafana loginuptime_kuma_api_key— Kuma automation key (expires 2027-03-08)- Telegram bot token stored in Grafana contact point config (not in Semaphore)
[Claude Code, 2026-03-08 09:40]
Grafana Provisioning: UID Behavior
- If a datasource is first provisioned without a
uid, Grafana auto-generates one (e.g.,PBFA97CFB590B2093). Addinguid: Prometheuslater does NOT update the existing datasource — Grafana matches by name and keeps the old UID. - Fix: Add a
deleteDatasourcesblock to the provisioning YAML to force deletion and re-creation with the correct UID on next startup. - Dashboard panels referencing
"uid": "Prometheus"will show "No data" if the actual datasource UID differs. [Claude Code, 2026-03-08 09:10]
Docker Compose Bind Mounts & Redeploy
docker compose up -dwithstate: presentdoes NOT recreate containers when only bind-mounted config files change. The container keeps the old cached file.- Fix: Use
recreate: alwaysincommunity.docker.docker_compose_v2or--force-recreateflag. Alternatively, use Prometheus lifecycle API (POST /-/reload) for Prometheus-specific config reloads — but this only works if the bind mount itself is refreshed. [Claude Code, 2026-03-08 09:05]
Uptime Kuma API
- Kuma uses Socket.IO, not REST. Cannot automate via curl. Use the
uptime-kuma-apiPython library instead. - Must run from a host that can reach Kuma directly (localhost on nelson-manager, not from Mac unless port is exposed).
- API key format:
uk1_*, created viaapi.add_api_key(name=..., expires=..., active=True). Key is shown once — store immediately. - Monitor types:
MonitorType.HTTP,MonitorType.PING,MonitorType.KEYWORD(for self-signed HTTPS withignoreTls=True). [Claude Code, 2026-03-08 09:15]
Moonraker Prometheus Support
- Moonraker
[prometheus]component requires a version newer than v0.10.0. On Raspbian Bullseye (32-bit), the Update Manager may not offer newer versions. - Symptom: "Unparsed config section [prometheus]" warning + "failed to load component" error in Mainsail UI.
- Workaround: Monitor printer via Uptime Kuma HTTP check on
/server/infoendpoint instead. [Claude Code, 2026-03-08 09:20]
Nelson Ops: External CDN Dependencies
- Rule: Never use CDN-hosted JS libraries (unpkg.com, cdnjs, etc.) in Nelson Ops. AdGuard or local DNS may block or fail to resolve external CDN domains.
- Pattern: Download the library and serve from
/public/instead. Example:vis-network.min.js(689KB) is served locally at/vis-network.min.js. [Claude Code, 2026-03-08 00:15]
Nelson Ops — Dev Workflow
Nelson Ops (docker-compose/nelson-ops/) is a web application, not infrastructure config. It has its own dev loop — do NOT use Semaphore or deploy_stack for changes to this app.
Stack: Node.js/Express, server-rendered HTML, LCARS CSS, no build step.
Dev loop (edit → deploy in ~5 seconds):
# 1. Edit files locally in docker-compose/nelson-ops/app/
# 2. Run tests
cd docker-compose/nelson-ops/app && node --test lib/*.test.js
# 3. Commit + push
git add docker-compose/nelson-ops/ && git commit -m "..." && git push
# 4. Deploy — one command, nodemon auto-reloads
ssh btnelson@192.168.1.30 "git -C ~/nelson-server-config pull"
Why no Semaphore: deploy_stack.yml uses rsync with --delete-after and become: true, which causes permission issues and deletes app files. Nelson Ops is a simple app — git pull is sufficient because the container volume-mounts the repo directory and nodemon watches for changes.
Container details:
- Runs on nelson-manager (192.168.1.30:3020), proxied at
ops.nelson.home - Docker compose mounts
./app:/app:rwand/home/btnelson/nelson-server-config:/repo:rw - Command:
npm install && npx -y nodemon -w /app server.js - After
git pull, nodemon detects file changes and restarts automatically
Known issue: deploy_stack.yml rsync can delete docker-compose.yml and app files on nelson-manager due to permission conflicts. If the container won't start after a Semaphore deploy, run git checkout -- docker-compose/nelson-ops/ on nelson-manager to restore.
[Claude Code, 2026-03-08 22:15]
Nelson Ops: Adding New npm Dependencies
- Symptom: After
git pullwith a new dependency inpackage.json, nodemon detects file changes and restarts the app — but the app crashes withCannot find module '<new-package>'becausenpm installhasn't run yet. - Root Cause: The container command is
npm install && npx -y nodemon. On initial start, npm install runs. But nodemon restarts onlynode server.js— it does NOT re-runnpm install. - Fix:
docker restart nelson-ops— this re-runs the full entrypoint (npm install && nodemon), installing the new dependency. - Prevention: After pushing a commit that adds a new npm dependency, always follow up with
ssh btnelson@192.168.1.30 "docker restart nelson-ops". [Claude Code, 2026-03-08 23:45]
cAdvisor Docker API Version Compatibility
- cAdvisor v0.49.1 uses Docker API client v1.41. If the Docker daemon requires minimum API v1.44+ (Docker 27+), cAdvisor fails to register the Docker container factory and falls back to systemd cgroup monitoring.
- Symptom: Container metrics have
idlabels like/system.slice/docker-xxx.scopeinstead ofnamelabels. Dashboard queries filteringname!=""return empty. - Fix: Upgrade to cAdvisor v0.51.0+ which supports newer Docker API versions.
- Prevention: When deploying cAdvisor via shell (
docker run), always adddocker pull <image>beforedocker runto avoid using a cached old image. The deploy_node_exporter.yml playbook now does this. [Claude Code, 2026-03-08 10:00]
Home Assistant Reverse Proxy (Trusted Proxies)
- HA rejects requests with non-matching
Hostheaders unlesshttp.trusted_proxiesis configured. - Symptom: 400 Bad Request when accessing HA through NPM proxy (e.g.,
ha.nelson.home), but direct access (192.168.1.11:8123) works. - Fix: Add to
configuration.yaml:http: use_x_forwarded_for: true trusted_proxies: - 192.168.1.2 - HA config lives at
/opt/docker-data/homeassistant/configuration.yamlon ubuntu-server (NOT/home/btnelson/homeassistant/). [Claude Code, 2026-03-08 11:15]
Unpoller (UniFi Monitoring)
- Unpoller v2.34.0 deployed on nelson-manager as part of the monitoring compose stack.
- Connects to UniFi Controller API at
192.168.1.30:8443with read-onlyunpolleruser. - Exports Prometheus metrics on
:9130with namespaceunifipoller. - InfluxDB: Enabled by default even if not used — set
UP_INFLUXDB_DISABLE=trueto suppress connection refused errors. - Credentials:
unpoller_passwordstored in Semaphore Default environment (not Ansible Vault). Templated into.envon deploy. - Dashboards: 6 community dashboards imported (IDs 11310-11315) + UniFi summary row on Nelson Home Overview. [Claude Code, 2026-03-08 10:30]
Semaphore + Ansible Vault: secrets.yml in .gitignore
ansible/group_vars/all/secrets.ymlis in.gitignore. All secrets are managed via Semaphore environment variables (Default environment), NOT Ansible Vault files in the repo.- If
secrets.ymlis accidentally committed, Semaphore deploys fail with "Attempting to decrypt but no vault secrets found" because the template doesn't have a vault key linked. - Rule: Add new secrets to Semaphore Default environment (
PUT /api/project/1/environment/4), not tosecrets.yml. [Claude Code, 2026-03-08 10:15]
Naming Standard
- Official name: Nelson Home
- Internal domain:
nelson.home - Infrastructure slug:
nelson-home - Node convention:
nelson-<role>(e.g., nelson-edge, nelson-manager, nelson-apps) - Tailnet:
tadpole-dory.ts.net
Version Pinning (Standing Rule)
Never use latest tags for infrastructure services. Pin to a specific version in docker-compose files.
- Why: On 2026-02-19,
linuxserver/unifi-network-application:latestpulled a breaking change that required a new MongoDB permission (unifi_audit) and showed a setup wizard despite intact data. Recovery took 7 commits. - Rule: After deploying a new service or updating an image, immediately pin the working version tag in the compose file.
- Exception: Dashboard-only services (Homepage, Glances) that don't hold state can use
latestwith caution. - Current pinned versions: UniFi
9.0.114, MongoDB8.0, Postgres15. [Claude Code, 2026-02-19 16:30]
Atomic Documentation Updates (Standing Rule)
When you change an IP, rename a node, move a service, or decommission anything, update all references in the same session. Files to check:
ansible/inventory/hosts.ini— host entries and group namesansible/group_vars/all/common.yml— URLs, IPs, NPM hosts, DNS rewrites.ops/PROTOCOL.md— architecture section, inventory groups table, Semaphore URL.ops/ROADMAP.md— phase descriptions and resource budget.ops/RUNBOOKS.md— any URLs or IPs in scheduled runbooksdocker-compose/*/docker-compose.yml— any hardcoded IPsGEMINI.md/ dashboard URLs- Homepage config files
~/.claude/projects/.../memory/MEMORY.md— safety rules block (Semaphore IP, critical nodes)
A stale IP in PROTOCOL.md is worse than no documentation — it gives agents false confidence and causes silent failures. [Claude Code, 2026-02-19 16:30 | updated 2026-02-20: added item 9 (MEMORY.md)]
Credential Hygiene
Known plaintext credentials in git history (as of 2026-02-19):
docker-compose/unifi/docker-compose.yml— MongoDB rootexample, unifi userunifi111docker-compose/semaphore/docker-compose.yml— admin passwordadmindocker-compose/pulse/docker-compose.yml—password123.env.local— all service credentials (gitignored but modified locally)
These cannot be removed from git history without a force-push/rewrite. The mitigation strategy is:
- Template credentials into Ansible Vault variables (SPRINT.md tracks this)
- Rotate all exposed passwords after templating
- Never add new plaintext credentials to committed files [Claude Code, 2026-02-19 16:30]