NELSON HOME OPS CONSOLE
GitHub ↗

Changelog

Nelson Home — Changelog

Standard: Nelson Home | Internal Domain: nelson.home | Tailnet: tadpole-dory.ts.net


2026-03-08 (Doc Audit — Node Type Corrections)

  • FIX [Claude Code]: Corrected nelson-edge node type across 6 files — docs said "Raspberry Pi 5" but live Proxmox audit confirms it is LXC 301 on nelson-pve (192.168.1.2, 2 GB RAM). A physical Pi (MAC dc:a6:32, .247/.90) remains on the network as a legacy device but is not the active edge node.
  • FIX [Claude Code]: Corrected nelson-manager RAM in README.md resource budget (2 GB → 4 GB, confirmed by live audit).
  • FIX [Claude Code]: Updated ROADMAP.md resource budget — nelson-manager 3 GB → 4 GB, nelson-edge Pi 5/8 GB/4-core → LXC 301/2 GB/1-core.
  • FIX [Claude Code]: Updated GEMINI.md — NPM local URL was pointing to ubuntu-server (192.168.1.11:81), corrected to nelson-edge (192.168.1.2:81); AdGuard Tailscale IP was stale nelson-pi address (100.75.196.25), corrected to nelson-edge LXC (100.77.163.93); Homepage port corrected (3000 → 3030).
  • FIX [Claude Code]: Updated KNOWLEDGE.md hostvars comment — "(Pi)" → "(LXC)".
  • FIX [Claude Code]: Updated SPRINT.md DNS resilience task — "Pi 5 failure" → "nelson-edge LXC failure".

2026-03-08 (Late Night — Architecture Visualization)

  • FEAT [Claude Code]: Built interactive architecture visualization at /architecture — vis.js network graph showing all hosts, containers, proxies, VMs, LXCs, and network devices pulled from audit reports.
  • FEAT [Claude Code]: Added config drift detection — compares intended state (common.yml, containers.yml, hosts.ini) against discovered state (audit reports) and flags missing, unmanaged, and mismatched items.
  • FEAT [Claude Code]: Added Prometheus live metrics overlay — CPU, RAM, disk usage displayed on node tooltips with health-based border colors (green/amber/red).
  • FEAT [Claude Code]: Created 5 audit report parsers (Docker, NPM, Proxmox, Network, Resilience) and 3 config parsers (common.yml, containers.yml, hosts.ini) with 57 unit tests.
  • FEAT [Claude Code]: Added Uptime Kuma and Home Assistant links to ops dashboard sidebar.
  • DOCS [Claude Code]: Created architecture visualization design doc and implementation plan.
  • FIX [Claude Code]: Served vis-network.js locally — CDN (unpkg.com) was unreachable on local network.
  • FIX [Claude Code]: Deduplicated architecture device node IDs — two UAP-AC-Lite APs with same name caused vis.js crash.
  • FIX [Claude Code]: Roadmap parser now counts plain bullet items as tasks; COMPLETE phases show all items done.

2026-03-08 (Evening — Monitoring Expansion)

  • FEAT [Claude Code]: Deployed Unpoller v2.34.0 to monitoring stack — full UniFi network observability (44 clients, 2 APs, 1 USG, 1 switch). 6 community Grafana dashboards imported + UniFi summary row on Nelson Home Overview.
  • FIX [Claude Code]: Upgraded cAdvisor v0.49.1 → v0.51.0 on ubuntu-server — Docker API version mismatch broke container name labels.
  • FIX [Claude Code]: Added explicit docker pull to cAdvisor deploy playbook to prevent cached old image on redeploy.
  • FIX [Claude Code]: Restored btnelson UniFi admin role via MongoDB db.privilege.updateOne() — role had been changed to readonly.
  • FIX [Claude Code]: Fixed Home Assistant crash loop — updated stale image (simplejson ImportError) and corrected volume mount path to /opt/docker-data/homeassistant.
  • FIX [Claude Code]: Added http.trusted_proxies to HA config for NPM reverse proxy (was returning 400 Bad Request).
  • FEAT [Claude Code]: Added homeassistant.nelson.home NPM proxy host + AdGuard DNS rewrite.
  • FEAT [Claude Code]: Added Home Assistant monitor to Uptime Kuma.
  • FEAT [Claude Code]: Added unpoller_password to Semaphore Default environment.
  • DOCS [Claude Code]: Created Unpoller design doc and implementation plan.

2026-03-08

  • FEAT [Claude Code]: Added home dashboard to Nelson Ops — sprint stats, roadmap progress, audit report status, recent crew activity, and quick access links to all services.
  • FEAT [Claude Code]: Added About page rendering README.md via markdown-it.
  • FEAT [Claude Code]: Added BRIDGE nav group to sidebar with Home, About, and prominent Vaultwarden link.
  • FIX [Claude Code]: Diagnosed and fixed .nelson.home DNS resolution failure — stale Tailscale global nameserver (nelson-pi) was overriding AdGuard. Added split DNS rule in Tailscale admin routing .nelson.home to nelson-edge's AdGuard.
  • FEAT [Claude Code]: Built Nelson Ops dashboard — Node.js/Express web app at ops.nelson.home with LCARS Star Trek theme. Views: Sprint board (interactive checkboxes), Standup, Roadmap (progress bars), Audit Reports, Crew Activity, Knowledge/Runbooks/Changelog docs, Archive.
  • FEAT [Claude Code]: Sprint board checkbox toggle commits and pushes changes via git automatically.
  • FEAT [Claude Code]: LCARS design system — amber/lavender/periwinkle/peach/ice-blue palette, collapsible sections, SVG favicon, stardate display, GitHub link.
  • FEAT [Claude Code]: Font size controls (A-/A+) with localStorage persistence, --font-scale CSS variable (0.75x to 1.6x).
  • FEAT [Claude Code]: Archive viewer reads .ops/archive/{sprints,retrospectives,reports}/ and renders as collapsible markdown cards.
  • FEAT [Claude Code]: Deployed to nelson-manager with nodemon hot-reload — git pull auto-restarts app, no Semaphore needed.
  • DOCS [Claude Code]: Documented nelson-ops dev workflow in KNOWLEDGE.md, PROTOCOL.md, CLAUDE.md — SSH deploy permitted for app dev, distinct from IaC Semaphore workflow.
  • FIX [Claude Code]: Fixed deploy_stack rsync permission issues on nelson-manager — rsync with --delete-after and become: true deletes app files and changes ownership. Documented workaround (git checkout -- docker-compose/nelson-ops/).
  • FIX [Gemini CLI]: Corrected Prometheus scrape targets in monitoring stack (localhost -> node-exporter:9100).
  • FIX [Gemini CLI]: Fixed Grafana dashboard datasource linking by defining static "Prometheus" UID.
  • FIX [Gemini CLI]: Added missing dependencies (rsync, python3-docker) to deploy_monitoring.yml playbook.
  • FEAT [Gemini CLI]: Verified Uptime Kuma and Monitoring stacks are ready for active service checks.
  • FIX [Claude Code]: Resolved Grafana datasource UID mismatch — added deleteDatasources directive to force re-provision with correct UID Prometheus. All dashboard panels now resolve correctly.
  • FIX [Claude Code]: Fixed cAdvisor Prometheus scrape target (port 8082 → 8080 internal).
  • FIX [Claude Code]: Added recreate: always to deploy_monitoring.yml so bind-mounted config changes take effect on Semaphore redeploy.
  • REMOVED [Claude Code]: Dropped moonraker Prometheus scrape target — Moonraker v0.10.0 (Bullseye) lacks [prometheus] component support.
  • FEAT [Claude Code]: Created 12 Uptime Kuma monitors via API — 8 HTTP service checks (Semaphore, Grafana, Prometheus, Vaultwarden, AdGuard, NPM, UniFi, Moonraker) + 4 node ping checks (manager, edge, ubuntu-server, pve).
  • FEAT [Claude Code]: Created Uptime Kuma API key (semaphore-automation, expires 2027-03-08) and stored in Semaphore Default variable group as uptime_kuma_api_key.
  • FEAT [Claude Code]: Created custom "Nelson Home Overview" Grafana dashboard — node status, CPU/RAM/disk gauges, container metrics, network traffic. Set as home dashboard.
  • FIX [Claude Code]: Fixed node-exporter on nelson-manager — switched to network_mode: host + pid: host + hostname: nelson-manager so Grafana dashboards show correct host labels instead of container IDs.
  • FEAT [Claude Code]: Configured Grafana unified alerting with Telegram contact point (nelson-home bot). Created 4 alert rules: Node Down, High CPU, High Memory, Disk Critical.
  • FEAT [Claude Code]: Configured Uptime Kuma Telegram notifications — applied to all 13 monitors as default.
  • DOCS [Claude Code]: Updated PROTOCOL.md architecture with full observability stack details and alerting strategy. Updated ROADMAP.md Phase 2.3 to COMPLETE. Added comprehensive observability architecture section to KNOWLEDGE.md.

2026-03-07

  • FIX [Gemini]: Resolved audit_master.yml failure by removing the archived sync_gemini_knowledge.yml import.
  • REFACTOR [Gemini]: Redesigned audit_docker.yml to target all active nodes (manager_nodes, edge_nodes, ubuntu-server) and aggregate reports in a non-destructive manner.
  • REFACTOR [Gemini]: Updated audit_npm.yml to correctly target edge_nodes (nelson-edge) for proxy host audits.
  • VERIFIED [Gemini]: Successfully ran the full audit_master.yml suite via Semaphore API.
  • FIX [Gemini]: Implemented a shell-based fallback (docker inspect) in audit_docker.yml for environments without the requests library (e.g., nelson-edge).
  • SUCCESS [Gemini]: Validated the Proxmox audit using native pvesh on nelson-pve.
  • TASK [Gemini]: Updated SPRINT.md and ready for the user to set the final Semaphore cron schedule.

2026-02-20

  • FIX [Claude Code]: Redesigned MEMORY.md as thin pointer — removed drifting PROTOCOL.md summary, now contains only mandatory session-start instruction + critical safety rules. Solves stale-copy problem at root.
  • FIX [Claude Code]: Corrected stale Semaphore IP in GEMINI.md (.20.30).
  • CLEANUP [Claude Code]: Archived GEM_SYSTEM_PROMPT.md and ansible/sync_gemini_knowledge.yml (both v9 Google Drive era) to .ops/archive/.
  • FEAT [Claude Code]: Added MEMORY.md as item 9 in KNOWLEDGE.md atomic documentation update checklist to prevent future drift.
  • FEAT [Claude Code]: Added PreToolUse hook blocking direct secrets.yml edits and Stop hook reminding about post-work checklist when infra files are modified (.claude/hooks/).
  • FEAT [Claude Code]: Added /post-work and /standup skills to .claude/skills/.

2026-02-19

  • RETRO [Claude Code]: First repository retrospective — covered full history (Apr 2025 — Feb 2026). 7 lessons extracted, all actioned. Stored in .ops/archive/retrospectives/.
  • FEAT [Claude Code]: Filled all 5 event runbooks in .ops/RUNBOOKS.md from real incidents (were stubs). Added retrospective to scheduled runbooks.
  • FIX [Claude Code]: Updated stale references across .ops/PROTOCOL.md architecture section + inventory groups + Semaphore URL, ROADMAP.md phases, KNOWLEDGE.md IPs.
  • FEAT [Claude Code]: Added 3 new KNOWLEDGE.md standing rules — version pinning, atomic documentation updates, credential hygiene.
  • FEAT [Claude Code]: Added retrospective cadence, commit quality convention, and runbook-consultation step to PROTOCOL.md.
  • SUCCESS [Gemini]: Configured public proxy host for Proxmox Web UI (proxmox.tudhopenelson.duckdns.org) with Let's Encrypt SSL on nelson-edge.
  • DOCS [Gemini]: Integrated owned domains (tudhopenelson.com, palladiumresearch.com, tanzolabs.com) into architecture documentation and created a task to define the public exposure strategy.
  • SUCCESS [Gemini]: Updated UniFi Port Forwarding rules to point HTTP (80) and HTTPS (443) to the new nelson-edge node (.2), completing the proxy migration.
  • FIX [Gemini]: Resolved Docker-in-LXC startup failure (sysctl net.ipv4.ip_unprivileged_port_start permission denied) by setting lxc.apparmor.profile: unconfined in Proxmox LXC configuration and adding keyctl=1 feature.
  • DECOMMISSIONED [Gemini]: Stoped and destroyed nelson-identity LXC (ID 200, .20) after a full snapshot backup to Proxmox storage (NelsonBackups).
  • SUCCESS [Gemini]: Deployed and configured Nginx Proxy Manager and AdGuard Home on new nelson-edge node (.2).
  • FIX [Gemini]: Resolved AppArmor access denied issues for Docker containers in LXC by setting security_opt: [apparmor:unconfined].
  • REFACTOR [Gemini]: Resolved port 80 conflict on Edge by moving AdGuard dashboard to port 3000.
  • SUCCESS [Gemini]: Automated configuration of 13 Proxy Hosts and 17 DNS rewrites via Semaphore.
  • FIX [Gemini]: Fixed UniFi Network Audit playbook — updated login endpoint to /api/login and corrected cookie property usage (cookies_string).
  • REFACTOR [Gemini]: Updated common.yml and dashboard.yml to reflect new service locations on nelson-manager and nelson-edge.
  • DOCS [Gemini]: Added "Semaphore Template Configuration" guidelines to KNOWLEDGE.md regarding Variable Groups and Vault requirements.
  • FIX [Claude Code]: Diagnosed and resolved UniFi outage — linuxserver/unifi-network-application:latest pulled 2026-02-19 requires unifi_audit MongoDB permission not provisioned by original init script. Granted role live (no data loss), restored service.
  • FIX [Claude Code]: Pinned UniFi image to lscr.io/linuxserver/unifi-network-application:9.0.114 after new image version showed setup wizard despite intact data (schema incompatibility). Controller fully restored.
  • PATCH [Claude Code]: Updated docker-compose/unifi/initdb/init-mongo.sh to include unifi_audit dbOwner role for future fresh installs.
  • DOCS [Claude Code]: Added UniFi + MongoDB section to KNOWLEDGE.md — breaking change, live fix command, and patch details.
  • DOCS [Claude Code]: Added UniFi force re-adopt runbook to KNOWLEDGE.md (mca-cli / set-inform procedure).
  • TASK [Claude Code]: Added Vaultwarden password audit checklist to SPRINT.md — USG SSH, MongoDB credentials, NPM, Semaphore, Proxmox, Vault passphrase, DuckDNS token.
  • DECISION [Claude Code]: SSL strategy — defer to Caddy on nelson-gateway. Vaultwarden accessible once NPM moves to nelson-edge.

2026-02-18

  • MIGRATED [Gemini]: Moved Semaphore + Postgres from monolith to nelson-identity (192.168.1.20). Clean database restore. Old monolith instance stopped.
  • REFACTOR [Gemini]: Updated common.yml and NPM proxy hosts to route semaphore.nelson.home to identity node.
  • FIX [Gemini]: Recreated nelson-identity as a Privileged LXC to resolve Docker sysctl/AppArmor issues for Postgres.
  • FEAT [Claude Code]: Created configure_adguard_dns.yml and configure_npm_hosts.yml for automated DNS/NPM routing management.
  • FEAT [Claude Code]: Full Homepage dashboard redesign with improved grouping and widgets.
  • REFACTOR [Claude Code]: Hardened 8 Ansible playbooks — fixed host group bugs, undefined variables, credential leaks, non-idempotent regex.
  • REFACTOR [Claude Code]: Created group_vars/all/common.yml as single source for shared variables.
  • FEAT [Claude Code]: Created CLAUDE.md operating protocol for cross-session context.
  • FEAT [Claude Code]: Architecture review — identified ghost infra (bolt-claw VM, dead NPM rules), expanded roadmap through Phase 4.
  • REFACTOR [Claude Code]: Restructured project management into .ops/ directory (ROADMAP, SPRINT, KNOWLEDGE, STANDUP) replacing flat .gemini/ files.
  • VERIFIED [Gemini]: Restored and ran full audit suite via Semaphore — 100% pass.
  • SUCCESS [Gemini]: Provisioned and bootstrapped nelson-identity LXC on Proxmox.
  • SUCCESS [Gemini]: Automated Vaultwarden and UniFi backups to Proxmox storage.
  • SUCCESS [Gemini]: Configured AdGuard Home with DNS resilience (Quad9/Cloudflare).
  • SUCCESS [Gemini]: Formatted and mounted 5TB WD HDD at /mnt/nelson-backups on Proxmox.
  • SUCCESS [Gemini]: Established Nelson Home naming and documentation standard.
  • REFACTOR [Gemini]: Centralized Proxmox API identifiers in Vault.

2026-02-17

  • INIT [Gemini]: Initialized Gemini memory system — created GEMINI.md, .gemini/TASKS.md, .gemini/MEMORY.md.
  • DOCS [Gemini]: Added operator station setup (MacBook Pro SSH config) to README.