NELSON HOME OPS CONSOLE
GitHub ↗

Archive

Archived Sprints

2026 02 18 bootstrap

Sprint Archive: Bootstrap (2026-02-17 — 2026-02-18)

Phase 1 complete. Phase 2.2 partially complete.

Completed Tasks

  • [x] Initialized Gemini memory system and operational files [Gemini, 2026-02-17]
  • [x] Bootstrapped nelson-edge lifeboat (Ansible, Git, Pip, SSH, Tailscale) [Gemini, 2026-02-18]
  • [x] Deployed AdGuard Home on nelson-edge with DNS resilience [Gemini, 2026-02-18]
  • [x] Formatted and mounted 5TB WD backup drive on Proxmox [Gemini, 2026-02-18]
  • [x] Automated Vaultwarden SQLite backup to Proxmox with retention [Gemini, 2026-02-18]
  • [x] Automated UniFi config backup sync to Proxmox with retention [Gemini, 2026-02-18]
  • [x] Provisioned nelson-identity LXC (Ubuntu 24.04, ID 200, Privileged) [Gemini, 2026-02-18]
  • [x] Migrated Semaphore + Postgres from monolith to nelson-identity [Gemini, 2026-02-18]
  • [x] Restored and verified master audit suite (100% green) [Gemini, 2026-02-18]
  • [x] Centralized Proxmox API credentials in Vault [Gemini, 2026-02-18]
  • [x] Hardened 8 Ansible playbooks (host groups, variables, idempotency, credential leaks) [Claude Code, 2026-02-18]
  • [x] Created shared variable conventions in common.yml [Claude Code, 2026-02-18]
  • [x] Created CLAUDE.md operating protocol [Claude Code, 2026-02-18]
  • [x] Created DNS + NPM automation playbooks [Claude Code, 2026-02-18]
  • [x] Full Homepage dashboard redesign [Claude Code, 2026-02-18]
  • [x] Architecture review and roadmap expansion through Phase 4 [Claude Code, 2026-02-18]
  • [x] Established Nelson Home naming standard [Gemini, 2026-02-18]

Retrospectives

2026 02 19 bootstrap to migration

Retrospective: Bootstrap to Migration (2025-04-21 — 2026-02-19)

First retrospective. Covers the full repository history from initial creation through the migration to distributed Proxmox architecture. [Claude Code, 2026-02-19 16:30]


Timeline & Eras

Era 1: Manual Docker Tinkering (Apr 21 — May 3, 2025)

  • 15 commits over 2 weeks, all by Bryan (human)
  • Docker-compose files only: Glances, Code Server, Dashy, UniFi, Plex, Jellyfin, Nextcloud
  • No automation, no IaC patterns. Casual commit messages ("fix", "small update", "added")
  • Repo went dormant for 9 months

Era 2: The Ansible Awakening (Feb 14 — 16, 2026)

  • Bryan returns and introduces Ansible + Semaphore
  • Audit playbooks appear: Proxmox, Network (UniFi), Docker, NPM
  • "Ansible Robot" author appears — automated commits from Semaphore runs
  • Heavy fix-fix-fix cycles, especially on Docker audit formatting (5+ commits)
  • Vault-encrypted secrets introduced

Era 3: Gemini Enters (Feb 17, 2026)

  • Gemini CLI starts committing (74 total commits across 3 days)
  • Memory system initialized, edge node bootstrapping, backup automation
  • Rapid feature velocity but many small fixup commits

Era 4: Two-Agent Collaboration (Feb 18, 2026)

  • Claude Code joins. Both agents working the same codebase in a single day
  • Gemini: Semaphore migration, LXC provisioning, audit verification, NPM routing
  • Claude Code: .ops/ restructure, protocol extraction, playbook hardening, common.yml, Homepage redesign, DNS/NPM automation
  • 43 commits — most productive day in repo history

Era 5: The Big Migration (Feb 19, 2026)

  • Architecture shifts from nelson-identity (.20) to nelson-manager (.30)
  • NPM + AdGuard move to the Pi (nelson-edge)
  • UniFi breaks twice (MongoDB permission + schema incompatibility) — recovered
  • nelson-identity decommissioned, Tailscale deployed everywhere

Commit Statistics

Author Commits Role
Bryan Nelson (human) 92 Initial setup, manual era, early Ansible
bryan-tanzo (Gemini) 74 Infrastructure provisioning, migrations
Ansible Robot 69 Automated audit reports from Semaphore
Total 235 Over ~10 active days

Architecture Evolution

Apr 2025:  [ubuntu-server] — everything on one box, manual docker-compose

Feb 2026:  [ubuntu-server] + [nelson-pve] — Proxmox added, Ansible introduced

Feb 18:    [ubuntu-server] + [nelson-pve] + [nelson-identity LXC] + [nelson-edge Pi]

Feb 19:    [ubuntu-server draining] + [nelson-pve] + [nelson-manager LXC] + [nelson-edge Pi]

Major Milestones

  1. Repo creation (Apr 2025) — Docker compose collection
  2. Ansible + Semaphore adoption (Feb 14) — IaC begins
  3. Automated audit suite (Feb 14-16) — Proxmox, Docker, Network, NPM, Resilience
  4. AI agent integration (Feb 17) — Gemini CLI, then Claude Code
  5. .ops/ restructure (Feb 18) — Professional project management framework
  6. common.yml centralization (Feb 18) — Single source of truth for variables
  7. First service migration (Feb 18) — Semaphore to nelson-identity
  8. Infrastructure shift to nelson-manager (Feb 19) — New control plane
  9. Edge network deployment (Feb 19) — NPM + AdGuard on Pi
  10. nelson-identity decommission (Feb 19) — First successful node lifecycle

What Went Well

  1. Audit suite was highest-ROI investment — Automated Proxmox/Docker/Network/NPM/Resilience audits via Semaphore cron gave both agents and the operator real-time visibility. Caught stale NPM rules and ad-blocking gaps automatically.

  2. Two-agent model delivered complementary strengths — Gemini excels at provisioning and migration execution; Claude Code at architecture, restructuring, and debugging. Feb 18 was the most productive day because both worked in parallel on non-overlapping tasks.

  3. .ops/ framework created clean information architecture — Separating ROADMAP (strategic) from SPRINT (tactical) from KNOWLEDGE (durable patterns) from STANDUP (daily state) let both agents navigate without stepping on each other.

  4. common.yml centralization eliminated a class of bugs — Before it, IPs and paths were hardcoded in individual playbooks. After, all playbooks reference the same source of truth.

  5. KNOWLEDGE.md incident format works — The UniFi + MongoDB entry (symptom/cause/fix/prevention) was immediately useful when the issue recurred in a different form. Pattern documentation pays dividends fast.

  6. The 9-month gap was beneficial — Bryan returned with better tools (Ansible) and clearer vision. Sometimes stepping away is the best architectural decision.


What Went Wrong

  1. Fix-fix-fix commit chains — The UniFi deploy had 7 sequential fix commits. Every audit playbook went through multiple formatting fix rounds. Deploy-and-see-what-breaks is expensive even with AI agents.

  2. Documentation drifted within hours — After just 2 days, PROTOCOL.md had a stale Semaphore URL (.20 instead of .30), an architecture section describing nodes that no longer exist, and inventory groups that had been renamed.

  3. Naming changes cascaded everywhere — The rename from nelson-identity to nelson-manager touched inventory, compose files, common.yml, Homepage, DNS, NPM, and multiple .ops/ files. Several references were missed.

  4. Plaintext credentials accumulated as persistent debt — Noted as a problem on Feb 18, still present on Feb 19. MongoDB example/unifi111, Semaphore admin, Pulse password123 in committed compose files.

  5. Runbooks were written as stubs, never filled in — Meanwhile, actual migrations were the runbook content. The knowledge existed in CHANGELOG.md and STANDUP.md but never flowed back into RUNBOOKS.md.

  6. ROADMAP.md drifted from reality — Still describes nelson-identity and nelson-gateway as the V2 architecture. Actual work diverged significantly (nelson-manager replaced both roles, edge took networking).


Lessons Learned (Actionable)

L1: Test before commit, not after

Fix-fix-fix chains are the biggest time sink. Use --check / dry-run / local validation before committing. One tested commit is worth more than seven fixups. Action: Added to PROTOCOL.md as a convention.

L2: Documentation updates must be atomic with the change

If you change an IP, rename a node, or move a service, update every reference in the same commit/session. Don't batch doc updates for "later." Action: Added to post-work checklist in PROTOCOL.md. Added as a RUNBOOKS.md checklist item.

L3: Pin versions for infrastructure services

The UniFi outage was caused by latest pulling a breaking change. 7 commits to recover. Action: Added to KNOWLEDGE.md as a standing rule.

L4: Write runbooks from incidents, not in advance

Empty stubs stay empty. Real incidents generate real checklists. Action: Filled in RUNBOOKS.md from actual nelson-identity decommission, UniFi migration, and DNS/routing changes.

L5: Credential hygiene is Phase 1, not Phase 4

Every day plaintext creds exist in git history, they're effectively permanent. Action: Elevated in SPRINT.md priority. Documented in KNOWLEDGE.md.

L6: Run retrospectives regularly

This is the first retrospective after 235 commits. Too many learnings were implicit. Action: Added retrospective cadence to PROTOCOL.md and RUNBOOKS.md.

L7: Keep ROADMAP.md honest

When reality diverges from the plan, update the plan. A stale roadmap is worse than no roadmap — it gives false confidence. Action: Updated ROADMAP.md to reflect actual architecture.


Current State (as of 2026-02-19)

  • Control plane (Semaphore, Vaultwarden, UniFi, Homepage): nelson-manager (.30)
  • Network edge (DNS, reverse proxy): nelson-edge Pi (.2)
  • Legacy monolith (.11): still running HA, Plex, Nextcloud, n8n, Pulse, Glances, DuckDNS
  • Known debt: stale docs (fixed this session), plaintext creds, port forwarding still at .11, 3 dead NPM rules
  • No blockers

Archived Reports

2026 02 20

2026-02-20

Done

  • MEMORY.md Redesign: Replaced drifting PROTOCOL.md summary with a thin pointer. Now contains only the mandatory session-start instruction + critical safety rules. Eliminates the stale-copy problem at root. [Claude Code, 2026-02-20]
  • Stale URLs Fixed: Corrected Semaphore IP in GEMINI.md (.20.30). Fixed wrong inventory group and Semaphore IP in old MEMORY.md. [Claude Code, 2026-02-20]
  • Zombie Files Archived: GEM_SYSTEM_PROMPT.md and ansible/sync_gemini_knowledge.yml (both v9 Google Drive era) moved to .ops/archive/. [Claude Code, 2026-02-20]
  • Drift Prevention: Added MEMORY.md as item 9 in KNOWLEDGE.md atomic documentation update checklist. [Claude Code, 2026-02-20]
  • Hooks: Added PreToolUse hook blocking direct secrets.yml edits and Stop hook reminding about post-work checklist when infra files are modified. [Claude Code, 2026-02-20]
  • Skills: Added /post-work and /standup skills to .claude/skills/. [Claude Code, 2026-02-20]

In Progress

  • None.

Next Actions

  1. Secret Migration: Populate Vaultwarden with all homelab secrets and template Docker Compose files to remove plaintext.
  2. Deploy Logging & Monitoring stack (Grafana, Prometheus, etc.) on nelson-manager.
  3. Decommission monolith services (.11).

Blockers

  • None.
2026 02 19

Nelson Home — Daily Report: 2026-02-19

2026-02-19

(Session Snapshot)

Done

  • Incident Recovery: Resolved NPM and AdGuard Home outage on nelson-edge. Fixed Docker-in-LXC startup failure by setting lxc.apparmor.profile: unconfined and keyctl=1 on Proxmox host. [Gemini, 2026-02-19 21:55]

  • Edge Migration: Successfully pointed UniFi 80/443 port forwarding to nelson-edge (.2). [Gemini, 2026-02-19 22:05]

  • Public Entrypoint: Configured proxmox.tudhopenelson.duckdns.org with Let's Encrypt SSL for secure remote hypervisor access. [Gemini, 2026-02-19 22:15]

  • Vaultwarden Fix: Restored HTTPS access to Vaultwarden by correcting proxy forward scheme from HTTPS to HTTP. [Gemini, 2026-02-19 22:25]

  • Architecture: Documented all owned domains and established a public exposure strategy task. [Gemini, 2026-02-19 22:10]

  • Retrospective: First repo retrospective completed — full history (Apr 2025 — Feb 2026), 7 actionable lessons extracted. [Claude Code, 2026-02-19 16:30]

  • Runbooks: All 5 event runbooks written from real incidents (were stubs since Feb 18). Added retrospective schedule. [Claude Code, 2026-02-19 16:30]

  • Stale Docs Fixed: Updated PROTOCOL.md (architecture, inventory groups, Semaphore URL), ROADMAP.md (reconciled phases with reality), KNOWLEDGE.md (stale IPs, new standing rules). [Claude Code, 2026-02-19 16:30]

  • Framework Additions: Added retrospective cadence, commit quality convention, atomic doc-update rule, and runbook-consultation step to PROTOCOL.md post-work checklist. [Claude Code, 2026-02-19 16:30]

In Progress

  • Secret Migration: Identified 6+ plaintext passwords in codebase; ready for Vaultwarden migration.

Next Actions

  1. Secret Migration: Populate Vaultwarden with all homelab secrets and template Docker Compose files to remove plaintext.
  2. Deploy Logging & Monitoring stack (Grafana, Prometheus, etc.) on nelson-manager.
  3. Decommission monolith services (.11).

Blockers

  • None current.