Archive
Archived Sprints
Sprint Archive: Bootstrap (2026-02-17 — 2026-02-18)
Phase 1 complete. Phase 2.2 partially complete.
Completed Tasks
- [x] Initialized Gemini memory system and operational files [Gemini, 2026-02-17]
- [x] Bootstrapped nelson-edge lifeboat (Ansible, Git, Pip, SSH, Tailscale) [Gemini, 2026-02-18]
- [x] Deployed AdGuard Home on nelson-edge with DNS resilience [Gemini, 2026-02-18]
- [x] Formatted and mounted 5TB WD backup drive on Proxmox [Gemini, 2026-02-18]
- [x] Automated Vaultwarden SQLite backup to Proxmox with retention [Gemini, 2026-02-18]
- [x] Automated UniFi config backup sync to Proxmox with retention [Gemini, 2026-02-18]
- [x] Provisioned nelson-identity LXC (Ubuntu 24.04, ID 200, Privileged) [Gemini, 2026-02-18]
- [x] Migrated Semaphore + Postgres from monolith to nelson-identity [Gemini, 2026-02-18]
- [x] Restored and verified master audit suite (100% green) [Gemini, 2026-02-18]
- [x] Centralized Proxmox API credentials in Vault [Gemini, 2026-02-18]
- [x] Hardened 8 Ansible playbooks (host groups, variables, idempotency, credential leaks) [Claude Code, 2026-02-18]
- [x] Created shared variable conventions in common.yml [Claude Code, 2026-02-18]
- [x] Created CLAUDE.md operating protocol [Claude Code, 2026-02-18]
- [x] Created DNS + NPM automation playbooks [Claude Code, 2026-02-18]
- [x] Full Homepage dashboard redesign [Claude Code, 2026-02-18]
- [x] Architecture review and roadmap expansion through Phase 4 [Claude Code, 2026-02-18]
- [x] Established Nelson Home naming standard [Gemini, 2026-02-18]
Retrospectives
Retrospective: Bootstrap to Migration (2025-04-21 — 2026-02-19)
First retrospective. Covers the full repository history from initial creation through the migration to distributed Proxmox architecture. [Claude Code, 2026-02-19 16:30]
Timeline & Eras
Era 1: Manual Docker Tinkering (Apr 21 — May 3, 2025)
- 15 commits over 2 weeks, all by Bryan (human)
- Docker-compose files only: Glances, Code Server, Dashy, UniFi, Plex, Jellyfin, Nextcloud
- No automation, no IaC patterns. Casual commit messages ("fix", "small update", "added")
- Repo went dormant for 9 months
Era 2: The Ansible Awakening (Feb 14 — 16, 2026)
- Bryan returns and introduces Ansible + Semaphore
- Audit playbooks appear: Proxmox, Network (UniFi), Docker, NPM
- "Ansible Robot" author appears — automated commits from Semaphore runs
- Heavy fix-fix-fix cycles, especially on Docker audit formatting (5+ commits)
- Vault-encrypted secrets introduced
Era 3: Gemini Enters (Feb 17, 2026)
- Gemini CLI starts committing (74 total commits across 3 days)
- Memory system initialized, edge node bootstrapping, backup automation
- Rapid feature velocity but many small fixup commits
Era 4: Two-Agent Collaboration (Feb 18, 2026)
- Claude Code joins. Both agents working the same codebase in a single day
- Gemini: Semaphore migration, LXC provisioning, audit verification, NPM routing
- Claude Code: .ops/ restructure, protocol extraction, playbook hardening, common.yml, Homepage redesign, DNS/NPM automation
- 43 commits — most productive day in repo history
Era 5: The Big Migration (Feb 19, 2026)
- Architecture shifts from nelson-identity (.20) to nelson-manager (.30)
- NPM + AdGuard move to the Pi (nelson-edge)
- UniFi breaks twice (MongoDB permission + schema incompatibility) — recovered
- nelson-identity decommissioned, Tailscale deployed everywhere
Commit Statistics
| Author | Commits | Role |
|---|---|---|
| Bryan Nelson (human) | 92 | Initial setup, manual era, early Ansible |
| bryan-tanzo (Gemini) | 74 | Infrastructure provisioning, migrations |
| Ansible Robot | 69 | Automated audit reports from Semaphore |
| Total | 235 | Over ~10 active days |
Architecture Evolution
Apr 2025: [ubuntu-server] — everything on one box, manual docker-compose
Feb 2026: [ubuntu-server] + [nelson-pve] — Proxmox added, Ansible introduced
Feb 18: [ubuntu-server] + [nelson-pve] + [nelson-identity LXC] + [nelson-edge Pi]
Feb 19: [ubuntu-server draining] + [nelson-pve] + [nelson-manager LXC] + [nelson-edge Pi]
Major Milestones
- Repo creation (Apr 2025) — Docker compose collection
- Ansible + Semaphore adoption (Feb 14) — IaC begins
- Automated audit suite (Feb 14-16) — Proxmox, Docker, Network, NPM, Resilience
- AI agent integration (Feb 17) — Gemini CLI, then Claude Code
.ops/restructure (Feb 18) — Professional project management frameworkcommon.ymlcentralization (Feb 18) — Single source of truth for variables- First service migration (Feb 18) — Semaphore to nelson-identity
- Infrastructure shift to nelson-manager (Feb 19) — New control plane
- Edge network deployment (Feb 19) — NPM + AdGuard on Pi
- nelson-identity decommission (Feb 19) — First successful node lifecycle
What Went Well
-
Audit suite was highest-ROI investment — Automated Proxmox/Docker/Network/NPM/Resilience audits via Semaphore cron gave both agents and the operator real-time visibility. Caught stale NPM rules and ad-blocking gaps automatically.
-
Two-agent model delivered complementary strengths — Gemini excels at provisioning and migration execution; Claude Code at architecture, restructuring, and debugging. Feb 18 was the most productive day because both worked in parallel on non-overlapping tasks.
-
.ops/framework created clean information architecture — Separating ROADMAP (strategic) from SPRINT (tactical) from KNOWLEDGE (durable patterns) from STANDUP (daily state) let both agents navigate without stepping on each other. -
common.ymlcentralization eliminated a class of bugs — Before it, IPs and paths were hardcoded in individual playbooks. After, all playbooks reference the same source of truth. -
KNOWLEDGE.md incident format works — The UniFi + MongoDB entry (symptom/cause/fix/prevention) was immediately useful when the issue recurred in a different form. Pattern documentation pays dividends fast.
-
The 9-month gap was beneficial — Bryan returned with better tools (Ansible) and clearer vision. Sometimes stepping away is the best architectural decision.
What Went Wrong
-
Fix-fix-fix commit chains — The UniFi deploy had 7 sequential fix commits. Every audit playbook went through multiple formatting fix rounds. Deploy-and-see-what-breaks is expensive even with AI agents.
-
Documentation drifted within hours — After just 2 days, PROTOCOL.md had a stale Semaphore URL (
.20instead of.30), an architecture section describing nodes that no longer exist, and inventory groups that had been renamed. -
Naming changes cascaded everywhere — The rename from
nelson-identitytonelson-managertouched inventory, compose files, common.yml, Homepage, DNS, NPM, and multiple.ops/files. Several references were missed. -
Plaintext credentials accumulated as persistent debt — Noted as a problem on Feb 18, still present on Feb 19. MongoDB
example/unifi111, Semaphoreadmin, Pulsepassword123in committed compose files. -
Runbooks were written as stubs, never filled in — Meanwhile, actual migrations were the runbook content. The knowledge existed in CHANGELOG.md and STANDUP.md but never flowed back into RUNBOOKS.md.
-
ROADMAP.md drifted from reality — Still describes
nelson-identityandnelson-gatewayas the V2 architecture. Actual work diverged significantly (nelson-manager replaced both roles, edge took networking).
Lessons Learned (Actionable)
L1: Test before commit, not after
Fix-fix-fix chains are the biggest time sink. Use --check / dry-run / local validation before committing. One tested commit is worth more than seven fixups.
Action: Added to PROTOCOL.md as a convention.
L2: Documentation updates must be atomic with the change
If you change an IP, rename a node, or move a service, update every reference in the same commit/session. Don't batch doc updates for "later." Action: Added to post-work checklist in PROTOCOL.md. Added as a RUNBOOKS.md checklist item.
L3: Pin versions for infrastructure services
The UniFi outage was caused by latest pulling a breaking change. 7 commits to recover.
Action: Added to KNOWLEDGE.md as a standing rule.
L4: Write runbooks from incidents, not in advance
Empty stubs stay empty. Real incidents generate real checklists. Action: Filled in RUNBOOKS.md from actual nelson-identity decommission, UniFi migration, and DNS/routing changes.
L5: Credential hygiene is Phase 1, not Phase 4
Every day plaintext creds exist in git history, they're effectively permanent. Action: Elevated in SPRINT.md priority. Documented in KNOWLEDGE.md.
L6: Run retrospectives regularly
This is the first retrospective after 235 commits. Too many learnings were implicit. Action: Added retrospective cadence to PROTOCOL.md and RUNBOOKS.md.
L7: Keep ROADMAP.md honest
When reality diverges from the plan, update the plan. A stale roadmap is worse than no roadmap — it gives false confidence. Action: Updated ROADMAP.md to reflect actual architecture.
Current State (as of 2026-02-19)
- Control plane (Semaphore, Vaultwarden, UniFi, Homepage): nelson-manager (.30)
- Network edge (DNS, reverse proxy): nelson-edge Pi (.2)
- Legacy monolith (.11): still running HA, Plex, Nextcloud, n8n, Pulse, Glances, DuckDNS
- Known debt: stale docs (fixed this session), plaintext creds, port forwarding still at .11, 3 dead NPM rules
- No blockers
Archived Reports
2026-02-20
Done
- MEMORY.md Redesign: Replaced drifting PROTOCOL.md summary with a thin pointer. Now contains only the mandatory session-start instruction + critical safety rules. Eliminates the stale-copy problem at root. [Claude Code, 2026-02-20]
- Stale URLs Fixed: Corrected Semaphore IP in GEMINI.md (
.20→.30). Fixed wrong inventory group and Semaphore IP in old MEMORY.md. [Claude Code, 2026-02-20] - Zombie Files Archived:
GEM_SYSTEM_PROMPT.mdandansible/sync_gemini_knowledge.yml(both v9 Google Drive era) moved to.ops/archive/. [Claude Code, 2026-02-20] - Drift Prevention: Added
MEMORY.mdas item 9 in KNOWLEDGE.md atomic documentation update checklist. [Claude Code, 2026-02-20] - Hooks: Added
PreToolUsehook blocking directsecrets.ymledits andStophook reminding about post-work checklist when infra files are modified. [Claude Code, 2026-02-20] - Skills: Added
/post-workand/standupskills to.claude/skills/. [Claude Code, 2026-02-20]
In Progress
- None.
Next Actions
- Secret Migration: Populate Vaultwarden with all homelab secrets and template Docker Compose files to remove plaintext.
- Deploy Logging & Monitoring stack (Grafana, Prometheus, etc.) on
nelson-manager. - Decommission monolith services (.11).
Blockers
- None.
Nelson Home — Daily Report: 2026-02-19
2026-02-19
(Session Snapshot)
Done
-
Incident Recovery: Resolved NPM and AdGuard Home outage on
nelson-edge. Fixed Docker-in-LXC startup failure by settinglxc.apparmor.profile: unconfinedandkeyctl=1on Proxmox host. [Gemini, 2026-02-19 21:55] -
Edge Migration: Successfully pointed UniFi 80/443 port forwarding to
nelson-edge(.2). [Gemini, 2026-02-19 22:05] -
Public Entrypoint: Configured
proxmox.tudhopenelson.duckdns.orgwith Let's Encrypt SSL for secure remote hypervisor access. [Gemini, 2026-02-19 22:15] -
Vaultwarden Fix: Restored HTTPS access to Vaultwarden by correcting proxy forward scheme from HTTPS to HTTP. [Gemini, 2026-02-19 22:25]
-
Architecture: Documented all owned domains and established a public exposure strategy task. [Gemini, 2026-02-19 22:10]
-
Retrospective: First repo retrospective completed — full history (Apr 2025 — Feb 2026), 7 actionable lessons extracted. [Claude Code, 2026-02-19 16:30]
-
Runbooks: All 5 event runbooks written from real incidents (were stubs since Feb 18). Added retrospective schedule. [Claude Code, 2026-02-19 16:30]
-
Stale Docs Fixed: Updated PROTOCOL.md (architecture, inventory groups, Semaphore URL), ROADMAP.md (reconciled phases with reality), KNOWLEDGE.md (stale IPs, new standing rules). [Claude Code, 2026-02-19 16:30]
-
Framework Additions: Added retrospective cadence, commit quality convention, atomic doc-update rule, and runbook-consultation step to PROTOCOL.md post-work checklist. [Claude Code, 2026-02-19 16:30]
In Progress
- Secret Migration: Identified 6+ plaintext passwords in codebase; ready for Vaultwarden migration.
Next Actions
- Secret Migration: Populate Vaultwarden with all homelab secrets and template Docker Compose files to remove plaintext.
- Deploy Logging & Monitoring stack (Grafana, Prometheus, etc.) on
nelson-manager. - Decommission monolith services (.11).
Blockers
- None current.