Infra Monorepo Migration
The secret is to do it early, with standards.
One weekend of fun follows many former, and also weekdays, and other weekends. It's nice to have a small team manage a lot of resources using a single pane of glass via tmux, rs232, ipmi, Ansible and BigNetwork.
More details later... sleepy.
# NeoCortex Standards, Repo Storage, ZFS Dataset, and M70 QAT Fast-Track Plan
## Summary
Implement a non-destructive NeoCortex follow-up that locks these policies into code, tests, and docs:
- Repos live under /opt/repos/<repo>; worktrees live under /opt/repos/worktrees/<repo>/<branch>.
- /root is not a repo/worktree destination; /root/OMGFUCKED is temporary quarantine only.
- New repos must be created from yukon-systems/YukonSYS-Standard-Definitions.
- Every repo must use Git LFS and the standards repo .gitattributes byte-for-byte.
- ZFS dataset creation for /opt/repos becomes reusable Ansible/policy logic, based on /root/bin/ZFS-M70-Forge_zroot-dataset_opt-repos.sh.
- M70 QAT rollout uses canary first: build OpenSSL+QAT and OpenZFS/QAT binpkgs, validate on M70 canary, then promote.
## Key Changes
- Standards and repo-placement policy
- Update NeoCortex root policy docs and tests to forbid new repo/worktree paths under /root, except documented temporary quarantine paths under /root/OMGFUCKED.
- Update tools/import-source-repos.py defaults from /root/<repo> to /opt/repos/<repo>, with /root/OMGFUCKED/<repo> accepted only as an explicit transitional override.
- Add a repo-location audit script/test that fails active operational docs/scripts using /root/<repo> or /root/.config/superpowers/worktrees as live paths.
- Git LFS and .gitattributes
- Replace NeoCortex root .gitattributes with the standards repo version byte-for-byte.
- Add a validation script that checks:
- git lfs version works,
- filter.lfs.required=true,
- origin LFS locksverify=true,
- root .gitattributes hash equals standards source,
- LFS pre-push hook exists.
- Document that non-compliant repos must be retrofitted before migration/import.
- ZFS dataset automation
- Convert /root/bin/ZFS-M70-Forge_zroot-dataset_opt-repos.sh into reusable NeoCortex Ansible/defaults for managed datasets.
- First managed dataset profile: zroot/opt/repos mounted at /opt/repos with the script’s options, including compression=zstd-fast-1, dedup=sha256,verify, recordsize=256K, xattr=sa, POSIX ACLs, atime=off, and reservation=32M.
- Add a second reusable pattern for per-user home datasets, but do not block on the NAS/NFS 777 investigation; include a permission preflight that creates a test directory with umask 0027 and fails if mode becomes 0777.
- M70 QAT binpkg fast-track
- Extend the existing M70 canary stage4/binpkg lane to build a QAT package set:
- QAT firmware/kernel module readiness package policy,
- OpenSSL QAT engine/provider package path,
- OpenZFS package built with QAT support,
- rollback stock OpenSSL/OpenZFS binpkgs.
- Add package/use/profile fragments under the existing Gentoo stage4 profile area instead of ad hoc host edits.
- Add canary validation:
- lspci -nnk sees QAT device,
- qat_c3xxx and intel_qat load,
- OpenZFS module parameters expose QAT controls,
- openssl speed before/after artifacts are captured,
- root pool imports and mounts after reboot,
- rollback binpkg path is tested before wider promotion.
- Do not promote to all M70 nodes until canary evidence is committed.
## Test Plan
- Local static checks:
- tests/shell/run-tests.sh
- new repo-location standards test
- new Git LFS standards test
- new ZFS dataset defaults test
- new M70 QAT package-policy test
- YAML parse for all workflows/playbooks
- git diff --check
- Runtime/dry-run checks:
- Ansible syntax-check for dataset roles/playbooks.
- Dry-run dataset plan for /opt/repos confirms current zroot/opt/repos matches expected options.
- M70 canary dry-run emits QAT binpkg build plan without mutating packages.
- Canary acceptance:
- QAT binpkgs build successfully.
- M70 canary reboots with root pool intact.
- QAT modules and OpenZFS QAT controls are present.
- Benchmark and rollback evidence are recorded under NeoCortex audit docs.
## Assumptions
- Canonical repo clone layout is /opt/repos/<repo>.
- Worktree layout is /opt/repos/worktrees/<repo>/<branch>.
- /root/OMGFUCKED remains temporary quarantine, not an approved steady-state repo location.
- QAT rollout gate is canary first, not all-M70 immediate.
- OpenZFS QAT implementation details must be verified against current OpenZFS/Gentoo package behavior during implementation; source basis includes OpenZFS QAT docs and module-parameter docs:
- https://openzfs.org/wiki/ZFS_Hardware_Acceleration_with_QAT
- https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html
─ Worked for 22m 41s ───────────────────────────────────────────────────────────
Implement this plan?
1. Yes, implement this plan Switch to Default and start coding.
2. Yes, clear context and implement Fresh thread. Context: 53% used.
› 3. No, stay in Plan mode Continue planning with the model.
-----------------------
## Assumptions
- Canonical repo clone layout is /opt/repos/<repo>.
- Worktree layout is /opt/repos/worktrees/<repo>/<branch>.
- /root/OMGFUCKED remains temporary quarantine, not an approved steady-state repo location.
- QAT rollout gate is canary first, not all-M70 immediate.
- OpenZFS QAT implementation details must be verified against current OpenZFS/Gentoo package behavior during implementation; source basis includes OpenZFS
QAT docs and module-parameter docs:
- https://openzfs.org/wiki/ZFS_Hardware_Acceleration_with_QAT
- https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html
─ Worked for 22m 41s ───────────────────────────────────────────────────────────
› 1) We also need to include the control plane fixes and validations so that mutli-workers comms are fixed. 2) We need to leverage the SLURM cluster nodes and
additional distcc build hosts, 3) X12again & FMT2 are effectively idle and can be blasted into submission to fast track all of these changes. 4) Once the
M70-canary passes, we need to roll it out to K10 (tear-it-down AAA) and validate that type of OS netbooting. 5) Add those aspects to the Plans and we're
looking better. 6) Finally, wrap up ALL changes necessary to complete the NeoCortex monorepo migration.
• I’ll revise the plan around the expanded end-state, but first I’m doing a read-only pass over NeoCortex’s control-plane, SLURM/distcc, X12/FMT2/K10, and
migration-status artifacts so the replacement plan is grounded instead of generic.
• Ran set -euo pipefail
│ NEO=/root/.config/superpowers/worktrees/YukonSYS-NeoCortex/feat-source-repo-imports
│ cd "$NEO"
│ … +6 lines
└ --- migration docs/status ---
control-plane/forge/docs/MIGRATION-INVENTORY.md
… +259 lines (ctrl + t to view transcript)
git diff --check
```
• Ran set -euo pipefail
│ NEO=/root/.config/superpowers/worktrees/YukonSYS-NeoCortex/feat-source-repo-imports
│ cd "$NEO"
│ … +6 lines
└ --- control plane comms files ---
control-plane/forge/agents/registry.fcp.json
… +394 lines (ctrl + t to view transcript)
control-plane/forge/docs/FCP-COMMUNICATION-SECURITY.md:44:classes unless a narrower standing approval is explicitly recorded.
control-plane/forge/docs/FCP-COMMUNICATION-SECURITY.md:49:agent action and human approval.
• Ran set -euo pipefail
│ NEO=/root/.config/superpowers/worktrees/YukonSYS-NeoCortex/feat-source-repo-imports
│ cd "$NEO"
│ … +6 lines
└ --- slurm/distcc/build host files ---
docs/audits/neocortex-live-validation/fmt2-ephemeral-readiness-20260530.md
… +574 lines (ctrl + t to view transcript)
platforms/gentoo/stage4-llvm/tests/shell/test_sunrise_critical_path.sh:32:require_grep 'tun-fmt2' "${SCRIPT}"
platforms/gentoo/stage4-llvm/tests/shell/test_sunrise_critical_path.sh:41:require_grep 'FMT2 transport' "${DOC}"
• Ran set -euo pipefail
│ NEO=/root/.config/superpowers/worktrees/YukonSYS-NeoCortex/feat-source-repo-imports
│ cd "$NEO"
│ … +7 lines
└ {"headRefOid":"f8118b56046f135292cce2be73a6308261d60589","isDraft":false,"mergeStateStatus":"BLOCKED","mergeable":"MERGEABLE","state":"OPEN","statusCheckRo
llup":[{"__typename":"CheckRun","completedAt":"2026-05-31T00:34:51Z","conclusion":"SUCCESS","detailsUrl":"https://github.com/yukon-systems/
… +341 lines (ctrl + t to view transcript)
exit "${fail}"
• Context compacted
• Proposed Plan
# NeoCortex Standards-Complete Monorepo Migration Plan
## Summary
- Make /opt/repos/<repo> the only canonical repo location; no new checkouts or random repo files under any home directory, including /root.
- Finish standards adoption: all new repos must originate from YukonSYS-Standard-Definitions; NeoCortex gets retrofitted instead of restarted unless
retrofit verification fails.
- Complete NeoCortex monorepo migration with standards baseline, Git LFS policy, control-plane comms fixes, ZFS dataset policy, SLURM/distcc acceleration,
M70 QAT canary, FMT2/X12again build capacity, and K10 netboot AAA rollout validation.
## Key Changes
- Repository standards
- Merge standards policy PR work, then make NeoCortex root .gitattributes byte-for-byte identical to the standards repo version.
- Update active NeoCortex import/migration tooling to default source paths to /opt/repos/<repo>.
- Preserve historical /root/... references only in immutable audit/provenance docs; active scripts, runbooks, tests, and templates must not depend on
home-directory repos.
- Use worktrees under /opt/repos/worktrees/<repo>/<branch>.
- NeoCortex monorepo completion
- Finish all migration blockers already tracked in NeoCortex: M70 stage4/QCOW validation, Kata/container runtime, FMT2 ephemeral VM lane, Gentoo/OpenRC
replacement for temporary Rocky bootstrap, Thor replacement path, and optional history graft/freeze decision.
- Treat NeoCortex as authoritative after completion: source repos become frozen compatibility references or are history-grafted into NeoCortex with
documented provenance.
- Add/repair migration validation so PR completion proves standards baseline, Git LFS, repo location policy, shell tests, YAML parsing, and migration
manifests.
- Control-plane and multi-worker comms
- Bring all control-plane fixes into NeoCortex and validate Forge/FCP worker communication end to end.
- Add tests for FCP envelope validation, replay/nonce handling, approval ceilings, routing scope, transport policy, ntfy bridge behavior, and multi-
worker request/response flows.
- Replace temporary Rocky bootstrap control-plane host with the compliant Gentoo/OpenRC image once that image passes validation.
- ZFS, NFS, and storage policy
- Use the provided /opt/repos ZFS dataset options as the canonical dataset profile for repo storage.
- Add reusable dataset/playbook logic for repo datasets and per-user home datasets.
- Add NFS/floating-home validation that detects the NAS NFSv4.2 0777 directory-mode bug under expected 0027 umask and blocks insecure rollout.
- Build acceleration and hardware rollout
- Use SLURM plus distcc for M70 stage4/QAT/OpenZFS builds.
- Treat X12again and FMT2 as available fast-track build/validation capacity.
- Build binpkg path for OpenSSL + QAT + OpenZFS on M70 canary first; non-QAT nodes may consume the same binpkgs with QAT unused.
- After M70 canary passes, roll the validated path to K10 and run destructive AAA/netboot validation with rollback evidence captured.
## Test Plan
- Run NeoCortex shell/YAML validation, including migration, standards-baseline, Git LFS, .gitattributes, and no-active-home-repo-path checks.
- Run control-plane/FCP tests for envelope schema, replay rejection, worker routing, ntfy bridge, and multi-worker comms.
- Run Ansible/playbook syntax checks for ZFS datasets, NFS home policy, SLURM/distcc build orchestration, M70 canary, FMT2/X12again lanes, and K10 netboot
AAA.
- Execute staged hardware validation:
1. SLURM/distcc dry-run.
2. M70 canary build and QCOW/stage validation.
3. FMT2/X12again accelerated build/ephemeral VM smoke.
4. K10 netboot/AAA destructive validation after M70 success.
## Assumptions
- Retrofit NeoCortex rather than restart the repo unless standards validation exposes unrecoverable divergence.
- Historical audit files may retain old /root/... paths as provenance; active operational content may not.
- M70 canary is the release gate for QAT/OpenZFS/binpkg rollout.
- K10 destructive AAA/netboot validation is in scope only after M70 canary passes and rollback evidence is captured.
─ Worked for 2m 18s ───────────────────────────────────────────────────────────