Netgate DNS Cutover Toolkit
I ran a read-only safety check before migrating LAN DNS onto the firewall. It came back NO-GO. Good, that's the gate doing its job. What it found underneath was more interesting than a simple outage: a routing trap quietly breaking a box's own network access, plus a one-line bug in the toolkit's own API client mislabeling a working endpoint as broken. Both got fixed. The gate came back clean. Then the actual cutover ran, live, against a real client.
Why this needed a gate at all
A Pi-hole VM had been doing double duty: ad-blocking, and DNS resolution for every internal hostname. The instability that came with that wasn't "the firewall can't do DNS well." It was that DNS policy had no single owner. Depending on which device you asked, the answer to "what server resolves this name" could be the firewall, the old DNS box, a hardcoded public resolver, or whatever Tailscale happened to be injecting on that client. The fix is one enforced rule at the network layer, not a setting repeated per device. This toolkit makes that provable through the pfSense REST API, under one constraint: nothing destructive happens without explicit confirmation, and every write is preceded by a read-only check that can veto it.
The NO-GO
One blocker: the DNS fallback host was completely unreachable. Not a script error. Independent ping, raw TCP, and DNS queries against it all timed out identically. The gate stopped before writing anything against a network that wasn't in the state the plan assumed.
That host turned out to be alive. Console access confirmed the OS was up, the interface had the right IP, link state UP. But it couldn't ping its own gateway, in either direction, while staying reachable over its own Tailscale connection. Alive on the tailnet, invisible on the LAN it was physically sitting on.
The routing trap
That split is a specific, well-known Tailscale failure mode. The host had previously been this network's subnet router, advertising the LAN range it physically lived on, and it also had route-accepting enabled (RouteAll: true). Once a different machine, the firewall, mid-migration to taking over that role, started advertising the same subnet, this host installed a policy route sending its own LAN traffic, including packets to its own gateway, into the Tailscale interface instead of out its physical NIC. The tunnel only understands tailnet peers, not "the LAN I'm bolted to," so every packet got silently dropped.
Confirmed with a route-table check before and after: the self-routing entry disappeared, the gateway ping went from 100% loss to 0%, and DNS/HTTP/SSH from the LAN all came back immediately.
The lesson generalizes past this one box: any device physically on a LAN should not accept Tailscale subnet routes for that same LAN, regardless of which machine is advertising them. The moment anyone advertises a subnet, every tailnet member sitting on it with route-accepting enabled risks routing its own local traffic into a tunnel that can't deliver it.
The bug in the gate itself
With that resolved, preflight still reported a second blocker: "cannot read DHCP config via API," even though the same endpoint answered fine from a plain curl. The difference: the shared API helper sent Content-Type: application/json on every request, including bodyless GETs. pfSense's REST API tries to parse query-string params like ?id=lan out of a JSON body when that header is present, finds nothing, and fails with MODEL_REQUIRES_ID, even though the parameter it ignored would have worked fine as a query string.
One-line fix: only attach Content-Type when there's an actual JSON payload, on PATCH and POST, never on a plain GET. Reverified across several runs before trusting it again. With both issues resolved, the now-working DHCP read also surfaced the real misconfiguration this project exists to fix: the firewall's DHCP scope was handing clients a public resolver plus the now-unreachable fallback host, never its own resolver at all.
Clean GO, then the real cutover
Re-running preflight: six sections green, zero blockers, two informational notes describing exactly what the apply step would change. Then the actual write, gated behind an explicit flag:
Verified against a live client immediately after, not just trusted on the API's word. DNS server, internal hostname resolution, and public resolution all checked on a real machine:
All three passed. The firewall is now the DNS server DHCP hands out, internal hostnames resolve through it, and public resolution still works.
The network after
One device decides routing, DHCP, and DNS policy. The DNS box that used to make that decision quietly, alongside Tailscale and a hardcoded public resolver, now does exactly one job.
What's next
What's staying manual on purpose: WAN-facing firewall policy stays human-reviewed, and ad-blocking stays with a dedicated box doing only that, once it's no longer also trying to be a router and a DNS authority at the same time.
Status
DHCP-only cutover is live and verified against a real client. Full-policy apply (system DNS, resolver, DHCP, and host overrides together) is dry-run clean and ready, held until a full DHCP lease-renewal cycle has been observed network-wide without intervention. The fallback DNS host stays up as a safety net until then.
Stack