March: PKI before the mid-2026 deadline, an agentic engineering setup, and infra hygiene at scale

March was about catching up on three things I had been pushing off.

Public CAs, private CAs, and the mid-2026 cutover

By June 2026, Chrome’s root program will require any certificate hierarchy in its public trust store to be dedicated to TLS server authentication only. Public certificate authorities will no longer be allowed to issue certs that carry both serverAuth and clientAuth Extended Key Usages: the two roles are being split into separate PKIs. A lot of organizations have been issuing public certs and quietly using them for internal client-auth scenarios (VPN onboarding, Wi-Fi 802.1X, service-to-service auth between microservices, SSO endpoint certs). The cutover forces the question that nothing was forcing before.

Property	Public CA	Private CA
Trust store	Browser/OS, automatic	Internal, explicit
Allowed EKUs	`serverAuth` only (Jun 2026)	Whatever you need
Cost per cert	Free (Let’s Encrypt et al.)	Operational (HSM, ops)
Audit / control	External rules, browser policy	Yours

Public CAs were never designed for internal authentication workflows. They are tuned for the public web, with audits and policy constraints that match that purpose. The path forward for internal use cases is a private CA, run by you, with certificate profiles tuned to the workflows you actually have: control over issuance, revocation, EKU sets, SAN shapes, renewal cadence, integration with ACME or EST or SCEP for automation. The setup work is real (HSM-backed keys, an auditable signing endpoint, automated renewal that doesn’t page someone at 2am), but every part of it is the kind of decision a senior platform team should be able to make on a whiteboard.

The deeper lesson is that SSL/TLS, X.509 fields, EKU semantics, and key hygiene are not niche specialties. They are the substrate that almost every other security control sits on top of. A team fluent in the difference between a CA, an intermediate, an EKU, and a SAN, with the muscle memory to read openssl x509 -text output on a bad day, navigates the mid-2026 transition without drama. A team that is not, hits the deadline as a surprise.

A real agentic engineering setup

Running several coding agents on the same repository at the same time is doable, but only with proper isolation. Process-level isolation is not enough; in-place edits from two agents collide on the working tree the first time they both touch the same file. Worktrees are the right unit of isolation, because they map directly onto an agent’s mental model of “the repo”: one branch, one filesystem, one task.

~/work/
├── repo/                       # primary working tree
├── repo-wt-feature-auth/       # parallel agent A
├── repo-wt-bugfix-cache/       # parallel agent B
└── repo-wt-experiment-llm/     # parallel agent C

I settled on a <repo>-wt-<task> naming convention, with one branch per worktree and a clean teardown when the task ends. The adjacent change that paid off immediately was per-project MCP scoping. The default MCP loadout was schema-noisy enough to evict useful context on every turn (every tool definition has a JSON schema, and a pile of unused tools still sits in the prompt). Trimming the loadout to what each project actually needs cut per-turn schema overhead noticeably and freed budget for the work itself.

Multi-agent code review needs the same discipline. Different agents have different default biases (one leans on type-system invariants and barely notices concurrency bugs, another flags everything that vaguely resembles a race condition and skips style); without a way to weigh those voices, the noise compounds and people stop reading the comments. The fix is a paired reviewer-fingerprint file per agent: what it reliably catches, and where it tends to false-positive. With that file in hand, each agent’s review becomes a recognizable voice you can read at a glance instead of a wall of text.

Infra hygiene at scale

A decade ago, a few hundred config values in a shared store felt like a lot. Today, thousands across multiple regions is normal. The per-account, per-region throttle that nobody notices at small scale becomes a self-inflicted denial-of-service the first time a burst of workloads cold-start at the same time and all reach for the same store. The lesson generalizes: capacity planning at scale is mostly about what gets called at the worst possible moment, not what gets called on the happy path.

Bottleneck	Bites when	Mitigation
Param store throttles	Cold-start storm	Caching, lower TPS at startup
Stale resources	After environment churn	Scheduled IaC-reference scan
Tag drift	Console edits	Continuous compliance scan
Long-lived CI keys	Credential leak	OIDC token exchange

Letting state accumulate without a sweep policy is the next pattern. An audit of an established parameter store will turn up orphans almost immediately: leftovers from deleted environments, stale credentials from rotated services, doubled-up entries from a migration that finished six months ago. None of it costs much per item, but the human cost of “what is this and is anyone reading it” goes up with every entry, and the security cost of leaving a stale credential where a workload can still read it is real. A scheduled discovery pass that flags anything not referenced in IaC is the cheapest preventative; mandatory tags (Owner, Environment, CostCenter, Criticality) at variable-check time and as account-level policies turn the same problem into a continuous compliance signal instead of a quarterly cleanup.

The same shape applies to credential management at scale. Long-lived access keys in CI secrets are obsolete; OIDC-based token exchange is the boring, correct default, with trust policies scoped to a specific repository and ideally a specific branch, and one role per environment so a misconfigured pipeline cannot accidentally apply to production. The day-one cost is real (one role per environment, one trust policy per role, one set of permissions to right-size); the day-two-and-N savings are larger than they look until you have lived through the alternative.

Long-form writeups on the PKI and agentic threads are in progress. They will land in the /writing section when ready.