How One Server Can Run Whitelisted Commands on Another Without Sharing SSH Keys

Sharing root SSH keys was the obvious answer. I picked something smarter.

At Tacavar, we run a two-droplet setup. The primary handles production traffic, Caddy, and the main API. The secondary (Bailian) runs batch jobs, media pipelines, and agent orchestration. The problem: Bailian needs to trigger deploys, post Instagram reels, and write video briefs on the primary. Cross-server RPC is unavoidable. The naive fix is to copy root’s private key to Bailian and call it a day. That works until it doesn’t.

The Problem: Two Droplets, One Deployment Pipeline

Every distributed system eventually needs one machine to command another. In our case, Bailian finishes a video render and needs the primary to update the hub dashboard. Or a domain scanner on Bailian finds a pending drop and needs the primary to run a renewal check. These are discrete, well-defined operations. They are not “open a shell and figure it out.”

The standard playbook says: generate an SSH key pair, drop the public key into primary:/root/.ssh/authorized_keys, and let Bailian ssh in as root. That gives you full shell access, passwordless, with no granularity. Any process on Bailian that can read that key can run any command on the primary. It is the fastest path and the worst architecture.

Why SSH Key Sharing Is a Blast Radius Disaster

SSH key sharing violates least privilege at the infrastructure level. Once a key is on a secondary server, you have expanded the trust boundary to include every process, cron job, and compromised container on that machine. If Bailian gets hit, the attacker inherits root on the primary by default. That is not a security model. It is a liability structure.

The blast radius is total. There is no command filtering, no rate limiting, and no audit trail beyond SSH’s own logs, which are noisy and non-semantic. You cannot tell from /var/log/auth.log whether someone ran git pull or rm -rf /. For devops security, that opacity is unacceptable. You need to know what ran, when, and why.

The Whitelist Dispatcher Pattern

We built deploy_gate.sh on the primary. It runs as a dedicated deploy user with passwordless sudo for a specific set of whitelisted commands. Bailian connects over SSH using a dedicated key that is not root’s key and has no shell access. It sends command names, not raw bash. The dispatcher maps each name to a predefined, auditable action.

Current whitelist includes: instagram_post, domain_pending, pip_install_*, video_brief_write, and deploy_restart. Each maps to a script or a tightly scoped sudo invocation. If Bailian sends instagram_post, the dispatcher runs the exact handler we wrote. If it sends rm -rf /, the dispatcher returns “not in whitelist” and exits. The secondary cannot escape the predefined surface.

This is cross-server RPC with a contract. The whitelist is the API schema. The SSH layer is just transport. The security boundary is the dispatcher, not the key.

The Audit Log That Doubles as Documentation

Every call to deploy_gate.sh appends a line to /var/log/deploy-gate.log with a timestamp, the command name, the originating IP, and the exit code. That log is not an afterthought. It is the source of truth for what the secondary actually did to the primary.

More importantly, it doubles as documentation. A new engineer can read the whitelist and the log and understand the entire cross-machine contract without reading a wiki page. When we added instagram_post on 2026-04-07, the change was a single line in the whitelist and a corresponding handler script. The log immediately showed the first production call, confirming the integration worked. No separate docs needed updating. The system documents itself.

Adding New Commands Explicitly

There is no dynamic command injection. Adding a capability requires editing the whitelist and writing a handler. That friction is intentional. It forces every new cross-server operation to pass a human review. You cannot accidentally expose a new surface because the dispatcher has no fallback to shell evaluation.

This is the opposite of “move fast and break things.” It is “move deliberately and know exactly what broke.” For infrastructure that runs revenue-generating services, that is the correct tradeoff.

What Happens If the Secondary Server Gets Compromised

This is the test that matters. If Bailian is compromised, the attacker gets the deploy key. They can connect to the primary’s dispatcher. They can run exactly the commands on the whitelist. Nothing else.

They cannot get a shell. They cannot read arbitrary files. They cannot pivot to the private network bridge or access the Caddy configuration. The blast radius is bounded by the whitelist. Every action they take is logged in real time. If they spam instagram_post, the log shows it, and we can revoke the key in seconds.

Compare that to a shared root key. Compromise the secondary, and you own the primary. No boundary. No log granularity. No recovery path except rotating root keys across every machine.

The whitelist dispatcher is not perfect security. It is bounded security. In practice, that is what keeps you alive.

Tacavar designs secure automation pipelines for distributed infrastructure. See our DevOps capabilities at tacavar.com.