OpenClaw on Oracle Cloud Free Tier: Lobster in the shell

AI-rendered cartoon lobster perched on Luciano's shoulder while he eats dinner, with a small pixel-art orange Claude Code mascot sitting on the table in the foreground; the lobster wears an OpenClaw badge. — Adolfo riding shotgun, with the Claude Code mascot finally pulled out of the fallback chain.

TL;DR. OpenClaw is open-source. Oracle’s VM.Standard.A1.Flex is free forever. Stitched together with Terraform, you get a personal multi-channel AI assistant that bills you exactly $0.00 a day. The pitch ends there. The reality is that you have now adopted a small AI that needs to live on a leash, and the leash is more interesting than the model.

The assistant is named Adolfo (Project Hail Mary’s Rocky, with a faded Johnny Silverhand overlay). It answers DMs on Telegram and Discord, hangs out in four WhatsApp groups, and sends me a daily summary at 7 a.m. The lobster is a process. The shell is the VM. After living with it for a few months, the strange conclusion I keep landing on is that this pile of middleware and spaghetti .md files is, structurally, what the current wave of “AI” actually is. Models are the engine. The leash, the harness, and the config telling the engine when to shut up are the product.

The shell that costs nothing but isn’t free

Three constraints fall out of “Always Free” the moment you take it seriously. Tailscale-only ingress removes the public-internet attack surface, the cert dance, and the question of whether any given probe is hostile or curious. Single VM means there is nothing to scale out because there is no second box, which forces every feature to fit inside a fixed process budget. And Always-Free reaps any instance that goes idle for seven days, so the cron jobs that make the assistant useful (morning summary, daily work log, group icebreakers) double as the heartbeat that keeps the host alive.

shell
├── VM.Standard.A1.Flex   4 OCPU / 24 GB
├── Tailscale only        no public ports
├── Terraform IaC         single source of truth for the box
└── single VM             everything colocated

What you are paying instead of dollars is attention. The free tier is a contract: stay inside the box, keep the box warm, do not paint yourself into a corner that requires a bigger box to get out of. None of these are difficult on their own. Together they shape the architecture more than any technical choice would. The whole stack is provisioned in Terraform, so if the box ever does get reaped, a terraform apply brings it back and an idempotent deploy script puts the assistant back on top.

The provider cascade

The hard part of running an assistant on free-tier infrastructure is not the infrastructure. It is the inference. Models are the actual recurring cost, and “free” inference comes with rate limits, randomly-dropped sessions, and the occasional 410 with no body and no log line.

Tier	What it is	Used for
Reasoning	A frontier-model CLI behind subscription OAuth	DM thinking, hard cron tasks
Default chat	Free-tier hosted models (Kimi K2.5, GLM-5.1, Qwen3-Coder, Nemotron via NVIDIA NIM)	WhatsApp groups, fast turns
Cheap fast	Gemini 2.5 Flash and Pro	Cron icebreakers, summaries
Last resort	Cerebras Qwen-3-235B, Groq Llama-4-Scout	When everything else times out

The reasoning tier deliberately does not run through an API key. Going through the CLI’s subscription OAuth means inference counts against my flat-rate plan, which is what I am already paying for. The split also lets me reserve the heaviest model for DM conversations, where multi-turn context preservation matters most, and keep group replies on cheaper free-tier models that ship answers in under three seconds.

The mechanism that makes this stack actually work is the fallback chain. Each cron and each agent has an explicit, ordered list of LLM backends; the runtime walks it any time the current candidate errors, times out, or returns garbage. The four critical crons all carry the same eight-deep chain: one frontier-model CLI tier, four free-tier hosted models, two Google variants, and a specialty fast-inference provider as the last resort. Empirically Kimi K2.5 returns a 410 at least once a day, and the chain catches it inside a minute.

With the chain, the bot is more reliable than any single LLM behind it.

If “free-tier infrastructure with paid inference” sounds slightly cheating, fair. The point is not that nothing has costs. The point is that costs are visible, capped, and live where the value is, in the model rather than the shell.

The leash is the product

The model itself is not doing anything special in this story. The reasoning model that drafts my morning email rollup is the same reasoning model anyone with a subscription can talk to in any browser. What makes Adolfo Adolfo is roughly twelve kilobytes of system prompt across IDENTITY.md, TOOLS.md, and per-group rules. It is the cron schedule. It is the eight-deep fallback chain. It is the exec policy that says the assistant can read but not write outside its workspace directory. It is hundreds of refusal-pattern test cases that taught the bot to decline an off-color request without sounding like an HR memo. Iterated by hand, in markdown, in a directory I version like code.

Models are the engine. The leash is the product.

The clearest place to feel this is in group chats. Adding the bot to a friends’ group is a multiplier on day one: it ice-breaks dead threads, follows up on running jokes, references inside-baseball from previous conversations because it actually has memory of them. Then a quieter day arrives where the room is doing fine without it, and the bot keeps replying anyway, and you realize you have built something that does not know when to be silent. The fix is not a smarter model. The fix is configuration: per-group reply rates, trigger words that hard-mute the bot for a fixed window, a soft-mute heuristic that goes quiet for the rest of the session if anyone tells it to shut up. Designing the off-switch took more iterations than designing the on-switch.

The hype around AI right now is largely model launches, but the value people experience is shaped by middleware that nobody has named yet. OpenClaw is one such middleware. Cursor, Aider, n8n, the various agent frameworks are too. The model is the steam engine. The locomotive, the rails, the timetable, and the conductor’s whistle are the product you actually buy a ticket for, and we are not yet very good at any of those.

What free-tier teaches that paid infrastructure doesn’t

Constraints make every decision visible. Always-Free forces the colocation question early. Tailscale-only forces the access question early. Subscription-rate inference forces the fallback chain early. Each of these is a question you would answer eventually anyway. The difference is whether you answer it before deploying or after the bot has gone mute in the family chat for four hours and your mother has texted you twice.

The lobster is fine. The shell is fine. They both cost zero. The attention I pay instead is, in some sense, the fee. The side effect of paying it is that I now believe the future of “AI” is going to look much more like middleware than like model releases. The engine is impressive. You just cannot ride it without the harness.

Update, May 2026: Sonnet collapsed the chain

The eight-deep free-tier fallback chain collapsed about two weeks ago. Sonnet 4.6 at max thinking is now the default workhorse; Opus 4.7 max stays pinned to Gmail triage and Hermes escalations. The free-tier providers above are still wired in, but they sit at the bottom of the chain now, not the top.

A few months of living with the eight-deep chain made one thing clear: most “default chat” turns did not need a different model, they needed a reliable one. Subscription-rate Sonnet through the CLI’s OAuth tier matches Opus on structured-and-bounded work at roughly half the latency, and counts against a flat monthly fee I am already paying. So the four free-tier hosted models that used to sit at positions two through five (Kimi K2.5, GLM-5.1, Qwen3-Coder, NVIDIA NIM Nemotron) got demoted to the back of the chain, behind the subscription tiers. Gemini 2.5 Flash and Pro sit below them. Cerebras and Groq are still the very last resort.

Tier	What it is	Used for
Reasoning (rare, important)	Opus 4.7 max through subscription OAuth	Gmail triage, Hermes escalations, repair runs
Default workhorse	Sonnet 4.6 max through subscription OAuth	DM thinking, group replies, scheduled crons
Fallback hosted	Kimi K2.5, GLM-5.1, Qwen3-Coder, Nemotron via NIM	When the subscription tier rate-limits or errors
Last resort	Gemini 2.5 Flash/Pro, Cerebras Qwen, Groq Llama	When everything paid above has refused

Two exclusions stayed off Sonnet on purpose. Hermes (the escalation surface my assistant routes through when a turn looks emotionally heavy or unfamiliar) stays on Opus 4.7 max because the cost of a tone-deaf reply there is higher than the cost of the extra latency. Gmail triage stays on Opus 4.7 max for the same reason: misclassifying an inbound from a hiring manager as a newsletter is the kind of mistake the slower model catches and the cheaper one does not. The rule is roughly “anything where a wrong reply embarrasses the person reading it stays on the heaviest model”; everything else is Sonnet by default.

When subscription inference catches the free tier on latency at frontier quality, the chain stops being a search and starts being a guard.

The generalized lesson is about chains in general, not about which model won this month. A fallback chain whose top entry is a paid frontier model behaves very differently from one whose top entry is a free hosted one. With free at the top, the chain is exploratory: every cron is hunting for any backend that will reply. With subscription at the top, the chain is defensive: most days only the top entry runs, and the rest is a quiet insurance policy that exists for the half-hour a month when the primary provider has a bad incident. Same code, different posture. The hype around model launches will keep telling you which engine is fastest. The middleware question is whether your harness knows it has been upgraded.