Infrastructure·2026-05-21·9 min read

How a Series A AI startup cut a $14k/mo AWS bill to $2,400/mo in one weekend

Repatriation is the right call for 70% of B2B SaaS workloads at Series A. Here's the math, the runbook, and the answer to "but what about elasticity?"

By Ijeoma Eze · Head of Infrastructure

Two-region private cloud topology with DeSoto and Flint regions

The customer was a Series A AI infrastructure startup running on AWS us-east-1. Their monthly bill had climbed from $4k to $14k over 18 months as their RAG workload scaled. They had two engineers spending roughly 25% of their time on cost management — Reserved Instance math, NAT-gateway optimization, S3 lifecycle rules, CloudWatch retention tuning.

We started with their last three AWS invoices under NDA. The line-item breakdown: $4,200 EC2 (with RI applied). $2,800 RDS. $1,900 NAT gateway + inter-AZ data. $1,600 S3 + storage. $1,400 CloudWatch + GuardDuty. $1,100 Bedrock + embedding API. $1,000 misc.

The proposal: Ultiblob Pro tier × 3 (sharded by service) + L40S inference node + Pure FlashArray slice. Monthly total: $2,400. Egress, monitoring, managed support, and Anthropic Claude integration with prompt caching all included.

The savings broken down: $1,900 from removing inter-AZ + NAT-gateway charges (Ultiblob's dedicated tenancy + private VLAN architecture eliminates this entire category). $4,800 from removing managed-service surcharges (RDS, ELB, CloudWatch — replaced by Ultiblob-operated equivalents). $3,200 from dropping Bedrock for direct Anthropic Claude integration with prompt caching (50-70% token reduction). $1,700 from compute-tier rightsizing (most of their EC2 fleet was 40% utilized). Net: $11,600/mo savings.

The migration ran on a Saturday night between 11pm and 4am. Database cutover was the critical path: PostgreSQL logical replication to the Ultiblob Pro tier database, traffic flipped via Cloudflare DNS, AWS resources kept running in read-only mode for 72 hours as the rollback path. Total downtime: 14 minutes.

What didn't change. Same application code. Same Docker images (with Ultiblob registry pointing at the existing GitHub Actions pipeline). Same observability stack (the customer kept Sentry; we offered Grafana Cloud for infra metrics but they wanted to standardize on Datadog later). Same customer experience — sub-100ms p95 latency on the AI inference endpoint after the move.

What about elasticity. The objection we get most often. Answer: this workload was running at steady-state — they were paying for elasticity they never used. For workloads that actually do burst (B2C consumer apps, gaming, viral content), public cloud is still the right answer. For B2B SaaS that scales linearly with customer count over months, dedicated tenancy is decisively cheaper.

The runway math. $11,600/mo savings × 18 months of typical Series A burn = $208,800. That's three additional months of runway, or one additional senior engineer hire, or a marketing budget that didn't exist before. For an early-stage startup, that's a meaningful number — not a nice-to-have.

If you want us to run this same analysis on your AWS bill, the offer stands: send the last three months under NDA, we'll return a line-by-line Ultiblob proposal inside one business day. No commitment, no sales follow-up sequence — we'll let the math do the talking.

#aws-repatriation#tco#ai-startup

Keep reading

How a Series A AI startup cut a $14k/mo AWS bill to $2,400/mo in one weekend

More from the team

Azure Local in 2026: a buyer's guide for mid-market enterprises

How Ultiblob ships production apps in under two hours

Want this for your team?