Cut AWS Costs ~70% by Consolidating Serverless APIs

When people talk about slashing AWS costs, the conversation usually jumps to reserved capacity, cheaper storage, or turning off idle resources. Those levers matter. In this project, the largest win came earlier in the stack: how the serverless system was shaped—not only how much of it ran.

If you're running a SaaS product or internal platform, this kind of architecture usually shows up in three ways:

Your AWS bill grows faster than your usage
Latency feels inconsistent under load
Your team spends more time maintaining than shipping

This case study is about fixing all three at once. The organizational side of that—shipping before everyone agrees what “good” looks like—shows up in other forms too; I wrote about that pattern in the most expensive mistake I keep seeing in software projects.

This came out of a production system handling real traffic—not a greenfield example.

How Slashing AWS Costs Starts with Architecture, Not Optimization

You can right-size instances and hunt for idle resources all day. If the architecture fragments traffic across dozens of tiny functions, you still pay for complexity: cold starts, operational overhead, and a bill that grows with surface area, not just usage. Slashing AWS costs, in practice, often means redesigning boundaries before you tune knobs.

The API surface had grown to more than eighty Lambdas behind API Gateway—one function per route, more or less. We consolidated around domain boundaries and ended up with roughly fifteen to twenty service-style functions. Infrastructure spend dropped on the order of sixty to seventy-five percent for the relevant slice of the bill, latency became more predictable, and the team stopped drowning in deployments. Your mileage will vary; the point is where the leverage actually was.

The original system: eighty-plus endpoints and growing

Fragmented Lambdas and API Gateway sprawl

The pattern is common: one Lambda per HTTP endpoint. Early on it feels clean. Each function is tiny, isolated, and easy to reason about in isolation.

Then the count climbs:

Dozens of functions, each with its own route
Matching explosion in API Gateway configuration
A CloudWatch log group per function (or equivalent noise)
Pipelines or deploy steps multiplied across the same surface

What started as separation became operational sprawl. The system did not fail because serverless was wrong. It failed because every route was treated like its own product.

Cold starts and thin traffic

Traffic was spread across too many cold execution units. Each function saw relatively little volume, so warmth did not accumulate where it mattered. Cold starts showed up often enough that latency felt random—not catastrophic, but inconsistent in a way users and monitors both notice.

Growing the surface area made the system slower to feel at the edge, not faster.

Deployment and shared logic

Anything cross-cutting—auth, DB access patterns, validation—had to be duplicated or republished across many packages. A small fix could mean many deploys or subtle version skew. CI time climbed. Fear of touching “the wrong” function climbed with it.

At that point the cost was not only dollars on the invoice. It was fragility and drag.

What was actually wrong: infrastructure sprawl

Too little shared structure

Isolation is a tool. When every handler is a silo, you lose:

Sensible reuse of connection setup and middleware-style concerns
One place to fix a class of bugs
A mental model that matches how the business thinks about the product

Everyone reinvented the same wheel in eighty slightly different ways.

Over-isolation versus practical scale

The mistake was not “using Lambda.” The mistake was equating an HTTP endpoint with a service boundary. Endpoints are not domains. Domains are where cohesion actually lives.

The strategy: slashing AWS costs by consolidating the right way

Service-shaped Lambdas

Instead of eighty-plus one-off handlers, we grouped work into a smaller set of domain services—think user lifecycle, trades, reporting—each owning a family of related operations. Names and boundaries matched how the product actually behaved, not how the router was wired on day one.

Logical boundaries, not technical ones

The shift in rule was simple:

Before: one route, one function.
After: one cohesive service, many actions inside it—still modular inside the codebase, still testable, but one deployable unit per domain where it made sense.

That is how you get shared execution context, shared warm behavior for related traffic, and fewer moving parts in the control plane—without pretending the whole API is a single blob.

How the consolidation was implemented

Unified API Gateway entry points

We reduced the number of external routes by routing related work through fewer integration points. Fewer routes means less API Gateway surface, less configuration drift, and less to audit when something breaks.

Action-style routing (query parameters)

For some services we used explicit actions on a stable path—for example, different operations on /user distinguished by a query parameter such as action=getDetails versus action=updateProfile. This is not the only way to structure it. In many cases, HTTP verbs and path-based routing are cleaner. In this system, query-based dispatch matched the existing clients and let us consolidate without breaking contracts.

The implementation detail matters less than the idea: one Lambda integration can serve a coherent set of operations if the contract is deliberate.

Internal dispatch

Inside each service Lambda, requests were dispatched to small modules—handlers per action, shared utilities for auth and persistence. Think of it as a thin router at the edge of the function, not a random pile of if statements.

switch (action) {
  case 'getDetails':
    return getUserDetails();
  case 'updateProfile':
    return updateProfile();
  default:
    return notFound();
}

In production this was structured more cleanly (validation, error mapping, logging)—the point is modularity inside the function, not a monolithic script.

Reusing resources inside the service boundary

With related traffic hitting the same execution context more often, connection reuse and shared initialization actually paid off. Auth and data-access code lived in one place per domain. Fewer copies of the same bootstrap logic meant fewer places for drift.

Architecture shift: from endpoint sprawl to domain-based service functions

The consolidation story in one frame: 80+ route Lambdas folded into ~15 domain services—less surface area, lower burn, steadier latency

The difference here is not cosmetic—it's structural.

On the left, each endpoint lives in its own Lambda. Traffic is fragmented, logic is duplicated, and cold starts are more likely because each function handles only a thin slice of requests.

On the right, related operations are grouped into domain-based service functions. Traffic concentrates, shared logic becomes reusable, and the system becomes easier to operate and scale.

This shift—from endpoint-based design to domain-based service functions—is where most of the cost and complexity reduction came from.

Before and after: what actually changed

Area	Before	After
Lambda functions (approx.)	80+	15–20
API routes (approx.)	80+	~20
Deployable units for that surface	One per route	One per domain service
Shared cross-cutting logic	Scattered / duplicated	Centralized per service

Deployments became fewer, faster, and easier to reason about. When something broke, you knew which domain to open—not which of eighty nearly identical entry points got unlucky.

Results: what slashing AWS costs looked like in practice

Bill impact

Consolidation reduced:

Per-function overhead and the “many small things” tax across logging, monitoring, and Gateway
Pressure to paper over cold starts with expensive knobs where they were not otherwise justified

In this codebase the infrastructure line item we cared about landed roughly sixty to seventy-five percent lower after the change—about 70% in round terms. In practice, this meant going from mid four-figure monthly costs on that layer to something much closer to low four figures. Exact numbers depend on region, traffic shape, and what else shares the account—treat that band as evidence of magnitude, not a promise for every app.

Performance

More related requests hit the same warm pools. P99 latency stopped bouncing as hard. The system felt boring in a good way.

Team velocity

Fifteen services are easier to hold in your head than eighty functions. Debugging, onboarding, and release confidence all improved.

Fragmented Lambdas quietly taxing your bill and latency? I help teams consolidate serverless boundaries around real domains—see full-stack web development.

Tradeoffs and what I would do differently

When not to consolidate

Do not merge things that have different scaling profiles, different blast radius, or real independence—for example, a high-risk integration that should stay isolated, or a path that must scale on its own SLAs. Consolidation is a design choice, not a moral imperative.

Avoiding a monolith inside Lambda

Structured consolidation is not “one giant handler for everything.” The rule is clear internal modules, explicit boundaries, and tests so the inside of each service does not turn into spaghetti. If you cannot draw where User ends and Billing begins, you have not designed—you have merged.

Practices that generalize

Group by domain, not by endpoint

Name and structure around User, Trade, Report—not around GetUserLambda as if it were the center of the universe.

Keep functions cohesive but adaptable

You want enough sharing to kill duplication and cold-start noise, and enough separation that one domain’s mess does not own everyone’s outage.

Lessons

Think in systems, not in function counts

Optimizing eighty Lambdas one at a time would have been endless whack-a-mole. The win came from changing the shape of the system so the platform stopped fighting us.

Tie engineering to outcomes

This work was not resume-driven. It cut cost, stabilized latency, and gave the team time back. That is the bar I care about when advising on architecture—whether on AWS or elsewhere. If you are weighing a similar rebuild or consolidation, the way I take on full-stack and platform-leaning work is outlined on my full-stack web development page. On the frontend side, what teams get wrong when they move to Next.js is the same class of problem: expecting the framework to fix unclear structure.

How this maps to SaaS today

Serverless and managed services are still a strong default for many products. As you grow, complexity compounds unless you deliberately collapse what does not need to be separate. The goal is lean and predictable, not “as many Lambdas as possible.”

FAQ: slashing AWS costs and serverless consolidation

What usually drives AWS cost in serverless apps?

Often it is too many discrete units (functions, routes, environments), cold paths you pay for in latency or provisioned workarounds, and data transfer and logging you did not budget as first-class. Architecture shows up on the bill, not only instance sizes.

Does consolidating Lambdas hurt scalability?

Not when domains are honest. You still scale per function; you just stop fragmenting related load across dozens of cold sandboxes.

How many Lambdas should a system have?

There is no magic number. Fewer well-bounded services usually beat many one-route functions when they share logic and traffic patterns.

Is this pattern always right?

No. Highly independent workloads, strict isolation requirements, or very different scaling needs may warrant more separation, not less.

What about API Gateway complexity?

Fewer, intentional routes generally reduce configuration burden and failure modes—as long as your routing contract is clear to clients.

How do you avoid a monolith?

Modules, tests, and domain language inside each service. If “consolidation” means one file with five hundred branches, you have made a different problem.

Conclusion

Slashing AWS costs, in this story, was not about cutting features or refusing to scale. It was about aligning structure with how the product and traffic actually behave—so the cloud bill, the latency charts, and the team’s calendar all moved in the same direction.

If you want a deeper reference from AWS itself, their Lambda best practices guide is a solid complement to any consolidation effort.

If you want hands-on help reviewing consolidation boundaries or AWS shape, see how I work with teams or contact me.