Full Stack Optimization: more than just Kubernetes

June 25, 2025

Share this post

Full Stack Optimization: more than just Kubernetes

In 2025, Kubernetes dominates the cloud container landscape. It’s the “operating system of the cloud”. Yet, many organizations are finding out how expensive it can be to run and scale Kubernetes. But it’s not only about efficiency and costs: achieving good service reliability is another significant challenge. OOM kills anyone? Overprovisioning to avoid an unwanted outage is costing companies a lot more than they budgeted for when they initially migrated to containerized applications. While many operational teams focus on right-sizing pods and tuning cluster resources to address this, many are still missing a fundamental layer that often holds the key to poor reliability and overspending: the language runtime of the application itself. As a result, they are leaving a lot of money on the table and suffering unneeded reliability incidents. 

The reality is that modern applications don’t run directly on Kubernetes. They execute on language runtimes like the Java Virtual Machine, the Node.js V8 engine, .NET CLR, or the Go runtime. None of these can be considered “tuned” right out of the gate. These runtimes have an intimidating number of configuration options for memory allocation, garbage collection, thread management, and compilation. Each of these settings can make a big difference in performance and resource utilization. Yet, this area goes ignored because the platform teams that deploy the applications are focused on Kubernetes and do not have the same concerns about the applications themselves as a developer or a performance engineer would.

Full Stack Optimization_ More Than Just Kubernetes
Java 21 has an outstanding 537 configuration options. How many are you actually setting?

The Hidden Costs

The main approach SREs and platform teams use today is tuning pod requests and limits. That’s right, as applications are where most of the wastage comes from. However, the approach is misguided: it focuses exclusively on Kubernetes metrics like CPU utilization and memory consumption at the pod level. What’s happening inside the application is not their concern. To be transparent, it isn’t easy to tune at that level when the configuration could be extremely specific per application, and it means looking at things in a very granular way. It’s hard! The results of focusing on the infrastructure cause at least three common problems that we see as a pattern at Akamas.

Memory Misconfiguration Leads to Cascading Failures

Consider a typical Java application running in a Kubernetes pod. Infrastructure monitoring might show that the pod consistently uses 2GB of memory, prompting a recommendation to reduce the memory limit from 4GB to 2.5GB for cost savings. However, this recommendation ignores the JVM’s heap configuration, which might be set to use up to 3GB based on the original 4GB limit.

The result? Even a slight traffic increase may induce the JVM to use all the allocated heap, and Kubernetes kills the pod entirely. These reliability problems aren’t visible in infrastructure metrics until it’s too late.

Full Stack Optimization_ More Than Just Kubernetes - OOM Kill

Auto-Configuration Creates Performance Cliffs

Java’s runtime is designed to react based on your pod resources. When you reduce a pod’s resource allocation, the runtime responds by adjusting its internal parameters. The JVM will reduce the heap size, adjust garbage collection algorithms, and modify thread pool sizes.

These automatic adjustments often create performance issues. Even a slightly smaller memory limit may reduce the JVM heap size, which in turn might trigger significantly more frequent garbage collection, consuming CPU cycles and increasing response latency. The Node.js V8 engine exhibits similar behavior, where memory pressure can cause dramatic changes in resource consumption and execution performance.

Full Stack Optimization_ More Than Just Kubernetes - Heap Pressure

Efficiency Gains Require Runtime Tuning

While you can reduce some level of waste by right-sizing your infrastructure, it doesn’t make the applications any more efficient. True efficiency comes from optimizing how the application runtime uses resources. A common problem is that developers set Java heap memory way higher than what’s actually needed by the application. Properly sizing the heap can cut the memory footprint in half, with the same performance. Different garbage collection algorithms in the JVM can vary CPU usage by 30% for identical workloads. The default V8 engine settings optimized for browser environments are not necessarily the right ones for high-throughput server workloads that Node.js requires.

By tuning the runtime parameters, you can many times get the same performance with significantly fewer resources being consumed. You can also improve performance with the same resource allocation. Sometimes the results can be dramatic and show up as a happy surprise on your monthly cloud invoice. This level of optimization is impossible without understanding the application layer. 

Full Stack Optimization_ More Than Just Kubernetes - Cost saving

The Kubernetes Dependency Challenge

Full stack optimization becomes even more complex when considering how Kubernetes and the application runtimes interact as you leverage Kubernetes autoscaling. Horizontal Pod Autoscaling (HPA)  decisions are based on metrics and thresholds you define. If your application runtime is poorly configured and not aligned with pod requests and limits, these systems might scale incorrectly, creating an excessive number of pods that trigger unnecessary cluster scale-out, or performance-degrading under-provisioning.

Full Stack Optimization More Than Just Kubernetes
There’s more to Kubernetes scaling than meets the eye.

Node selection presents another challenge. Should your workload run on compute-optimized instances with high CPU-to-memory ratios, or memory-optimized instances with more RAM per core? The answer depends entirely on your workload resource shape, which is primarily driven by the application’s runtime configuration. Kubernetes provides built-in cluster autoscalers that can be used to match provisioned resources with actual workload requests. However, simply turning on a cluster autoscaler may not provide optimal efficiency: your node groups configuration is key to ensure node shapes match your workloads.

Multiply these decisions across hundreds of microservices, each with different runtime requirements and resource patterns. Manual optimization is NOT practical when you scale, but when the automated tools ignore the application layer, it means you WILL be overspending and suffer reliability incidents because of that blind spot.

Beyond Infrastructure Metrics

We believe “full stack” optimization requires analyzing application performance metrics alongside infrastructure utilization. This means considering response times, throughput rates, error percentages, garbage collection frequency, and more. You need to dive as deep as you can into the application layer metrics to get the whole context that infrastructure metrics alone cannot deliver.

This is why a full stack, or “application-aware” optimization approach, is an essential part of any sound Kubernetes optimization strategy. This approach enables effective collaboration between platform engineering, SREs, and development teams, leading to more efficient and performant applications.

When you have an optimization tool that can consider both layers simultaneously, you will start seeing what you have been missing this whole time. You will see when an application might show low CPU utilization at the infrastructure level while experiencing high garbage collection overhead, that is consuming available CPU cycles inefficiently. A traditional tool might recommend reducing CPU allocation, but an “application-aware” tuning product will suggest fixing the garbage collection stress instead.

This means you can improve cost efficiency AND application performance. Instead of choosing between saving money and maintaining performance, you can have both.

The Path Forward

The organizations that use our product consistently report significant improvements, sometimes dramatic ones. Cost reductions of 20-50% are common when application tuning is combined with infrastructure optimization. Misconfigurations that lead to reliability issues are quickly identified and fixed. Performance improvements often exceed the cost savings, with response time reductions of 30% not uncommon. Often, Akamas pays for itself in the early stages of the tuning process, when teams first start using it.

Rather than constantly fighting resource constraints and performance issues, teams can now proactively tune their specific Kubernetes environment for their unique applications and situations. Those organizations that embrace application-aware optimization will find themselves with more efficient, reliable, and cost-effective Kubernetes deployments.

Ready to discover how much performance and cost efficiency your Kubernetes applications are leaving on the table? Learn more about implementing full-stack optimization for your organization through the Akamas contact page.

Author: Scott Moore

See for Yourself

Experience the benefits of Akamas autonomous optimization.
No overselling, no strings attached, no commitments.