Kubernetes is one of the most rapidly adopted standards in the IT industry. As of 2024, over 60% of Enterprises had adopted it for their de facto container orchestration solution, and that number is projected to exceed 90% by the end of 2027. While impressive, it’s important to address the concerns around Kubernetes out loud. It also brings with it unprecedented levels of complexity.
One area impacted is performance engineering. Traditional performance engineering approaches originally built for static infrastructures are not adequate for dynamic, distributed environments. A new paradigm is needed.
This blog explores how Akamas and Speedscale offer complementary capabilities that enable a radically new performance engineering approach. This combined solution goes beyond reactive troubleshooting to continuous optimization.
Challenges of Performance Engineering in Cloud-Native Architectures
Modern application architecture is now characterized by microservices and container orchestration using Kubernetes. Traditional methodologies used to tune applications of the past will struggle when trying to apply them to containerized applications. The level of complexity increases because there are simply more moving parts going from monolith to microservices.
A single microservice often depends on other services, creating dependency chains that are difficult to map and test completely. A performance bottleneck in one service can cascade and show up as an issue in a seemingly unrelated service. This can obscure the real root cause of a problem.
Kubernetes-Specific Pitfalls
Kubernetes has its own set of performance pitfalls that can significantly impact application reliability, user experience, and operational costs. We’ve written about our top 10 performance pitfalls in a previous article. Incorrectly set pod CPU requests & limits, exhausted node resources (e.g., memory), and inefficient cluster autoscaling are just a few of these. Application runtime tuning presents another layer of complexity.
Java applications running on Kubernetes can be particularly tricky due to the large JVM configuration surface and inefficient default settings. Node.js applications often face performance and reliability issues due to their unique (and many times not very well-documented) runtime characteristics.
The fine line between balancing performance, reliability, and costs is almost impossible to manage manually.
The Cloud Is Not Magic
Many believe the cloud has “unlimited resources”. While it does have the ability to auto-scale resources based on demand, this is a double-edged sword for performance testing. Poorly written applications will take as much of a resource as it is given, creating the risk of very expensive cloud bills.
Elasticity is great for uptime, but makes it harder to establish performance baselines. Consistent and reliable test results are often difficult to achieve. The underlying infrastructure for a test run might change, making it impossible to obtain truly consistent response times and throughput.
Using the cloud also means the possibility of multi-tenant interference. This is noise that can affect the accuracy of test results and mask real issues. It can lead to false positives during peak testing windows. Shared, dynamically allocated cloud resources are fundamentally opposite to controlled, repeatable conditions traditionally required in performance testing. This makes it difficult to predict real-world performance or justify optimizations.
Testing Environment Bottlenecks
Environments have always been an issue with performance testing projects. Even more so with microservices. Complexity affects productivity. Creating realistic, isolated, and cost-effective test environments that accurately replicate production can be a major bottleneck. It hinders the agile development process.
A single microservice can have several dependencies. Testing a single service in isolation requires mocking these dependencies, while integration testing requires a running instance of the entire dependency chain. Traditional mocks don’t always reflect a high diversity of data and must be continuously maintained. This can distort results. A manual effort is time-consuming and error-prone, rarely yielding mocks that are truly as real as production.
Using production data for testing is a terrible idea unless it’s been obfuscated. Sensitive data like PII, financial records, and health information have big privacy and compliance risks. Copying full production datasets can be impractical due to their massive size. Periodic database dumps can quickly become outdated, missing recent schema changes.
Not being able to quickly create and deploy production-like, isolated test environments with realistic data and dependencies causes compromises that lead to unreliable test results. This increases the risk of production failures.
Traditional Load Testing Limitations
Traditional load testing tools and methodologies have to change to meet the demands of cloud-native, microservices-based applications. Some teams will not have the luxury of writing traditional test cases and maintaining them. Tools like JMeter require scripting and several plugins to get useful test results.
Another limitation is the lack of real-world traffic as a model. Synthetic workloads are sometimes a “best guess” based on anecdotal information from the testing team or operations. They may not capture the reality of what’s happening in production. This means that even if tests are run, they might not reflect how the application will behave in the real world.
New solutions for next generation cloud-native performance engineering
AI-Powered, Autonomous Optimization Platform
Akamas has emerged as a unique solution in the realm of performance engineering, especially when it comes to Kubernetes-deployed applications. Its core value lies in its ability to move beyond the guesswork of trial-and-error to deliver the optimal configurations for the best performance and efficiency. It does this in a fraction of the time that it used to take with traditional methods, thanks to AI.
The platform supports two main scenarios:
- Optimize in staging environments by automating performance tuning with Akamas Offline
- Optimeze directly in production by discovering optimization opportunities and providing ready-to-apply recommendations for K8s clusters and workloads with Akamas Insights.
This post focuses on Akamas Offline and the Speedscale integration to bring performance testing and tuning to a new level.
The AI-Driven Approach
“AI-powered autonomous optimization” – what does this mean?
At its core, Akamas employs patented reinforcement learning AI. Before the hype of large language models and “agentic” labeling of everything, Akamas was already using advanced AI techniques to intelligently set the best configurations within hours of the first tuning experiments.
In days past, performance engineers would gather in a lab with a production-like environment and use manual methods to create a multitude of load tests, analyze the results, and make small changes to the configuration to determine which one was best. This process could take weeks or even months, involving endless trial-and-error tuning.
Think of the sheer number of combinations of a Java application, with 500+ JVM options, K8s pod resources, scaling policies, etc. to tweak for each microservice. Manual optimization is practically impossible to achieve. Akamas’ AI directly addresses this human limitation, transforming a time-consuming, expert-dependent task into an automated, data-driven process.

Unlike code profiling techniques, Akamas doesn’t require code changes or agents within the application itself. Instead, it leverages existing observability tools to collect KPIs. It can integrate with major vendors like Dynatrace, DataDog, as well as open source tools like Prometheus. This is crucial for its AI to learn from real-time system behavior, ensuring minimal overhead and no distortion of application behavior.
What do we mean by “autonomous”?
This means Akamas automates the optimization process driven by a goal the user can set, like “improve performance” or “reduce costs while matching my latency SLOs”.
Akamas reconfigures your systems by applying (or recommending) configuration changes and identifies the best configuration to achieve that goal. It can operate in a fully autonomous mode (directly applying changes) or a semi-autonomous mode (providing recommendations that require human approval).
What about “Optimization”?
This involves adjusting parameters across the tech stack stack (e.g., K8s pod resources, JVM or Node.js garbage collection options, cloud instances, etc.) Akamas unique full-stack capability allows it to optimize all the layers of the stack at the same time, including infrastructure-level and application-level configurations, adjusting them to minimize or maximize the optimization goal. This is key to ensure reliable configuration changes, as we explained here.

Goal-Oriented Optimization with SLOs and Constraints
Akamas’s optimization is goal-oriented, allowing users to define specific objectives and critical guardrails. Users can assign user-defined optimization goals, such as maximizing application throughput or minimizing cloud resource usage, or balancing throughput, response time, resilience, cost”. This capability fundamentally aligns performance engineering with business outcomes, moving beyond purely technical metrics.
Akamas continuously learns from system behavior and avoids configuration changes that might harm performance, costs, or availability. An AI optimizing without clear boundaries could lead to undesirable outcomes, such as cost reduction at the expense of severe performance degradation. By defining SLOs, Akamas is instructed to find the optimal configuration within acceptable performance boundaries. It maintains safety policies to prevent failures in production.
Performance Experiment Orchestration
Akamas runs performance “experiments”. It applies configurations, triggers tests, collects metrics, and evaluates results within a structured optimization loop. You can leverage existing load testing tool investments to generate the workload. Akamas acts as the brain, orchestrating the entire performance optimization cycle. Teams aren’t locked into a single testing methodology, but can leverage the best-of-breed tools for specific tasks. This is where Speedscale comes in.
Speedscale: Production Traffic Intelligence
Speedscale has a novel approach to API testing and load testing. It distinguishes itself by its ability to isolate individual microservices so each component can be tested and optimized before release. For a complex application this kind of tuning lets you ensure the overall experience is strong by building on a solid foundation.
Speedscale operates on a three-part process:
- Observe – Capture API requests and responses to get realistic data sets
- Transform – Prepare the data to be re-run in another environment
- Replay – Execute load tests and mocks against new versions of the code
By capturing and replaying sanitized production traffic, Speedscale can simulate the messy reality of production into staging (or testing environment), providing deep insights into application behavior under realistic conditions without the typical risks to live systems or sensitive data.
Realistic Environment Replication and Service Virtualization
Traditional load testing tools cannot easily replicate complex production environment responses with their dependencies. This requires an additional capability called Service Virtualization. This capability is built into Speedscale natively. It can automatically generate mock services from backend requests and responses captured during observation. To ensure data privacy and compliance during these realistic tests, there are robust data sanitization options like PII masking, header filtering, and custom scrubbers using regex or JSONPath-based rules.

These mocked responses are designed to be as real as production. They are automatically refreshed from running systems and require no application changes or network re-routing. This eliminates common maintenance headaches and potential result distortions associated with traditional mocks.
Speedscale basically creates an environment replication system that simulates the active parts of the runtime environment completely. This includes replicating a remote Kubernetes, Elastic Containers (ECS), or a virtual machine environment.
Service mocking decouples application code from downstream systems, such as large language models (LLMs) or databases. Code can be tested at scale without hitting rate limits or incurring per-transaction costs associated with real services. You can expect all of the traditional features of a full-featured Service Virtualization solution within Speedscale – including the ability to slow down the responses artificially to see how external service delays affect overall performance of the larger application. Advanced load shaping means you can adjust request volume, allowing users to multiply user traffic by factors like 2x or 10x of expected production load.
Solving this environment issue saves a lot of setup and configuration time for a load testing project.
Performance Insights and Reporting
The Speedscale Traffic Viewer can provide deep insights into API traffic and service behavior. It visualizes traffic flows in real-time, allowing detailed inspection of API request and response payloads, headers, and metadata for comprehensive debugging and analysis. It also offers powerful filtering and search capabilities to easily locate specific traffic patterns or anomalies.

After a replay, detailed reports provide key SRE golden signals: latency (measured using average, 95th, and 99th percentile), throughput (requests per second/minute), and saturation (CPU and memory utilization from the environment). An entire dashboard is devoted to error information.

The Traffic Viewer includes a dynamic Graphical Service Map that visualizes the interactions between services based on actual traffic, providing a clear and intuitive representation of dependencies, bottlenecks, and data flow. This real-time mapping helps pinpoint potential failure points or performance bottlenecks.
While Akamas can provide metrics from major Observability vendors as mentioned earlier, Speedscale also integrates with these products in stand-alone mode as well.
Integrating Akamas and Speedscale
The integration of Akamas and Speedscale creates a powerful solution that addresses many of the challenges of performance engineering in cloud-native environments.

Speedscale’s core strength lies in its ability to capture real production traffic and automatically generate realistic test cases and high-fidelity service mocks from it. It provides realistic dependencies without manual maintenance.
Akamas is the intelligent orchestrator of performance experiments. It can initiate Speedscale-driven tests as part of its optimization loop, continuously learns and recommends optimal configurations for Kubernetes resources, JVMs, Node.js applications, and other components.
Speedscale’s contribution here is important. The learning process within Akamas is dependent on accurate, real-world performance data derived from load tests. By replaying sanitized production traffic, Speedscale generates the necessary high-fidelity performance data, measuring critical metrics such as latency, throughput, error rates, and resource utilization (CPU and memory) under conditions that closely mimic live production.
Both products support continuous optimization within a CI/CD pipeline. Every code change can be automatically validated against realistic production traffic, providing immediate feedback to developers on performance regressions or improvements. Speedscale also supports test case versioning and rollback safeguards, preventing merges if replay tests degrade performance. Akamas can automatically apply its AI-recommended optimal configurations as part of the CI/CD pipeline as well.
The Future of Performance Engineering: Autonomous and Intelligent Systems
Performance engineering is clearly moving towards highly autonomous and intelligent systems, driven by advancements in AI and machine learning. While there is still a lot of hype, there is also some capabilities behind the hype. It’s moving so fast that, at some point, hype will reach reality.
This means the future of performance engineering is linked to the adoption of AI. The more complex systems get, the more of a need to go beyond the traditional human pace. AI-powered platforms can identify subtle patterns, and automate the generation of complex queries and solutions much faster than humans. This will accelerate analysis and troubleshooting and democratize the need for specialized expertise. The result will be a dramatic increase in team productivity, inviting more time to address innovation.
Shifting left was the first step – having developers test the performance of individual features with multiple virtual users. The next stage is embedding performance testing and tuning directly into the CI/CD pipeline, so that every code change is continuously validated against requirements using realistic conditions.
A continuous performance feedback loop driven by AI could eliminate manual guesswork and the endless trial-and-error approach. The goal is to make software releases more predictable and reduce the risks in deployment.
In the past, performance engineering was mainly accomplished by highly specialized experts. Low-code/no-code approaches and intuitive visual interfaces are simplifying test creation, allowing a wider range of roles to design and execute performance tests without extensive scripting knowledge.
There will always be a need for performance engineering specialists to apply the context needed to address the full spectrum of performance across the enterprise – but the more roles sharing the responsibility to engineer performance into their part and within their scope – the higher the quality and the better the outcome.
Conclusion
The integration of Akamas and Speedscale offers a compelling solution to these challenges. Combining these two platforms creates a powerful, closed-loop system for continuous performance engineering. Speedscale feeds Akamas with realistic workloads and performance data, enabling Akamas’s AI to make informed optimization decisions. Akamas, in turn, can orchestrate Speedscale tests and apply its recommended configurations directly within CI/CD pipelines, automating the entire performance lifecycle. This integrated approach shifts performance engineering from a reactive, bottleneck-prone activity to a proactive, continuous, and intelligent process.
The future of performance engineering is continuous, autonomous, and intelligent. AI and machine learning will continue to be enablers, democratizing expertise, accelerating analysis, and freeing engineers for higher-value tasks. Organizations that embrace these solutions will gain a significant competitive advantage, delivering superior service quality, enhancing resilience, and achieving substantial cost efficiencies in the dynamic cloud-native era.

