Join Red Hat and Akamas practitioners for a conversation on what it really takes to run GenAI workloads efficiently on Kubernetes.

AI workloads are fundamentally different from the applications we’ve been running on Kubernetes for the past decade. Models consume GPUs that cost thousands of dollars per month. Agents burn through reasoning tokens around the clock. And the complexity of the stack, from model serving engines to GPU resource management, makes traditional performance tuning feel like a warm-up exercise.
This webinar brings together practitioners from Red Hat and Akamas to explore what it takes to run GenAI workloads efficiently on Kubernetes. We won’t pitch products. Instead, we’ll walk through the real optimization challenges that platform engineers and AI engineers face today: how to right-size GPU infrastructure, where to find cost savings in inference serving, and why agent tracing and evaluation frameworks matter more than most teams realize. We’ll also look at the cost-latency-accuracy triangle that makes GenAI optimization fundamentally harder than anything we’ve tuned before.
Whether you’re just deploying your first model on Kubernetes or already running agents in production, you’ll leave with a clearer picture of where the optimization opportunities are and which ones to tackle first.
Meet the speakers
Stefano Doni – CTO | Akamas
Stefano Doni is the CTO and co-founder of Akamas, where he leads the company’s vision for autonomous performance optimization powered by AI. With over 15 years of experience in performance engineering, he has worked on optimization projects for major national and international enterprises. In 2017, he shipped one of the first Kubernetes capacity optimization solutions on the market. He is a frequent speaker at industry conferences, including SREcon and CMG.
Daniele Zonca – Distinguished Engineer and Chief Architect | Red Hat
Daniele Zonca is a Distinguished Engineer and Chief Architect at Red Hat AI Engineering, leading the technical vision and strategy for AI offerings on Kubernetes. He co-authored the O’Reilly book “Generative AI on Kubernetes” and actively contributes to open-source optimization frameworks including vLLM, KServe, TrustyAI, and Kubeflow. Before Red Hat, he led the Big Data development team at UniCredit, designing large-scale analytical engines. He is dedicated to making AI optimized, safe, and reliable for enterprise deployments and is a frequent speaker at tech conferences.
Roland Huß – Distinguished Engineer and Architect | Red Hat
Roland Huß is a Distinguished Engineer and Architect at Red Hat with over 30 years of programming experience. He currently works as architect within Red Hat AI (RHAI), where he focuses on developing a platform for running securely agentic applications. He is a co-author of “Kubernetes Patterns” and “Generative AI on Kubernetes” (both O’Reilly), sharing his extensive expertise in cloud-native architecture, AI integration, and serverless innovation.
Watch the recording
Missed it live? Fill out the form to watch the full recording of the webinar.