Also, Mercari has embraced a microservices architecture, currently managing over 1000 Deployments, each with its dedicated development team.
To effectively drive FinOps across such a sprawling landscape, it's clear that the platform team cannot individually optimize all services. As a result, they provide a plethora of tools and guidelines to simplify the process of the Kubernetes optimization for service owners.
But, even with them, manually optimizing various parameters across different resources, such as resource requests/limits, HPA parameters, and Golang runtime environment variables, presents a substantial challenge.
Furthermore, this optimization demands engineering efforts from each team constantly - adjustments are necessary whenever there’s a change impacting a resource usage, which can occur frequently: Changes in implementation can alter resource consumption patterns, fluctuations in traffic volume are common, etc.
Therefore, to keep our Kubernetes clusters optimized, it would necessitate mandating all teams to perpetually engage in complex manual optimization processes indefinitely, or until Mercari goes out of business.
To address these challenges, the platform team has embarked on developing Tortoise, an automated solution designed to meet all Kubernetes resource optimization needs.
This approach shifts the optimization responsibility from service owners to the platform team (Tortoises), allowing for comprehensive tuning by the platform team to ensure all Tortoises in the cluster adapts to each workload. On the other hand, service owners are required to configure only a minimal number of parameters to initiate autoscaling with Tortoise, significantly simplifying their involvement.
It happens shockingly often that applications only support working with a single replica and even worse when those applications cannot run concurrently with replicas of themselves which prevent smooth rolling updates.
IME if applications are fault tolerant of restarts, or support concurrent replicas then scaling up and down to meet demand is absolutely fine.
Also this reads like a cry for help:
> Therefore, to keep our Kubernetes clusters optimized, it would necessitate mandating all teams to perpetually engage in complex manual optimization processes indefinitely, or until Mercari goes out of business.