We’re running Spinnaker 1.11.4 running on 1.10.11-gke.1 with halyard using the V2 providers and we’re using the “HA setup”. We have about 100 kubernetes v1 and v2 provider accounts in the same system.
Our clouddriver-caching pods seem to be getting OOM killed on about hourly basis during working days.
I’ve debugged the memory usage of clouddriver-caching for a while and noticed that 2GB heap is enough for the java process but it still keeps getting OOM killed by kubernetes.
I can see that there is one long running java process (the main process) and a varying number of kubectl commands and other java processes. Occationally the total memory usage goes over the 8GB limit when too many processes are running at the same time.
I’ve tried increasing the replica count and giving the pods all the resources I can but nothing’s enough.
What can I do to limit clouddriver-caching memory usage? I’ve been going through the source code to see if there are any useful parameters I could set, but haven’t found anything so far.
Anyone else experiencing this kind of behaviour? Any suggestions on what to try?
Thanks in advance for help