Component Sizing with Halyard


#1

Hi,

I had some trouble deploying Spinnaker to Kubernetes due to the host Node running out of memory (OOM killer and Kubernetes is not a happy thing!). I have configured my cluster to auto-scale based on resource requests/limits, but it turns out Halyard doesn’t set any of these by default (https://github.com/spinnaker/spinnaker/issues/1311), so after a little while of seemingly healthy operation, the Node eventually fell over in a heap.

So I looked for a way to specify resource requests and limits and found this link in the above issue report:

This is great!, I thought. I can experiment with various values until I get it right. So I edited ~/.hal/config and applied appropriate requests and limits for each Spinnaker component exactly as described in the doc and ran ‘hal deploy apply’. But when I looked at the resulting ReplicaSets, none of them had any resources set, so I’ve ended up in the same “creaking bridge” situation as before.

Does anyone know how to apply the config changes as specified in that document? I’ve read the docs many, many times looking for clues but am none the wiser.

Can anyone help?

Thanks,

Simon.


#2

Do you mind pasting the requests you have in your ~/.hal/config file? I’ve set resource requests via (for example)

  deploymentEnvironment:
    customSizing:
      spin-clouddriver:
        requests:
          memory: 1024Mi
          cpu: 150m

on GKE and it’s working for me.


#3

Oh! I found the problem. There was already a customSizing attribute defined in halconfig like this:

customSizing: {}

I just added my own and didn’t remove that one, so my values were overridden. All fixed now!

Incidentally, I’m now getting OOM errors so my requests and limits clearly aren’t quite right yet. Would you be happy to share the values that work for you, so I can use this as a starting point?

Thanks!

Simon.


#4

I’m happy to share what I have, but these are just numbers I’ve arrived at looking at the resource usage of my cluster I use for development. It should be sufficient resources for a single-user development set-up, but will likely be significantly under-resourced for any actual production uses:

spin-clouddriver:
  requests:
    memory: 1200Mi
    cpu: 450m
spin-deck:
  requests:
    memory: 64Mi
    cpu: 50m
spin-echo:
  requests:
    memory: 1000Mi
    cpu: 100m
spin-front50:
  requests:
    memory: 650Mi
    cpu: 100m
spin-gate:
  requests:
    memory: 650Mi
    cpu: 100m
spin-igor:
  requests:
    memory: 650Mi
    cpu: 100m
spin-orca:
  requests:
    memory: 1000Mi
    cpu: 200m
spin-redis:
  requests:
    memory: 256Mi
    cpu: 800m
spin-rosco:
  requests:
    memory: 650Mi
    cpu: 100m

#5

Thanks!

After a lot of experimentation (and with help from Heapster), I found I needed the following settings just to allow everything to start up without being killed:

customSizing:
  spin-front50:
    limits:
      cpu: 250m
      memory: 2Gi
    requests:
      cpu: 100m
      memory: 2Gi
  spin-clouddriver:
    limits:
      cpu: 1
      memory: 2Gi
    requests:
      cpu: 250m
      memory: 2Gi
  spin-clouddriver-bootstrap:
    limits:
      cpu: 1
      memory: 2Gi
    requests:
      cpu: 250m
      memory: 2Gi
  spin-redis:
    limits:
      cpu: 250m
      memory: 128Mi
    requests:
      cpu: 100m
      memory: 128Mi
  spin-deck:
    limits:
      cpu: 250m
      memory: 64Mi
    requests:
      cpu: 100m
      memory: 64Mi
  spin-gate:
    limits:
      cpu: 250m
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 1Gi
  spin-orca-bootstrap:
    limits:
      cpu: 1
      memory: 2Gi
    requests:
      cpu: 250m
      memory: 2Gi
  spin-igor:
    limits:
      cpu: 250m
      memory: 768Mi
    requests:
      cpu: 100m
      memory: 768Mi
  spin-rosco:
    limits:
      cpu: 250m
      memory: 768Mi
    requests:
      cpu: 100m
      memory: 768Mi
  spin-echo:
    limits:
      cpu: 250m
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 1Gi
  spin-orca:
    limits:
      cpu: 1
      memory: 2Gi
    requests:
      cpu: 250m
      memory: 2Gi

The docs said not to bother setting resources for the orca and clouddriver bootstrap pods, but I found this was necessary because they consume around 1-2GiB each.

I expect the above will need to be tweaked once I actually start using Spinnaker, but hopefully someone will find this helpful enough to get started.

Simon.