Wait for Manifest to Stabilize in Deploy Manifest takes a long time

Hi all, I’m currently using Spinnaker to deploy a Helm chart in Kubernetes and I’m noticing that the “Deploy Manifest” stage spends the vast majority of it’s time on “Wait for Manifest to Stabilize”. What exactly is this stage doing? Also this only seems to happen when I’m redeploying to an existing namespace with a “long” history. Deploying to a fresh namespace does not have this issue. Is there any way to lower the time Spinnaker take in this stage?

Thanks in advance!

EDIT: I’m also using the “liveManifestsCall” option for kubernetes account

1 Like

Exactly the same on our side, 12minutes+ a deployment. Upgrade to v1.12.5 doesn’t solve the issue.

What type of manifest are you deploying? That stage is waiting for Kubernetes to report that the manifest is stable, the exact semantics of which depends on the particular type of manifest. For a deployment, for example, it’s waiting until the status.conditions reported by Kubernetes reports that the condition MinimumReplicasAvailable is true.

In general there’s some lag between when Kubernetes reports the update and when Spinnaker’s cache gets the update…though if you’re using the liveManifestCalls option then this is skipping the cache so you should be seeing updates as soon as they are reported by Kubernetes. (Note that you spelled the parameter wrong in your above post—just wanted to check that was just a typo and that you have it correctly configured.)

This is helpful, thanks @ezimanyi. For others, the liveManifestCalls skips the 12minutes long Spinnaker cache refresh delays. As the fix and explanation wasn’t well available for me searching google, below are the steps to enable this Spinnaker feature introduced in version 1.12.x.

Edit .hal/config and add liveManifestCalls property.

- name: default
      - name: ...
        liveManifestCalls: true

Make sure you pull the new halyard docker image to use the new halyard version,

hal deploy apply

With Spinnaker(1.17.0) on GKE, the ‘Force Cache Refresh’ step was running for ~5 - 12 mins and the ’ Wait For Manifest To Stabilize’ step would often fail with the error ‘WaitForManifestStableTask of stage {stage name} timed out after 30 minutes’.

this solution worked like a charm and deployment time is down to ~5 mins compared to ~40 mins earlier(if it had managed to succeed). thanks @ezimanyi & @Thomas_Nalevajko