Canary and "foolproof" install


#1

I am using kubectl apply -f https://spinnaker.io/downloads/kubernetes/quick-install.yml on our GKE cluster

Then I am enabling canary using following guide https://www.spinnaker.io/setup/canary/

  1. get into hal pod: kubectl exec $(kubectl get po -n spinnaker -l "stack=halyard" -o jsonpath="{.items[0].metadata.name}") -n spinnaker -c halyard-daemon -i -t -- bash -il

  2. `hal config canary enable
    hal config canary edit --default-metrics-store prometheus

hal config canary prometheus enable

hal config canary prometheus account add prom --base-url=http://$MY_PROMETHEUS_IP:9090/

hal config version edit --version canary-preview

hal deploy apply`

  1. setup port forwarding to deck(9080 -> 9000) and gate (8084)

  2. I get to see new canary menu on localhost:9080. yet then I try to save a configuration I get “The was an error saving your config: 400

Debugging deeper I see that

  1. both POST (then I try to save) and GET (I guess UI periodically tries to read config) fail with 400 on http://localhost:9080/gate/v2/canaryConfig
  2. gate log is full of HTTP 400 http://spin-kayenta.spinnaker:8090/canaryConfig

Any idea how make it save and try the canary functionality?

(one point is that I didn’t setup storage for kayenta as per this link I couldn’t figure out how to make it to accept Minio s3 interface which was setup by quick-install.)


#2

Maybe 400 error has nothing to do with storage. Maybe it is the actual metrics config? here is how UI looks like

And this is the kind of payload POST to http://localhost:9080/gate/v2/canaryConfig has (from chrome)


#3

So I scratched above deployment. created a fresh one and strickly followed quick start

The resulting canary part of the config (on the hal pod) looks like this (and I created /root/gcs-account.json with service account data)

However, now I have another issue. The Kayenta pod has the following error:
java.io.FileNotFoundException: /root/.hal/default/staging/dependencies/276027459-gcs-account.json (Permission denied)

logging into kayenta pod I see that /root folder is really locked to root
drwx------ 1 root root 4096 Apr 16 18:52 root

Maybe it is related to the following commit in kayenta i.e. user spinnaker is created instead of running as root. and we can’t really mound dependencies under /root folder anymore.


#4

Have you resolved this issue?
I’m seeing similar error.


#5

When you use Prometheus without GCS or AWS there is no storage provider per default. For testing I configured In-Memory storage (https://github.com/spinnaker/kayenta/blob/f704192fa8fb7beaf75d40c74ab85c680a404d33/kayenta-objectstore-memory/src/main/java/com/netflix/kayenta/memory/config/MemoryConfiguration.java).

This should solve the 400 issue you see on the UI as well.


#6

yes. version 1.7.0 was released a couple of days ago. it solved all those issues (bunch of other services were failing). now it works (at least for me in GKE)


#7

Thanks! FYI, got it to work with AWS too.


#8

I have upgraded to spinnaker version 1.7.1 but still, am facing the same issue 404 bad request. Any idea?

here is my canary hal config:

canary:

  • name: prometheus
    enabled: true
    accounts:
  • name: aws
    enabled: true
    accounts:
    • name: my-cannary-aws-account
      bucket: spinnaker-debian-repo-khamru
      rootFolder: kayenta
      supportedTypes: []
      s3Enabled: true
      reduxLoggerEnabled: true
      defaultMetricsAccount: prometheus
      defaultStorageAccount: my-cannary-aws-account
      defaultJudge: NetflixACAJudge-v1.0
      defaultMetricsStore: prometheus
      stagesEnabled: true
      templatesEnabled: true
      showAllConfigsEnabled: true

Here is the command I ran to add aws as default storage for canary:
$hal config canary edit --default-storage-account my-cannary-aws-account --no-validate


#11

is you not config canary metrics account ? command is :
hal config canary edit --default-metrics-account my-prom-canary-account

=== get config will be
canary:
enabled: true

reduxLoggerEnabled: true
defaultMetricsAccount: my-prom-canary-account
defaultStorageAccount: my-cannary-aws-account
defaultJudge: NetflixACAJudge-v1.0
defaultMetricsStore: prometheus
stagesEnabled: true
templatesEnabled: true
showAllConfigsEnabled: true

==== your current config is
canary:
enabled: true

reduxLoggerEnabled: true
defaultMetricsAccount: prometheus
defaultStorageAccount: my-cannary-aws-account
defaultJudge: NetflixACAJudge-v1.0
defaultMetricsStore: prometheus
stagesEnabled: true
templatesEnabled: true
showAllConfigsEnabled: true


#12

I have updated the defaultMetricsAccount as you mentioned. Receiving the below longs along with 404 bad request.

May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: 2018-05-02 15:23:49.353 ERROR 6214 — [ scheduler-2] c.n.k.index.CanaryConfigIndexingAgent : Problem indexing account my-prom-canary-account:
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: java.util.NoSuchElementException: No value present
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at java.util.Optional.get(Optional.java:135) ~[na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at com.netflix.kayenta.index.CanaryConfigIndexingAgent.indexCanaryConfigs(CanaryConfigIndexingAgent.java:128) ~[kayenta-core-0.1.0-20180420132832.jar:0.1.0-20180420132832]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65) [spring-context-4.3.14.RELEASE.jar:4.3.14.RELEASE]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-4.3.14.RELEASE.jar:4.3.14.RELEASE]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: at java.lang.Thread.run(Thread.java:748) [na:1.8.0_162]
May 02 15:23:49 ip-172-50-1-161 kayenta[6214]: 2018-05-02 15:23:49.369 INFO 6214 — [ scheduler-2] c.n.k.index.CanaryConfigIndexingAgent : Re-indexed canary configs in PT4.406S.
May 02 15:23:50 ip-172-50-1-161 kayenta[6214]: 2018-05-02 15:23:50.064 INFO 6214 — [ scheduler-9] c.n.k.o.controllers.PipelineController : Health indicators are still reporting DOWN; not starting orca queue processing yet: DOWN {redisHealth=UP {}, canaryConfigIndexingAgent=DOWN {existingByApplicationIndexCount=1, expectedByApplicationIndexCount=2, cyclesInitiated=1, cyclesCompleted=1}, diskSpace=UP {total=83204141056, free=67027787776, threshold=10485760}}
May 02 15:23:55 ip-172-50-1-161 kayenta[6214]: 2018-05-02 15:23:55.113 INFO 6214 — [ scheduler-7] c.n.k.o.controllers.PipelineController : Health indicators are still reporting DOWN; not starting orca queue processing yet: DOWN {redisHealth=UP {}, canaryConfigIndexingAgent=DOWN {existingByApplicationIndexCount=1, expectedByApplicationIndexCount=2, cyclesInitiated=1, cyclesCompleted=1}, diskSpace=UP {total=83204141056, free=67027783680, threshold=10485760}}
May 02 15:24:00 ip-172-50-1-161 kayenta[6214]: 2018-05-02 15:24:00.177 INFO 6214 — [ scheduler-3] c.n.k.o.controllers.PipelineController : Health indicators are still reporting DOWN; not starting orca queue processing yet: DOWN {redisHealth=UP {}, canaryConfigIndexingAgent=DOWN {existingByApplicationIndexCount=1, expectedByApplicationIndexCount=2, cyclesInitiated=1, cyclesCompleted=1}, diskSpace=UP {total=83204141056, free=67027771392, threshold=10485760}}


#13

sorry, i don’t know either . I have not successfully configured canary。 i am telled must be use aws or google to store canary config in slack . but my country can not access aws and google . :sweat:


#14

any update on this?


#15

After a lot of code reading I got minio to work!

  aws:
enabled: true
accounts:
- name: minio-s3
  bucket: spinnaker
  rootFolder: kayenta
  endpoint: http://minio-service:9000/
  accessKeyId: <azure-storage-account-name>
  secretAccessKey: <key>
  supportedTypes:
  - CONFIGURATION_STORE
  - OBJECT_STORE

s3:
enabled: true

Additionally you need to place your accessKeyId and secretAccessKey in the following file:
/root/.aws

The format for that file is:
[default]
aws_access_key_id = <key_id>
aws_secret_access_key = <secret_key>


#16

my hal version 1.7.5
i can not use you config,such as endpoint ,accesskeyId…can not use


#17

i have reslove this issue


#18

If you are using GCS, make sure your Google Service Account have storage.buckets.get access and write to existing bucket.

I’m having similar issue in Spinnaker 1.7.6.
Spinnaker UI only gives The was an error saving your config: 400 message. Then I try using Kayenta’s swagger docs and got this response:

  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "gcs-service-account@project.iam.gserviceaccount.com does not have storage.buckets.get access to spinnaker.",
    "reason" : "forbidden"
  } ],
  "message" : "gcs-service-account@project.iam.gserviceaccount.com does not have storage.buckets.get access to spinnaker."
}```

#19

I must say thank you for saving my life with this hints, it finally works for me on minio after using your script. Just 1 more hints to whom may have trouble with this in new kayenta releases, if placing key file under /root/.aws doesn’t work, try put them into your user’s folder, e.g /home/spinnaker/.aws