Gate API Internal Server Error! Attempt to Update Pipelines through Gate API with Authorization on


#1

I’ve been running a script to generate and upload pipelines via Gate’s /pipelines API. I recently updated the script to include the Authorization header and re-ran it to verify that it worked but I ended up getting the following message:

{
	"error": "Internal Server Error",
	"exception": "org.springframework.dao.InvalidDataAccessApiUsageException",
	"message": "MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.; nested exception is redis.clients.jedis.exceptions.JedisDataException: MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.",
	"status": 500,
	"timestamp": 1527879670908
}

I tried accessing the UI but am now also getting this message when the redirection happens during our authentication process.

I then SSH’d into the box and tried to run a hal deploy apply (with sudo) and got:

+ Get current deployment
  Success
+ Prep deployment
  Success
Problems in ecom:
- WARNING Version "master-latest-unvalidated" is not a released
  (validated) version of Spinnaker.
? Options include:
  - 1.5.4
  - 1.6.1
  - 1.7.6

Problems in halconfig:
- WARNING There is a newer version of Halyard available (1.2.0),
  please update when possible
? Run 'sudo apt-get update && sudo apt-get install
  spinnaker-halyard -y' to upgrade

+ Preparation complete... deploying Spinnaker
+ Get current deployment
  Success
- Apply deployment
^ Apply deployment
* Apply deployment
_ Apply deployment
- Apply deployment
^ Apply deployment
- Apply deployment
  Failure
Problems in Global:
! ERROR Failed to write config for profile install_packages.sh:
  /tmp/ff67eac8-62b4-45bf-9367-0e9fb6f9d9f0: No space left on device

- This mission is too important for me to allow you to jeopardize
  it.
- Failed to deploy Spinnaker.

I’ve looked at the redis logs as well and see the following output:

A  5776:M 01 Jun 19:36:21.058 * Background saving started by pid 30595 
A  30595:C 01 Jun 19:36:21.063 # Failed opening the RDB file dump.rdb (in server root dir /var/lib/redis) for saving: No space left on device 
A  5776:M 01 Jun 19:36:21.158 # Background saving error 

Is this somehow specific to my instance? I took a snapshot of our startup disk and it’s showing that it still has plenty of space left.

I included the full log for completeness above also to note that the version i’m on is currently master-latest-unvalidated due to some other bugs in Authorization that has been fixed since the 1.7.6 release (namely https://github.com/spinnaker/fiat/pull/225)

I can probably just restart the instance and hopefully that would fix things but I would love to first understand what happened and see if I can get some guidance on how to fix this.


#2

Are you running Redis on the same machine you are running Halyard on? It looks from the above messages like both Redis and Halyard are unable to write to disk due to space constraints.

The size of a disk snapshot won’t be a reliable indicator of how full the disk is—snapshots are both incremental and compressed so in general a snapshot will be smaller than the amount of space in use on a disk. I’d suggest running df -h to see how full your disk is—if it is full then resizing it should solve the issue, but it may also be a good idea to split some things to a different instance/disk (such as Redis) depending on what’s running there. I’m happy to help dig into this further and look into next steps.


#3

We are currently running halyard and redis on the same instance. We’re planning to move to GKE sometime in the near future. I am using a default 10GB startup disk and had not made any snapshots previously. Unfortunately I had to restart the instance to get it working again and cannot run that command at this time. Will keep it in mind if I run into the issue again. Is a different sized startup disk recommended?


#4

I’m less concerned about the disk space available to Halyard, as it’s fairly lightweight, but I do think that the 10 GB disk (and perhaps also the memory on the machine you’re using) might be a bottleneck for Redis. It looks from the error message above like Redis ran out of disk space trying to snapshot its database to disk.

I don’t have enough experience to say offhand what an ideal disk size/memory for a Redis machine supporting Spinnaker should be (and of course it will depend on the workload, etc.), but a 10GB disk seems small given the overhead of the OS, etc. Others may have a more precise recommendation, but my rough guess would be to start with somewhere around a 50 GB disk and 16 GB of RAM, then monitor how that goes in the actual workload you supply to it.

It’s also worth noting that by default Spinnaker keeps all of your execution history in the Redis cache, so the memory footprint only grows with time. There is some info here on how to configure Spinnaker to delete old executions to keep the memory usage under control:

Also, to make sure I understand your setup—you said Halyard and Redis are running on the same machine. Is that also the machine that all of the other microservices are running on? If so, then likely they are eating into the memory disk/space as well so my recommendations above should be increased. (Of course this would probably be temporary as you noted you’ll be moving to Kubernetes.)