Efficient Way of Cleaning up AWS CloudFormation Stacks

Efficient Way of Cleaning up AWS CloudFormation Stacks

A bit of context

There are many cases where we want to clean up cloudformation stacks. A few of them are:

  • Deleting stacks created by developers in the process of learning new stuff
  • Reduce cost generated by unused experimental stacks
  • Practice DR by periodically deleting/creating test environments

Deleting a large number of stacks manually is a time-consuming task. If the stack we are trying to delete has some dependencies, then we need to delete the dependent stacks first. We can't imagine doing this for an environment/platform consisting large set of stacks.

In my company, we have been following a process of creating ephemeral dev/testing environments. A single environment has more than 500 stacks and the number keeps growing daily. The requirement was to clean up the non-production environment and experimental stuff from the AWS account periodically. Also, we needed to filter the core infrastructural components during the clean-up so that the base AWS setup would be intact and we could just spin a new environment on top of it.

First, we tried AWS-NUKE library which was meant for the whole AWS account, not just CloudFormation, and would cover all regions. It served our purpose for some time. Sometime later, we introduced custom cloudformation resources that were linked across different AWS accounts and had to be deleted gracefully. So, the brute force strategy used by this library did not work.

Solution

First, we prepared a dependency tree of all stacks which gave us more context on how to approach the teardown. We found that many stacks had tight coupling and during deletion, we needed to respect such dependency for determining the deletion order. We went for a hybrid solution where we created a custom script to teardown our non-production environment stacks first and then used AWS-NUKE for deleting residual and experimental stuff in the account.

Stack Teardown Strategy

The strategy is simple. We start deleting stacks from the leaf nodes(with zero dependent stacks) which ultimately free up parent stacks and make them eligible for deletion as shown in the GIF below:

stack delete demo.gif

In order to clean up all stacks which constitute the "staging" environment, we followed the following steps:

  1. Scan all stacks with names starting from staging-.
  2. Scan imports/exports of all stacks and prepares a dependency tree. Now, we know which stacks have what dependencies.
  3. Send delete requests to all stacks which have no dependent stacks or importers i.e deleting leaf nodes as shown in the above GIF.
  4. Wait for some time to allow stacks to get deleted.
  5. Check the status of delete_in_progress stacks and mark them as deleted.
  6. Remove deleted stacks from lists of importer stacks so that new stacks become eligible for deletion as their dependencies have been already deleted.
  7. Repeat this process until all stacks are deleted.

Benefits of this approach

  • Fast teardown as we are not wasting time operating ineligible stacks and waiting for them.
  • No brute force, so minimum errors.

CFN-Teardown CLI

I have released an open-source CLI which follows the above deletion strategy with additional features.

Check it out and share your thoughts. Feel free to open an issue or contribute. github.com/nirdosh17/cfn-teardown

Did you find this article valuable?

Support Nirdosh Gautam by becoming a sponsor. Any amount is appreciated!