We heard a lot buzz around spot instance at Re:Invent in terms of cost optimization, as previous post discussion. In this post, we will talk about running your stack on spot instance. It’s no secret that spot instance is about 80~90% cheaper than On-Demand instance. The downside is spot market fluctuation based on the supply vs demand, instances would be terminated from time to time. How is our stack cope with those interruption? We are going to write three series posts to share our journey.
Let’s take one simple case as example. It’s a traditional Spring application+activeMQ+worker with front-end and database. We used to host on everything on EC2 instances, database on RDS and other AWS services as shown in Picture 1.
The stack is a pretty common architecture. Everything were running multi-zone to improve the availability. Autoscaling groups can handle any web traffic spike. There are a few drawback with it if we’d like to take cost optimization to next level.
Three major reasons drove us decoupling the stack into micro-service architecture. One – we can split between core and minor service. Then we can focus core services high availability(HA), take minor services offline/online whenever they are needed. Two – all containers on the same host can share computing resources while they are still isolated at OS level. Three – some components can be replaced by AWS managed services that are much cheaper. There are lots of internet resources about dockerize application. We will free your imagination, here is the architecture diagram after refactoring. (Picture 2)
In additional to containerize our app, we also replaced some components with AWS managed service. We will talk about them later, it turns out to be a good way to not only save your cost but also reducing the complexity of your stack.
For a quick demo, you may checkout sample docker files at Github . Just following the README , you should get your local version work after running docker compose -f docker-compose-blog.yml up
Since you have your local environment up and running, now it’s time to move onto AWS. To save your time, we have done all the automation work in the cloudformation. You can do the same to save your cloud operation cost tremendously on repetition work. Checkout our cloudformation template here. Let’s talk a few 2016 Re:Invent cost saving take away before provisioning the stack.
Instance flexibility is the key in the spot market. In Picture 1, our instance requirement is kind of fixed. We don’t want to go for any thing less than m4.xlarge due to computing/bandwidth constraint or anything more than m4.xlarge due to cost concern, so we have purchased RI for all for them in the first year. After we converted everything to docker images and hosted on ECS, we can subscribe as many type instances (bigger than xlarge) as we could on spot market, it can be any generation or family like m1,m3,c4. The stack will be affected minimum when particular instance type fluctuate in spot market. Here is the reference on Re:Invent talking about instance flexibility.
As we all know, spot instance is 80%~90% cheaper than regular on demand instance. Our stack has the full flexibility on AMI instance type. As matter of fact, you can request most spot instances type in the market to join the ECS cluster through your autoscaling group. It’s actually very easy to do it cloudformation. You just need to add one line of code in launch configuration
SpotPrice: !Ref 'ECSSpotPrice'
Then your entire stack is running on top of spot instances. The autoscaling group will automatically bid on spot instance on your behalf. In normal case, you actually don’t have to go crazy about the spot price bidding. If your bid is over market price, you only pay the prevailing spot instance price.
Let’s upload the template onto cloudformation in your AWS account, choose a few parameters. It will take about 10 minutes to provision the entire stack including ECS,SQS,RDS,Security Group etc. It probably only cost less than $0.50 per hours for the entire stack. You can retrieve the application url at ELB endpoint.
We have decoupled api layer and worker layer into different autoscaling group in ECS. These layers can be scaled independently. Some of the services has been replaced with AWS managed service such as SQS,RDS,CloudFront, which we will dedicate one post for them in future.
GET /api/version
– sanity check
GET /api/work
– Sending message to SQS queue. Worker will pick up message automatically in backend as Picture 4.
GET /api/records
– Query total are processed records by worker.
Let’s quantify our saving per month, we will talk about cost measurement later.
We have saved about 80% total cost. We were able to provision our entire stack with cloudformation template, with approximately 10 minutes of dev engineer’s time.
There are still some of opening issues in the stack. What can we do if there is no specific spot instances available? How do we wrap up work on spot instance that will soon to be terminated. We may find answer in those sequential posts.
Please feel free to leave your comments on any of our post.
Reference: