Managing Hundreds of Servers For Load Test: Auto-Scaling

On average we create about 100 servers to supply the load and around 150 servers for our service. All these servers need to create, delete, configure and run. For this, we use the same tools as in the prod, to reduce the amount of manual work:

To create and delete test environments — Terraform scripts;
To configure, update, and launch Ansible scripts;
For dynamic scaling based on load — house Python scripts.

Through scripts Ansible and Terraform, all operations from creating the instance to start the server, run all six commands:

#launch the desired instance in the aws console
ansible-playbook deploy-config.yml #update server versions
ansible-playbook start-application.yml #run our application on these servers
ansible-playbook update-test-scenario.yml --ask-vault-pass #update Jmeter script if it was changes
infrastructure-aws-cluster/jmeter_clients:~# terraform apply #create jmeter servers to supply the load
ansible-playbook start-jmeter-server-cluster.yml #run jmeter cluster
ansible-playbook start-stress-test.yml #run the test

Dynamic scaling of servers

During rush hour in production we have more than 20K online users simultaneously, and at other hours they may be 6K. It makes no sense to keep the full amount of the servers, so we set up autoscaling for board servers that are open Board at the time of entering on to them, and for API servers, which handle API requests. Now servers are created and removed when necessary.

This mechanism is very efficient in load testing: by default, we can have the minimum number of servers, and at the time of the execution of the test they will rise automatically in the right quantity. In the beginning we can have 4 board server, and in the peak — up to 40. At the same time new servers are created immediately, but only after I loaded the current server. For example, the criterion for creation of new instances can be 50% CPU usage. This allows you to not slow down the growth of virtual users in the scenario and not to create unnecessary servers.

The bonus of this approach is that the dynamic scaling we know how many we will need capacity for different number of users, which we have not yet been prod.

The collection of metrics on both the prod

For monitoring load tests there are many approaches and tools, but we went on its way.

Production we monitor the standard stack: Logstash, Elasticsearch, Kibana, Grafana, and Prometheus. Our cluster tests similar to prod, so the monitoring we decided to do the same as on prod with the same metrics. There are two reasons:

No need to build from scratch a monitoring system, it is already with us fully and immediately;
We further tested the monitoring of prod: if, during monitoring of the test we understand that we do not have enough data to analyze the problem, then they will not be enough in production when this problem will appear there.

What we show in the report

Technical characteristics of the stand;
The scenario described by the words, not code;
The result is clear to all team members, both developers and managers;
Graphics General condition of the stand;
Charts that show a bottleneck or something that was affected by the optimization, check the in the test.

It is important that all results were stored in one place. So they will be convenient to compare with each other from run to run.

The reports we do in our product (an example of the Board report):

Report generation takes a lot of time. So our plan is to make non-personal information collected automatically with the help of our public API.

Infrastructure as code

We have quality product not meet QA Engineers and the whole team. Load tests — this is one of the tools of quality assurance. Cool, if the team understands that it is important to check under load changes that it made. To begin to think about it, she needs to be responsible for work production. Here we help the DevOps principles of culture which we started to implement in the work.

But start to think about load tests is only the first step. The team will not be able to consider the tests without understanding the device operation production. We faced the same problem when I started to customize the process of conducting load tests in teams. At that time, the teams had no way to understand the device production, so it was difficult for them to work on the design of the tests. There were several reasons: the lack of relevant documentation or one person who kept in mind the whole picture of the prod; fold increase of the development team.

To help the teams to understand the product, can approach to Infrastructure as code, which we began to use in the development team.

What are the problems we have started to solve with this approach:

Everything must be scripted and can be raised at any time. This significantly reduces the recovery time of prod in case of emergency the data center and allows you to keep the right amount of relevant test environments;
Reasonable economy: an expandable environment for Openstack when it can replace expensive platforms like AWS;
Commands create load tests, because they understand the device of the prod;
The code supersedes the documentation, so there is no need to endlessly update, it is always complete and up to date;
Does not need a separate expert in a narrow field, to solve ordinary tasks. Any engineer can figure this out;
Having a clear structure of the prod is much easier to plan research load tests, like chaos monkey testing or long test on memory leaks.

This approach wants to extend not only on infrastructure but also on the support of different tools. For example, test database, about which I talked in the last article, we have fully transformed into the code. Because of this, instead of the pre-prepared sites, we have a set of scripts, which in 7 minutes we get the configured environment in a completely empty uchetku AWS and you can proceed to test. For this reason, we now carefully consider Gatling, which the creators positioned as a tool for “Load test as code”.

Approach to infrastructure as code entails a similar approach to its testing and scripts that writes the command to lift infrastructure new features. All this should be covered by tests. There are also various framewrk for testing, for example Molecule. Tools exist to chaos monkey testing for AWS is paid tools for Docker is Pumba , etc. They allow to solve different types of problems:

As the fall of one of the instances on AWS to check pereplaniruemoe if the load on the remaining servers will survive if the service from such a sharp redirection of requests;
How to simulate slow network performance, interruption and other technical problems, after which it must not break the logic of the infrastructure service.

The solution of such problems we have in our immediate plans.

Insights

Not worth wasting time on manual orchestration infrastructure test, it is better to automate these steps to securely manage all environments, including prod;
Dynamic scaling significantly reduces the cost of maintenance of products and large test environment, and reduces the human factor in the scaling;
You can not use a separate monitoring system for tests, and take it from prod;
It is important that the load test reports that are automatically gathered in a single place and had a unified form. This will allow them to easily compare and analyze changes;
Load tests will be the process in the company, when the teams will feel responsible for the work of the prod;
Load tests infrastructure tests. If the load test is successful, it is possible that it was drawn incorrectly. To validate the correctness of the test requires a deep understanding of the work of prod. The teams should be able to understand the device operation of the prod. We solve this problem by using an approach of Infrastructure-as-code;
The scripts prepare the infrastructure also require testing like any other code.