Chaos engineering is about creating experiments to reveal the weakness in a system. We know that sooner or later, a complex system will fail. We can inject failure in a controlled way to be able to gain confidence in our system.
Injecting failure we can check the system behavior. Observe whether the monitoring is working as expected or whether the UI keeps working as expected due to a backend failure. Although we can break things while doing this, the main goal is to build a more resilient system.
Before implement chaos engineering, it is important to collect metrics on the normal behavior of our system over time. With that metric, you can measure the effectiveness of the experiment. It can be a business metric like how many purchases we have. Or technical metrics like how many clicks or service invocations per hour, what the CPU load and so on. It´s also important that your system has monitoring and observability capability. It is how you know the system behaving when the failure is injected.
After that, you create the hypothesis you are going to prove or disprove. How will the system handle a specific event? We ask a question like “If we try this then”. i.e If the database is down then our front-end will still behave correctly.
It’s recommended to inject the failures to a small test group and not to all users. Once you have the metrics from the experiment you need to use the metrics to prove or disprove the hypothesis. Did the front-end break when the database does down? The lack of error handling, proper timeout or do not have proper fallbacks are common mistakes found.
Chaos engineering is powerful tool to build resilient applications. It exposes the weakness of your system and allows you to act on advance on these issues.
Check the Principles of Chaos Engineering and this nice podcast on this subject. If you use cloud-native technologies projects like Netflix Chaosmonkey helps you with the task of injecting failure in your application
If you know a tool for applying chaos engineering or know a history on how this technique was applied, drop a comment below.