#2 We Are Not Alone: DORA Metrics

Software delivery becomes more and more a critical competitive advantage for companies. However, it is often based on feelings rather than concrete values. It's time to change it.

Mar 30, 2024

In the past, I tried to introduce the culture of continuous deployments in different companies. Once with success, once it failed. Everything on so-called feeling. The most difficult part? Convincing the business.

Often, as developers, we are vulnerable when talking to business people. We focus on the technicalities instead of what business benefits our proposals can bring.

What can help us? Numbers.

One does not argue with numbers. They are facts that have been archived in some way, and then we can draw conclusions based on them. And it’s a priceless support.

Meet DORA metrics.

They are divided into 4 key areas:

Deployment Frequency - defines how often you deploy your code changes into your environment (in case of CD to production)
Lead Time For Changes - defines the average time between when a developer commits code changes and the resulting code being released into the “final” environment
Time To Restore Service - defines the average time required to restore service after an incident has occurred
Change Failure Rate - defines the percentage of failed deployments out of the total number during a specific period

That’s it. 4 metrics that can help you to be supported by concrete numbers.

Focusing on just one metric can be misleading. Let’s take into consideration 2 teams:

Team A releases their application to production 4 times per year
Team B releases their application to production 1000 times per year

We decide to look at the Change Failure Rate only. Team A was successful with 4 releases, Team B failed 40 deployments:

Team A = 0% of failed deployments
Team B = 4% of failed deployments

“Look, we told you that it makes no sense to deploy more often!” - members of team B might have heard.

Luckily, there was a smart guy in the room. “Let’s check other metrics” - he said.

So, now we are talking. First in line - Deployment Frequency:

Team A = 4 times to production
Team B = 1000 times to production

Team B released their code 250! times more than team A. But for their business people, it did not make any difference - both teams released some features, and both were successful. So, why to release more often? Here you will need another metric (not DORA one) that can show that based on the customer’s feedback/observations you did not spend money on useless features that were not needed.

We shift our focus to Lead Time For Changes. There you go, this metric is starting to work in favor of Team B and, above all, brings tangible benefits:

Team’s A average lead time - 1 week. It includes as well resources of testers, system engineers, and many more
Team’s B average lead time - 15 minutes. It includes only the resources of the infrastructure to run automated checks and validations. That’s it.

One week of several resources vs 15 minutes. The difference is huge. Let's remember that we are all intelligent people and such things will not escape our attention. In the end, it is a business - higher costs = lower profits.

It’s time for the last metric - Time To Restore Service. How long does it take teams to restore it?

Team’s A average time to restore the service - 3 hours
Team’s B average time to restore the service - 3 minutes

Thanks to automated validation, checks, and tests, team B can restore the service almost immediately. In comparison, team A has to look at the problem, manually check logs, and perform manual operations to restore it and in the end, it would take them 3 hours. This would significantly reduce customer satisfaction, especially when it repeats, and can affect our finances seriously - imagine the payment system is down for 3 hours during Cyber Monday.

How to implement it? Many solutions exist on the market but you can also add custom implementation on your own. Based on my experience, you can use Octopus Deploy, a combination of Argo, Prometheus and Grafana, or any other solution.

One last important thing - such metrics do not only help us in discussions with others but thanks to them we can mainly improve the way we work.

Do you use DORA metrics daily or not? I am curious as it looks like this concept is not well-known in the software architecture world.

Have a great weekend and hear you next week! :)

Fractional Architect

Discussion about this post