#1 Why Good Software Architecture Matters: The Impact on Performance and Agility
Often, software architecture is inadequate for the problem it solves. More often than not, this leads to serious problems during the life of the application.
In recent years, I have encountered many problems in IT companies caused by unsuitable software architecture. What do I mean by unsuitable? In most cases, this is one direction – either it is too trivial or incredibly complicated in relation to the problem it is supposed to solve. Both cases lead to performance problems and stop the organisation from being agile.
What do I understand by software architecture?
Before I start talking about the problems, I would like to look at the definition of software architecture. For me, it is everything that goes into producing software – from the strategic analysis of a domain and its requirements to translating them into a technical solution. In other words, a holistic approach:
Your domain: understanding the business, its capabilities and processes
Its division: splitting it into subdomains, definition of its boundaries and interactions between them
Limitations and usage: how many customers do we have, how many DAU (daily active users) do we predict, how many requests can our app handle etc.
Infrastructure: how, where and what do I want to achieve here. Cloud, on-premise or hybrid? Do I need containers? K8s? API Gateway? Load Balancer? Cache? What type of database? How do I want to approach monitoring, logging, and telemetry? Do I need Stage environment or only Production?
Deployment strategy: would it be a single (modular monolith) or multiple deployment units (microservices)? If we decide on microservices, what are the reasons (disintegrators) that drive us to this decision?
Architecture and logical patterns: do I need an active record? Transaction script? Domain model? Service layer?
Test strategy: which tests do I want to perform? When? How often?
Release strategy: do I want to continuously deploy my application?
If you go through this list point by point, you will notice that there are plenty of other topics in each of them, which I will return to in a paragraph describing potential solutions to help you build better applications.
Too trivial
One of two major problems. If caught early enough, the impact will not be as great as in the case of overengineered architecture. The problem is that it is most often only seen when there are extreme issues with application performance and very large financial outlays to add new functionality. Additionally, the cost of maintenance starts to be high as well. In such cases, the most common solution companies opt for is refactoring from zero in another technology (a very bad idea) or using a group of consultants, often experts in the field (a very expensive idea).
How architecture can be too trivial? Imagine that you joined a clothing startup company as CTO. This is a great challenge in your career. You do not want to let anyone down, so you give your best to prove the MVP solution on time. There is one problem – you have to deliver the first results in 3 months. You skip business analysis (not a good decision), and you do not care about any calculations (how many users, how big data will you store, or whether it should scale or not – at that point in time, acceptable solution). You do not have any time to learn new things, so you choose from what you know (good decision) and just for a few customers that will come in the beginning (as well a good decision). Tests? We will add them later (a very bad idea).
At that point in time, all your design decisions are acceptable – after all, you need to test your product as soon as possible on the market. On one condition – in case of success and more interest, you start iteratively refactoring the application. And this condition is most often not fulfilled. Why?
Your first customers want more functionality. You have more and more work and fewer and fewer resources. It takes ages to hire and bring in more people – and you do not have time for it. You just add more features. And the hell begins.
In the MVP version, you had a table of Products. It had 5 properties (there are already mistakes of messing boundaries but at that point in time only theoretically):
Id
Name
Description
Price (oops)
Amount (oops for the 2nd time)
There are new requirements – we have to add type, material, and size. So, our products table grows. Let’s add it. Size can have different values – from S (small) to XXL (extra extra large). At some point, you will have a table with 80 properties – where some will and some will be not used, depending on the type.
After several months your company decided to sell as well shoes. Only, the shoes are totally different sizes than e.g. T-shirts. In the EU it is 36, 42, or 46, and in the US 9, 10.5, or 12. So you add special rules which are handled only for shoes or only for coats or for something else. As you have just one model (Product) it is getting harder and harder to keep it up. Your codebase grows as your team does – everything is out of control. Instead of focusing on how to divide your code into modules (to make your code independent from each other), you grow your application into a big ball of mud.
At some point your customers start to see 2 main issues:
The application gets slower (up to the extreme)
Changes, fixes, and new features are released too rarely
You want to react but it is already too late.
It is not possible to make the application faster as you have huge tables with a lot of complex data inside - of course you fight, you add slave databases for reads optimisation, you try to shard your data to optimise writes, maybe use cache – all in all, you complicate it more and more without thinking about proper modularisation, you need to scale the entire application as a single deployment unit even if the majority of it does not need to scale – and it takes time and resources.
Every change in one place affects the other place, so whenever you add a new feature or try to fix functionality, then a lot of bugs appear in other places. As you do not have tests (or improper) you cannot release fast – someone (or a group of people) has to manually do the regression each time.
Your application ceases to be used, which leads to financial losses, which in turn leads to the collapse of the company.
Too complex
The second problem, it is far more difficult to reverse than if the architecture is too simple. By the time this is noticed it is usually too late to make changes – for the reason that replacing, for example, one component causes gigantic costs. On top of that, in case we depend on other systems and our application is not deployed correctly, these problems will only pile up. In such cases, the only way is to replace elements in the system step by step - help from external consultants will be probably required here as well.
How architecture can be too complex? Imagine that you joined a corporation as a software architect. You are hired because there are no people who have experience in designing web systems. In the last few years, you heard about k8s, containers, microservices, Redis, NoSQL, Kafka, Domain-Driven Design, and many more. And now it is you and your team of other architects and developers who have the scepter of the decision in their hands. What do you do?
If you focus on domain analysis then great. The first step towards a good system is taken. You met with different people – future customers, stakeholders, operations, and so on. Performed workshops where you collected and analysed business requirements. It has been a tough few weeks, but in the end, we got there, you want to start building the system as soon as possible. So, together with your team, you decide:
Go with microservices
Containerise the application
Run it in k8s cluster
Add Redis
Add Kafka
Use NoSQL
Apply DDD in each microservice
Unfortunately, you skipped one of the most important steps – you did not count how much traffic you will actually generate. Let’s stop here and focus on some magic numbers. Imagine that your application has a forecast for 10 million users, 1 million DAU (daily active users) and each user must upload a profile picture (max. 1 MB). Additionally, we assume that every user will do 100 requests daily. We cover only the DACH region (Germany, Austria, and Switzerland). A lot, mhm? In fact, it is not a lot!
1 000 000 users * 20 requests/day (assume that an app-day is 16 hours, so 57 600 seconds) = 20 000 000 / 57600 seconds = 348 requests per second. In the peak, we can multiply it with e.g. 3, so around 1000. Usually, not complex infrastructure is capable of handling such amount of requests
10 000 000 users * 1 MB = 10 GB for profile pictures
And all above is just an optimistic forecast.
You spent a lot of time trying to prepare the greatest application possible, that every single microservice can scale and be published to k8s pod, and the data is optimised for reads (Redis) and writes (NoSQL). There are microservices where there is almost no business logic but domain model is applied. You use Kafka but you do not need to stream anything. In the end of the day, instead of 1 000 000 DAU you just have 10 000.
There are now a couple of problems:
When something does not work as expected, you have a lot of places to check
Everything was added from scratch – not based on the needs but willing to use and premature optimisation. In fact, you do not know if you need it or not, so in many cases, the system is too complex to maintain and the lack of knowledge is exposed
Anyone who joins your team(s) has a very high entry threshold
Costs of infrastructure could are increased by e.g. 70% in comparison to another, possible solution (but now it is too late to change it)
What you have achieved is certainly a nice modularization of the system – if the split to microservices was done correctly. Unfortunately, you introduced extreme complexity when it was not needed. This again will drive sooner or later to 2 problems from the subject:
Performance – your application starts to randomly fail (due to errors in different components about which you do not have full knowledge), and your team performs slower because of the complexity
Changes, fixes, and new features are released too rarely because you have to do parts of deployments and tests manually
Again, similar results to too trivial architecture occur – customers get angry and in the end, stop using your product.
What can you do?
There is no single, perfect recipe for making everything work as it should. However, there are some steps you can take to minimize the risk of the problems described:
Perform strategic analysis of a business domain: organise workshops, collect and analyse business requirements. You can use one or combination of the following methods – Event Storming, Domain Storytelling, Impact Mapping, Story Storming, Wardley Maps and many more
Divide your business domain into multiple subdomains: define a type for each: core (complex business logic, something that makes your company unique in the market), supportive (less complex, in-house or outsourced, nothing really unique), or generic (you can buy it as it exists in the market and fits your needs). Next, wrap them inside boundaries. Then decide where to apply the transaction script, active record, service layer or domain model
Calculate the traffic that your application will have to handle: the number of reads and write operations, database transactions, and amount of data to be stored, etc. Include peak calculations – I prefer 3x standard estimate, some people do 5x or more
Prepare the simplest infrastructure architecture that can handle it: no worries, if your application goes viral, then you will have a chance to use a lot of different frameworks, libraries, or components that are popular. Just start simple (KISS - Keep It Simple Stupid)
Choose deployment strategy: you can start with a single deployment unit (e.g. modular monolith) or multiple deployment units (e.g. microservices). If you decide on the second option, please find a clear reason(s) – is there any area that is less fault tolerant than others? Is there an area that requires increased security? Do you think there is an area that will change more frequently? Is there any area that will be used heavily from day one? Do you have multiple teams in different time zones?
Document your architecture and decisions: you can do it through the architecture decision log - it should lay as close to code as possible, e.g. in the form of .adoc or. md files inside the repository and C4 model
Write tests for your architecture: yes, it can be tested! In .NET you can use NetArchTest or ArchUnitNet, in Java ArchUnit.
Divide your work into vertical slices: try to deliver a feature or a usable part of it instead of first preparing the database, then the application layer, and then API. Think of a cake - you usually do not eat cream, bisquit and fruits separately
No or very short branches: if you follow trunk-based development with pair & mob programming, you will omit the need for PRs as the review will be handled by your partner during coding. If you don’t think that you are ready for this, follow short-living branches – e.g. each branch has to be closed within 4 hours from the start. This way you have small PRs, frequent changes and easy merge/rebase
Automate as much as you can: in the best-case scenario there would be no manual step required. This means you will need a good test strategy and run them automatically each time. Thanks to it, you will be able to release multiple times per day. If this is not possible due to some circumstances, try to release it every week or sprint
Release a new version only to part of your users:, then incrementally raise the number. You can use canary releases or blue-green deployments with feature toggles
Extendable and maintainable software architecture is extremely important because it has a significant impact on the performance and agility of a software system. Properly designed architecture can make a system more scalable, maintainable, and easier to modify, which can save time and resources in the long run.
And what about you? How do you approach software architecture?