A Few Basic Principles For System Design (Part 3)
Building complex systems is never simple, but there are a few things you can do to make your life a bit easier. One of the hardest parts in designing these things is to make them testable, so that work tomorrow doesn’t break work today. I’ll go over a few concepts to make life easier when doing this.
Immutability
One easily applied hard-and-fast rule you can make for yourself and any code you write is to take advantage of immutability.
If we look up immutability, we see that it means something cannot be changed. Applied here, we mean to never mutate input. There are other ways to apply this, but the simplest rule is no mutation.
If we take the code:
int main() {
string to_print = "hello, world!";
modify_string(to_print);
cout << to_print << endl;
return 0;
}
we can see one glaring area of needed maintenance. If modify_string
’s operation should ever change, we need to be absolutely sure that our modifications don’t influence the program’s correctness outside of modify_string
.
Here’s a potential implementation of modify_string
:
void modify_string(string& input) {
for(int i=0; i<string.size(); i++)
if(string[i] == ',')
string[i] = '.';
}
This implementation overwrites the calling function’s understanding of the data, making this function and the caller very tightly coupled. If I am to call modify_string
from anywhere else in the program, I must be absolutely aware of the preconditions and postconditions of it’s execution, and replicate any setup procedures from the canonical use.
In short:
- Code duplication
- Tight coupling
- Difficulty optimizing
- High knowledge overhead
Let’s tweak this function to make it re-useable
int main() {
string to_print = "hello, world!";
cout << modify_string(to_print) << endl;
return 0;
}
string modify_string(const string& input) {
string copy = input;
for(int i=0; i<copy.size(); i++)
if(copy[i] == ',')
copy[i] = '.'; // input is a copy, so safe
return input;
}
Now, to_print never changes. If we need to mutate it, we just assign it back the result of modify_string
. No callers need to know the post-conditions of their inputs because the inputs don’t change. If the function has some precondition incantation that must be run to prepare it’s inputs, we can make an immutable function to do just that, and pass it in directly. Should modify_string
ever change, we are fully safe from the result in all callers and thus our maintenance burden has decreased significantly.
There are practically no situations where parameter mutation is beneficial for the maintenance overhead it creates. The one situation is memory pressure. Cloning an object over and over can drastically reduce performance, and many immutable-first languages optimize these clone operations out where possible. If you absolutely MUST mutate inside a function, make sure to be aware of the following conditions:
- Thread safety - the mutated parameter must not be shared, and if some property of it is mutated directly, re-factor your code to mutate that portion alone in a thread-safe manner.
- Conciseness - your mutation should be very small and explicit. Dozens of small mutations is far easier to reason about than one large multi-branch multi-loop recursive operation
- Unique - your mutation should be an operation that only happens in this specific call-chain. If it happens in many places, consider a class member that does the mutation for you and hiding the mutated data behind an explicit interface. This reduces the chance of inflicting duplicate code on other callers.
- Final - your mutation should be the last change that happens to this input, and potentially the last action that happens to this input. Callers of your mutating function shouldn’t rely on the function having side-effects or post-conditions on inputs for them to continue operating.
- Obviousness - many languages support the concept of out-bound parameters, C and C++ with
&
, C# withref
andout
. Consider annotating the value as explicitly being an outbound mutation so callers can know to expect it.
A strong point to remember is that every function you write is an interface you or someone else will have to deal with in the future, without a cup of coffee, with haranguing managers and impending deadlines, with a hangover, etc. Try to make your functions as trivial to understand and reason about as possible, and try to never make future you spend 3 hours stepping through code with the debugger to figure out what just happend.