It’s well-known since the seventies that premature optimization is root of all evil. This saying is so old that it was part of my CS education in the early 2000’s. (The best Hungarian tech universities are 10-20 years behind of the state of the art.) I believe Martin Fowler also wrote a few paragraphs about performance in his Refactoring book.
I’m bringing this up because during the last 5 years I saw 2 projects suffering from the very same thing. They’re successful projects but the maintenance efforts are higher than they could be. Before I draw the usual conclusion, let’s see the stories:
Story 1: Very efficient system integration
We were doing a porting project for a Big Globalized Company. They already had a SOA solution for this particular problem. They used this solution at a subset of their locations. However, they wanted an alternative solution that has same functionality on another SOA platform which had cheaper license fees and better performance. They intended to use this other solution on another subset of their location.
The original solution transformed and relayed messages from one system to another. After each step of this transform-and-relay process it stored the messages on the hard drive. It wasn’t going to be that hard to write something with shorter response times and better scalability, we concluded. However, we still wanted superb performance.
We were told to optimize the XSLs with using full paths from root instead of searching with //. (We, the team, didn’t say no, didn’t even question this idea.) So we wrote XSLs which were difficult to read, difficult to modify, and the search paths contained the same prefix all over the templates. We had to write super-fast BPELs no matter how similar the branches are. Using this optimization resulted so many duplications that we had to write scripts to modify BPEL.
Finally we started to measure performance. The bottleneck was in the SOA platform: it handled XSLTs and BPELs gloriously but it forced us to write our custom POJO logics as synchronous services. Unfortunately, our little custom logic was in the entry point in our message-handling pipeline. Thus to start processing the next message a thread had to wait until it could send the response for the actual message. Here it was: a bottleneck. This actual bottleneck was about two orders of magnitude higher than the small wins we had with all that hacking. A textbook example for premature optimization. Still, the performance was good enough because we didn’t store messages on the hard-drive.
Story 2: Very scalable scrambling server
Two years later I joined another company. They were in the scrambling business (A business I’m not telling which one. I’m using the word “scrambling” instead). They needed a very scalable, no, a wickedly scalable scrambling server which could handle 100.000 scrambling transactions per second. On the job interview they asked me if I had experience with software optimization projects. It should have rang a bell but I eagerly said yes. Of course I knew which pitfalls to avoid so I wouldn’t shoot myself on foot when optimizing.
They didn’t. I mean, they managed to optimize software by choosing the right libraries and by making a few trade-offs with functionality: They used the reactor pattern to deal with networking. They decided that it was enough to handle the scrambling metadata with such scalability: the “real” scrambling content was scaled with other tools. Aw, and they shot themselves on foot: They wrote very long methods (~500 lines was average); they used public fields so no accessor methods were necessary; all the constants were inlined to the method body, etc. They used no logger because, they said, it wasted 5 nanoseconds for each log message to decide if it was worth to log. I could go on, but I think you can already tell if it was a nightmare.
I shared my worries with my boss about unnecessary optimization versus code quality. I know that my communication skills could use a little polishing but I remember I was being very conscious about being polite. Nevertheless, he said that every optimization was necessary to manage 100.000 scrambling transactions per second. After this conversation I found myself working in a hostile environment. I witnessed how they doubled their performance when they fixed something around public key authentication. It felt like some sort of satisfaction that due to the obscure code it was very difficult to locate this otherwise subtle bug.
Story 3: A thick client with complex calculations
This is about a thick client that has to make very complex calculations for marauders – since the company is in the marauding business (you already know the deal). Once, when I implemented another new feature, my boss told me that it used too much memory and I had to make the memory footprint smaller.
After spending a day or two with measurements I concluded that memory footprint was mostly OK. We had problems with CPU instead: in a few cases CPU ran so heavily that the garbage collector had no chance to clean up the mess, so the thick client went out of memory. I also found out that a 3rd party library didn’t cache some calculations we thought it would. Once we understood that we decided to optimize that part of the solution. It made that part dirty. Very dirty. But the system didn’t run out of memory again.
Most of the system was readable. Not Like-Uncle-Bob-Clean, but readable, manageable and maintainable. There were a few modules though where I could not tell if it’s dirty because it was written in a rush or it’s dirty because it had to be fast. It’d be great if these distinctions were documented in the code.
Why is this third story different? How could we avoid the disaster? My take: we just wanted to add a new feature with a reasonable performance hit.
The usual conclusion
When you have to write a very fast software, that’s business as usual: you have poorly defined requirements. Actually, you have not-so-poorly defined requirements because your customer already pointed out a crucial non-function requirement. You know, NFRs are the ones you need to pay special attention to. Most likely you’ll have a bit easier job with explaining how much time you’re going to spend with design and analysis.
Usually, the biggest performance gains come with design: if you can relax the requirements then you have better chances to create better-performing software. Say for instance, you don’t need to persist the messages. Or it’s enough to provide metadata only. Also, you can think of the many NoSQL solutions: they all are wickedly scalable while their transactions are only eventually durable. Making these sort of decisions need some analysis: thinking, building and measuring PoC-s, re-negotiating requirements.
The second biggest performance gain is finding the right tools / architecture. This requires the same thing: thinking, building PoC-s and measuring a lot. You might need to talk to various field specialist at this point.
Then you’ll need to write flexible code. Right, you need to write a very fast system. Still, the rest of the requirements are subject to change. To keep the project going you’ll need flexible, good quality code. OK, if you work for a startup then you’ll need some code anyway, otherwise the competitors are eating your cake. There are many good posts about code quality in startups thing.
Time by time during development and maintenance (nowadays these two are less and less different) you’ll need to measure performance and tune it if necessary. If you find a bottleneck at a certain module / feature then it is the time to go gaga: You can build/dismiss caches, inline methods, you can use bitwise operators or anything else that makes it work more efficiently. You better document why those modules are the way they are. I don’t know yet how to make this documentation part of the code; nor do I know how to make these things reversible, like the inline keyword for C++. Please let me know if you have an answer.
Finally, it’s a good question when performance is good enough. Maybe it’s OK to reach a magic limit like 100.000 transactions. Maybe it’s enough to be 10 times faster than the legacy system to be replaced. I know it’s very difficult to add a hard limit before the development starts. It’s just important to keep the system maintainable, since the requirements will change.