May 20, 2021

Application Rewrites, Part I

20 years ago, Joel Spolsky claimed the single worst strategic mistake that any software company can make is deciding to rewrite code from scratch. Fast forward 20 years and it's still a terrible idea.

Congratulations! Against all odds, you have a product. You have customers who pay you real money to use your application. But, for some reason, your ARR has stalled and new logos just aren't coming in the door.

You talk with the team and everyone is conflicted about how to proceed.

Engineering says the product is fine, you just need to better educate users.

Marketing says that it's too hard to articulate the value proposition of the product.

Sales thinks that there is a silver bullet feature that will fix all the problems.

You consult board members for advice. They look at Moore's map and confidently proclaim that you have saturated the Innovators and Early Adopters, but to make it to the Early Majority you need to make the product easier to use.

You don't really know how to do this so you start talking to folks outside the organization to get a fresh perspective. You come across a compelling candidate who points to other products in the market and concludes that you need to rewrite the entire application from the ground up.

The new version will be more usable! It will be more flexible! It will more thoroughly expose the capabilities of the backend! It will drive operational improvements! Sounds compelling, right?!

Wrong.

Rewrites almost never go well so let's understand why.

On Rewrites

Back in 2000, Joel Spolsky wrote a classic blog post "Things You Should Never Do, Part I" where he argued that the single worst strategic mistake any software company can make is deciding to rewrite the code from scratch. When you decide to rewrite, you are effectively throwing out years of bug fixes, ensuring no new features get delivered for at least a year, and at least doubling the complexity of supporting your product.

Testing

Despite old code being ugly and messy, it has the advantage of having been used and tested:

The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive...Each of these bugs took weeks of real-world usage before they were found...When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work. – Joel Spolsky

By rewriting the application you are throwing out all that accumulated knowledge.

If there were barnacles on the bottom of the boat, you wouldn't get rid of the barnacles by blowing up the boat and building a new one would you? Of course not! Your code is the boat. Don't blow up the boat to get rid of the barnacles.

New Feature Development

When you decide to rewrite an application, the pace of new feature releases will slow down dramatically. This is because when you rewrite you ensure that there will be two versions of the product: V1 and V2. Your entire customer base is on V1 but you are now developing on V2.

As Spolsky notes:

You are putting yourself in an extremely dangerous position where you will be shipping an old version of the code for several years, completely unable to make any strategic changes or react to new features that the market demands, because you don’t have shippable code. You might as well just close for business for the duration. – Joel Spolsky

Consider the case of Asana and their rewrite of the Luna framework. The Asana engineering team started to encounter fundamental limitations in the way they had architected their application and realized that they needed to make dramatic changes in order to achieve performance gains:

We—and more importantly, our customers—had noticed performance degrading over time and Asana no longer felt like the desktop-quality application that our founders originally imagined. Several performance teams spent months improving different parts of the app. While each team achieved incremental wins, the fundamental drawbacks of the framework remained. We needed to attempt drastic measures to achieve our desired performance. – "Treating performance as a product: The technical story of Asana's arduous rewrite"

They heard anecdotes of start-ups both "failing to achieve their performance goals from rewriting" and "imploding from the [rewrite] effort" and ultimately decided to incrementally rewrite the application:

We noticed the rewrite-induced nightmares had a common theme. Each time, the company placed a big bet without unanimous support. They also tried to avoid running two frameworks at the same time. The natural consequence would be a “stop-the-world” re-write and no new product features. Our strategy allowed us to validate the new framework early and often. Each feature would enable performance improvements sooner and inform the project timeline. As opposed to creating a new “v2” application, we maintained the entire feature set. – "Treating performance as a product: The technical story of Asana's arduous rewrite"

If you decide to fully rewrite your application you are going to have two versions of the product which limits your ability to deliver new features to customers and greatly increases your risk of blowing-up your company.

Support

Related, a rewrite necessarily means you will have multiple versions in the wild and this means that you will have multiple versions of the product to support. In the simplest case, you have two versions: V1 and V2. As you're rewriting the new application you are going to have customers on both the old and the new version of the product which means that you have doubled the complexity of supporting your customers. In short, a rewrite increases the operational complexity of your business by at least a factor of 2.

Security

When you have two versions, the likelihood that you have a catastrophic security incident increases significantly. This is because while you're busy trying to get V2 to parity with V1, you're not going to be paying any attention to the older version. However, technology never sits still and as time goes on, the V1 of your product becomes a larger attack vector. Maybe the product is built on an end-of-life operating system. Perhaps all your alerting and monitoring tooling is built for the new product and you aren't even monitoring the old product. Either way, having two versions presents a security concern which should make you think twice about a stop-the-world rewrite.

Software and Complexity

It's cliché to say, but software is the definition of a complex system. In his essay "How Complex Systems Fail", Dr. Richard Cook of the University of Chicago runs through the way in which complex systems in the medical field can fail. However, this framework also has purchase when thinking about areas outside medicine.

Cook notes that "The complexity of these systems makes it impossible for them to run without multiple flaws being present." In other words, there are going to be bugs. "A corollary to the preceding point" he writes, "is that complex systems run as broken systems. The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws."

The takeaway is clear: your application is going to have flaws and as a general matter, it's always better to deal with the devil you know rather than the devil you don't. When you rewrite, you throw out years of accumulated knowledge around bugs/testing, you pause new feature development for at least a year, and you increase the operational complexity of your business by at least 2x.

Despite the seductive siren song of tossing your current codebase and rewriting your application, this remains the worst strategic mistake you can make. More prudent is to tie yourself to your messy, flawed codebase. Refactor and incrementally improve? Absolutely. But rewriting should be the last thing you consider.