Duplicate code

Duplicate code is one form of technical form of technical debt that adds problems where you might not expect.

As a simple example, let’s take email address validation. Email address is the type of thing that seems simple, but is actually devilishly complex. As an example, here is a validator that conforms very strictly to the actual RFC, and is almost 7k of code, including explanations so you can figure out what it is doing.

I happen to know about this regex because it was written by a friend and coworker, Sandeep Tamhankar. He wrote it years ago just to prove a point. In the real world, almost no one ever uses Sandeep’s regex. They’ll invent their own version to validate email addresses, or they’ll avoid regular expressions entirely. The point is that there are a huge number of ways to do something very basic, like deciding if a piece of text is an email address or not.

This brings me to something that happened the other day. I was doing a push of some new code, and it required a migration of some data into a new model. Everything was going well, until the script stopped. The script was idempotent, so I knew it was safe to run again. As a first shot at solving the problem, I just re-ran it. No luck.

Now, these things happen. No problem. I dug into the migration, looked around at the code, and after a while, I found out that the new code to store the data was rejecting the record because it said that a field that should have been an email wasn’t.

This was particularly strange, because the data was coming from our own system, not an import from a third party. We already knew these were email addresses. There should have been no problem. But sure enough, there was an error staring me in the face saying “nope, it’s not an email”

I took a look at the record, and there definitely was something unusual. Over the past few years, there has been a proliferation in TLD (top-level domains) in use. We used to have good ol’ .com, .net, .org, .gov, .edu, and a bunch of 2-letter country codes. Now we have a lot more.

It turned out that in our system, there was more than one way to validate an email address. Some email addresses had made it into our system past one type of check, but was causing trouble as we tried to move it to another part of the system, because a different type of check was being done.

This highlights a risk from technical debt caused by code duplication. Nearly every software engineer has heard the expression “don’t repeat yourself” or DRY. Obviously our code failed that test (although this instance was quickly fixed so that we could finish the migration).

This is the impact of non-DRY code, and captures the link to technical debt. Something as simple as validating an email address stopped us from deploying new features and bugfixes, because we had to investigate and resolve something that happened because the second person wasn’t familiar with enough of the code to know that someone else had already solved the problem, and they could reuse the first solution.

What had happened if this wasn’t as simple as an email address? What if this was a complex system process?

The lesson? Software engineers should always be mindful of duplicated functionality, and when it is found, refactor it out into common modules whenever possible. The code ends up being cleaner, more maintainable, and increases uptime.

Unmanagability

On July 8, 2015, on an otherwise ordinary day, all United Airlines flights in the US were grounded, the NYSE computers crashed, and WSJ.com was down.

The thing is — it was an ordinary day.  Some computers crashed. Except they happen to be major computer systems, and it turns out they crash fairly often relative to their importance in our lives. One secret of the modern world is that technology lasts a lot longer than we give it credit for. Large chunks of the modern world still run using code that is older than Mark Zuckerberg.

Zeynep Tufekci wrote about July 8, 2015, and in a very eloquent way, having had hands-on experience working on these pieces of software that have been running for decades:

The big problem we face isn’t coordinated cyber-terrorism, it’s that software sucks. Software sucks for many reasons, all of which go deep, are entangled, and expensive to fix

It’s that simple — the modern world is tangled. Have you ever untangled a string of holiday lights? Imagine that, except there are about 100 strands all connected, tangled, and you have to untangle them without any lights going off.

This is why code stays online — a system gets written, reaches a reasonable level of stability, then gets layers build on top of it. Eventually, so many layers are running on top of it that no one is willing to touch the underlying system, because it could break everything that depends on it. After a long enough time, many of the original authors may actually have not just left the company, but actually passed away.

This is the complex modern world we live in. Don’t unplug those lights. Why the Great Glitch of July 8th Should Scare You by Zeynep Tufekci.

Choose Boring Technology

I lifted the name of this post from another blog post that I think did an excellent job of addressing the subject of tool choice. Think of this as a cover song.

New needs constantly emerge during software development. Taking an app from being installed on a single server, to handling large amounts of traffic or data, to adding data mining functionality — every new business challenge brings technical changes.

Historically, different companies have had similar technology needs at the same time, and tools show up in order to meet these needs. Typically, one team encounters a problem and builds a tool, then releases it to the public where other teams begin using it.

Actually, this process tends to happen multiple times in parallel. Today, jQuery is a de-facto web framework, but Prototype, MooTools, Dojo, YUI, and many others were created around the same time to address the increasing usage of Javascript in web pages. Any new technical challenge is likely to have many tools that address it.

Choosing new tools is an inflection point when it comes to technical debt. When Javascript libraries first showed up, it would have been remarkably hard to know that jQuery would be the winner. Reliably picking the longest-lasting tool from a new set of options should be thought of as being so hard that we should assume it is impractical.

Picking the wrong tool has serious implications for the pace of work that an engineering team is able to complete. The more people using a piece of technology, the easier it is to learn about, work with, and hire people who have skills working with it. The harder it is to learn, the slower you can work with it, and you’ll have more failures, and need to make more compromises in hiring new engineers.

These are all serious challenges that need to be dealt with, yet I’ve just told you that it is effectively impossible to pick winners out of tools that exist to solve leading-edge problems. Perhaps a different mindset is needed.

Instead of attempting to pick new tools, another approach is to assume that some new tools will need to be brought into the technology stack over time, that some of the choices will not be optimal. In this case, the goal is to ensure that this is done in a manageable way.

An engineer by the name of Dan McKinley spent over 6 years at Etsy. Etsy is a company known for having a high-performing engineering department, and this type of performance comes not only from the code, but from the philosophies of the team.

Dan has talked about one philosophy that comes from this culture — the concept of “innovation tokens”. The idea is that as you choose new pieces of technology to incorporate into a project, some pieces of technology cost you an innovation token.

Every team has a natural number of innovation tokens. Dan estimates that you should start by getting three tokens, and only getting extra tokens when the tech stack is sufficiently stable.

The way a token gets spent is when a decision is made to use a new and interesting piece of technology, instead of something that is well-known.

An example of this is database choice. At the moment, there are many databases on the market: PostgreSQL, MongoDB, CouchDB, Redis, Riak, and many others. Many of them are interesting, but if you want a truly reliable piece of technology, you choose MySQL. MySQL was released in 1995 and has been battle-tested. Everyone knows how to use it. It rarely fails, and when it does, there are standard procedures for recovering from those failures. Configurations and performance are widely known and predictable.

MySQL is not perfect, however, and the other databases exist to solve certain problems better than they could be solved with MySQL. However, they lack the battle testing and broad knowledge base that MySQL has. They have risk, and that risk will eventually show up with poor performance requiring unexpected changes to the code, slowing down feature development.

To choose a non-MySQL database is to spend an innovation token. By spending that token, you acknowledge that risk exists, and that there is a natural amount of budget in the engineering plan to handle unexpected situations, and that you are putting a limit on that budget.

One interesting social aspect to this is that the more rapidly technology is changing, the more likely people will want to use new tools, and that’s when it requires holding strong and not chasing a new trend for technology’s sake. Making a budget is easy, sticking to it is hard, but that’s how you stay out of debt.