Holistic Software Engineering

All organizational problems in software development can be traced back to the erroneous assumption that developing software systems is like developing other large, complex things, like buildings or cars.

In a company producing cars, for example, the process is divided into two roles, broadly speaking. There are people who design the cars, creating blueprints and specifications as to how the engine components and various features will fit together functionally — let’s call them designers — and then there are the people who actually assemble the cars — let’s call them assemblers.

The assemblers need specialized skills to operate the various tools used to put together the components at the assembly plant. But they don’t have to know how cars work. They certainly don’t need a mental model of how a modern vehicle is designed. They are mostly concerned with the actual assembly process, how to make it more efficient, how to avoid costly accidents and delays in the assembly process, and so on.

The designers, on the other hand, have a completely different skill set. They have a deep knowledge of what kind of engine layouts work, what effect changing certain components have on the overall performance of the car, and so on. They talk to executive leadership about what kind of designs are selling, learn about performance improvements that can be made, and respond to problems in the design by producing new designs.

We also have a middle tier of managers that serve as connective tissue between these groups. They help to keep communication working smoothly, resolve issues with employees, and identify needs for new hires as the production line grows.

(OK, this is probably not exactly how automotive design works, but bear with me!)

Divergent Levels of Understanding

It’s tempting to think that building something like a mobile app works the same way. Perhaps you’re shaking your head now as you read this article, thinking to yourself, “Of course that’s how it works!”

In software, it’s quite common to have the assembler and designer roles separate, especially in corporate environments. From a hiring standpoint, this makes sense.

Let’s say we are developing a new mobile app front end to our back end system. We can hire an off-shore team of contractors to put together an iOS app, and throw in a senior back-end engineer from the main office to help direct them on Zoom calls once or twice a week. This might help us keep our personnel costs down, right?

Already we’ve split the task of building software into two roles. Now we have a senior engineer who’s really doing the design phase of the project and a group of assemblers who are actually doing the programming.

Let’s say we also have a product manager to oversee this project. The PM knows how the mobile app needs to work.

Notice how divergent the different levels of understanding of the project have become. We have different humans handling three distinct levels of understanding:

  1. What is the project supposed to do?
  2. What is the best approach to meet these requirements?
  3. How are we going to implement this best approach?

Roles Within Software

The people who understand the product and what it needs to be able to do in the long term have titles like product manager or software engineering manager. They usually don’t write code or implement the design. Sometimes they have development experience, but not always. These roles have a level 1 understanding of the project.

Even software architects frequently spend little time implementing designs. Usually, they’re concerned with making sure the team has the right approach, and they have to justify that approach. Sometimes a similar role is taken by lead engineers who are responsible for guiding the team toward an implementation. These are your level 2s.

And, of course, there are often several engineers working on implementing the design. Usually when we say engineer we mean someone writing application code to be deployed either to a server somewhere or delivered to a browser in a bundle of compressed JavaScript. These folks are deep in level three.

But there are also those tasked with ensuring the environment where the application actually runs is healthy. For maddening reasons, we call them devops engineers. In a past life, these brave souls might have been called sys admins. If they’re working in a cloud provider, like Amazon Web Services, we’ve sometimes taken to calling them cloud engineers because their job is to provision cloud resources for the team. These ops roles are also level three.

Split Brain

In our zeal to make software development more efficient, I think we’ve erroneously taken this metaphor of the assembly line and applied it to software.

Instead of making things more efficient, we split the complete understanding of the project, so when we make changes in the system, we now have to begin at level one, and translate the requirements all the way down to level three.

Dare I ask, is there a better way? Let’s let our imaginations go wild for a moment. I give you the holistic software engineer.

A Holistic Software Engineer

holistic software engineer is capable of:

  • Designing a system from beginning to end, taking into account various tradeoffs.
  • Administering a working system, be it in the cloud or on-premises.
  • Communicating these concepts to non-technical staff outside the team, e.g. executives or sales.
  • Understanding the need that the project fills for the user, whether it be an internal user like an executive or an external customer.

In other words, the holistic software engineer has a complete understanding of the project — from its purpose right down to its implementation details.

They can speak to the various decisions that went into designing the project. They can quickly iterate on new designs when the implementation in the “assembly” of the software runs into performance problems. They can also understand how decisions impact users of the system, and what aspects of the system’s performance would be most important to its users.

The most useful contributors on software projects are the ones that have this holistic understanding. They can bridge the gap between the different levels that the organizational strategy has created.

They could come from either direction. They might be a technical person who has gone out of their way to understand the product side of things.

Or they could be a product person who has done a lot of face-to-face work with engineers to understand how the system was designed and what its limitations are.

Whichever direction the holistic software engineer comes from, they’re useful because they’ve started to re-compose the complete understandingthat was lost when we split roles along with levels of understanding.

A Challenge

I challenge you to think about your own role in these terms and try to take on a more holistic role.

Are you working at level one, with a solid understanding of the product but no idea how to design or implement it? Try learning some technical skills so you can understand what the engineers mean when they say that service X needs to call service Y.

Or are you working at level two, where you sketch out designs of systems but cannot be bothered to iterate on those designs when your offshore team runs into problems? You’d be a better architect if you dove into the code and had a solid grasp of where the product is heading.

Perhaps you’re deep in level three, heads down in code, but not really aware of how the project might change and grow in the future. I would challenge you to think of yourself not as a “coder” but as someone with a role in structuring the implementation to allow the design to evolve to meet future product needs. Talk to product and ask questions to fill in the gaps in your understanding.


We can all improve in our craft if we have a little more understanding of the concerns of the people we interface with at work. I present this concept of holistic engineering as a way of getting at the kind of empathetic approach that I find works best in software teams.

I think that the Agile movement was an attempt to re-constitute this holistic engineer. Processes like point estimation, storyboards, and so on are just formalized ways of communicating technical challenges to management and communicating product requirements to engineers.

Whatever system we use, the end objective is the same: engineers and product experts working together toward a complete understanding of the project, so that progress can be made. Try to understand the whole picture and you will be more useful as a contributor, no matter what your role is.

The Surprisingly Simple Solution for Streaming Data

Previously, I’ve covered why functional programming provides the conceptual foundation for the big-data problems at the heart of software engineering today.

Now, let’s take a look at how these functional concepts have been applied to building a type of big-data data structure called a stream.

What Do We Mean by Stream?

What is a stream, exactly? It’s an ordered sequence of structured events in your data.

These could be actual events, like mouse clicks or page views, or they could be something more abstract, like customer orders, bank transactions, or sensor readings. Typically, though, an event is not a fully rendered view of a large data model but rather something small and measurable that changes over time.

Each event is a point of data, and we expect to get hundreds — or even millions — of these events per second. All of these events taken together in sequence form our stream.

How can we store this kind of data? We could write it to a database table, but if we’re doing millions of row insertions every second, our database will quickly fall over.

So traditional relational databases are out.

Enter the Message Broker

To handle streaming data, we use a special piece of data infrastructure called a message broker.

Message brokers are uniquely adapted to the challenges of event streams. They provide no indexing on data and are designed for quick insertions. On the other end, we can quickly pick up the latest event, look at it, and move on to the next one.

The two sides of this system — inserts on the one end and reads on the other — are referred to as the producer and the consumer, respectively.

We’re going to produce data into our stream — and then consume data out of it. You might also recognize this design from its elemental data structure, the queue.

An In-Memory Buffer

So now we know how we’re going to insert and read data. But how is it going to be stored in-between?

One option is to keep it all in memory. An insert would add a new event to an internal queue in memory. A consumer reading data would remove the event from memory. We could then keep a set of pointers to the front and end of the queue.

But memory is expensive, and we don’t always have a lot of it. What happens when we run out of memory? Our message broker will have to go offline, flush its memory to disk, or otherwise interrupt its operation.

An On-Disk Buffer

Another option is to write data to the local disk. You might be accustomed to thinking of the disk as being slow. It certainly can be. But disk access today with modern SSDs or a virtualized disk — like Amazon’s EBS (Elastic Block Store) — is fast enough for our purposes.

Now that our data can scale with the size of the SSD, we can slap on our server. Or even better, if we’re in a cloud provider, we can add a virtualized disk to scale as much as we need.

Aging Out of Data

But wait a minute. We’re going to be shoveling millions of events into our message broker. Aren’t we going to run out of disk space rather quickly?

That’s why we have a time to live (TTL) for the data. Our data will age out of storage. This setting is usually configurable. Let’s say we set it to one hour. Events in our stream will then only be stored for one hour, and after that, they’re gone forever.

Another way of looking at it is to think of the stream on disk as a circular buffer. The message broker only buffers the last hour of data, which means that the consumer of this data has to be at most one hour behind.

Introducing Your New Friend, Apache Kafka

In fact, the system I’ve just described is exactly how Apache Kafka works. Kafka is one of the more popular big-data solutions and the best open-source system for streaming available today.

Here’s how we create a producer and write data to it in Kafka using Java.

Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");KafkaProducer kafkaProducer = new KafkaProducer(properties);
kafkaProducer.send(new ProducerRecord("mytopic", 0, "test message")); // Push a message to topic "mytopic"

Now on the other side we have our consumer, which is going to read that message.

Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("group.id", "mygroup");
KafkaConsumer kafkaConsumer = new KafkaConsumer(properties);List topics = new ArrayList();
kafkaConsumer.subscribe(topics); // Subscribe to "mytopic"ConsumerRecords records = kafkaConsumer.poll(1); // Get back the next record, blocking until it's available

There are some details here that are specific to Kafka’s jargon. First of all, we have to point our consumer/producer code to the Kafka broker and configure how we want it to transfer data back and forth out of the topic. Then, we have to tell it what sort of data to fetch by specifying a topic.

topic is essentially the name of the stream we’re reading and writing from. When we produce data to Kafka, we have to specify which topic we’re writing to. Likewise, when we create a consumer, we must subscribe to at least one topic.

Notice we don’t have any commands in this API to modify data. All we can do is push a ProducerRecord and get the data back as ConsumerRecords.

That’s all well and good, but what happens in-between?

It’s All About the Log

Kafka’s basic data structure is the log.

You’re familiar with logs, right? If you want to know what’s happening on a server, you look at the system log. You don’t query a log — you just read it from beginning to end.

And on servers with a lot of log data, the data is often rotated so older logs are discarded, leaving only the recent events you’re most likely interested in.

It’s the same thing with Kafka data. Our stream in Kafka is stored in rotating log files. (Actually, a single topic will be split among a bunch of log files, which depending on how we’ve partitioned the topic.)

So how does our consumer of the data know where it left off?

It simply saves an offset value that represents its place in the stream. Think of this as a bookmark. The offset lets the consumer recover if it shuts down and has to resume reading where it left off.

Now We Have Live Data

Now that we have our data in a live stream, we can perform analysis on it in realtime. The details of what we do next will have to be left for another article.

Suffice to say, once we have the data in a stream, we can now start using a stream processor to transform the data, aggregate it, and even query it.

It’s Functional

As we’ll see is the case in many big-data systems, Kafka uses the functional model of computation in its design.

Note that the data in Kafka is immutable. We never go into Kafka data and modify it. All we can do is insert and read the data. And even then, our reading is limited to sequential access.

Sequential access is cheap because Kafka stores the data together on disk. So it’s able to provide efficient access of blocks of data, even with millions of events being inserted every second.

But wait a minute. If the data is immutable, then how do we update a value in Kafka?

Quite simply, we make another insertion. Kafka has the concept of a key for a message. If we push the same key twice and if we enable a setting called log compaction, then Kafka will ensure older values for the same key are deleted. This is all done automagically by Kafka — we never manually set a value, just push the updated value. We can even push a null value to delete a record.

By avoiding mutable data structures, Kafka allows our streaming system to scale to ridiculous heights.

Why Not Data Warehousing?

On first glance, a stream might seem like a clumsy solution. Couldn’t we just design a database that can handle our write volume — something like a data warehouse — and then query it later when processing?

In some cases, yes, data warehousing is good enough. And for certain types of uses, it might even be preferable. If we know we don’t need our data to be live, then it might be more expensive to maintain the infrastructure for streaming.

The turnaround time for processing our data out of a data warehouse will be slow, delayed by hours or perhaps days, but maybe we’re happy with a daily report. Let’s call these solutions batch processing systems.

Batch processing is more commonly found in extract-transform-load (ETL) systems and in business-intelligence departments.

The Limitations of Batch Processing

There are lots of cases where batch processing isn’t good enough.

Consider the case of a bank processing transactions. We have a stream of transactions coming in, showing us how our customers are using their credit cards in real time.

Now let’s say we want to detect fraudulent transactions in the data. Could we do this with a data-warehousing system?

Probably not. If our query takes hours to run, we won’t know if a transaction was fraudulent until it’s too late.

Where Can I Use This?

Streaming isn’t the solution for every problem. For certain types of problems that are becoming increasingly relevant in the modern world, these are critical concepts, but not everyone is building a realtime system.

But Kafka is useful beyond these niche cases.

Message brokers are crucially important in scaling any system. A common pattern with a service-oriented architecture is to allow services to talk to one another via Kafka, as opposed to HTTP calls. Reading from Kafka is inherently asynchronous, which makes it perfect for a generic messaging tier.

Look into using Kafka or a similar message broker (such as Amazon’s Kinesis) for streaming your data any time you have a large write volume that can be processed asynchronously.

A messaging tier might seem like overkill if you’re a small company, but if you have any intention of growing, it’ll pay dividends to get this solution in place before the growing pains start to hurt.


As we’ve seen, functional-programming concepts have made their way into infrastructure components in the world of big data. Kafka is a prime example of a project that uses a very basic immutable data structure — the log — to great effect.

But these components aren’t just niche systems used in cutting edge machine-learning companies. They’re basic tools that can provide scaling advantages for everyone, from large enterprises to quickly growing startups.

The Limitations Of Automated Testing And What To Do About It

When you’re developing a software system, you know you should write automated tests. That’s a given.

Why should we write tests? Some common answers given:

  • Tests will catch bugs.
  • Tests improve the quality of the code.
  • Our senior architect said we should.

In this article, I’d like to shift the perspective a bit. I’ll look at automated testing from a pragmatic angle, analyzing the different types of tests we might write, and also highlight some of the limitations inherent in automated testing.

Why Test At All?

Why do we bother to test our system at all? The motivation is quite simple.

We want to identify and locate defects in our system.

This statement sounds glib, but it’s actually crucial. We assume there are defects in our system. We do not write tests to reassure ourselves that we are great programmers and to pat ourselves on the back for being so wonderful. We write tests to find the bugs we know are hiding in our system, despite our best efforts.

Defects can be syntax errors, or they can be incorrect implementations of a vendored service or a database client, or brain twister mistakes in multi-threaded programs. The list goes on and on.

In any case, a defect is a case where the program doesn’t do what the programmer intended.

Naturally, we want to identify these cases in our system. How can we best go about that?

Code Quality As A Metric

Organizations often lean toward metrics like code coverage as a way of proactively encouraging testing and maintaining a certain standard across teams.

Code coverage tools work by parsing code into decision trees, then running through it alongside the test code and generating a numerical rating based on how much of the decision tree was visited during the test.

We end up with a number, like 78.98%. That looks nice to put in reports and on slides.

But what does it mean? How do we know if we have “enough” code coverage? Is 80% too low? What about 90%? Does it depend on the project?

And what about the quality of the tests? Is it possible that we’re visiting every logical branch in the code but checking something trivial or unimportant at each step of the way?

It’s a well-meaning effort, but I don’t like code coverage metrics. I think it prevents us from seeing the forest for the trees.

Tests Are Just More Code

Automated tests are just code. And like all code, they can contain bugs and mistakes. Buggy tests may conceal real bugs in the code we are testing.

For example, say we test a simple algorithm. When we call it with a certain value x=10, it should return 100. But it actually returns 99.

But when we wrote the test, we were confused. So we tested that the algorithm should return 99 when we call it with x=10.

Congratulations! We now have 100% test coverage of this module.

But our test is wrong. Not only is the test wrong, it is hiding the bug, which is the opposite of what we want it to do.

Writing Tests Requires Context

I do not recommend having QA or test engineers write tests instead of developers. I have seen this practiced in some organizations and it never works out.

Only the developer who wrote the feature knows the nuances and potential snares in the system. They alone are equipped to test it properly.

Furthermore, separating the task of writing feature code from writing test code results in test engineers having to rewrite code produced by developers, or else accept that certain modules are untestable. This is because test code often needs to stub a function return value, inject a class, or do other runtime substitutions which require the feature code to be properly structured.

We wouldn’t want to have the person building the space shuttle to have no hand in testing its operation, would we? And we wouldn’t want to have the testers go into the engine room and rewire the oxygen to test it. So why do this with our software systems?

Time Investment

Writing tests takes time.

In some projects I’ve worked on, I spent up to 50% of my time writing tests. This ratio is not evenly distributed. Sometimes a single line of code resulted in the addition of 20–30 lines of test code. On the other hand, sometimes a block of code resulted in only one line of test code.

Time is a zero sum game. Every hour that we spend writing test code is an hour we could have been writing product feature code.

And vice versa. If we slow down our progress on churning out new features, we have more time for writing tests.

Whether to write tests or not is always a trade-off. Tests are competing for a limited resource — our time.

So we have to make sure we’re testing the right thing.

A Thought Experiment

If you’ll humor me, I want you to try the following thought experiment.

Imagine, for a minute, that there is no such thing as an automated test. Somehow, in the history of computing, no one has ever thought to write programs to check other programs. Your only choice when testing your system is to do it by hand.

On the plus side, you have six weeks to test every aspect of the project. In this fictional world, all programmers are also manual testers. It’s part of your job description. It’s not optional. You must do it before the project launches, fixing bugs along the way.

My question to you now is simple.

What do you test?

Do you try to test every line of code to ensure that it operates exactly the way you think it does? Do you suspiciously check that database client to make sure it really returns what it says it returns? Do you test everymethod? Every class? What about the ones that don’t really do anything except call something else?

Do you check functional features of the system? For example, if you’re building a system with an API, you might want to load a bunch of test data into the database and then make a series of API requests and make sure their responses match up with what you expect. Is that good enough? Will you know where the bug is if your test fails, or will it take you hours of debugging?

Are there any areas of your system that you suspect are hiding bugs? Very often these are the most complex parts of your system because defects hide in complexity. Intuitively, we all know where our bugs are probably going to come from.

Most products have some algorithmic component somewhere. Do you spend a lot of your time testing that this algorithm does what you think it does? Do you try giving it unexpected input just to see what happens?

Whatever you choose to invest most of your time in testing in this scenario, that is where you should invest your time when writing automated tests.

Automated tests are just programs that test what you would otherwise have to test by hand.

Not All Tests Are Created Equal

The fact that we write tests doesn’t necessarily mean we are writing the best tests.

Consider the following:

  1. A unit test that checks that an input to complex algorithmic code returns the correct value.
  2. A unit test that checks that a method in a class calls a method in another class.

Assuming both unit tests take the same time to write, and similar effort to maintain, which one is more valuable?

Hopefully you agree that the first is more valuable. We get more “bang for our buck” with tests that target complex code, where we are more likely to have bugs. The second test is fine, and it may be valuable, but it’s providing relatively little value compared to the first one.

We must be careful not to invest too much time in writing tests that provide little value.

The Fabled End-to-End Test

If our system is simple, we probably can get by entirely on simple tests. But when systems grow, they sprout extra limbs. Suddenly our simple web application with a RDBMS attached actually needs ten different components to operate. That adds complexity.

And remember, defects hide in complexity.

In every company I’ve worked at, there came a time when there was a complicated bug across multiple systems that was so terrible, and so devastating to our confidence in our system, that it convinced us to drop everything and look for a way to plug this hole in our testing strategy.

We had unit tests. We had functional service-level tests. But we hadn’t tested the interactions between all these systems together.

So we began to quest for the fabled “end-to-end” test.

And much like the lost city of El Dorado, we never quite found it.

End-to-end tests look something like this:

  • Push a message into queue.
  • Wait for Service A to pick up the message and insert it into the database.
  • Check that message is in the database.
  • Wait for Service B to notice the row in the database and push a message to another queue.
  • Check that the other queue contains a message.

And so on.

On the surface this seems like a perfectly valid way to test our complex system. Indeed, it comes closer to describing how the system actually works in production.

But I have never seen this style of test last longer than a few months without succumbing to the same fate.

The Fate Of Too-Big Tests

Inevitably, end-to-end tests fall into disrepair and neglect. This is primarily because they yield false-negatives — i.e., failing when there is not really a bug found — so often that their results are frequently ignored. This problem is even worse if they block the operation of a deploy pipeline.

False-negative tests become “the boy who cried wolf.” When we see them failing, we don’t take them seriously.

Thus the test code rots. Failing tests will be marked as ignored or commented out. Someone will eventually propose we take them out of the deploy pipeline and run the end-to-end tests only once in a while, basically admitting that they aren’t important.

What Went Wrong?

The problem with end-to-end tests is that they’re huge.

They cannot be easily booted up and run without some significant infrastructure. That might mean a special QA environment just for the tests. It might mean provisioning some cloud resource (like a SQS queue). It might mean reworking some part of the system to be more amenable to testing.

All of this requires more code, more work, and someone to maintain it going forward. Developers are typically more focused on completing features and not on maintaining test infrastructure, so tests with complex resource requirements are rarely a priority.

And why are these tests so unreliable anyway?

End-to-end tests often contain a lot of waiting for asynchronous tasks to complete — for example, checking if a row has been updated in a database. The only solution for this in most cases is polling with a timeout. Given that we are going to tear down and boot up the whole test environment on every run, this is basically begging for tests to fail for operational reasons, not because of bugs. That’s how we get our false-negatives.

Too-Big Tests Aren’t Useful Even When They Work

But it gets worse.

Even if the end-to-end test worked perfectly, it’s still not that valuable. What does it mean if a test that does everything fails?

If the “do everything” test fails, developers have to spend significant time debugging to identify where in the test the defect is hiding. Remember, the whole point of testing is to identify and locate defects. If our tests don’t do that, they’re not good tests.

Whether you call your test a unit test, a functional test, or a service test, tests are always more valuable when they are smaller. It is better to have several small tests than to have one large test, assuming that tradeoff is possible.

Smaller tests are better because when they fail they tell the developer where the bug is. They say, “hey, there’s a bug in the Foo service’s run endpoint!”

A big, unwieldy test mumbles under its breath, “hey, uh, I think there’s a bug somewhere. But I don’t know where.”

A Better Large Test

Tests that span service boundaries can be great. But we have to narrow in on what they are testing and what they are not testing.

For example, service level tests are great for checking a schema. If we have a contract in Service A that an endpoint should return schema X, then we can have another test for Service B that assumes this schema (for example, using a stubbed function call). Do we really need to test that the two services can talk to each other over HTTP? Probably not. HTTP is reliable enough.

By taking the operational components out of the test, we can get the valueof a cross-service boundary test without all that complexity.

Now when the tests fail, we know what they’re trying to communicate to us!

The Ideal Test

Taking all of this together, we can say a few things generally about “good” tests.

  1. They don’t report false-negatives. When they fail, they identify the presence of a defect.
  2. They are small and bounded. When they fail, they tell the developers where the defect is hiding.
  3. They are testing some code that we wrote, not third-party code.
  4. They are things we would prioritize testing by hand if we had the time.
  5. They are targeting complex parts of our system where bugs are likely to hide.

Note that all of these points help us to target our goal of identifying and locating defects in the system.

Of course, terms like “small” are entirely subjective, and there are cases where bigger or smaller tests are appropriate. You may adjust the definition of “small” according to circumstances, and the principles still hold.

A Common Theme

A common theme tying together all these observations from my experiences is that testing is often approached with an overzealous mindset.

We have grand visions of a testing approach that tests everything that could go wrong in the entire system. Every line of code is “covered” by our testing framework. Our end-to-end test spans the entire system because defects might occur anywhere.

This shows that we are focusing on the idea of testing rather than the practical outcome that we want. Remember, we want to identify and locate defects.

The Limitations Of Testing

I’m not saying we shouldn’t write tests. We should. We must. We shall.

But tests are just one tool in the arsenal of software engineers to manage the quality of our systems. And they are a fallible and limited tool.

Automated tests carry a significant maintenance cost. Someone has to keep our testing infrastructure up and running. Tests are just code, and we will have to change them as our system evolves. They cost real hours of effort that could be devoted to project work. And test code itself will inevitably contain bugs that may conceal defects in our working system.

Tests also must be limited in their scope of they lose their usefulness in locating defects. So we can’t simply write giant system-spanning tests in incredible detail and expect that our developers will find this useful.

So what can we do? Is it possible that there is something better than testing that we should be doing?

As it so happens, yes. Yes, there is.

Code Review: Better Than Testing

Nothing is better than code review for finding issues in software systems. A code reviewer is not a program. They are a real human being, possessing intelligence and context for the whole project, who will look for problems we never even considered.

A good code reviewer will find complex multi-threading issues. They will find slow database queries. They will find design mistakes. They will even identify bad automated tests that are disguising defects.

Even better, they will suggest improvements in the code that go beyond finding defects!

So why is there such emphasis on testing, with innumerable frameworks and tools released, and so little on code review?

Perhaps it is because code review is a nebulous process that depends in large part on the efforts of individual contributors. And we in the software industry have a nagging tendency to downplay the importance of human intelligence and ingenuity in development of our systems.


As I will explore further in future articles, I believe that the software industry often suffers from overemphasizing technological solutions over solutions that rely on human intelligence.

Technological solutions are great for amplifying human intelligence and ingenuity. Automated tests, for example, save us the trouble of manually testing our entire product. Thank goodness we don’t live in that hypothetical world without them!

But we need those intelligent, thoughtful, and conscientious engineers at our side to look for problems in our work and to gently challenge us to perfect our designs. Indeed, this is often the easiest way to catch serious defects in our systems.

Automated testing should always be a complement to our collaboration with colleagues, not a replacement for it.