Software Testing for Reluctant Product Managers
Software testing is not just a developer’s responsibility, but something that you as a product manager need to be on top of. Here’s what you need to know.
Why is testing important?
When you’re supervising a software development project, you need to know that the end product:
- yields a net benefit,
- works as intended,
- is serviceable for maintenance and updates.
To achieve this, you need to have testing protocols in place throughout the development process.
Testing is like a hidden force in your development process. It will work either for or against you, depending on whether or not you get it right. When it is done right, it gives your team confidence and makes future changes more effortless. When it is not, you may be faced with quality issues, rework, missed deadlines and low morale.
Testing as risk management
Testing is a form of risk management. It’s rather like insurance, where you weigh up the likelihood of a problem occurring, and decide what you want to do, and how much you want to spend, in order to address that problem.
In other words, testing effort should be in proportion to the risks involved.
For example, if you’re developing a banking or healthcare application, you need to have extensive and rigorous testing procedures in place. But when the stakes aren’t that high, you’ll want to do a more modest amount of testing.
Sometimes you’ll want to skip testing all the way to the end product. For instance if you want to quickly try a new idea with a trusted audience, you might just do some “cowboy coding” in collaboration with a designer and a developer, and get something out there as soon as possible.
Common tests to have in place
Let’s step through the phases of a project and see what kind of testing is typically used at each stage.
|Before development||Prototype testing||Business analyst + designer||Does our solution satisfy the users’ needs?|
|Define and implement automated user acceptance tests||Business analyst + developer||What exactly should users be able to do with the software?|
|During development||Unit + integration tests||Developer||Do small-scale functions (units, integrations) work as expected?|
|Vulnerability testing||Developer||Does the code contain security issues, or depend on vulnerable libraries or components?|
|Automated user-acceptance test||Developer||Is work on the new feature done?|
|New feature ready||QA review||QA tester or developer||Does the source code of the new feature conform to coding standards and security requirements?|
|Manual user-acceptance test||Product manager (+ trusted end user)||Does the new feature conform to the specification?|
|Exploratory testing||Software tester||Does the new feature have hidden issues?|
|Before launching a major release||Beta testing||End users||What obstacles are real-world users encountering?|
|Load testing||Operations||Will the system perform well in real-world use?|
|After launching a major release||ROI testing||Business analyst||Is the release achieving our business goals?|
|Deploying an update||Smoke testing||Operations||Did the deployment process complete without breaking anything?|
|Ongoing||System monitoring||Operations||Is the system available and healthy? Are users running into (fatal) errors?|
|Vulnerability testing||Operations||Is the software resilient against the latest known threats?|
Roles are different from people. When I say “role”, I mean an area of responsibility that could be covered by someone who also has other roles. An engineer could have both a developer and operations role, and a product manager could also be a testing engineer.
The next few sections look at some of these types of testing in more detail.
A development process builds on a growing pile of assumptions, and the earlier we discover the bad ones, the less rework is needed later.
During the prototyping phase, the designer and business analyst might be looking for some quick, inexpensive ways to validate ideas. This is known as “guerilla testing” or “hallway usability testing”.
They might start out by sharing some rough, hand-drawn sketches with the team and some end users.
As a next step, the designer could create an interactive (“clickable”) model, basically a web site without a database behind it. This is still quick to make, and allows users to interact with the interface.
Next, a developer could turn the web site into a simple, functional web application by setting up a database with some sample data.
As the prototypes become more like the real thing, the quality of the feedback keeps improving. Having this inexpensive feedback during early development can prevent a lot of wrong turns, and save a lot of money.
User acceptance testing (UAT)
UAT is the test that every product manager is familiar with.
The development team makes a planned release of the software available on a testing server, where the product manager checks whether the software works according to the specification, and a testing engineer looks for hidden issues by means of exploratory testing.
If end users of the new functionality are working within the company, ask them to test as well.
Even though manual testing is essential, they key workflows should be covered automated tests, using Specification by Example.
Specification by Example (SBE)
Traditionally, the development team works from a functional specification, a written document prepared by a business analyst describing all workflows that users can follow. For example, if you were building an ecommerce site, it would describe how a user would add a product to a basket, pay for it, etcetera.
While programming, developers perform automated and manual tests to confirm that their code works, before handing their code to the product manager for user-acceptance testing.
But with SBE, the business analyst and a developer collaborate beforehand on a human-readable specification that is also executable as an automated user-acceptance test. This is called an executable specification. (I wrote an article specifically about SBE.)
The automated test is available to the developers as a useful first check while they do their programming. When the test fails, they know they still have work to do before they can request a manual test.
In this way, automated tests work like guardrails, keeping developers on track and preventing them from going into a “tunnel” where they keep moving forward, but in the wrong direction. Nobody wants to work on a function for a week, only to find out they built the wrong thing and their code has to be thrown away.
When done right, SBE decreases misunderstandings and helps the development team to work fully focused and make fewer mistakes.
Also, the development team doesn’t waste time waiting for the product manager to test the software, and the product manager doesn’t waste time performing the same tasks time and again, with every change in the software.
This is where a software tester actively looks for problems by straying off the “happy flow” of the user acceptance test, drawing on their knowledge about how software, tools or workflows typically break.
Where the user acceptance test in an ecommerce application might take a user nicely from product catalog to shopping cart to order confirmation, a tester might go back and forth between those stages, ignoring the site navigation and instead using the browser’s “back” and “forward” buttons - something that applications often won’t handle well.
Or, they will double-click on the “Pay now” button, and see if the application is smart enough to not double-charge the customer.
Also, they might try to enter numbers in different decimal notations. American users will use a period (full stop) as a decimal separator, while European ones might use a comma.
Exploratory testing is all about the unusual and the unexpected. That’s why it’s best done manually. Whereas machines are good at repetitive tasks, humans are good at asking “what if?” and using their imaginations to go “off limits” and see where things break.
Exploratory testing does not require a computer science degree, but it needs to be done right. A tech-savvy, good logical thinker with a slightly devious mindset (that’s a joke) and a few hours of training will get you a long way. They should be working hot on the developers’ heels, so that bugs and other issues are caught before they cause problems to other developers, or worse, end users.
Exploratory testing tends to have excellent payoff. Whenever we had good testers on our projects, they always found issues that could have caused us major embarrassment had they materialised. Our development team always just loved our testers.
Unit and integration testing
Developers write unit and integration tests while programming, in order to gain a certain amount of confidence that their code works.
These tests can also be helpful when the development team modifies or expands the software. Sometimes a change in the source code can cause something to break elsewhere in the application. If there aren’t enough tests in place, a developer will just see an unhelpful error message indicating that something, somewhere is broken, and may spend hours on troubleshooting. With tests in place, a developer can localise and diagnose the problem faster.
Even though unit and integration tests are usually hidden from the product manager’s view, they can be a major drag on productivity.
As a general rule, you don’t want too many unit tests. I’ve seen the most value from higher-level (system or end-to-end) testing, less from integration testing, and even less from unit testing. The reason is that lower-level tests almost never fail, and therefore don’t provide any useful information on how and where the system breaks. Therefore, they don’t justify messing with the source code.
The next few sections look at some other common mistakes.
Fetishising “100% testing coverage”
There is this popular refrain among developers that every line of code needs to be “touched” by a test, so you can trumpet the fact you’ve achieved “100% testing coverage”.
It sounds impressive and comprehensive. After all, you can’t get any better than 100%, right? But what really matters is the depth and focus of testing, not shallow universal coverage for its own sake.
Let’s say you go to a vegetarian restaurant and they proudly tell you that the food is “100% meat-free”. OK, that’s nice to know. But in another sense it’s merely the least you would expect.
Yes, your dish is 100% meat-free – but it could still taste like crap, or be stone cold by the time it arrives, or be completely different from what you ordered. If you could somehow write an automated test for your meal, you’d want it to consider all these vital factors – not just return a “100%” rating for something fairly fundamental or easy to achieve.
Merely “touching” every line of code with a test may give you a sense of reassurance, but it won’t necessarily improve the quality of your application. In fact, the reverse is true – because a dollar you waste on a pointless test is a dollar you can’t spend on a valuable one.
Writing sloppy tests, or no tests at all
Some developers do not like writing tests, and write them as an afterthought, or try to skip writing tests altogether. This has major implications for the serviceability of the software.
While the effects of sloppy or missing tests might not be immediately visible, they will be as soon as you try to make changes to the software. Bugs will start to appear even in places that are seemingly unrelated to the functionality that you are changing. They can be hiding within thousands of lines of code, and can take hours to spot and troubleshoot.
Having too few or sloppy tests also increases the burden on the product manager and tester to test manually, which soon gets extremely cumbersome and time-consuming.
If the codebase is hard to service, developers will become hesitant to make changes to any part of the source code that they haven’t written themselves. This is a bad place to be in, so if you notice that this is happening, you need to step on the breaks. Make sure that proper tests are put in place, and that code smells get refactored (improved without changing functionality) with top priority.
Just throwing a test at something
Especially after drinking the “100% testing coverage” Kool-Aid, it’s easy for a developer to fall into the trap of adding tests that are easy to write, but not necessarily important. It’s often justified with a “belt and braces” philosophy – “Let’s throw a test at this, just in case. You never know, it might go wrong…”
No test should be written without an understanding of why it needs to be there. And test routines need to be written in code as clear and concise as the source code that they test.
There are a lot of ways in which code can be unsafe.
A classic example is a developer storing confidential data like passwords and access keys in the source code.
Another common issue is being too trusting of user input. Hackers try to exploit this kind of vulnerability by tinkering with a URL or a web form to get access to the database or even the entire server.
In development based on an open source technology stack, third party libraries can be an issue. Developers often use tens or even hundreds of those libraries, which focus on one particular process, like parsing data from user input, PDF generation, etcetera. These libraries are usually developed by volunteers, who maintain them for as long as they feel like it. At some point they abandon their work and stop releasing security patches.
The best countermeasure to these issues is code review, both automated and manual. Automated code review tools are pretty good at detecting outdated libraries and unsafe coding practices. Peer review, a check on the source code by a co-developer, will mostly cover what the automated tools have missed.
Make sure that the development team carefully reviews any third party libraries before adding them to the source code. You want to know if these are under active maintenance by a reputable developer or organisation. If not, consider licensing a commercial library or service, or developing your own library.
Ongoing, after launch
After launch, your operations team will keep an eye on the system, and regularly test it on vulnerabilities.
There are so-called penetration testing (“PEN-testing”) tools that take a hacker’s perspective, scanning for ways to gain access and take control of the system.
If you need a more comprehensive test, you’ll want to work with a security testing company, who know more creative ways to hack an application.
Quality Assurance (QA) review
QA review is all about checking the technical quality of the source code.
A developer starts work on new functionality by making a copy (“branch”) of the current version of the source code (the “main line”). Once the functionality is ready, they ask a colleague to check if the new code passes scrutiny. There usually will be some going back and forth and making corrections, after which the reviewer will update the main line with the new update.
Alternatively, developers can work in pairs, so that code review happens right during programming instead of asynchronously.
Either way, code review is an effective but time-consuming process. Asynchronous code review includes a lot of handovers which routinely introduce days of delay. Pair programming doesn’t have this problem, but keeps two developers tied to the same task, in a process that is not necessarily good for deep thinking. (I, for one, do a lot of problem-solving while biking, walking, or taking a shower.)
As a base practice, I’d recommend:
- a developer working solo, and only involving a colleague when stuck,
- when the new functionality is ready, immediately calls a colleague for a screenshare,
- make corrections based on the colleague’s feedback, then merges the code into the main line.
I would reserve pair programming for highly critical functions, as follows:
- one developer preparing the new feature, and doing the deep thinking,
- works with a second developer to do the actual programming,
- merges the code into the main line.
Especially in case of a major new release of the software, you may want to reduce risk with a phased roll-out, where the new functionality is made available for testing to a smaller group of users before general release. This is called beta testing.
An elegant way to go about this is to offer users a “feature switch” that allows them to toggle between the current and the upcoming versions of the software. This way, they can try the new functionality, and go back to the current version if needed.
Especially if your software has a large audience, you’ll need to know how the software performs under real-world circumstances, and what capacity is needed to give the users a smooth user experience.
Application performance is a complex function of many moving parts, such as:
- amount of user activity
- complexity of the functions being used
- quality of the source code
- system architecture
- software configuration
- available server processing power
- available system memory
- network limitations
Application performance doesn’t scale linearly with any of those metrics. For example, if you want to half the system’s response time, you won’t get there by doubling system memory, or throwing in an extra server.
Reversely, a system won’t get twice as slow with double the number of users. Performance per added user usually isn’t affected much until a critical threshold, after which it will decrease exponentially.
An application can be made to perform fast, or use fewer resources, by carefully removing bottlenecks.
The first place to look is the application’s source code.
A classical bottleneck is the number of “calls” that an application needs to make to fetch data from the database.
Let’s take the user interface of a shopping basket with 10 articles as an example. If the application isn’t optimised, it will make 10 “queries” on the database, one for each individual line item. This is inefficient, because the same data can be fetched with a single query.
A developer won’t notice the difference in performance during programming, because they are the only user of the instance of the application that they’re running on their computer. But once the application gets used in the real world, those database calls add up, increasing response times from the database management system, and possibly overwhelming it.
Once the application is optimised to reduce those database calls, it is time for the operations team to tune the various server and database components.
For instance, they need to carefully calibrate the number of connections that each component can handle, and measure the amount of memory involved in dealing with a single server connection.
The next stage is to calculate how many connections a server can handle given a certain amount of processing power, system memory and other factors. With that number, and the expected use of the software, operations can do their initial capacity planning.
However, it can be very hard to accurately predict how a system behaves under load. This is why operations will do a simulation of real-world use with a procedure called “load testing”. During a load test, a script will make thousands of simultaneous calls to the application, revealing the number of connections at which performance starts to slow down.
With every major release, the business analyst needs to know whether or not it’s achieving its business goals.
Let’s say we are working for a news publication. We prepared this new feature in the software allowing customers with a cancelled subscription to renew at a discount. The business goal is to convince 10% of people who cancelled. The business analyst would then measure the renewal rate before and after the release, and see if that goal was achieved.
As software development takes serious investment, it is essential to know its returns. In practice however, not all businesses measure those, and the ones that don’t inevitably start to view their software as an expense, rather than an asset. They will often try to reduce costs, which isn’t necessarily the right answer to the problem.
ROI testing is essential for any software-centered business, because one can’t afford to invest without learning from mistakes and knowing the returns.
Deploying an update to a server is a process all by itself, and it comes with its own risks.
For instance, if the update contains a change in the database structure (a “migration”), that change can go wrong.
Or, a connection to a third-party service that the process depends on might time out.
Or there is some other process erroring out for whatever reason.
In any case, the development team needs to know whether or not a new update made it to the server successfully. This is where a smoke test comes in.
A smoke test is a lightweight test that runs automatically after the deployment procedure has finished. It checks just one or a handful of key functions that are unlikely to work unless the deployment process was completed successfully. It does not need to test all functionality, because a deployment is usually an all or nothing process, that either succeeds or fails in its entirety.
Smoke testing is an inexpensive way to address a potentially expensive problem. If an update breaks the system, you want to be the first to know. You don’t want to hear it from your users, and you don’t want them to wonder why you didn’t even bother to check if the application was running after the latest update.
No matter how much testing you do, stuff happens after launch.
People do things with the application that you didn’t anticipate. They encounter errors and hidden bugs. Some process may run slowly. The server runs out of disk space. The operating system needs to be upgraded. There’s automated and manual hacking attempts. Sometimes the whole server becomes unavailable.
This is why the operations team needs to have a tool in place that keeps track of downtime, errors, changes in performance and other interesting events. The tool will have something like a dashboard, giving them a bird’s eye view of overall system health, and an option to drill down to the details of a particular event. They will also set up notifications for events, so that they are informed immediately in case users get lots of errors, or the system is down or under attack.
If you recognize some of the examples above, you may want to have a conversation with the development team.
If you find yourself manually testing the main workflows time and again, it’s time to automate those tests.
If the development team comes back with an implementation of the software that is a lot different from what you (or the client) had in mind, you should start prototyping more, and implement Specification by Example.
If your users run into bugs regularly, it’s time to do more exploratory testing.
If you hear developers complain that the testing suite takes too long to run, they may be testing through the user interface instead underneath it.
While these problems are being fixed, the testing procedures in your project will start to work for you instead of against you, and make the development process faster, more predictable, and a lot more pleasurable.