Why You Need a Staging System for Acceptance Testing, and How to Set It Up

Some software teams do their testing on a live system, which can be quite risky. A properly configured staging system allows for testing in a controlled environment without disrupting the live system or its users.

Interior of a nuclear submarine with a large, scary-looking launch button. — Interior of a nuclear submarine. (AI-generated image by DALL-E)

Why “live”-testing is a bad idea

Recently, a fellow product manager told me that the majority of his clients perform their acceptance testing on a live (“production”) system. That surprised me, because this is something that a development team either knows not to do, or learns the hard way not to do very quickly.

We learned it the hard way, back in 2007.

Our team was rolling out a new feature to an email platform with tens of thousands of users. At the end of the procedure, we had the system send out an email with subject “test 1 2 3” to verify that the mail server was still working. That email was just dispatched to ourselves. Or so we thought.

Less than a minute later, the confused and angry responses from our customers started pouring in. We quickly figured that we had sent our test email to every single user.

After pulling an all-nighter answering hundreds of emails, we decided that this was one of those “never again” moments.

We decided to set up a separate staging system so that going forward, we could do our testing in isolation, without troubling our customers.

What is a staging system?

A staging system is an environment, separate from the production system, specifically set up for user-acceptance testing (“UAT”).

It allows the product manager, testers, and trusted end users to safely test the next upcoming release of the software before it goes to the production system for real-world use.

A staging system is a compromise

A staging system needs to reconcile conflicting requirements.

On one hand, you want the staging environment to be the same as production, so that you can be confident that whatever works on staging will work on production.

On the other hand, you don’t want real-world users to be affected by your testing. If the testing environment doesn’t keep you safe from accidents, you may be tempted to avoid testing certain risky workflows, or to not test at all, which inevitably leads to problems later on. (I wrote an article about the importance of testing.)

So a good staging system should be designed to avoid:

people’s personal data getting leaked,
getting the production database “polluted” with testing data,
real-world users receiving test messages.

Taking the mailing system that I mentioned above as an example, you’ll want to know if it is still working once a new release goes live, but without actually sending any emails.

So, how can we set up a staging system in which we feel confident to perform comprehensive user-acceptance tests?

Let me share some do’s and don’ts.

Do’s and Don’ts of staging environments

Don’t: test with production data

It can be very tempting to test with a copy of a production database, so that the staging system looks and works exactly the same as the production system. After all, the closer to reality your staging system is, the more realistic your testing gets, and the better your chances of catching bugs, right?

Wrong.

Testing with a copy of the production base is like a crap shoot: you’re testing workflows with random database records, hoping that bugs come to the surface that way. This is wasteful, because we can actually develop some good hypotheses on how the system could fail and use those to come up with good sample data, which help us discover issues much faster.

In addition, user-acceptance tests should be based on a shared understanding between the product manager and the development team. In other words, the development team needs to know how you are going to test before development even begins. A random test is fine in exploratory testing, but not in a formal acceptance test.

Don’t: test with pseudonimised production data

In one of the systems that our team built, there was this feature to assist callcenter agents in matching a customer with the closest available employee.

We felt that the only way to test realistically was with real addresses from real customers and real employees. After all, testing with random geolocation data would have yielded some very strange results during testing — like the closest employee being 70km away from the customer.

At the same time however, we were concerned about having personally identifiable data in the staging database. So as a precaution, we had a script in place that automatically replaced people’s personal data (names, addresses, email addresses, phone numbers, etcetera) with fake data.

It sounded good in theory, but didn’t always work so well in practice. In one occasion, the pseudonimisation script failed without warning, and people’s private data showed up on the staging system.

In hindsight, we should not have set up our sample data this way. We should just have created fake records for customers and employees, with carefully chosen geolocations, and test with those instead.

So why should personal data not be used for testing?

To begin with, there is the legal argument. There is an increasing number of jurisdictions where testing with personal data is prohibited.

But what I came to realise is that the primary argument is a moral one. Would I like my bookstore, or my doctor, or the government, to use my personal data to test new features in their software, even if it’s just my address?

Exactly.

Do: use carefully curated sample data

Sample data for user-acceptance testing needs to be carefully curated: it should be representative for the workflows that you want to support. Let me illustrate that with an example.

Recently, I tried to make a payment in a US-based ecommerce store with my Netherlands-issued credit card. The transaction failed because of a bug: the interface required me to select a US state as part of my Netherlands billing address, which made it invalid.

How did this problem happen?

It was clear that the store owner intended to take orders from foreign customers — after all the interface showed a long list of countries. But it was also clear that the delivery team had not bothered to test the payment workflow with a billing address outside the United States.

Therefore, given the expectation of having customers from outside the United States, there should have been at least one test routine in place for that scenario.

And the sample data set for testing should contain at least one record of a customer based outside the United States.

If you’re using Specification by Example (I wrote a detailed article about it), the development team already has these data available, for use in their own automated tests.

The benefit is that everyone is testing with the same data, based on scenarios that were agreed upon up front.

Do: automate system configuration

If the procedure of launching or resetting a staging system involves any manual work, mistakes are too easily made.

For instance operations might overlook an item on their checklist, and forget to set up a firewall, or to block outgoing email, or to secure services like Redis (a popular database used among others for authentication services) that are vulnerable by default.

In addition, a staging system that is cumbersome to maintain might end up not being used at all, or encourage testers to do their user-acceptance testing on production.

Fortunately there is a solution called configuration management, which is the automated setting up of servers, networks, software and databases.

Getting the automation in place is an operations task. Using tools like Terraform, Ansible and GoCD, they will prepare configuration files that completely automate the process of setting up a server and getting it into a specific state, in a reliable, repeatable way.

As a result, you can can have a staging server set up from scratch and ready for acceptance testing in minutes, with the click of a button.

Do: disable all messaging on the staging server

Server operating systems often come with an SMTP-server (software for sending email) pre-installed.

I would recommend to completely remove any SMTP capability from any staging environment, so that the server simply can’t send any emails.

Don’t: have testing accounts on production

Some end users feel tempted to do a manual “smoke test” after a new release on production, to see if the key workflows are still intact.

For instance they’ll create a fake account under username [email protected] and password 123456, and create fake records under that account.

Using fake accounts and records on production is not a good idea. It can distort statistics and contaminate reports, as well as create backdoors into the system.

The better approach is an automated smoke test that checks the system right after the deployment of a new release, but without changing anything in the database.

Do: use third-party services in testing mode

If your application uses third-party services for email, PDF generation, payments etcetera, the odds are that they offer a testing mode.

If so, the staging system needs to be configured to use those testing modes.

In testing mode, an email delivery service will capture and show an outgoing email in their user interface, but not send it.

And similary, a payment service provider will capture and show a test transaction, but not carry it out.

Meanwhile, they will report the same errors to you as in production mode, so this is perfect for testing. You can verify that your application is handing over everything correctly, and be confident that once things work on staging, they will work fine on production as well.

If a third-party service doesn’t have a testing mode, don’t include them in your testing, and consider switching services as soon as possible. There is no excuse, and you need those services covered in your testing procedures.

Do: make the staging environment look different from production

One of the issues that our “power” users (trusted end users involved in testing) used to run into, was that the staging and production environments looked the same.

For instance, they would take a phone call from a new customer and accidentally enter their data into the staging system instead of production.

A simple measure that you can take to prevent this, is to use a different color scheme on the staging system.

This can also serve as a nice “smoke test” after the staging system is updated: as proof that the configuration script correctly set up a staging environment, not a production one.

Do: have a user switching feature in place

Some of the systems that we built have large numbers of user archetypes (roles).

In one case, a system for a services company, we had the roles of callcenter agent, callcenter supervisor, support agent, support manager, complaints manager, store manager, data analyst, accountant, and business owner.

Having so many roles is not great for testing. If a workflow requires handovers from one role to the other, you constantly find yourself logging in and logging out, or navigate from one window to the other, which becomes tedious quickly.

That’s why I always require a user switching feature on a staging system. In order to switch roles, you can simply open a drop-down window, select the new role, and continue testing, without logging on or off or switching windows.

Do: have a “reset” button in place

Especially if you work with a limited data set, you will run out of usable sample data while testing.

Let’s take an ecommerce store as an example, and let’s say that the sample data has 10 different orders for testing, in various states. Perhaps one order is a shopping basket with a few items, another is placed but hasn’t shipped yet, another is on its way to the customer, yet another has two items that shipped and one on backorder, etcetera. At some point during testing, all those orders will be completed.

When that happens, a tester should be able to reset the system directly from their user interface so that they can continue testing right away. (Of course this would reset the system for all testers, but testers who work simultaneously need to coordinate their work anyway.)

Recommendations

Unless your application has no privacy concerns, or concerns of bothering users, testing on a production system carries a lot of risk and should be avoided.

If you don’t have a dedicated staging system, have a conversation with the development team and operations. They might actually have been uneasy about the situation themselves, and be grateful that you bring it up.

A proper, easy to use staging system is a great asset to have. It encourages testing, and gives everyone peace of mind.