Basic Approaches to Performance Testing

In which I shall attempt to state that there are 3 and only 3 approaches to performance testing…

1. The comparative method.

We’ll assume you have an application called AUT v1.0, We’ll further assume you have a scenario built to test AUT v1.0 and to hit it sufficiently hard that it’s response times are less than perfect. Ideally it should be walking, not limping and certainly not sprinting along.

We’ll then suggest that v1.1 is coming out soon and that much of the functionality is unchanged. There is always new functionality, that’s the whole point, but it is the level of new-ness that dictates whether this approach can work.

Run your scenario against v1.0 as often as is necessary for consistent timings to be established. I maintain that 3 is the absolute minimum, and that more (often much more is better). Gather your results so that direct comparison between transaction times and runs is possible.

Run 1 2 3 4 5
Transaction A 1 1 1.1 2 1
Transaction B 2 2 2.2 4 2


We can see that run 3 was a little slow and run 4 was 100% slower. Discard run4’s results from the analysis at this point – you have now established a baseline of performance for v1.0.

Install V1.1

Run your scenario (There are going to be issues where new functionality has caused scripts to break and ideally you wouldn’t run those scripts in the scenario but you need to keep the scenario the same where possible).

Get your results:

Run 1 2 3 4 5
Transaction A 0.9 1 1.1 0.8 1.1
Transaction B 2 2 2.2 2.4 2

Plot a simple graph with 2 lines showing the correlation between transaction times for the 2 versions.

And that is the comparitive method… it’s not rocket science, it can be quite limited but it’s fast, and it produces results that anyone can understand.

The advantage of the comparitive method is it’s inherent simplicity. It doesn’t matter what the scripts and scenario are doing to a large degree (as long as the system is being exercised). The key is to compare the next version with the current one, build a long enough timeline and you can see the system getting better/worse over the life of the project.

2. The Reality Model method.

The reality model method is generally accepted as the right way to perform automated tests, but it requires a better understanding of what the usage of the application is like in the real world (which is a pain for new apps as this becomes somewhat educated guesswork). To begin, we require information in the form of raw data about which business process are the most common in the application.

From there, we can derive a model showing what an average day, week, month, year looks like with a percentage breakdown of the business process. Prepare scripts for the most frequent (and also highest priority cases), map that into a scenario with the load rationed out according to usage / priority and you have an ability to generate a realistic load on the system. Run the scenario, run the analysis, look for excessive transaction timings.

3. The background load.

Essentially performance regression testing + new functionality testing
Build a realistic model of usage for the application with existing functionality. Benchmark the hell out of it.
Add in new functionality (preferably one item at a time) and re-script for that functionality.
Run the original model with new scripts for new functionality in place of old scripts (for old functionality)
Benchmark this and compare the results.

The issue here is too much new functionality and there will be a greater amount of variation. What if the new functionality is massively different to the old – as is often the case on web-based application as additional pages are added.

I’ve had projects where the increase in pages alone doubled the amount of transactions for a single script. Direct comparison for that script is tricky. I’ve been known to assingn 10 seconds of think-time to an old script and only 5 to the new one in an effort to bring them into some sort of order. Personally I think that’s cheating but Managers will make peculiar requests.

The wonderful thing about this is that all 3 can be used together, and that over time as benchmark sets are captured it becomes easier to get a feel for (un)acceptable performance.

You can leave a response, or trackback from your own site.

Leave a Reply

Powered by WordPress and ThemeMag