In industrial practice, test suites are created manually by handcrafting them using specific test design techniques and domain-specific experience. Although the automatic or semi-automatic creation of test suites has been the focus of a great deal of research, manual testing is still widely used in software development. To support developers in testing, different approaches for producing good tests have been proposed.

Combinatorial testing has been explored with the goal of automatically combining the input values of the software based on a certain strategy. Pairwise testing is a combinatorial technique used to generate test suites by varying the values of each pair of input parameters to a system until all possible combinations of those parameters are created. There is some evidence suggesting that these kinds of techniques are efficient and relatively good at detecting software faults. Unfortunately, there is little experimental evidence on the comparison of these combinatorial testing techniques with, what is perceived as, rigorous manually handcrafted testing.

Combinatorial testing is used to design test cases by combining different input parameters based on an interaction strategy (e.g., each-used, pair-wise, t-wise, base choice). One of the most commonly used strategy is pairwise (also known as two-way) testing in which each combination for all possible pairs of input parameters are covered by at least one test case.

Researchers at Mälardalen University in Sweden studied the comparison between pairwise testing and manual testing in terms of achieved branch coverage, fault detection (i.e., using a proxy measure: a mutation score) and number of created test cases. The case study uses an industrial safety-critical system developed by Bombardier Transportation Sweden AB, a large scale company developing and manufacturing railway equipment. The system is a train control management software (TCMS) that has been in development for several years and is tested according to safety standards and regulations. TCMS is a control system containing both software and hardware components in charge of the safety-related functionality of the train and is used by Bombardier Transportation Sweden AB for the control and communication functions in high speed trains.

Overview of the experimental method used to perform the case study. For each program in the Train Control Management System (TCMS), test suites are created manually by industrial engineers, generated by SEAFOX for pairwise testing, and executed on both the original and the mutated programs in order to collect scores for code coverage, mutation and number of tests.

Manual test suites created by industrial engineers working at Bombardier Transportation Sweden AB have been used. These manual test suites were obtained by using a post-mortem analysis of the data provided. In addition, researchers have generated pairwise test suites using SEAFOX (2). SEAFOX is the only available combinatorial test suite generation tool for IEC 61131-3 control software and is an open source software. SEAFOX supports the generation of test suites using pairwise, base choice and random strategies. For pairwise generation, SEAFOX uses the IPOG algorithm as well as a first pick tie breaker. A developer using SEAFOX can automatically generate test suites needed for a given IEC 61131-3 program after manually providing the input parameter range information based on the defined behaviour written in the specification.

From the 45 industrial programs studied, they drew the following conclusions (1):

  • The use of pairwise testing results in high branch coverage and mutation scores for the majority of the programs considered. The results for all programs were surprising: test suites created using pairwise testing achieved relatively high code coverage (94% on average). This shows that, for the programs studied in this experiment, pairwise testing achieves high branch coverage. pairwise test suites achieve just as good or better code coverage scores as manual testing for 71% of programs considered. This can be explained by the fact that pairwise testing if properly used is quite good at covering the logical behaviour of the code.
  • The use of pairwise testing results in similar or better fault detection scores than the use of manual testing for 64% of the programs. For 42% of the programs, pairwise testing performs as well as manual testing while for 36% of the programs manual testing performed better in terms of fault detection.The difference in effectiveness between manual and pairwise could be due to other factors such as the number of test cases and the test design techniques used to manually create test suites.

Pairwise testing achieves marginally lower levels of code coverage while in the same time using more tests cases than manual testing performed by industrial engineers. It seems that manual testing is not significantly better at finding faults than pairwise testing.

The results imply that pairwise testing can achieve high branch coverage, but slightly lower scores than manual testing. The results suggest that pairwise testing can perform in some cases comparably with manual testing performed by industrial engineers. This is a significant experimental evidence on the progress of pairwise testing that needs to be further studied; there is a need to consider the cost of using pairwise testing in practice. In addition, pairwise testing is only one type of combination strategy and one would need to evaluate the use of stronger criteria.

References:

(1) Charbachi, Peter, Linus Eklund, and Eduard Enoiu. “Can Pairwise Testing Perform Comparably to Manually Handcrafted Testing Carried Out by Industrial Engineers?.”International Conference onSoftware Quality, Reliability and Security Companion (QRS-C). IEEE, 2017. Link

(2) SEAFOX tool. https://github.com/CharByte/SEAFOX 

Share This