The Unified Modeling Language (UML) has become the de facto standard for software modeling. UML models are often used to visualize, understand, and communicate the structure and behavior of a system. UML activity diagrams (ADs) are often used to elaborate and visualize individual use cases. Due to their higher level of abstraction and process-oriented perspective, UML ADs are also highly suitable for model-based test generation. In the last two decades, different researchers have used UML ADs for test generation. Despite the growing use of UML ADs for model-based testing, there are currently no comprehensive and unbiased studies on the topic. The objective of this study is to present a comprehensive and unbiased overview of the state-of-the-art on model-based testing using UML ADs. The systematic mapping study followed the principles presented in Guidelines for conducting systematic mapping studies in software engineering.

Study design

We review and structure the current body of knowledge on model-based testing using UML ADs by performing a systematic mapping study using well-known guidelines. We pose nine research questions:

  • RQ 1. Where and when were primary studies published? The aim is to answer the following sub-questions:
    • What is the annual number of publications in this field?
    • Which publication venues (i.e., conferences, journals) are the main targets of studies in this field?
  • RQ 2. Which other modeling notations have been used in combination with ADs? The aim is to identify if and how other modeling notations have been used to complement the test generation process.
  • RQ 3. What methods are used for model validation and verification? Although UML ADs 2.x adopt semantics similar to the Petri nets, the specification of the model is still semi-formal. Therefore, unlike Petri nets, UML ADs cannot be formally verified. Answering this RQ will allow us to identify the possible techniques that can be used to validate and verify ADs.
  • RQ 4. What coverage criteria are used for test generation? The objective is to discover the coverage criteria that research has focused upon, to guide the test generation process.
  • RQ 5. What methods are used for test case generation?
  • RQ 6. What methods are used for test data generation? Answering this and previous RQ will allow us to gain knowledge about the types of methods for test data and test cases generation, respectively, that have been more prominent.
  • RQ 7. How are the tests executed against the SUT? The aim is to determine whether online or offline test execution technique has been more popular.
  • RQ 8. How are the test requirements traced against the model? The aim is to determine how the research community assesses the importance of traceability between the test requirements and the models.
  • RQ 9. What tools are used for model editing, test generation, and test execution? Answering this RQ will enable us to determine which tools in each of the aforementioned categories have been widely adopted by the research community.

 Inclusion criteria:

  • Papers whose abstracts, titles or keywords discussed test case generation using activity diagram or any of the alternate terms that we specified in Section 4.3;
  • Papers are written in English;
  • If an extended version (e.g., book chapter or journal paper) of a conference paper was found in the search results with more technical details, only the extended version was included.

Exclusion criteria:

  • The publication is a secondary study (e.g., a literature review);
  • Papers not subject to peer review;
  • Duplicated papers (e.g., returned by different search engines).

After applying the inclusion and exclusion criteria, 119 studies were excluded. During full-text analysis, another five studies were excluded as they were not in the scope based on the selection criteria. The remaining papers (i.e., 39) were used to conduct backward snowball sampling (as shown in Fig.1 ), which led to 12 primary studies being added.

 Number of primary studies at each stage during the selection process.

Figure 1: The Number of primary studies at each stage during the selection process.


Source selection

In order to get a broader perspective, we searched systematically in electronic databases instead of targeting a constricted set of journals and conference proceedings. Five electronic databases were considered for conducting the searches: IEEE Xplore, ACM Digital Library, Science Direct, Springer, and Web of Science. These databases contain almost all the important conferences, workshops, and journal publications relevant to the software engineering field. Table 1 lists the number of primary studies found from each database.

Table 1. The number of primary studies per database.

Database Search results
ACM 17
Web of Science 39
Springer 62
Science Direct 12
Total 183


Study quality assessment

We designed a questionnaire for quality assessment (as suggested in [1]) in order to evaluate the quality of the selected primary studies. The quality criteria include the following questions:

  1. Are the goals clearly described?
  2. Is the method/algorithm clearly described?
  3. Are assumptions/restrictions clearly described?
  4. Is the method validated via a case study?
  5. Is tool support discussed?
  6. Is the case study realistic?
  7. Are there multiple case studies used for validation?
  8. Is there a qualitative comparison with other approaches?
  9. Is there a quantitative comparison with other approaches?

Each primary study was evaluated based on the questionnaire mentioned above, in which each question was scored as follows [2], [3]:

  • Fully answers the question: 2 points;
  • Partially answers the question: 1 point;
  • Does not answer the question: 0 points.

For validating our results, we created a word cloud of the most frequent words occurring in the titles and abstracts of the selected primary studies (see Fig. 2). We eliminated the common English words such as thisfirst, etc. Further, we grouped different variations of the same word, for instance, generating, generation, and generated as generate. As seen in Fig. 2, the top 5 most frequent words were test, generate, case, activity, and diagram, which correspond to our selection criteria.

Word cloud based on the titles and abstracts.

Figure 2: Word cloud based on the titles and abstracts.



The results comprise 41 primary studies analyzed against nine research questions. We also highlight the current trends and research gaps in model-based testing using UML ADs and discuss some shortcomings for researchers and practitioners working in this area. The results show that the existing approaches on model-based testing using UML ADs tend to rely on intermediate formats and formalisms for model verification and test generation, employ a multitude of graph-based coverage criteria, and use graph search algorithms.

For detailed results and analysis, read the full article at Model-based testing using UML activity diagrams: A systematic mapping study


[1] Applying systematic reviews to diverse study types: An experience report

[2] Distributed virtual machine consolidation: A systematic mapping study

[3] Effort estimation in agile software development: A systematic literature review

Share This