SDMetrics home page
The Software Design Metrics tool for the UML

Survey of UML Quality Modeling Papers: Less Than One Third Contain Empirical Work

May 28, 2013, Jürgen Wüst. Category: Measurement

I found an interesting review paper titled “A Systematic Literature Review on the Quality of UML Models” by Marcela Genero, Ana Fernández-Saez, H. James Nelson, Geert Poels, and Mario Piattini. It was published in the Journal of Database Management 22 (3), 46-70, 2011. You can also find it here.

The paper aims to determine the coverage of UML quality modeling research since 1997: Which quality aspects and UML diagram types are investigated? What are the research goals and methods? What types of results are obtained?

To select the papers for their literature review, the authors ran keyword-based searches in six digital libraries. These searches identified 266 papers from peer-reviewed journals, conferences, and workshops between 1997 and 2009. The list of reviewed papers is here. The authors then classified the papers along five categories, which I’ll summarize in the following.

1. What types of quality are investigated?

51% percent of the papers investigate semantic quality of UML models, in particular inter-diagram consistency and correctness. 39% deal with what the authors call pragmatic quality, which are the ISO/IEC 25010 quality attributes. Of these, understandability (29%) and maintainability (9%) are the most frequently studied attributes.

2. Which research methods are used?

60% of the papers provide examples to illustrate proposed methods, metrics etc. 10% are merely speculative. On the empirical side, 24% describe experiments, 6% employ case studies. That’s a70/30 split between empirical and non-empirical work.

3. What types of results are achieved?

40% of the papers propose new methods for model verification, validation, and so on. 19% produce new knowledge, mostly as the result of empirical work. 16% present tools. The remainder of the papers concern themselves with checklists, metrics, design rules, guidelines, and so forth.

4. What are the research goals?

The papers mostly aim at assuring (45%), evaluating (32%), or merely measuring (14%) UML model quality.

5. Which UML diagrams are investigated?

31% of the papers are not specific to any diagram types. 25% focus on class diagram, 17% on state charts, 11% on sequence diagrams.

My take on the results

For me, the single most striking result is that only 30% of the papers contain empirical work. My first thought reading this number was “What a disgrace!”. The number does not shed a very good light on the research community.

Performing empirical studies is an essential part of applying scientific principles to software engineering. It is the best method we know to objectively and reliably demonstrate or refute the effectiveness and efficiency of software engineering practices, techniques, methods, and tools. That’s how we gain knowledge.

Theoretical papers do have their place, and expecting all papers to contain some empirical work is clearly unrealistic. But the ratio of 30% seems low for a research area that is largely quantitative in nature. The necessity to carry out empirical work is usually acknowledged in the “Future Work” section of papers, but apparently, this is rarely followed through. Otherwise, we should find the ratio of empirical papers well above 50%.

The authors provide a possible explanation for the low number: the survey included conference and workshop papers, which tend to be limited in scope, and have lower demands on research completeness. That may be true. I think another factor is simply that performing a sound empirically validation is far more difficult than just showing examples. You may not even have to leave the comfort of your desk for the latter. The most difficult part of empirical validation is to obtain external (pragmatic) quality data on reliability, maintainability etc. of the system, and mapping that data to UML model artifacts. This is especially true for case studies taking place “in the field”, which often requires some compromise on data collection. That too could explain why there is so little empirical work overall, and hardly any case studies (only one out of six of the empirical papers).

Much – if not most – public research takes place in the course of PhD theses and government-funded projects. One should expect that scientific evaluation criteria are eventually applied to such research. If the evaluation does not always happen, that’s a poor state of affairs indeed. Publication bias may also play a role here. Studies that do not find statistically significant trends tend to not even get submitted for publication, but silently disappear in the proverbial drawers.

Anyway, the review paper is highly insightful and certainly met its goal to determine the scope of research on UML quality modeling. Hats off to the authors for undertaking this massive – and surely at times tedious – task.