SDMetrics home page
The Software Design Metrics tool for the UML

Quality Assessment and Quality Improvement for UML Models

April 12, 2012, Jürgen Wüst. Category: Measurement

I came across a PhD thesis titled “Quality Assessment and Quality Improvement for UML Models” by Akhtar Ali Jalbani. It was defended in February 2011 at the University Göttingen, Germany. The title seemed promising, so I checked it out. Here’s my review.

Thesis Summary

Jalbani defines a quality model for UML that distinguishes three life cycle phases of UML models: (1) incomplete UML models at the end of the analysis phase, (2) complete UML models at the end of the design phase, and (3) executable UML models when there is no manual elaboration. For each life cycle phase, Jalbani’s quality model identifies a relevant subset of the ISO 9126 quality attributes. For incomplete models, they are analyzability, changeability, learnability, and understandability. For complete models, add accuracy, suitability, and testability. For executable models, take all of the above plus fault tolerance, maturity, and recoverability.

Jalbani then “instantiates” the quality model for incomplete and complete UML models. Following a GQM approach, he identifies questions about the analyzability, changeability, and understandability of UML models, and derives over 30 critical and non-critical UML model rules from those questions. For example, a question for analyzability would be “Is the incomplete model traceable?”, from which derives a rule that “each activity in an activity diagram should refer to a use case”. Jalbani also defines metrics for each rule, counting how often the rule is violated in the model, or the percentage of elements that violate the rule.

In the empirical part, Jalbani has two groups of seven students each develop a UML model of a bakery system. In two phases each group creates an analysis model and a design model of the system. At the end of each phase, each group obtains a model assessment report with all rule violations for their model and is asked to fix them. The corrected model is then subjected to a second assessment. The revised models tend to have fewer rule violations, but the students also introduce new issues when correcting their models. In the end, students were asked to give their feedback how helpful they considered the assessments reports, and whether the assessment criteria and reports were understandable. The reports were mostly deemed helpful and understandable by the students. Jalbani concludes that the approach “is practically feasible to assess and improve the quality of models in a continuous way”.

The Good

Jalbani’s quality model accounts for life cycle phases, which makes sense. An early analysis model clearly needs to be assessed differently than an executable model. The selection of applicable quality attributes is plausible, and many of the rules he derived are potentially useful.

The Bad

The quality assessment is based on rules only. In Section 3.6, Jalbani argues that “metrics are sometimes difficult to interpret and do not give an immediate hint of how to improve quality”. This is certainly true for some metrics, but not a good reason to dismiss such metrics upfront, let alone any metrics at all. In Section 4.5.6, Jalbani goes on to derive understandability rules such as “maximum of 10 operations per class”, “maximum of 20 classes per package”, or “maximum of 4 parameters per operation”. From a methodological standpoint, this has various problems.

  • The quality model throws useful information away. Instead of saying “in terms of understandability, having only 9 operations in a class on average tends to be a little bit better than having 10 operations, and much better than 100 operations”, the quality model says “in terms of understandability, 9 operations in a class is good, 10 is bad, and 100 is just as bad as 10”.
  • The selection of the thresholds is somewhat arbitrary.
  • For the example rules mentioned above, the underlying metrics (number of operations, classes, packages) are not difficult to interpret at all.

The Ugly

The empirical part left me underwhelmed. Jalbani makes no attempt to measure analyzability, changeability, or understandability directly to test if and how strongly the actually observed external quality corresponds to the assessment by the quality model. In other words, there is no empirical evidence that adhering to the – however carefully derived – design rules actually improves the external system quality.

The students developed models of non-trivial size (over 40 diagrams and 50 classes each) over the course of three weeks. So it is unfortunate that Jalbani missed the opportunity for a proper, controlled experiment. Or at least, the case study could have been spiced up a bit, for example by giving the assessment reports only to one group, and then throw an “unexpected” requirement change at both groups in the middle of a design phase, to see which group fares better, if any.

Wearing my “tool vendor hat”, I think that quality models such as the one presented here are useful, but in a PhD thesis I would have expected a much more profound empirical investigation.