SDMetrics home page
The Software Design Metrics tool for the UML

The Impact of the Level of Detail in UML Models on System Quality

May 23, 2012, Jürgen Wüst. Category: Measurement

Software architects create UML models with varying amounts of rigor and detailedness. Some use UML in an informal manner, as a means to communicate architectural or design decisions. Such use often relies only on a small subset of the graphical notation the UML defines, and pays little regard to the semantics. At the other end of the formality spectrum, UML models are used in the context of the MDA, model transformations, and executable models. In between, there is a continuum of “semi-formality”.

When UML models guide a manual elaboration or implementation phase, an important question is how detailed these models should be. In his PhD thesis “The Effects of UML Modeling on the Quality of Software”, Ariadi Nugroho investigates, amongst other things, the impact of the level of detail of UML models on system quality. The thesis was defended in October 2010 at Leiden University, the Netherlands; you can find most parts of it here.

Controlled Experiment: Impact of Level of Detail on Understandability

Chapter 5 of the thesis describes a controlled experiment in which two groups of students were presented with a UML model of a library system. One group of 26 students received diagrams with a high level of detail (LoD), the other group of 27 students received low LoD diagrams. Students had to answer multiple choice questions about the library system to assess their model comprehension.

The UML model of the library system consisted of three class diagrams and 17 sequence diagrams. The high LoD class diagrams included all class attributes, operations, and association names. The low LoD class diagrams did not show any attributes or operations, and omitted most association names. The high LoD sequence diagrams featured full message names corresponding to class operations, with complete parameter lists and return types. The low LoD sequence diagrams only showed informal “dummy message names”, without parameters or return types.

Nugroho found that comprehension correctness was higher for the high LoD group (75% correct answers) than for the low LoD group (67% correct answers). The high LoD group especially did better on questions concerning class implementation details related to sequence diagrams. The high LoD group also had higher comprehension efficiency (the number of correctly answered questions over time to answer all questions): 0.18 for the high LoD group, 0.15 for the low LoD group. Both differences were statistically significant at the 0.05 level.

Case Study: Impact of Level of Detail on Defect Density

The controlled experiment from Chapter 5 only makes a binary distinction between “high” and “low” LoD. In Chapter 6, Nugroho performs a more detailed measurement of LoD, and investigates its relationship to defect-density of the implementation classes. He defines the following measures of LoD based on class and sequence diagrams.

Class LoD (from class diagrams):

  • Ratio of attributes of the class for which the type is indicated
  • Ratio of operations of the class with input/return parameters.

Association LoD (from class diagrams):

  • Ratio of named associations to total number of associations of a class
  • Ratio of association ends on the class with role names

Object LoD (from sequence diagrams):

  • Ratio of objects with names to total number of objects
  • Ratio of objects with type indicated to total number of objects

Message LoD (from sequence diagrams):

  • Ratio of messages that correspond to operations in a class diagram to total messages
  • Ratio of return messages with label to total return messages
  • Ratio of messages with parameters to total messages

In addition to these LoD measures, Nugroho also measured class coupling (CBO) and McCabe cyclomatic complexity to assess what impact the LoD measures have above and beyond coupling and complexity.

The software system for this case study is an integrated health care system for psychiatrists, developed according to the Rational Unified Process. A UML model of the system was created as an implementation guide before the coding of the system. To obtain the defect data, Nugroho analyzed the source code repository logs and counted, for each Java class, how often it was modified to correct defects. The Java implementation classes were then mapped to UML design classes. In this way, 122 faulty classes were identified, 37 of which could be mapped to UML design classes on class diagrams (23), sequence diagrams (30), or both (21).

Because most classes only had a single defect, Nugroho used defect-density (#defects/1000 SLoc) as dependent variable, and performed log-linear regression to analyze the relationship of LoD and defect density.

Univariate regression analysis found a significant negative correlation for coupling, complexity, and message-related LoD measures only. Class and association LoD, as well as object LoD were not related. For message LoD, the negative correlation is intuitive: the higher the LoD, the lower the defect-density. For coupling and complexity, the negative correlation is attributed to the choice of the dependent variable, which has class size in the denominator.

For the multivariate analysis, Nugroho built two models using a backward elimination process. The first model, based on the class diagrams, includes coupling, complexity and class LoD as covariates; its goodness of fit (R-squared) is 63%. The coefficient of class LoD is positive: the higher the LoD, the higher the predicted defect-density, which runs against the hypothesis associated with LoD. A possible explanation is that classes with a large number of precisely specified attributes and operations may be more complex in terms of data computation and data manipulation (which is not necessarily captured by McCabe cyclomatic complexity).

The second multivariate model, based on sequence diagram data, includes message LoD, coupling, and complexity, in consistency with the findings from univariate analysis. Sequence diagrams specify control flow and logic between components, which are error prone in implementation; the model suggests that specifying the technical details of these interactions alleviates error proneness somewhat. The goodness of fit (R-squared) is 70%. Two thirds of the predictive power can be attributed to coupling and complexity, the remaining third is explained by message LoD.

My take on the studies

The controlled experiment and case study summarized above are thorough, sound pieces of empirical research, and among the very few studies that attempt to measure external system quality attributes for UML models. In the controlled experiment, the effects of LoD improved comprehension correctness by 7 percent points. From the case study in Chapter 6 we learn that message LoD consistently impacts defect-density; the result for class LoD in multivariate analysis may be spurious. This suggest that LoD has a moderate bearing on system quality, mostly due to LoD of messages on sequence diagrams. Because of the moderate impact, it is not obvious whether the additional effort to create high LoD sequence diagrams is paid off by the quality improvement. My hunch is that it does. Ultimately, however, this question can only be answered by hard data.

If I had one quibble with the data analysis in Chapter 6, it would be that the analysis only included classes with defects, discarding classes without defects. Nugroho briefly discusses this, stating that including classes without defects violates the linearity assumption of the regression model. To which I would say that the model should follow the data, not vice versa. The implication for the practical use of the model is that one would have to know upfront if a class contains defects to rely on the defect-density of the model estimates. The other question is whether the statistical relationship of message LoD to defect-density still holds when defect-free classes are included. Chapter 7 of the thesis sheds some light on this. Nugroho revisits the data sets and performs logistic regression using, among others, the LoD measures above to predict defect proneness. I’ll discuss the results from that chapter in a subsequent post.