» The Availability of UML Models for Empirical Research

Archive

Categories

The Availability of UML Models for Empirical Research

August 23, 2012, Jürgen Wüst. Category: Measurement

There are quite a few empirical studies that investigate how the structural properties of a system impact external system quality attributes. For example, what is the relationship between class coupling or complexity to its fault-proneness or maintainability? These studies typically measure structural system properties from source code, which is readily available once the systems under study have been implemented.

Many measures of structural properties like coupling or cohesion are proposed as design measures. That is, you can apply them to design artifacts such as UML models. In theory, these measures are therefore earlier available in the development process than measures based on source code. Applying these measures to UML models thus has the potential to provide early quality feedback. The empirical studies mentioned above often include remarks to the effect that “the research should be repeated using design documents to determine whether the observed results still hold”. The studies that I have been involved in way back when are no exception here.

However, there are hardly any studies that actually do perform structural properties measurement on any UML models, even rarer so on UML models developed in industrial settings. Researchers have a hard time to get their hands on UML models at all, let alone UML models of systems for which external quality data is available. Apparently, formal design does not happen a lot in practice. In my industrial development experience, formal design is often eschewed for one or more of the following alternatives:

No design documents: Developers form a rough idea of the system design in their mind. The details emerge during coding, with quite a bit of refactoring in the early coding stages. This is not unusual for smaller systems built by one or two developers over the course of a few months. To be honest, that’s also how SDMetrics was created …
Informal design: System design happens informally during discussions, e.g., in front of a whiteboard. Occasionally, snapshots of these whiteboard sketches end up in the Intranet/Enterprise Wiki pages for project documentation. It would be interesting to see some statistics that compare how much software companies spend on UML tools vs. whiteboards, bearing in mind that whiteboards don’t require maintenance updates every six to twelve months.
Reference architectures: The overall system design is already predetermined by a reference architecture or framework. In classical information systems, for instance, the remaining design work mostly concerns the domain model / database structure, from which everything else follows in a more or less straightforward manner.

If you look at repositories for open source systems, you will find hundreds of millions of lines of source code, but no UML models. I know it’s called open source, but it is still software development, sometimes even on a large scale. The point is, I’m not aware of any F/OSS projects that systematically use UML in their development. If there is some up-to-date, high quality documentation at all, it is mostly user documentation, or the occasional whitepaper with a “30000 feet view” of the system architecture.

Oh, and don’t try to google “modelforge” with software engineering on your mind.

So, as far as the availability of UML models goes, things can only get better. Originally I intended for this post to discuss Chapter 7 of Ariadi Nugroho’s PhD thesis, which describes one of the few studies that investigate the relationship of UML design structural properties to fault-proneness. But after this lengthy introduction why that study is so special, I’ll leave the discussion for the next post.