2025-06-20 21:35

UML: Class Diagram

Business analysis

After reviewing the use case diagram, let’s move on — next up is the class diagram. It’s a pretty handy thing (or rather, the models you can whip up with it).

As before, here’s the notation system:

Blue text — tips or rules not dictated by UML, but borrowed from other useful theories or practical experience.
Italics — examples.

UML Class Diagram is a structural UML diagram that shows the structure of something in terms of classes and the relationships between them. Its original purpose in UML is to represent the structure of classes in a software system built using an object-oriented approach (which makes sense, since UML is built around that paradigm). In other words, it literally shows the classes in the program code — a popular way to structure software in OOP, and you’ll soon see that the diagram’s features are tailored for exactly that use.

However, a) business analysts don’t really need that, and b) it’s way too narrow of a use for such a great diagram. So analysts decided to twist the original meaning a bit and repurpose the diagram for tasks more useful to them.

When to use it:

There are two typical business analyst tasks (though your imagination doesn’t have to stop there):

1) Business Domain Model (BDM):

A model that shows the client’s business domain — its structure: subjects, objects, terms, and concepts of the domain, and how they are all connected. This is usually built during the AS IS analysis phase (to understand and systematize the domain to be automated). But it can also be used to represent the TO BE state — to show what the target domain will look like after your system is implemented.

2) Data Model:

Most commonly it’s the Logical Data Model (LDM). Some analysts go deep and distinguish three or more levels of data modeling, but I don’t see much point — I recommend focusing on just the logical and physical levels.

The Logical Data Model (LDM) shows what information the system will operate with and how it is logically connected. It’s implementation-agnostic (i.e., it doesn’t care how developers will build it later), which makes it a perfect tool for analysts who work with requirements, not their implementation. Put simply, with this kind of model, the analyst defines that the system will include Products, Catalogs, Users, and shows how they are logically related (e.g., products belong to catalogs, and a catalog may contain no more than 10 products). Whether that information ends up in a database, a text file, a cache, or even hardcoded — that’s not the analyst’s concern. That decision lies with the architect or system analyst (but, as I mentioned earlier, we’re not covering that variation — we’re talking about IT business analysts who work with requirements).

The Business Domain Model is a more niche tool — not everyone uses or loves it — and I already covered it in the first part of this article (yes, the one with the Witcher example 😉). Here, we’ll use the class diagram to build a Logical Data Model, which I recommend having any time your solution involves data or information. And if you supplement it with a data dictionary — you’re basically a level 100 archmage of requirements management. Let me emphasize: among the things analysts tend to overlook (due to lack of knowledge, skill, or desire), working with data requirements — and this model in particular — is one of the most useful in terms of improving requirement quality.

Elements:

There’s just one really useful element for a business analyst: the class.

A class is the building block of the diagram. Since the diagram shows the structure of something, the class is one of those structural elements. It’s represented as a rectangle. In the context of the Logical Data Model, classes represent data entities — blocks of data composed of smaller pieces.

If you don’t want to go into detail about what the block contains, just draw a rectangle with the name:

Now, a quick note: in object-oriented modeling, there’s a distinction between a class and an object (that’s why UML has both class and object diagrams). This will help us understand what we’re modeling and how to read class diagrams.

A class is a template or abstraction. An object is a concrete instance of that class. So in the diagram above — and in any class-diagram-based model — we’re saying that our system will deal with Products and Catalogs as general concepts — these are classes. When users actually start using the system, the database will fill up with objects of those classes — dozens of specific products and catalogs, each corresponding to the class description defined in the model. Keep that distinction in mind as we encounter those terms later in the article.

In a more advanced version of the diagram, a class can include attributes — atomic pieces of data that characterize the class/entity. For each object of that class, attributes will have specific values (e.g., for a product, the Name might be “Apple iPhone X 64GB”).

Recommendations:

LDM: When dividing your data into classes and attributes, treat non-atomic pieces as data classes, and atomic (indivisible) ones as attributes.
LDM: Always show class attributes if you don’t have a data dictionary; otherwise, put them in the dictionary to avoid duplication and reduce maintenance effort.
BDM: Structure entities and show attributes as you see fit — the model reflects your understanding of the domain, primarily for your own analysis.

You can also specify data types for attributes:

Recommendations:

For LDM, keep data types at the logical (human-readable) level — e.g., “Text” instead of Char(100), or “Integer” instead of Int(32). These will be converted into technical types in the physical model later by someone else, once the implementation approach is defined.
Common data types (but you're free to name/define your own): Text, Number (integer, decimal), Boolean (yes/no), Date (and/or time), Image, Video.
A useful type: Enumeration — a value from a predefined list. Unlike Text, where values can be anything, an enumeration limits values to a known set. Example: Status could be “Approved,” “Rejected,” or “Pending”. You can list the exact values in the data dictionary or accompanying documentation.
Another useful type: reference to another class by name. This one requires some focus. It’s helpful when classes are related. Let’s say your system includes Comments and Users. Are they related? Of course — users leave comments. So, one attribute of Comment will be Author. What’s its type? You might first think: text. But a better option is to show it as a reference to a User — i.e., to an object of the User class. So instead of: Author: Text you write: Author: User — meaning it's a link to an object of the User class.

Relationships:

Generalization (also called Inheritance):

If you’ve read the earlier article on use case diagrams, you’ve seen this before — but let’s recap. This is a relationship between elements (in our case, classes) showing what OOP calls inheritance. It’s drawn as a straight line with a hollow triangle pointing from the child to the parent class.

Generalization shows that class B is a more specific version of class A — the child clarifies, adds detail, or narrows the parent. In other words, the child is the parent, just with extras.

The child inherits all the parent’s properties (i.e., attributes) but can also have its own.

Usage: just as described — to show that some classes have subtypes.

The next two relationships are very similar in nature: Aggregation and Composition. Both represent “whole-part” relationships. In simpler terms, one element is included in another. If the arrow goes from A to B, it means class A is a part of class B. You can read the direction as “is part of.”

What’s the difference between these two relationships? Visually, aggregation is shown as a solid line ending with an unfilled (white) diamond, while composition is shown with a filled (black) diamond. So be careful where you paint your diamonds 🙂 Aggregation is considered a weak relationship; composition is a strong one. In conceptual terms, if destroying the parent object (in your system that might mean anything you define as destruction — permanent deletion, archiving, deactivation, etc.) also implies that the child object must be destroyed as well, then the relationship is strong — composition. If the child can exist independently even after the parent is gone, then it's a weak relationship — aggregation.

Example: Student – Group. Let’s imagine that a student can be simultaneously enrolled in both a Business Analysis course and a UX course, or any other training offered by some educational center. Now ask yourself: if the Business Analysis group is disbanded, do its Students cease to exist? Clearly not. The student still exists in the system, can belong to the UX group, or maybe no group at all — just be stored in the system as a past or future student. So this is a clear case for aggregation.

Example: Course – Educational Center. If the educational center is shut down, the very concept of a "Course" belonging to it vanishes. All courses tied to that center are also conceptually destroyed. This is a strong relationship — composition.

Usage: Just as described above — to show that one class is part of another.

Association. Association looks the same as it does in use case diagrams: a solid line, optionally with an open arrowhead indicating direction. An association is any meaningful or structural relationship between classes. We cover it last, because it’s a fallback: if you can’t describe the connection using inheritance, aggregation, or composition — use association.

Recommendation: In models used by business analysts, always make associations directed and always label them. Without direction and names, associations — being highly abstract — are often impossible to interpret correctly.

Multiplicity. Relationships can (and should!) also include multiplicity. Multiplicity defines how many instances of one class can be linked through this relationship.

Let’s break down an example:

1) 1..* Products belong to 1 Catalog.

1.1) To define multiplicity for the Product end: How many Products can "belong to" one Catalog? Think in ranges. From 1 (let’s say we don’t allow empty catalogs on the website) to infinity (no upper limit — shown with an asterisk).

1.2) For the Catalog end: How many Catalogs can contain a given Product? Let’s say each Product must belong to exactly one Catalog — not more. That gives us 1..1 or just 1.

2) 1..4 Products can be part of 0..* Orders.

2.1) How many Products can be in a single Order? Say the system limits it to a maximum of 4 products per order. So the range is 1 to 4 (since an order must have at least one product).

2.2) How many Orders can a single Product be part of? Maybe none (if it’s never ordered) or many — so: 0..*.

Recommendations:

LDM: Always include multiplicity for all relationships except generalization. Multiplicities are a goldmine of insight — they lead to valuable business questions and rules that should be reflected in the system.
BDM (Business Data Model): Use multiplicity when helpful — it's up to you.

Now let’s return to the example from our earlier article on use case diagrams. Our scenario:

We have a video hosting platform — a kind of rough, stripped-down version of YouTube. It has the following features:

Registration, login/logout
Viewing videos (via a homepage feed) and uploading new ones
Commenting and moderating comments
Adding videos to a "Watch Later" playlist

Let’s build a logical data model for this system.

Notable elements:

The system will store data (classes) such as Users, Videos, and Comments.
Users will have attributes including a Role, since we have different user categories (e.g., moderators). This will be an enumeration — a value from a predefined list.
Each Video will have an Author, which is a reference to a User — same for Comments.
A User can author 0 to many Videos and 0 to many Comments.
Moderators are a subclass of User and have a separate association with Comments — they moderate them.
Multiplicity here shows that each Comment must be moderated by 1 Moderator, though some may still be pending (i.e., not yet moderated).
Each Comment is part of a Video — and this is a composition: deleting the video will delete its comments too.
There's an interesting aggregation from Comment to Comment — a comment can belong to another comment. This models comment threads (nested replies). That’s also why the Comment class has a Parent comment attribute. So, why aggregation? Because in our setup, deleting a parent comment doesn’t delete its replies — instead, we just show “Comment deleted,” and keep the replies visible. Multiplicity for this relationship: Each Comment can have 0 to many replies (child Comments). Each Comment can be a reply to 0 or 1 parent Comments.

Some advanced aspects of UML Class Diagrams (not strictly necessary for analysts, but worth knowing).

In a more complete version of a class, you might also see operations (methods), along with a number of additional markers for the elements we’ve already discussed.

These aren’t likely to be essential for a business analyst, but it’s good to have a basic understanding — so you can read such diagrams confidently and know where to look deeper if needed:

Access modifiers can appear before an attribute’s name: a minus, a plus, or some other symbols. In development (for code-level classes), this defines whether the attribute is visible to external objects outside the class. To simplify, let’s take Student as a class: an attribute like Grade is visible externally (e.g., to the trainer) — in fact, it’s the trainer who sets it. Its access modifier will be public (“plus” symbol). An attribute like Confidence, however, is likely an internal attribute visible only to the object itself — its access modifier is private (“minus”).

In addition to attributes, a separate section may include Operations. Again, this applies to code-level classes: these are either operations that can be performed on the class, or operations that instances of the class can perform themselves. Operations also have access modifiers, return data types, and (in parentheses after the name) input parameters they can accept.

Classes can also be abstract. We already touched on this in a previous article on use cases, but to repeat: the names of abstract elements (in this case, classes) are written in italics. What does this mean? An abstract element is not “real” — it exists only for supporting purposes. A typical use case for abstract classes is their combination with generalization.

Common Mistakes:

Let’s make them all in one diagram (you can practice by trying to think through each one before the spoilers hit):

Incorrect directions of relationships.

With associations, it’s fairly straightforward — just follow the advice to always name them in the direction they should be read. But with other relationships, it’s easy to get confused.

Generalization is drawn from the child to the parent and reads as “is generalized into” or “is abstracted into.”
Composition and aggregation are also drawn from the part (child) to the whole (parent) and read as “is part of.”

Confusing generalization with aggregation/composition.

Generalization/inheritance = A is a more specific version of B. Aggregation/composition = A is a part of B as a component. There’s a big difference. In the earlier example, a Moderator cannot be part of a User. A Moderator is a type of User.

LDM: confusion between "system stores" and "system operates on".

When I say “confusion,” I mean the following: If you don’t know what the physical implementation of the data will be — whether a particular piece of information will be stored in a database or somewhere else — use the term: “The system operates on information X” when building your LDM (Logical Data Model). If you’re a bit more technically inclined and know for sure (or have found out) what will be stored and what won’t, then use the term: “The system stores X.” Example from the diagram: Should Email (emails sent by the system to users) be considered part of the LDM? Simplest approach: If you’re unsure, just include them in the LDM as information the system operates on. Developers will decide what to do with these emails — likely together with you. Advanced approach: If you’ve leveled up in your understanding of the system, you might say: “I know these emails won’t be part of our database, so I won’t include them in the model,” or: “I know we need to save all emails for history/logging purposes, so I’ll include them.” Only take this approach if you’re confident in your conclusions about how it’ll be implemented — or if you’ve already consulted with the devs.

LDM: slipping into a physical data model.

We’ve already covered how a Logical Data Model differs from a Physical Data Model. Do not include in your logical model:

IDs of records (ID)
primary and foreign keys
join tables for many-to-many relationships
technical data types

And other similar things that your technical background might be tempting you to include. All this is only relevant if the physical implementation will, for example, be a relational DB. I myself have a developer background, but even if I know our DB will be MySQL, I deliberately avoid making decisions like “there should be an ID in each class,” or “here’s how the tables will look,” etc. I must recognize that the team includes specialists who are more professionally equipped to make such decisions. And if designing the DB is not my area of responsibility (a systems analyst with a clear mandate might be in a different situation), then I shouldn’t be doing it — and I certainly shouldn’t be restricting the creativity of the person who will do it with unnecessary and clumsy decisions on my part.