Modeling Supertypes and Subtypes: Part 2

Scott Burkow

5.00/5 (3 votes)

May 21, 2013

CPOL

13 min read

21406

Conceptual and logical designs.

Introduction
The Rules Redux
The Conceptual Model
The Logical Model
- The IE Diagram
Finally

Introduction

In this second of three articles, we'll take the business rules developed in the first article and transform them into the conceptual and logical designs.

In the first article [ ^ ] of this series we presented a background of the supertype-subtype paradigm: its implied inheritance, and at times multiple inheritance, and two essential properties, completeness and exclusivity. Then we established the business rules from which we'll design solutions. As you recall, the rules leading us to think of a hierarchical solution were expressed in process model diagrams done in the Business Process Model and Notation (BPMN) graphical language; I'll repeat a key diagram and other business rules as a refresher.

While the sample problem strongly suggests designing in a generalization hierarchy, we as modelers are free not to. I'll begin, therefore, with explaining the several strengths of the construct. Then its on to the conceptual model itself; I'll show you the designs using the same compact box-in-box style for supertypes and subtypes we saw in article one.

Although multiple inheritance is not yet a problem for us in either conceptual or logical modeling, we must make sure to express it properly in nested-box style. After doing so, we'll examine how we can fine-tune the design to make it extensible. Then we translate the conceptual into the logical. After all that's gone before, this will be anticlimactic.

I think you'll find this middle article a break from the complexity of the first--the sample problem is not complex. In the third [ ^ ] , though, we take our final logical design through alternate physical models and analysis again deepens. We'll see examples of implementing roll-your-own referential integrity, cascade delete, and safe read-only column enforcement.

The Rules Redux

The full set of BPMN process diagrams are available in Part 1; see the section on business rules. Here we review the excerpt that is relevant to modeling with a supertype and subtypes for our sample problem.

Before reaching the outer exclusive gateway, the app user sets the information fields common to all employees--our supertype. Because the gateway has a default path allowing her to circumvent setting any of the other fields specific to specialized employee types--the subtypes--we know the hierarchy will be incomplete. Or the user can continue to fill in manager- or engineer-specific fields.

If the employee is both a manager and an engineer then he is dubbed a mangineer, and additional fields must be set. (I know what you're thinking. A mangineer is half-man half-engineer. Sure.) This means that the entity class we'll model must multiply derive both subtypes; this multiple inheritance implies overlapping subtypes, and will complicate our physical design efforts.

Other Business Rules

An employee is mostly associated with a department.
An employee can make or receive any number of referrals. (A referral is an introduction of a company client from one employee to another to generate more business.)
A manager mostly has a budget.
An engineer has one or more skills.
An engineer must be associated with exactly one engineering discipline.
A mangineer gets bonus pay and a perk plan.

The Conceptual Model

We're at the blank page stage, and must translate this information into the entities with their attributes and relationships that will support the system, now and as it evolves.

At the highest level, the process model is telling us that we are to gather and store data about employees. If we stop our analysis here, then we create one employee entity and done. But if we are to consider designing the employee with one supertype and several subtypes, we mean to split each employee member's data over several entities in one-one relationships. Can we justify this?

The Case for Decomposition

In further analysis, the process model strongly suggests the generalization hierarchy: the root or supertype employee is to hold the attributes and relationships common to all specialized entities as per the Fill in Employee Info task and other business rules; the manager and engineer entity subtypes, to directly derive the employee and extend it with their specialized tasks and own business rules; and the mangineer, to extend the manager and engineer by the same means (in the dreaded diamond pattern inheritance). So the question really is this: because we're splitting entity members over several entities, how do we justify the construct itself?

Without elaboration, entity decomposition is mostly associated with the process of normalization, in which attribute dependencies are analyzed, and, to a lesser extent, the practice of designing without nulls. Hierarchy generalization is the third kind of attribute projection; here are the major advantages:

Affinity groups of attributes can be shown together in their own entities, and can now naturally become non-null as warranted (whereas they would all have to be nullable in one root entity; this is similar to designing without nulls).
Relationships can now be placed at suitable levels with their appropriate entities.
Hierarchies--whether using the isa relationship or not--are easily grasped by stakeholders and other team members.
The pattern doesn't lock us into a physical design.

As to the last point: by showing the full hierarchy in this high-level design, we open up the possibility of exploring different entity merge patterns in the physical design. In contrast, by presenting one entity with column after nullable column and highly generalized relationships, the options are made less clear. We'll work through several employee structures and their implications when we do the physical model in Part III.

The First Cut Diagram

We defer for now the mangineer entity class and discuss it in its own subsection. The diagram above captures the rest of the business rules.

The figure above was generated using an ER/Studio CASE tool for making conceptual diagrams. The tool allows for adding attributes, which we have, and which facilitates generation of the logical model, but we've left them off the diagram to focus more on the supertype-subtypes and their relationships. In this graphic notation, the relationship line itself is used to convey optionality (minimum participation): dotted for 0 and solid for 1.

In the first article, we discussed supertype-subtype properties completeness and exclusivity using box-in-box diagramming and labeled each entity set with a representative membership. Here you'll recognize the same nested rectangles with rounded edges--generated from the same tool--but this time we're showing the Employee entity class as the supertype and Manager and Engineer as its subtypes. These diagrams cannot directly show, aside from annotations, [non-]adherence to the properties.

Supertype employee participates in a relationship with Department and two more with Referral. Because of inheritance, these apply to the manager and engineer entities as well. The engineer adds it own relationship, as does the mangineer, which inherits all relationships from the other three (not shown). Recall from the first article that a subtype vertex inherits all attributes and relationships from all vertices in each path from the subtype to the root supertype.

Diagrams in this phase collectively must have enough detail to convey the entire high-level data design, but can leave off some particulars. Here we've deferred specifying the intersection entity between the engineer and Skill.

Another detail that is often left off is the classification entity. Such an entity represents a set of legal values for a category attribute in another entity. These are generally noted but left off of diagrams because they tend to proliferate, and because they are mostly enforced by check constraints--e.g., category attribute DayOfWeek. As my rule of thumb, if the classification aids the discussion, or may be realized in the physical design because it helps document the database or can be referenced by more than one category attribute, show it. Engineering Discipline is a classification entity.

Return of the Mangineer

In the previous article, we warned against refactoring an entity class that multiply derives the supertype in a diamond pattern into one that singly derives the supertype by removing its multiple relationships and making it a direct subtype. This is shown in the figure above lhs. The Mangineer retains all its attributes and relationships, but what should we do with those that were directly inherited from the manager and engineer?

Of course this is a silly question: the manager's and engineer's attributes and relationships must be duplicated into the mangineer. The essential problem here is we are making a design decision that should be deferred for the physical modeling stage. The conceptual model must be a true blueprint of the meaning of the data--a guide for decisions to come throughout the lifecycle of the system--and it is no longer a viable reference when we throw out associations and duplicate data. (As a lesser point, and without elaboration, it is also a poor physical design to begin with.)

That leaves as viable options the middle and rhs panes. The rhs pane in this graphical notation is the equivalent of making the Storefront Appartment from the first article into an association class. As with that example, although we can infer that the mangineer is also a subtype of employee by the relationships' cardinality, the direct association is lost. The subtlety is also lost on the CASE tool functionality that turns the conceptual diagram into the logical:

The EmployeeID should of course be the primary key for the mangineer, as it is with all entities in the hierarchy. The tool does this correctly for the middle pane design, as we'll see in the logical model.

Stability in the Seventh House

Data structuring that supports the business rules and functionality is necessary to good design, but not sufficient. It must also be extensible--i.e., can anticipate changes to the system and thereby require minimal re-structuring over time. It is better to redo some isolated code rather than alter table or indexed view schemas. So let's ask the business side this question: Will you record skills for managers in the future? If yes, our best choice may be relationship generalization:

Here we've moved the many-many relationship with Skill from the Engineer to the Employee. In the generalization hierarchy, and assuming the subtype is singly derived, any subtype relationship can be moved up the ancestor path to another subtype or the root supertype itself. When doing so, minimum participation from the hierarchy to another entity cannot remain mandatory, as the relationship with skill shows.

The decision to generalize the relationship means that when the time comes in which we record skills for managers, we're still right--we have our extensibility, or equivalently, stability of design. But as you may have realized, there is a cost here: we've traded in rules enforcement for this stability. The foreign key in the composite primary key for the intersection entity between skill and employee won't be enough to enforce referential integrity. Until skills are recorded for managers, we must also ensure that skills are recorded only for engineers. We make a note and defer to the physical design as we often do.

But there is another problem, a subtler one. If the hierarchy were complete, then when we also record skills for managers, the code we are to write for referential integrity enforcement can be retired. Because it is not, at that time we must modify the code instead to ensure that plain vanilla employees are not associated with skills. Subtle yes; so we add a note to the note.

As one more note, entities themselves are often generalized in conceptual designs to achieve stability--e.g. entities man and woman are merged as person. Their relationships are generalized as well, and the same issues as discussed above apply.

The Final Cut Diagram

This is the model from which we'll be moving forward. We see that the many-many relationship with Skill has been generalized to the Employee level to support extensibility. We also see that the Mangineer has a new relationship; remember our rule: every subtype must extend the classes (with attributes or relationships) in its ancestor path(s)--this is true for singly derived subtypes and just as true for multiply derived subtypes.

No diagramming notation can show all business rules. As a corollary, not all business rules can be enforced by structuring alone. We have two examples of this: 1) skills can only be associated with engineers, and perhaps also with managers at a later date; and 2) a mangineer exists if a person is both manager and engineer. (As you recall, this second point was made clear in the BPMN model.)

The Logical Model

Your data modeling CASE tool should have a function for automatically translating the conceptual diagram into a logical one. What we've done for this section is convert the simple box-in-box style into Information Engineering Crows Feet (IE).

In this intermediate stage we are still database agnostic, and have a chance to do further analysis and make structural decisions without business stakeholder involvement, and before committing to a database implementation. The attributes become more prominent, and we analyze them in part as to their dependencies--functional, multivalued and join--to decompose any relations as needed to achieve a desired level of normalization. We can denormalize later in the physical design and do other things to achieve performance goals, but our aim here is to make a normalized model that will be the basis for any implementation.

For our small model we make a couple decisions, but defer others. We notice that for our problem, at least, the heavy lifting was done in the conceptual stage.

The IE Diagram

In IE the semi-ellipse is the subtyping symbol. If ours had an "X" in it, it would mean that membership in the three subtypes is mutually exclusive; we leave it off because ours of course is overlapping. Our hierarchy is also incomplete, but we can't show that graphically in IE. We can suggest it, however, in the discriminator name, IsMgr0_Engr1_Both1_Nil3: I've placed the _Nil3 suffix to mean of no subtype. (And also as a replacement for null. I've made the attribute not null: always design out nulls when possible for fewer querying errors.)

At this stage, relationships are still not foreign keys, so we don't yet worry that the key for the mangineer is in three relationships. These relationships together, in fact, strongly suggest one-half of a complex business rule: a person is a mangineer only if she is also a manager and engineer. It doesn't, however, imply the logical converse--the flip side of the rule we must enforce. And this will be a challenge for the database implementation.

The case can be made that we should transform subtypes and supertypes into relations at this point. The argument here is that the logical model should be ready to be transformed into the physical, but it can't be because no relational database directly supports the generalization hierarchy. (We don't herein consider SQL99 extensions that may support it in the object-relational model.) The counter-argument is that the conversion involves possible merging of tables, and these are performance and querying issues best deferred to the physical design. I agree and apparently so does ER/Studio: the rollup and rolldown capabilities are available for physical modeling only.

There are two more decisions made here to complete the design. First, the non-specific, generalized relationship between skill and employee has not been resolved. It could easily have been, but the CASE tool that transforms the logical to the physical can write in an associative entity for us. Second, we're letting the classification entity Engineering Discipline remain to become a table.

Finally

I no longer remember the exact wording or the author if the quote was ever attributed; here is what I do remember:

If houses were built the same way software is developed, the first woodpecker that came along would destroy civilization.

All too often data modeling is done informally without artifacts--on a white board, and then wiped off as a courtesy to the next group using the conference room. I hope these first two articles and the one to come inspire you to adopt a formal design methodology if you haven't already done so.