Click here to Skip to main content
Click here to Skip to main content

Tagged as

Go to top

Modeling Supertypes and Subtypes: Part 3

, 24 Jun 2013
Rate this:
Please Sign up or sign in to vote.
We build the physical design from the logical design.

Contents

Introduction

In this last article of three, we build the physical design from the logical design. We'll first do a straight port from the logical, and then merge tables created from the supertype-subtype hierarchy in different patterns. Regardless of the design, we'll need to add in check constraints and at times more complex triggering subsystems to ensure the data remains consistent.

To start the series [ ^ ] , we presented background theory of the supertype-subtype paradigm, including multiple inheritance and essential properties completeness and exclusivity. In the second half of the article we introduced the business rules for our sample problem, partly in Business Process Model and Notation version 2.0 (BPMN).

In the second article [ ^ ] , we translated the business rules into the conceptual model, and then from the conceptual into the logical. There, we explained why generalization hierarchies are central to data modeling itself, and how for example multiple inheritance is correctly expressed in box-in-box notation.

While I hope you do read these articles, I've designed this one to be mostly self-contained. Our starting point is review of the logical diagram itself--and then it's all about the database.

The Logical Model

Above is the logical model in Information Engineering Crows Feet (IE) that we created in Part II. The generalization hierarchy is formed with the Employee supertype, directly descendant subtypes Manager and Engineer, and the multiply derived Mangineer. The subtyping symbol shows the employee's discriminator, and by lacking an "X," that the hierarchy is overlapping--an employee can be both manager and engineer. The hierarchy is also incomplete--an employee isn't necessarily described by any subtype--although we can't show this in IE.

Another of our sample problem's business rules that we cannot represent graphically is that a person who is both manager and engineer must also be a mangineer--the diagram can only imply its logical converse. This rule was specified in the first article by the BPMN variant model that introduced the mangineer.

The Physical Model

This is the physical model with minor alterations as generated from the logical model. We have resolved the many-many relationship between hr.T_Employee and skl.T_Skill with a junction table. Because the model is now connected to SQL Server, we design in schemas for grouping and security. Finally we've made a table from classification entity hr.T_EngineeringDiscipline.

The relationships in red converge on multiply derived subtype hr.T_Mangineer. Because the mangineer's existence depends on the existence of both the manager and engineer, we want to translate two relationships into foreign keys having action CASCADE DELETE. We can, can't we?

Finally you may have noticed that the discriminator column is missing from the employee table, as discriminators often are. But we can bring it back.

Enforcing Complex Business Rules

Our analysis begins with the split foreign key problem, which encompasses our question above on the CASCADE DELETE action. Next, we'll introduce the Mangineer Rule, which extends our thoughts on multiple inheritance, and how we'll support it. Finally we'll examine how to support the supertype's discriminator, which is missing--which it often is with good reason.

Taken together, the split foreign key problem, Mangineer Rule, and support for the discriminator all lead us to the design of a custom triggering subsystem to unify their solutions, and that's how we'll finish.

The Split Foreign Key Problem

In data design, subtypes are a type of weak entity. A weak entity relies on the existence of another entity, called the owner, for its own existence, as is evident by the fact that the weak entity's key contains the key of its referenced entity. Subtypes are a special kind of weak entity by the fact that we're distributing the attributes and relationships of one logical entity (supertype) among them, making all keys equal and the cardinality of their relationships one-one not one-many.

Because of this existence dependency, we want to delete weak entity members whenever referenced entity members are deleted. We can do this declaratively by adding the ON DELETE CASCADE clause to the referencing table's foreign key definition. In the diagram above, here is the error you get when you try to place the second foreign key on the EmployeeID key in the hr.T_Mangineer table to reference either of its parent tables:

Of course we could get our cascading delete foreign keys by replacing the mangineer's two foreign keys with one that directly references hr.T_Employee, but then we're no longer enforcing the rule that a mangineer's existence relies on that of both the manager and engineer. Nor can we split the mangineer's key column EmployeeID into two, each having cascade delete defined, and introduce a check constraint that ensures the column values are always equal. We get the same error, and thankfully so--duplicating a column whose sole purpose is to hang a foreign key is not an acceptable solution.

In our third attempt we create the mangineer's foreign keys without the cascading delete and define instead of triggers on the manager and engineer tables to pre-delete mangineer rows when manager or engineer rows are deleted.

The message is telling us that instead of triggers and cascading delete foreign keys in the same table are incompatible. (In fact if you create the trigger first and then the foreign key, you get similar error message 1787.)  What we'll have to do to support our split foreign key is define all foreign keys with ON DELETE NO ACTION and code the cascading delete in three instead of triggers, including one for the employee table itself.

The Mangineer Rule

For this analysis, we restate our extended business rule:

An employee has rows in the manager and engineer tables ⇔ she has a row in the mangineer table. 

This bi-conditional expands into these two implied conditional statements plus their contrapositives, denoted with the prime symbol:

  • A: an employee having manager and engineer rows must have a mangineer row
  • A': an employee without a mangineer row cannot have both manager and engineer rows
  • B: an employee with a mangineer row must have manager and engineer rows
  • B': an employee without both manager and engineer rows cannot have a mangineer row

We start our analysis with conditions B and B'. Referring back to both the physical model diagram and the prior section on split keys, we do know that we'll be implementing foreign keys from the mangineer to the manager and engineer--just without the cascade delete action. The keys are enough to enforce condition B, and also ensure that in insert operations the mangineer is placed last. But the missing action means the keys themselves cannot enforce B', which comes up during deletions in manager or engineer. This reinforces our conclusion in the split key section: we'll have to simulate cascade delete in code.

Our actions in support of A, however, are not so straightforward, and in practice may often be overlooked.   As we already noted, there is nothing in the physical design diagram--or logical or conceptual for that matter--that implies this rule.

 In our data modeling, relationship participation from sources manager and engineer to the target mangineer is optional, as it must be in this all or nothing scenario. The Data Architect software in the ER/Studio suite will allow us to show in the logical model the relationships as identifying (solid line)--which they are: mangineer is their weak entity--but then it warns us before making the change that it will remove the mangineer from the subtype cluster (and as you recall, back to the association class design from the first article).

We document and convey the A requirement to the developers, but rather than hope for the best, we should be proactive. We can't enforce A in the triggering subsystem we're designing because of timing, but we can detect and log transactional violations on a background thread. We can implement with a timing mechanism built into a SQL Agent job or Service Broker messaging.

As the contrapositive of A, condition A' essentially says the same thing, but the wording more strongly suggests the rule on the deletion side. For safety, we can enforce in our triggering mechanism that the mangineer can't be directly deleted--that it in fact must be deleted through deletion of its corresponding engineer or manager row only. We'll use the CONTEXT_INFO() resource in the strategy.

Supporting the Discriminator

If we are to add back the discriminator column to the employee table, then it must be dynamically updated as its child tables are modified. As with the split foreign key and Mangineer rule, for safety we don't leave the implementation up to the growing body of stored procedure code but rather want it in the emerging triggering subsystem. Because there is overhead involved and room for error, we redundantly store this system-implicit data only if query patterns warrant it. Let's assume they do.

We'll call the column IsMgr0_Engr1_Both2_Nil3; the name conveys the values it can take. (Or break it into two bit columns--we won't consider this.)  The _Nil3 part means that the construct can be incomplete, as it must be when an employee row is first inserted, and by our business rules it can stay that way. We'll add these constraints on the employee table in part to ensure new rows are in the nil state:

  • A column default having value 3
  • A check to keep the value in the 0-3 range
  • An insert trigger to ensure the initial value is 3

You can find these in the sample SupertypeSubtype database.

An implicit fact is one that can be gotten by a query on a table(s), and as such, we never allow its value to be set by user code; to do so would risk inconsistency. Because the discriminator value depends on insert and delete operations done on child tables, only triggers on those tables should set it. To ensure this is so, we also need an AFTER update trigger on the employee table; here is the relevant snippet:

if not exists
(
    select    0
    where   -- P --> Q is logically equivalent to not(P) or Q    
        not( UPDATE( [IsMgr0_Engr1_Both2_Nil3] ) )
        OR
        CONTEXT_INFO( ) = cast( @@spid as varbinary( 128 ) )
)
begin 
    if( 0 <> XACT_STATE( ) ) begin ROLLBACK TRAN end;

    RAISERROR( 'The discriminator column [IsMgr0_Engr1_Both2_Nil3]' + 
      ' cannot be directly updated by user code', 16, 0 );
end;

The context value on the connection is to be set by child table triggers only; other procedural code should not know the value, making the discriminator read-only. We'll see how the context is set in the triggering mechanism following, but we already have the first trigger and other protective measures in place. And yes--setting the context to the current SPID is not the safest strategy--demonstration only.

Implementation: The Triggering Subsystem

We're concerned with enforcing complex business rules when insert and delete actions occur on the tables we generated from the supertype and subtype entities. To restate: because the actions are constant over these table modifications, we make them part of the table definition and never leave them to outside procedural code that act on the tables. Let's start with insertion.

All triggers for the manager and engineer tables are essentially the same, so I'll just show code for the engineer.

DECLARE @inserted hr.tblEmployee
INSERT INTO @inserted( EmployeeID ) select EmployeeID from inserted;

exec hr.usp_UpdateEmployeeDiscriminator

    @p_tblEmployeeDelta = @inserted,
    @p_Mgr0_Engr1 = 1;

This snippet from the AFTER insert trigger above shows that the action to update the discriminator is done in shared code next:

CREATE PROCEDURE [hr].[usp_UpdateEmployeeDiscriminator]
( 
    @p_tblEmployeeDelta    hr.tblEmployee READONLY,
    @p_Mgr0_Engr1        bit
)
as
BEGIN
    SET NOCOUNT ON;

    DECLARE @vbPreExistingContextInfo varbinary( 128 );
    SELECT  @vbPreExistingContextInfo = IIF( CONTEXT_INFO( ) is null, 0x00, CONTEXT_INFO( ) );

    DECLARE @vbSPID varbinary( 128 ) = cast( @@spid as varbinary( 128 ) ); 
    SET CONTEXT_INFO @vbSPID;

    UPDATE
        emp
    SET
        IsMgr0_Engr1_Both2_Nil3 = 
        case
            when ( 0 = @p_Mgr0_Engr1 and IsMgr0_Engr1_Both2_Nil3 = 3 )
                  OR
                 ( 1 = @p_Mgr0_Engr1 and IsMgr0_Engr1_Both2_Nil3 = 2 )
            then 0
            when ( 1 = @p_Mgr0_Engr1 and IsMgr0_Engr1_Both2_Nil3 = 3 )
                 OR
                 ( 0 = @p_Mgr0_Engr1 and IsMgr0_Engr1_Both2_Nil3 = 2 )
            then 1
            when ( 0 = @p_Mgr0_Engr1 and IsMgr0_Engr1_Both2_Nil3 = 1 )
                 OR
                 ( 1 = @p_Mgr0_Engr1 and IsMgr0_Engr1_Both2_Nil3 = 0 )
            then 2
            else 3
        end
    FROM
        hr.T_Employee emp
        inner join
        @p_tblEmployeeDelta delta
        on
        emp.EmployeeID = delta.EmployeeID

    SET CONTEXT_INFO @vbPreExistingContextInfo;
END

Merging the nearly identical code that will appear in a total of four triggers into one module makes the logic more complex, but the added safety should be worth the effort throughout the maintenance cycle. After the triggers package the EmployeeID primary keys from the inserted or deleted logical table into a user-defined table type and invoke the procedure, the procedure code first sets the connection context value so that the update trigger on the employee table will allow the discriminator modification. As before, using the current session ID is not safe--for demonstration only.

From there, all the procedure needs to update the discriminator is its current value and the table source, given by parameter @p_IsMgr0_Engr1. If the discriminator is in a correct state, it will go to the next correct state--which is why we took pains to ensure the correct insertion start state value 3 in the section on supporting the discriminator above.

Assuming the implementation is correct, convince yourself that for (manual!) transactions that modify rows in both the manager and engineer tables, order doesn't matter: any intermediate result, and so the final result, must also be correct.

Unlike the insert triggers, the delete triggers for the employee, manager, and engineer must be instead of versions because along with setting the discriminator, they must also implement cascade delete, and do so in child to parent order to prevent foreign key violations. As you recall, we must do our own cascade delete because of multiple inheritance, as was described in the section on the split foreign key problem above.

The employee table trigger is simplest, implementing only cascade delete:

CREATE TRIGGER [hr].[trIOFdele_Employee]

    ON       [hr].[T_Employee]
    INSTEAD OF delete
as
BEGIN
    SET NOCOUNT ON;

    DELETE    mgr
    FROM    hr.T_Manager mgr
        INNER JOIN
        deleted dele
        on
        mgr.EmployeeID = dele.ID;

    /* repeat deletion for the engineer table--omitted here */

    DELETE    emp  -- highest parent is last to go in reverse child to parent order
    FROM    hr.T_Employee emp
        INNER JOIN
        deleted dele
        on
        emp.ID = dele.ID;
END

The delete triggers on the manager and engineer will handle the deletion of the mangineer rows, if any:

CREATE TRIGGER [hr].[trIOFdele_Engineer]

    ON        [hr].[T_Engineer]
    INSTEAD OF    delete
as
BEGIN
    SET NOCOUNT ON;

    DECLARE        @deleted hr.tblEmployee; 
    INSERT INTO @deleted( EmployeeID )  select EmployeeID from deleted;

    exec hr.usp_DeleteMangineer @p_deletedAncestor = @deleted;

    DELETE    engr
    FROM    hr.T_Engineer engr
        INNER JOIN
        deleted dele
        on
        engr.EmployeeID = dele.EmployeeID;

    exec hr.usp_UpdateEmployeeDiscriminator

        @p_tblEmployeeDelta    = @deleted,
        @p_Mgr0_Engr1        = 1;
END

The order of operations is critical. We first delete the child mangineers, which can no longer exist if either owner is missing--condition B'--followed by the manager or engineer rows themselves, and after all deletions we can update the discriminator if deletions did not start at the root employee. As we did with the code for updating the discriminator, we put the code for deleting the mangineers into a shared module:

CREATE PROCEDURE [hr].[usp_DeleteMangineer]
(
    @p_deletedAncestor hr.tblEmployee READONLY
)
as
BEGIN
    SET NOCOUNT ON;

    DECLARE @vbPreExistingContextInfo varbinary( 128 );
    SELECT  @vbPreExistingContextInfo = IIF( CONTEXT_INFO( ) is null, 0x00, CONTEXT_INFO( ) );

    DECLARE @vbSPID varbinary( 128 ) = cast( @@spid as varbinary( 128 ) ); 
    SET CONTEXT_INFO @vbSPID;

    DELETE    mngnr
    FROM    hr.T_Mangineer mngnr
        INNER JOIN
        @p_deletedAncestor deleAnc
        on
        mngnr.EmployeeID = deleAnc.EmployeeID;
    
    SET CONTEXT_INFO @vbPreExistingContextInfo;
END

Setting the context on the connection is also part of the strategy to prevent direct deletion of the mangineer. Recall that as per business rule A' in the section describing the Mangineer Rule above, without the mangineer row we can't have both manager and engineer rows, so we allow deletion through manager and engineer delete triggers only as above. In the sample code, you see I've again set the context. In any case, this snippet from the final trigger--an after delete trigger--on the mangineer table uses the context to prevent direct deletion:

if ( isnull( CONTEXT_INFO( ), -99999999 ) <> cast( @@spid as varbinary( 128 ) ) )
    begin 
        if( 0 <> XACT_STATE( ) ) begin ROLLBACK TRAN end;

        RAISERROR( 'Rows in table hr.T_Mangineer cannot be directly deleted.
          See the instead of delete triggers on hr.T_Manager and T_Engineer 
          for the correct CONTEXT_INFO( ) setting', 16, 0 );
    end;

Rolling Up and Down

In the section preceding, we considered cases in which the supertype and all subtypes from the logical design were implemented as tables. But in fact we have choices as to which entities become tables. We can merge tables in different ways: roll up all subtype tables into the supertype table; roll down the supertype among the subtypes; keep the supertype table while merging some of the subtypes in different ways...

We justify merging as a way to ease querying or simplify the design or lend better performance, but in practice we don't necessarily get any of these benefits; as always, it takes careful analysis of current and predicted usage patterns. And as always, we must still enforce all the business rules, and as without merging, triggers are often involved.

A corollary of the business rule enforcement statement above is that we must ensure that the data is the same as it would have been without merging.

For our running example, the viable options are these: roll up all three subtype tables into supertype table employee; and rollup manager and engineer but not mangineer. We'll also examine rolling down the employee into the manager and engineer tables, but we'll need to change the rules.

The Rollup

In this rollup design, we have merged all three subtype-based tables into the supertype--hr.T_Employee has all of their columns and relationships. Where the columns were mostly non-null in the subtype tables, they must all be null in the rollup table. The yellow tables were in relationships with the subtypes before rolling up, and their relationships remain at the supertype level.

For our sample problem, the rollup on balance is a big gain in reducing the plumbing that supports the multiply derived mangineer. As for the discriminator: it is looking more mandatory--but can we support it in the rollup?

Although the pre-existing relationships of the employee table are unchanged, its new relationships gotten from the subtype tables must be examined to see whether foreign keys alone still support the rules. So issues remain. There always are.

The Discriminator Revisited

The discriminator remains an implicit fact, so as we did before we want to block user code from modifying it. When we discussed supporting the discriminator for the pre-rollup design, we put in place miscellaneous constraints plus part of the triggering subsystem, which involved placing triggers on several tables, writing a shared code module, and setting and testing the connection context. But because all the information we need is now contained in each employee row, we just make the discriminator a computed, persisted column and done.

[IsMgr0_Engr1_Both2_Nil3] as
(
case
    when EngrDisciplineID is not null
         and
         Budget is not null
    then 2
    when EngrDisciplineID is not null
    then 1
    when Budget is not null
    then 0
    else 3
end
)PERSISTED NOT NULL

In most or perhaps the vast majority of situations, the computed column should work and be efficient, but in our special case it doesn't. Because rolled up table hr.T_Manager had only one column, Budget, and it was nullable, we can't always establish with certainty the discriminator value. If Budget is null, is the employee a manager or not?  Admittedly this is a rare case because most subtypes have (mostly) not null columns.

For remaining discussions, we'll assume that manager has at least one not null column, making the computed discriminator valid.

The Nullability Rules for Rollup

As was stated, all rolled-up columns become nullable. For subtype table columns that were not nullable before rollup, SQL Server--or any [O]RDBMS--naturally ensures that all rows have values in them as part of data integrity. When rolling up, we must ensure the same.

The First Nullability Rule for Rollup. For any row in the rollup table, over all columns from a single rolled-up subtype, all values must be null or if any value is set, then all values for columns that were not nullable in the subtype must be set.

This rule can be enforced in a check constraint; here is the sample check constraint for the rolled-up mangineer columns:

( [BonusPay] is null and [PerkPlanID] is null )
OR 
( [BonusPay] is not null and [PerkPlanID] is not null )

If the mangineer had a third column, and it was nullable, its test for null would be in the first proposition only.

As the TV ads say, wait there's more!  Pre-rollup, a row in a subtype must have a row in every ancestor in the path leading to the rollup table. After all, relationships do become foreign keys.

The Second Nullability Rule for Rollup. For each row that would have been in a rolled-up subtype table, in the rollup table each of its ancestor subtypes must have values in non-null columns in the rollup table.

We of course enforce the rule in a separate check constraint:

( BonusPay is null OR EngrDisciplineID is not null )

If the engineer table had more than one non-null column, we still would only need to test one of its columns: the test for all ancestor non-null columns is covered in checks that enforce the first rule.

The Mangineer Rule in Rollup

Going with our assumption that the manager table has at least one not null column, we have a valid, computed discriminator, and can use it to enforce the mangineer rule in its biconditional form:

-- P --> Q
2 <> [IsMgr0_Engr1_Both2_Nil3] OR ( BonusPay is not null )
and
-- Q --> P
( BonusPay is null OR 2 = [IsMgr0_Engr1_Both2_Nil3] )

As before, we're able to get by with checking the nullability of only one column, BonusPay, because of the previous check constraint supporting the first nullability rule.

Clearly the rollup is a winner here. For the original non-rollup model, we stated that we couldn't enforce condition A--that we could only log violations on a background thread. Condition A requires a mangineer row to exist whenever matching manager and engineer rows do. Enforcement of the other conditions required extensive triggering. Now we just enforced the whole rule with simple Boolean logic.

The series of checks we've build so far can replace the entire triggering subsystem because we are now doing table modifications in one table, not several. But we will need triggers.

Relationship Generalization

In the rollup diagram, the white tables are unaffected by the rollup. In particular, hr.T_Department and rfrl.T_Referral were referenced and referencing tables involving the employee table before the rollup, and the foreign keys are unchanged afterward. This is as expected: a subtype's inherited attributes and relationships are constant regardless of how the subtype is expressed.

The referential integrity for the yellow tables, though, which originally connected the subtype tables, has changed: the relationships have been generalized up to the employee level.

Where participation by hr.T_Engineer in its relationship with hr.T_EngineeringDiscipline was mandatory, the same relationship substituting the employee table for the engineer table is now optional, as shown by the circle on the one end. The same applies to the mangineer's relationship with the perk plan table. Again this is what we expect, because only some employees are engineers and mangineers, and we achieved this by making all the rollup columns nullable. And our check constraints ensure that the referencing columns have values only at appropriate times. We implement the foreign keys as before and done.

By contrast, the relationship line formerly connecting skl.T_EngineerSkill to the engineer table in the original logical design is unchanged. The difference is owing to the fact that nullability hasn't changed for the engineer skill referencing table. But a big change does exist, and it is subtle: only rows in the engineer skill table that reference employees who are engineers are valid. The foreign key alone can't enforce this; I sense a trigger coming on.

CREATE TRIGGER skl.trInsUpd_EmployeeSkill_ensureEngineer

    ON    [skl].[T_EmployeeSkill]
    FOR    insert, update
as
BEGIN
    SET NOCOUNT ON;

    if exists
    (
        select    *
        from    inserted ins
            inner join
            hr.T_Employee emp
            on
            ins.EngineerID = emp.EmployeeID
        where    emp.IsMgr0_Engr1_Both2_Nil3 not in( 1, 2 )
    )
    begin
        if( 0 <> XACT_STATE( ) ) begin ROLLBACK TRAN; end;
        RAISERROR( 'Skills can be associated with engineers only', 16, 1 );
    end;
END

Recall our discussion in the second article on stability and extensibility, in which we asked the question of whether managers may later be associated with skills. If so, we make a note to one day modify or deprecate the trigger.

What if an employee is no longer an engineer--e.g. becomes a non-engineer manager?  The trigger above won't help us--we complement it with an AFTER update trigger on the employee table:

if exists
(
    select    *
    from    inserted ins
        inner join
        deleted dele
        on
        ins.EmployeeID = dele.EmployeeID
        inner join
        skl.T_EmployeeSkill empSkl
        on
        ins.EmployeeID = empSkl.EmployeeID
    where    ins.EngrDisciplineID is null and dele.EngrDisciplineID is not null
)
begin
    if( 0 <> XACT_STATE( ) ) begin ROLLBACK TRAN end;
    RAISERROR( 'To remove engineer status, first delete rows in skl.T_EmployeeSkill', 16, 1 );
end

I hope you're getting the impression that data inconsistency occurs when things are not fully thought out by the architect.

The Partial Rollup

Rather than rolling up both levels of the inheritance hierarchy, we roll up one level only. One big minus of re-introducing the mangineer is that we're back to needing a background thread to check for and log missing mangineer rows--the violation of mangineer condition A.

Another minus is that we must reinstate two triggering mechanisms. The first is exactly as we saw in the previous section for maintaining data consistency between the employee and employee skill tables--we won't repeat it here.

The second is a simplified version of allowing mangineer row deletions only through triggers on the employee. As before, the triggers will use the same CONTEXT_INFO() strategy we used in the triggering subsystem section.

Because we no longer have the split foreign key problem, the mangineer's single foreign key can now specify ON DELETE CASCADE. But we'll need an INSTEAD OF delete trigger on the employee to set the context:

/* set connection context */

DELETE    emp
FROM    hr.T_Employee emp
    inner join
    deleted dele
    on
    emp.EmployeeID = dele.EmployeeID;

/* reset the context */

The trigger on the mangineer can now allow the cascade delete action to go through.

The second employee trigger is an AFTER update trigger that tests the discriminator and also sets the context when deleting the mangineer. Here is a snippet showing the deletion only:

DELETE    mngr
FROM    hr.T_Mangineer mngr
    inner join
    inserted ins
    on
    mngr.EmployeeID = ins.EmployeeID
WHERE    2 <> ins.IsMgr0_Engr1_Both2_Nil3

It is important to note at this point that neither this structuring nor the full rollup means that we designed out multiple inheritance from the architecture. We are just handling it differently.

Finally, this design is a viable compromise between creating one table per generalization hierarchy entity and the full rollup. In rolling up the first-level subtypes only and leaving the multiply derived one as standalone, the model is not inelegant.

The Rolldown

In this design we remove the employee supertype, rolling down all its columns, save for the discriminator, and relationships. Here I've chosen to roll down only one level, to the manager and engineer tables, but not the mangineer's. But neither design is acceptable: because the manager and engineer tables are not disjoint, data and relationships for employees who are both are duplicated, and can get out of sync. To make the point perfectly clear: rolldown is mostly a poor option for overlapping subtypes.

We are of course designing a transactional system--not considering cases where the data is replicated into a read-only environment.

If it's already in place or you can't veto it, you must develop a triggering mechanism to keep duplicated data in agreement. We won't consider it further.

If rolldown is a poor option for inclusive hierarchies, it isn't even possible for certain tables when the hierarchy is incomplete. Any table--whether from the supertype or a subtype--whose membership cannot be retrieved from the union of members over all its immediate descendants, cannot be rolled down. In our case, if employee Glenda is neither manager nor engineer, we could no longer track her. The rolldown diagram assumes the business rule allowing incompleteness no longer applies.

Analysis: Assuming Disjoint and Complete

Therefore to better explore rolldown, let's assume that manager and engineer memberships are disjoint and complete partitions of employee. Because overlap is a requirement for multiply derived entities--see the section on exclusivity in the first article--the first thing that happens is the mangineer can't exist--so strike it from the diagram.

Exclusivity is enforced with identical AFTER insert/update triggers on all rolldown subtype tables, and relies on there being a natural key (Name) for employees. You can see the triggers in the rolldown database in the download (e.g. hr.trInsUpd_Engineer_exclusivity).

While inter-table employee partitioning cannot be tested with surrogate keys, I have however made the IDENTITY column (EmployeeID) ranges non-overlapping between the subtypes for added safety and flexibility. For example, it allows referral referrers and recipients to uniquely identify employees, whether manager or employee, using employee identities.

With exclusivity, data in the rolldown tables over shared columns is no longer duplicated, nor are incoming relationships. In the diagram, both manager and engineer reference department without issue. The problem occurs with outgoing relationships; I've highlighted in red the relationships emanating from the manager and engineer into the referral table. Each referrer and recipient must be found in either table, but we can't place two foreign keys on each column because, unlike with the mangineer (without exclusivity), the same value can't be in both the manager and engineer. We'll have to forgo the foreign keys and once again write our own referential integrity. The first trigger ensures that both referenced values exist in one of the tables.

CREATE TRIGGER [rfrl].[trInsUpd_Referral_referentialIntegrity]

    ON    [rfrl].[T_Referral]
    FOR    insert, update
as
BEGIN
    SET NOCOUNT ON;

    if exists
    (
        select    *
        from    inserted ins
            LEFT OUTER JOIN
            hr.T_Manager mgrReferrer
            on
            ins.ReferrerID = mgrReferrer.EmployeeID
            LEFT OUTER JOIN
            hr.T_Engineer engrReferrer
            on
            ins.ReferrerID = engrReferrer.EmployeeID
            LEFT OUTER JOIN
            hr.T_Manager mgrRecipient
            on
            ins.RecipientID = mgrRecipient.EmployeeID
            LEFT OUTER JOIN
            hr.T_Engineer engrRecipient
            on
            ins.RecipientID = engrRecipient.EmployeeID
        where    ( UPDATE( ReferrerID ) OR UPDATE( RecipientID ) )
            and
            (
                -- referrer cannot be found in either table
                ( mgrReferrer.EmployeeID  is null and engrReferrer.EmployeeID  is null )
                OR
                -- recipient cannot be found in either table
                ( mgrRecipient.EmployeeID is null and engrRecipient.EmployeeID is null )
            )
    )
    begin
        if( 0 <> XACT_STATE( ) ) begin ROLLBACK TRAN; end;
        RAISERROR( 'Referential Integrity error: cannot find employee for a referrer or recipient', 16, 1 );
    end;
END

Delete triggers on the referenced (rolldown) tables simulate the CASCADE DELETE constraint. This time, because no foreign keys are involved, the triggers can fire before or after deletion. Or we can write the triggers to simulate the NO ACTION functionality. In this case, we throw out the transaction and inform the user to correct the transaction by first deleting referencing rows.

/* choice CASCADE DELETE */

    DELETE    ref
    FROM    rfrl.T_Referral ref
        INNER JOIN
        deleted dele
        on
        dele.EmployeeID = ref.ReferrerID
        OR
        dele.EmployeeID = ref.RecipientID;

/* choice NO ACTION */

    if exists
    (
        select    *
        from    deleted dele
            INNER JOIN
            rfrl.T_Referral ref
            on
            dele.EmployeeID = ref.ReferrerID
            OR
            dele.EmployeeID = ref.RecipientID
    )
    begin
        if( 0 <> XACT_STATE( ) ) begin ROLLBACK TRAN; end;
        RAISERROR( 'Referential Integrity error: Cannot ' + 
          'delete because a referral''s referrer or ' + 
          'recipient is referencing the employee', 16, 1 )
    end;

Which is better?  It all depends.

The Reconstruction

Whether we roll up or roll down, no business rules can be lost--as we've taken pains to show. An implication of this is that we must be able to recover exactly those rows from each table that existed in the non-merged design.

If we do no merging, we should be able to show each member of the generalization hierarchy in one row of a logical table. This follows from the fact that each subtype is a partition of attributes and relationships of the supertype.

I think you can see where we're going: the physical model chosen must include (thoroughly tested) views as part of the deliverable.

Here is the employee table from the full rollup:

From this table we should be able to reconstruct the original employee table, plus all three subtype tables. But for the subtypes, the question arises: should we recreate them exactly as they were in the pre-rolled state, or can we add columns from tables in the inheritance chain?  Why not do both?

CREATE VIEW [hr].[T_Manager_full]( EmployeeID, Name, DepartmentID, Budget )
as
    select    ID, Name, DepartmentID, Budget
    from    hr.T_Employee
    where    [IsMgr0_Engr1_Both2_Nil3] in( 0, 2 );
    
    ------------------------------------------------------------
    
CREATE VIEW hr.T_Manager( EmployeeID, Budget )
as
    select    EmployeeID, Budget
    from    hr.T_Manager_full;

Sure we're doing the easy stuff now, but just keep in mind that for completeness, reconstructive views should be part of the full physical design deliverable.

Finally

Finally.

Those of us who design data or implement databases know it takes a lot of thought, knowledge, tradeoffs, decisions. In this series I meant to share with you the process from its inception to conclusion through designing a model in supertypes and subtypes--a seemingly simple exercise that, when we looked more closely, wasn't.

The paradigm has its own theory, logic, and rules; it is a study within a study.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Scott Burkow

United States United States
Scott is a data architect/database developer specializing in SQL Server. He holds an Ms. C.S. from UCSB engineering, and is MCITP Database Developer in 2005 and 2008.
 
Based in San Francisco under the incorporated name Ziron Systems, he works with clients to analyze, design, and develop database systems as well as mentor team members and troubleshoot issues.
 
His passion is working on complex problems, and at the moment, writing articles for advanced practitioners that pose problems and explore solutions.
 
Reach him at scott.burkow@zironsystems.com or (310)403-1137.

Comments and Discussions

 
QuestionAntipattern PinmemberRoger Alsing26-May-13 20:27 
AnswerRe: Antipattern PinmemberScott Burkow27-May-13 2:20 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140916.1 | Last Updated 24 Jun 2013
Article Copyright 2013 by Scott Burkow
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid