Discrimination

Posted by    |      

Type - not sex, or race - discrimination is what we do when we read a row from a SQL query result set, and determine what Java class we should instantiate to hold the data from that row. Type discrimination is needed by any ORM solution or handwritten persistence layer that supports polymorphic queries or associations.

This simplest way to visualize type discrimination is to consider the following result set:

( 123, '1234-4567-8910-1234', 'Foo Bar', 'VISA' )
( 111, '4321-7654-0198-0987', 'Baz Qux', 'MCD'  )

This result set contains details of two credit cards. Suppose our application has a different Java class to represent each type of credit card. We have two subclasses, VisaCard and MasterCard of the CreditCard class. Then we can check the last column of the result set to decide which class to instantiate for each row. This column is the discriminator column.

You're probably wondering why I'm talking about about result sets instead of tables. Well, there are various ways to map a Java class hierarchy to a relational database schema: table-per-hierarchy, table-per-concrete-class, table-per-class. So the actual table structure may be quite complex. But the only way to actually get data efficiently out of the database is to denormalize it into one big square result set. In fact, I usually think of the job of the SQL query as transforming table-per-concrete-class or table-per-class mapped data into an intermediate table-per-hierarchy denormalized form. (This is, incidentally, why table-per-hierarchy offers the best performance of the three mapping strategies - it is already in a convenient form and does not require unions or joins on the database.) Whatever strategy we choose, we need to perform type discrimination upon this flattened result set.

Now, what I'm most interested in for the moment is the result set discriminator column. Most writings on the topic of inheritance mappings - and there have not been very many - and even most ORM solutions (actually every one that I know of), have assumed that the discriminator column is an actual physical column of the root table of the supertype. Indeed, it has usually been further assumed that the mapping from discriminator values to classes is one-to-one. But this need not be the case. In my credit card example, it certainly made sense. But now let's consider a different case. We're storing data relating to specific individuals in the PERSON table; our SQL query might look like this:

SELECT ID, NAME, SPECIES FROM PERSON

and the result set might be:

( 12345, 'Zxychg Ycjzy', 'Martian' )
( 52778, 'Glooble Queghm', 'Venusian' )
( 98876, 'Gavin King', 'Human' )

Now, here on earth, we consider Humans to be quite special and worthy of extra attention compared to other species of intelligent alien. So we might have a specific Human class, and a generic Alien class. Then the mapping from discriminator column values to classes is certainly not one-to-one. Indeed, there is a specific value for Human, and a catchall for Alien.

Actually, it's quite reasonable that we might even have some extra, Human-specific data, in the HUMAN table. To get all the data we need, let's use the following join:

SELECT ID, NAME, SPECIES, COUNTRY 
FROM PERSON 
    NATURAL JOIN HUMAN

( 12345, 'Zxychg Ycjzy', 'Martian', null )
( 52778, 'Glooble Queghm', 'Venusian', null )
( 98876, 'Arnold Schwarzenegger', 'Human', 'US' )

In this result set, we have two potential discriminator columns. Either the COUNTRY or SPECIES column could be used to determine if the individual is human. And the COUNTRY column isn't a column of the root PERSON table. Now imagine we introduce a further level of specialization, and include data specific to employees of our organization:

SELECT ID, NAME, SPECIES, COUNTRY, JOB
FROM PERSON 
    NATURAL JOIN HUMAN 
    NATURAL JOIN EMPLOYEE

( 12345, 'Zxychg Ycjzy', 'Martian', null, null )
( 52778, 'Glooble Queghm', 'Venusian', null, null )
( 98876, 'Arnold Schwarzenegger', 'Human', 'US', null )
( 34556, 'Gavin King', 'Human', 'AU', 'Java Developer' )

Now we can no longer perform type discrimination using just a single column. Ouch! That's messy. Let's change the query slightly:

SELECT ID, NAME, SPECIES, COUNTRY, JOB, 
    CASE 
        WHEN LOC IS NULL THEN 'ALIEN' 
        WHEN JOB IS NULL THEN 'HUMAN' 
        ELSE 'EMPLOYEE' 
    END 
FROM PERSON 
    NATURAL JOIN HUMAN 
    NATURAL JOIN EMPLOYEE

( 12345, 'Zxychg Ycjzy', 'Martian', null, null, 'ALIEN' )
( 52778, 'Glooble Queghm', 'Venusian', null, null, 'ALIEN' )
( 98876, 'Arnold Schwarzenegger', 'Human', 'US', null, 'HUMAN' )
( 34556, 'Gavin King', 'Human', 'AU', 'Java Developer', 'EMPLOYEE' )

Yay, we've got our nice clean discriminator column back again! But this column most certainly does not correspond to any physical column of a table. It holds a pure derived value.

This is content-based discrimination. Our example uses a table-per-class mapping strategy, but the result sets above could all just as easily have come from from some other mapping strategy.

Here's a second example of content-based discrimination:

SELECT TX_ID, ACCOUNT_ID, AMOUNT, 
    CASE 
        WHEN AMOUNT>0 THEN 'CREDIT' 
        ELSE 'DEBIT' 
    END 
FROM TRANSACTIONS

( 12875467987, 98798723, 56.99, 'CREDIT' )
( 09808343123, 87558345, 123.25, 'DEBIT' )

Here, we use a column that is based upon a mathematical expression, AMOUNT>0, to discriminate between DebitTransaction and CreditTransaction. In principle, much more complex expressions are possible. (In practice, they are likely to remain quite simple.)

In Hibernate 2.x the table-per-class mapping strategy always used content-based discrimination, and the table-per-hierarchy strategy always used column-based discrimination. For some reason - that is now kind of obscure to me - that felt quite natural. In Hibernate3, you can use content based discrimination for the table-per-hierarchy mapping strategy:

<class name="Person" table="PERSON" 
    discriminator-value="ALIEN">
    ...
    <discriminator type="string"> 
        <formula>
            CASE 
                WHEN LOC IS NULL THEN 'ALIEN' 
                WHEN JOB IS NULL THEN 'HUMAN' 
                ELSE 'EMPLOYEE' 
            END
        </formula>
    </discriminator>
    ...
    <subclass name="Human" 
        discriminator-value="HUMAN">
        ...
        <subclass name="Employee" 
            discriminator-value="EMPLOYEE">
            ...
        </subclass>
     </subclass>
 </class>

And you can use column-based discrimination for table-per-class mappings:

<class name="Person" table="PERSON" 
    discriminator-value="ALIEN">
    ...
    <discriminator type="string" column="TYPE"/>
    ...
    <subclass name="Human" 
        discriminator-value="HUMAN">
        <join table="HUMAN">
            ...
            <subclass name="Employee" 
                discriminator-value="EMPLOYEE">
                <join table="EMPLOYEE">
                    ...
                </join>
            </subclass>
        </join>
     </subclass>
 </class>

For the table-per-concrete-class strategy (<union-subclass> mapping), only content-based discrimination makes sense.


Back to top