New (3.2.3) Hibernate identifier generators

Posted by    |      

Introducing...

Starting in 3.2.3, I include 2 new identifier generators targetted at portability. They take a differect approach to portability than do the older native generator. Typically speaking using a synthetic identifier generation strategy while eyeing portability really comes down to wanting the capabilities that a sequence provides even though the database may not support sequences. Note that I explicitly leave off IDENTITY-style generators because generally speaking an Object/Relational Mapping technology will prefer identifier generation strategies where the identifier value can be retrieved before performing (and without having to actually perform) the insert statement; this is certainly true of Hibernate and other transactional write behind technologies because in the case of IDENTITY columns the insert must be performed immediately (and furthermore, because JDBC does not define a mechanism to retrieve batches of IDENTITY generated values, batching must be impliciitly disabled for entities using an IDENTITY generator), circumventing the transactional write behind behavior.

The two generators are:

  • org.hibernate.id.enhanced.SequenceStyleGenerator - the approach it takes to portability is that really you dont care whether you are physically using a SEQUENCE in the database; really you just want a sequence-like generation of values. On databases which support SEQUENCES, SequenceStyleGenerator will in fact use a SEQUNCE as the value generator; for those database which do not support SEQUENCES, it will instead use a single-row table as the value generator, but with the same exact charecteristics as a SEQUENCE value generator (namely it deals with the sequence table in a separate transaction at all times).
  • org.hibernate.id.enhanced.TableGenerator - while not specifically targetting portability, TableGenerator can certainly be used across all databases. It uses a multi-row table where the rows are keyed by a (configurable) sequence_name column; one approach would be to have each entity define a unique sequence_name value in the table to segment its identifier values. It grew out of the older org.hibernate.id.MultipleHiLoPerTableGenerator and uses basically the same table structure. However, while MultipleHiLoPerTableGenerator inherently applies a hi-lo algorithm to the value generation, this new TableGenerator was added to be able to take advantage of the pluggable optimizers.

Both generators, in addition to other specific parameters, share 3 useful configuration parameters:

  • optimizer
  • initial_value
  • increment_size

The role of the optimizer is to limit the number of times we actually need to hit the database in order to determine the next identifier value. The exact effect of initial_value and increment_size somewhat depend on the optimizer chosen. optimizer provides 3 choices:

  • none - says to hit the database on each and every request
  • hilo - says to use an in-memory pooling technique which is the same basic logic as the older Hibernate hilo or seqhilo generators. In terms of the database values, they are incremented one at a time; in other words, increment_size applies to the in-memory algorithm
  • pooled - says to use a stored pooling technique. Unlike hilo, where incremental values are stored and retrieved from the database sequence/table, pooled stores the actual current hi-value into the database. As an example, consider increment_size=10

Under the covers

So generally speaking, both the hilo and pooled optimizer seeks to optimize performance by minimizing the number of times we need to hit the database. Great! So then exactly how are they different? Well, lets take a look at the values stored in the database as a means to illustrate the distinction.

optimizer=hilo (increment_size=10)

After the initial request, we will have:

|  value (db)  |  value (in-memory)  |  hi (in-memory)  |
| 1            | 1                   | 11               |

The db-value and hi will remain the same until the 12th request, at which point we would clock over:

|  value (db)  |  value (in-memory)  |  hi (in-memory)  |
| 2            | 12                  | 21               |

Essentially, hi defines the clock-over value; once the in-memory value reaches the hi value, we need to hit the database and define a new bucket of values. The major drawback to this approach is when using this strategy with legacy applications that also need to insert values; those other applications must also understand and use this hilo algorithm.

optimizer=pooled (increment_size=10)

After the initial request, we will have:

|  value (db)  |  value (in-memory)  |  hi (in-memory)  |
| 11           | 1                   | 11               |

The db-value and hi will remain the same until the 12th request, at which point we would clock over:

|  value (db)  |  value (in-memory)  |  hi (in-memory)  |
| 21           | 12                  | 31               |

As you can see, with this optimizer the increment_size is actually encoded into the database values. This is perfect for databases which support sequences because typically they also define an /INCREMENT BY/ option to creating the sequence such that calls to get the next sequence value automatically apply the proper increment_size. Even if other applications are also inserting values, we'll be perfectly safe because the SEQUENCE itself will handle applying this increment_size. And in practice, it turns out, you will also be safe if SequenceStyleGenerator reverts to using a table in the same situation because of how the clock over happens.

Conclusion

I would expect that these two new generators actually replace currently existing ones in terms of short-hand names. Specifically, I would expect

  • the implementation behind sequence to change from org.hibernate.SequenceGenerator to the new org.hibernate.id.enhanced.SequenceStyleGenerator
  • the implementation behing table to change from org.hibernate.TableGenerator to the new org.hibernate.id.enhanced.TableGenerator

The second is the more risky replacement because of the big difference between the two. But we've all along discouraged direct use of the current table generator so I think we should be safe there. I am still uncertain when that replacement will happen (probably 4.0?), but in the meantime, the new generators are available and highly recommended for use.


Back to top