Help

Inactive Bloggers
14. Sep 2004, 09:15 CET, by Gavin King

September 20-22 in Melbourne will be the first time we deliver our new three-day Hibernate course. The course has been heavily revised and expanded to include previews of the cool new stuff coming in Hibernate3 and an overview of Hibernate internals (/very/ useful if you ever need to debug a Hibernate application). There are still seats available, if you're quick! This will be the last training we run in Australia for a while, since I won't be in the country much, if at all, over the next six months or so. Email training@jboss.com for more information. (We also have an upcoming course in Paris, November 3-5.)

27. Aug 2004, 09:46 CET, by Gavin King

I gotta preface this post by saying that we are very skeptical of the idea that Java is the right place to do processing that works with data in bulk. By extension, ORM is probably not an especially appropriate way to do batch processing. We think that most databases offer excellent solutions in this area: stored procedure support, and various tools for import and export. Because of this, we've neglected to properly explain to people how to use Hibernate for batch processing if they really feel they /have/ to do it in Java. At some point, we have to swallow our pride, and accept that lots of people are actually doing this, and make sure they are doing it the Right Way.

A naive approach to inserting 100 000 rows in the database using Hibernate might look like this:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
   Customer customer = new Customer(.....);
   session.save(customer);
}
tx.commit();
session.close();

This would fall over with an OutOfMemoryException somewhere after the 50 000th row. That's because Hibernate cache's all the newly inserted Customers in the session-level cache. Certain people have expressed the view that Hibernate should manage memory better, and not simply fill up all available memory with the cache. One very noisy guy who used Hibernate for a day and noticed this is even going around posting on all kinds of forums and blog comments, shouting about how this demonstrates what shitty code Hibernate is. For his benefit, let's remember why the first-level cache is not bounded in size:

  • persistent instances are /managed/ - at the end of the transaction, Hibernate synchronizes any change to the managed objects to the database (this is sometimes called /automatic dirty checking/)
  • in the scope of a single persistence context, persistent identity is equivalent to Java identity (this helps eliminate data /aliasing/ effects)
  • the session implements /asynchronous write-behind/, which allows Hibernate to transparently batch together write operations

For typical OLTP work, these are all very, very useful features. Since ORM is really intended as a solution for OLTP problems, I usually ignore criticisms of ORM which focus upon OLAP or batch stuff as simply missing the point.

However, it turns out that this problem is incredibly easy to work around. For the record, here is how you do batch inserts in Hibernate.

First, set the JDBC batch size to a reasonable number (say, 10-20):

hibernate.jdbc.batch_size 20

Then, flush() and clear() the session every so often:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

for ( int i=0; i<100000; i++ ) {
   Customer customer = new Customer(.....);
   session.save(customer);
   if ( i % 20 == 0 ) {
      //flush a batch of inserts and release memory:
      session.flush();
      session.clear();
   }
}

tx.commit();
session.close();

What about retreiving and updating data? Well, in Hibernate 2.1.6 or later, the scroll() method is the best approach:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

ScrollableResults customers = session.getNamedQuery("GetCustomers")
   .scroll(ScrollMode.FORWARD_ONLY);
int count=0;
while ( customers.next() ) {
   Customer customer = (Customer) customers.get(0);
   customer.updateStuff(...);
   if ( ++count % 20 == 0 ) {
      //flush a batch of updates and release memory:
      session.flush();
      session.clear();
   }
}

tx.commit();
session.close();

Not so difficult, or even shitty, I guess. Actually, I think you'll agree that this was much easier to write than the equivalent JDBC code messing with scrollable result sets and the JDBC batch API.

One caveat: if Customer has second-level caching enabled, you can still get some memory management problems. The reason for this is that Hibernate has to notify the second-level cache /after the end of the transaction/, about each inserted or updated customer. So you should disable caching of customers for the batch process.

25. Aug 2004, 19:20 CET, by Gavin King

We were doing some work with a customer with a very large project recently, and they were concerned about traceability of the SQL issued by Hibernate. Their problem is one that I guess is common: suppose I see something wong in the Hibernate log (say, some N+1 selects problem), how do I know which of my business classes is producing this? All I've got in the Hibernate log is org.hibernate.SQL, line 224 as the source of the log message!

I started to explain how Hibernate3 can embed comments into the generated SQL, so you could at least track the problem back to a particular HQL query. But then Steve remembered that log4j provides the /nested diagnostic context/. Now, I've seen a lot of projects using log4j, but I've never actually seen this used anywhere. I think it might be a better alternative to adding entry and exit logging everywhere, since we can see this context even if the entry/exit log categories are disabled. It's a good way to track the source of SQL in the Hibernate log. All you need to do is add calls to push() and pop() in your DAO:

public List getCustomersByName(String pattern) {
    NDC.push("CustomerDAO.getCustomersByName()");
    try {
        return getSession()
            .createQuery("from Customer c where c.name like :pattern")
            .setString("pattern", pattern)
            .list();
    }
    finally {
        NDC.pop();
    }
}

Then, if I set my pattern right:

log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m (%x)%n

I'll get a log message like this:

20:59:38,249 DEBUG [=>SQL:244] - select .... like ? (CustomerDAO.getCustomersByName())

Just thought I'd mention it, in case it helps someone.

25. Aug 2004, 13:12 CET, by Gavin King

One of the joys of working on an open source project with commercial competitors is having to implement features that our users simply don't ask for, and probably won't use in practice, just because those competitors try to spin their useless features as a competitive advantage. We realized ages ago that it's really hard to tell people that they don't need and shouldn't use a feature if you don't have it.

Multi-table mappings started out as a good example of that kind of features. We have been repeating the your object model should be at /least/ as fine-grained as your relational schema mantra for years now. Unfortunately, we keep hearing this echo back as Hibernate can't do multitable mappings. Nobody has ever once shown me a truly compelling usecase for multitable mappings in a real application, but apparently, if our competitors are to be believed, it is common to find schemas with attributes of the same entity scattered randomly across several different physical tables. I'll have to take their word on that one. I'm not saying you will /never/ run into this kind of thing and, indeed, I've seen a few borderline cases, though nothing that wasn't at least arguably better represented as an association. But certainly, to my mind, valid usecases for multitable mappings are not something you run into commonly enough for this to be an important feature. Perhaps the difference in perception is due to the fact that only /sane/ organizations use Hibernate.

Anyway, we introduced the <join/> mapping, just so we could tell people not to use it. Actually, it was fun to implement, and helped me make some really nice refactorings to the EntityPersister hierarchy.

Then a funny thing happened. I started to think of all kinds of useful things to do with <join/>, none of which had anything much to do with multitable mappings, as usually understood. And I'm pretty certain that these things were not what the other guys were talking about!

The first application I came up with is a mixed inheritance mapping strategy. Before, you had a choice between <subclass/> and <joined-subclass/> (now also <union-subclass/>), and you had to stick with that one strategy for the whole hierarchy.

It's now possible to write a mapping like this:

<class name="Superclass" 
        table="parent"
        discriminator-value="0">
    <id name="id">.....</id>
    <discriminator column="type" type="int"/>
    <property ...../>
    ...
    
    <subclass name="Subclass" 
            discriminator-value="1">
        <property .... >
        ...
    </subclass>
    
    <subclass name="JoinedSubclass" 
            discriminator-value="-1">
        <join table="child">
            <property ...../>
            ....
        </join>
    </subclass>
    
</class>

That's /really/ useful.

The next thing that <join/> can be used for required a little tweak. I added an inverse attribute to the join element, to declare that the joined table should not be updated by the owning entity. Now, it's possible to map an association (link) table - which usually represents a many-to-many association - with one-to-many multiplicity in the domain model. First, we have a basic many-to-many mapping, on the Parent side:

<class name="Parent">
    ...
    <set name="children" table="ParentChild" lazy="true">
        <key column="parentId"/>
        <many-to-many column="childId" class="Child"/>
    </set>
</class>

Now, we use a <join> mapping, to hide the association table from the Child end:

<class name="Child">
    ...
    <join table="ParentChild" inverse="true">
        <key column="childId"/>
        <many-to-one name="parent" column="parentId"/>
    </join>
</class>

Well, I'm not sure really how useful this is, but I was always jealous of the TopLink guys when they bragged how they could do this, and we got it /almost/ for free!

A third trick was also inspired by TopLink. A number of former TopLink users porting code to Hibernate found that Hibernate's table-per-class mapping strategy has significantly different performance characteristics to TopLink's. Hibernate has what seems to be a unique implementation of the table-per-class mapping strategy, in that no discriminator column is required to achieve polymorphism. Instead, Hibernate performs an outer join across all sublass tables, and checks which primary keys values are null in each returned row of results in order to determine the subclass that the row represents. In most circumstances, this offers an excellent performance balance, since it is not vulnerable to the dreaded N+1 selects problem. Furthermore, it does not require the addition of a type discriminator column to the table of the root class, which really feels extremely unnatural and redundant for this relational model.

An alternative approach, that TopLink uses, is to perform an initial query, check the value of a discriminator column, and then issue an extra query if the row represents a subclass instance. This isn't usually very efficient for shallow inheritance trees, but what we've seen is that some ex-TopLink users have created very deep or wide inheritance trees, in which case Hibernate's strategy can result in a single query with simply too many joins.

So, I added the outer-join attribute to <join/>. Its effect is slightly subtle. Consider the following mapping:

<class name="Foo" table="foos" discriminator-value="0">
    <id name="id">...</id>
    <discriminator column="type" type="int"/>
    <property name="name"/>
    <subclass name="Bar" discriminator-value="1">
        <join table="bars">
            <key column="fooId"/>
            <property name="amount"/>
        </join>
    </subclass>
</class>

When we execute a HQL query against the subclass Bar, Hibernate will generate SQL with an inner join between foos and bars. If we query against the superclass Foo, Hibernate will use an outer join.

(Note that you would not write the above mapping in practice; instead you would use <joined-subclass/> and eliminate the need for the discriminator.)

Suppose we set outer-join="false":

<class name="Foo" table="foos" discriminator-value="0">
    <id name="id">...</id>
    <discriminator column="type" type="int"/>
    <property name="name"/>
    <subclass name="Bar" discriminator-value="1">
        <join table="bars" outer-join="false">
            <key column="fooId"/>
            <property name="amount"/>
        </join>
    </subclass>
</class>

Now, when we query the subclass, the same SQL inner join will be used. But when we query the superclass, Hibernate won't use an outer join. Instead, it will issue an initial query against the foos table, and a sequential select against the bars table, whenever it finds a row with a discriminator value of 1.

Well, that's not such a great idea in this case. But imagine if Foo had a very large number of immediate subclasses. Then we might be avoiding a query with very many outer joins, in favor of several queries with no joins. Well, perhaps some people will find this useful....

Hibernate3 is now ready for a public test, go get it! It has all (well almost all) features we'll ever need for object/relational mapping, and if it doesn't have it, it's easy to subclass, extend, and implement.

We still have some things left on our TODO for the beta (no release date yet on the final), but it's getting better every day and we might have a very stable first beta. If you want to help, we are still looking for documentation translators.

Incidentally, the Hibernate project is now 1000 days old, if you believe the SourceForge stats . We actually had the Hibernate3 alpha finished for the anniversary, but then Gavin's laptop didn't agree with its owner anymore. At least it was an excuse to finish some website redesign.

P.S. The first copies of Hibernate in Action arrived! Mine was sent to an old address (thats the problem if you need years to finish something) and I'm going to hunt it down now. I already received a Thank You! email from the finder...

Showing 1176 to 1180 of 1219 blog entries