I'm the creator of Hibernate, a popular object/relational persistence solution for Java, and Seam, an application framework for enterprise Java. I'm also contributing to the Java Community Process standards as Red Hat representative for the EJB, JPA, JSF specifications and spec lead of the Web Beans specification. At Red Hat, I'm leading the effort to build a Unified development platform of programming model, frameworks and tooling.
|
Recent Entries |
|
18. Nov 2008
|
|||
|
17. Nov 2008
|
|||
|
12. Nov 2008
|
|||
|
10. Nov 2008
|
|||
|
09. Nov 2008
|
|||
|
03. Nov 2008
|
|||
|
21. Oct 2008
|
| Seam | (26) |
| Web Beans | (17) |
| Seam News | (7) |
| Web Beans Sneak Peek | (5) |
| EE6 Wishlist | (3) |
| Seam Wiki | (3) |
| Web Frameworks | (3) |
| JBoss Tools | (2) |
| Persistence | (1) |
| Photography | (1) |
| RichFaces | (1) |
|
Java Persistence with Hibernate
with Christian Bauer November 2006 Manning Publications 841 pages (English), PDF ebook |
|
Hibernate in Action
with Christian Bauer August 2004 Manning Publications 408 pages (English), PDF ebook |
Developerworks is featuring the best article I have ever read on the subject of Java performance. The authors dispose of the canard that temporary object creation is expensive in Java, by explaining how generational garbage collection works in the Sun JVM (this is a bit more detailed explanation than the typical one, by the way). Well, I already knew this; Hibernate rejected the notion of object pooling right from the start (unfortunately, the EJB spec has not yet caught up).
What I did /not/ know was that objects which implement finalize() require two
full garbage collection cycles to be released. Now, everyone knows that
finalize() cannot be relied upon and we should not write important
code in
a finalizer. But /this/ finalize() method, taken from Hibernate's SessionImpl
class seemed like a really good idea:
/**
* Just in case user forgot to call close()
*/
protected void finalize() throws Throwable {
log.debug("running Session.finalize()");
if (isCurrentTransaction) log.warn("afterTransactionCompletion() was never called");
if (connection!=null) { //ie it was never disconnected
if ( connection.isClosed() ) {
log.warn("finalizing unclosed session with closed connection");
}
else {
log.warn("unclosed connection");
if (autoClose) connection.close();
}
}
}
The main thing that this method is doing is checking to see if the naughty application forgot to close the session and, if so, log a WARN. This is a really good idea! It is otherwise quite hard to noticed unclosed sessions, and the JDBC connections they own. Unfortunately it has the terrible side-effect of preventing the session from being garbage collected immediately. Now, even after reading the article, I didn't think that this would be such a big deal, since I dereference almost all of the session's state from close(). However, My performance tests are showing a really /big/ difference in performance, just from removing the finalizer. For one problematic test, I actually /halved/ the overhead of Hibernate!
I can barely believe this result, but I've been successfully reproducing it for the last two hours.
I have to repeat this cliche to myself at least once a week:
/Never wrestle with a pig; you both get dirty and the pig loves it./
One of the problems with online forums is that, naturally, they are dominated by the people with the most time on their hands - and by the people with the most dogmatic views. As in any community, the loudest views are often the least-informed. When criticized in a forum like TSS , it's usually better to just stay out of the mud. As difficult as it is to let uninformed statements go unchallenged, it is almost always the best decision. Let the pig be. Disputing a post brings attention to it. If the poster is of a particular personality type, the disputation will very quickly turn personal. Maintaining your dignity once that happens is virtually impossible.
In fact, what most amazes me about IT communities is the sheer ubiquity of /argumentum ad hominem/. I've always associated computing with the pursuit of understanding via scientifically inclined methodology. Yet most of the debate that occurs in the Java community consists of name-calling. I got so mad about this today that I broke all my own rules and launched some /ad hominem/ of my own, which is really quite self-defeating, I suppose.
The big problem from my point of view is that I can't simply ignore the online forums; as an open source project they are an absolutely indispensible way for us to get our ideas heard.
Clay Shirky has written
insightfully about how online
communities can be designed
, so it is interesting to speculate
about what kind of adjustments could be made to a community like
TSS if we wanted to bring out our good sides, and encourage
technical arguments rather than personal ones. But perhaps the
very strength of TSS is the freewheeling nature of the debate
there. Flame wars get attention; they generate the most traffic.
Well, I'm a big boy. Hibernate has been subject to all kinds of
outlandish criticisms right from the start. But we are
growing every month. We often joke that criticisms of
Hibernate invariably begin with I've never used Hibernate but...
and indeed that is still true. If our actual /users/ start
bitching, /then/ we will need to start listening harder!
Apologies for the nontechnical post ;)
I just released 2.1.2 . This is a maintenence release, meaning no especially exciting new features (the interesting work is all going on in the 2.2 branch). However there are some small changes that might make a big performance difference in certain specific cases, especially if you are using a second-level cache. I'm hoping that this release brings the 2.1 branch to the same level of maturity that we were able to achieve with 2.0.3.
I just finished a consulting job at a large retailer where we managed to increase the performance of a Hibernate application by perhaps two orders of magnitude with just some fairly simple changes. It really drove home to me how almost all performance problems I've ever seen can be solved by either or both of:
- appropriate session handling
- appropriate association fetching strategies
(Note that I have not yet met a serious performance problem in Hibernate 2.x that I could not solve quite quickly.)
Hibernate's Session object is a powerful abstraction that allows some extremely flexible architectural choices. Unfortunately, this flexibility comes at a cost: many people seem to stuff it up! There are three well-understood patterns for managing Hibernate sessions correctly (actually, three-and-a-half, as I've recently discovered) and three common antipatterns. The fact that the antipatterns are more common than they should be suggests a real problem with our existing documentation, and highlights the fact that /we need to get this book out!/
The main reason we've previously been unable to explain the correct ways to handle Hibernate sessions is that we simply havn't had a decent language for describing our ideas. Since we've developed this language in the process of writing the book, explanations are much easier. The key concept is the notion of an /application transaction/. An application transaction is a unit of work from the point of view of the user
; it spans multiple requests and multiple database transactions - it does, however, have a well-defined beginning and end. Even if you don't currently use this notion explicitly, you probably /do/ use it implicitly in your application.
Briefly, the three acceptable approaches are: session-per-request, session-per-request-with-detached-objects and session-per-application-transaction. A variation of the third approach is the newly-discovered session-per-application-transaction-with-flush-delayed-to-the-last-request (phew!). The three broken approaches are: session-per-operation (ie. many-sessions-per-request), session-per-user-session and session-per-application. If you are using any of these approaches, please stop.
The three acceptable approaches each have different performance and architectural implications and there is no best
solution. It is incredibly important to choose the approach that is most suitable to your particular application (this was the key to the two-orders-of-magnitude improvement described above).
Association fetching is, I think, covered quite well in our documentation, but we still sometimes see people struggling with the dreaded n+1 SELECTs problem. So let me be very clear: Hibernate /completely/ solves the n+1 SELECTs problem! However, it takes some thought and a (very) little work on the part of the user to take full advantage of this fact. We recommend that all associations be configured for lazy fetching by default. Then, for particular use cases, eager outer join fetching may be chosen by specifying a LEFT JOIN FETCH clause in a HQL query, or by calling setFetchMode() for a Criteria query. (If you are too lazy to do this work, you could even try the new batch fetching features of Hibernate 2.1. We don't recommend this less elegant approach, however.)
Less common performance problems may be fixed by using a second-level cache, or occasionally by managing flushing manually (set the session flush mode to FlushMode.NEVER and flush manually when required). However, in almost all cases, acceptable performance can be achieved by concentrating upon the two items listed above.
One of the reasons we use relational database technology is that existing RDBMS implementations provide extremely mature, scalable and robust concurrency control. This means much more than simple read/write locks. For example, databases that use locking are built to scale efficiently when a particular transaction obtains /many/ locks - this is called /lock escalation/. On the other hand, some databases (for example, Oracle and PostgreSQL) don't use locks at all - instead, they use the multiversion concurrency model. This sophisticated approach to concurrency is designed to achieve higher scalability than is possible using traditional locking models. Databases even let you specify the required level of transaction isolation, allowing you to trade isolation for scalability.
Unfortunately, some Java persistence frameworks (especially CMP engines) assume that they can improve upon the many years of research and development that has gone into these relational systems by implementing their own concurrency control in the Java application. Usually, this takes the form of a comparatively crude locking model, with the locks held in the Java middle tier. There are three main problems with this approach. First, it subverts the concurrency model of the underlying database. If you have spent a lot of money on your Oracle installation, it seems insane to throw away Oracle's sophisticated multiversion concurrency model and replace it with a (less-scalable) locking model. Second, other (non-Java?) applications that share the same database are not aware of the locks. Finally, locks held in the middle tier do not naturally scale to a clustered environment. Some kind of distributed lock will be needed. At best, distributed locking will be implemented using some efficient group communication library like JGroups. At worst (for example, in OJB), the persistence framework will persist the locks to a special database table. Clearly, both of these solutions carry a heavy performance cost. Accordingly, Hibernate was designed to /not require/ any middle-tier locks - even thread synchronization is avoided. This is perhaps the best and least-understood feature of Hibernate and is the key to why Hibernate scales well. So why do other frameworks not just let the database handle concurrency?
Well, the only good justification for holding locks in the middle tier is that we might be using a middle-tier cache. It turns out that the problem of ensuring consistency between the database and the cache is an extremely difficult one and solutions usually do involve some use of middle-tier locking. (Incidently, most applications which use a cache do not solve this problem correctly, even in a non-clustered environment.)
So, for example, when Hibernate integrates with JBoss Cache, the cache implementation must obtain clustered locks internally (again, using JGroups). In Hibernate, we consider it a quality-of-service concern of the cache implementation to provide this kind of functionality. We can do this because Hibernate, unlike many other persistence layers, features a two-level cache architecture. This design separates the transaction-scoped /session cache/ (which does /not/ require middle-tier locking and delegates concurrency concerns to the database) from the process or cluster scoped /second-level cache/ (which /may/ require middle-tier locks). So when the second-level cache is disabled for a particular class, no middle-tier lock is required. Hence, in this case, the scalability of Hibernate is limited only by the scalability of the underlying database. Our design also allows us to consider other, more sophisticated approaches to ensuring consistency between the second-level cache and database - approaches that do not require the use of middle-tier locking. I'll keep this stuff secret for now; it is an active area of investigation!
|
|
|
Showing 121 to 125 of 131 blog entries |
|
|