Help

I'm working in the Hibernate and Infinispan teams at JBoss, caring about Lucene integration in products we support, striving to make it easier to use and to integrate in well known APIs and patterns, and finally to make it scale better; I love clean and well performing code.

I've been an early adopter of cloud deployments scaling Lucene to a huge number of requests on EC2 using Hibernate Search, and after that I worked with Sourcesense to make JIRA clusterable via Infinispan. Have been trainer on Seam and Hibernate courses.

Location: Newcastle, UK
Occupation: Doing stuff at JBoss, a Division of Red Hat Inc
Archive

While the team is busy on significant internal refactoring, we also accumulated 30 minor fixes and improvements which have been merged in the master branch for Hibernate Search 5.

Since it was a while since the last published tag, today we're publishing this 5.0.0.Alpha5 release to make all these minor improvements available already, while the cooler features will need some more work.

New attribute: includeEmbeddedObjectId

This new attribute of the IndexedEmbedded annotation allows you to control if you want to store all ids for related objects which are embedded in the parent's Lucene Document. Very useful to save space in your index, and so improve performance, when you don't need these identifiers.

Other changes

  • As documented in the Migration guide the API to implement a JMS master node was simplified (but changed!).
  • A possible loss of index update events was fixed, but not a critical issue as this could happen only if you had a significant load and were not using transactions.
  • Better interaction with second level caching.
  • Performance fixes: when the DocumentId doesn't match the JPA id we now still execute an optimal database query.

Full details available in the Release notes.

Today we released yet another milestone for the Hibernate Search 5 train. We worked in parallel on multiple fronts; the most notable changes are:

OSGi support, API changes

All our modules now provide OSGi metadata to make life easier for users running in OSGi containers. We also included an example features file and integration tests using Karaf.

Keep in mind that Apache Lucene is not providing this same metadata, so it might be worth looking into the features file we use to learn how the Lucene modules need to be wrapped.

Class Relocations

A consequence of being OSGi compliant is that we had to move some packages of well-known APIs; please see the migration guide for all details.

Apache Lucene 4.8.1, Java7 now required

Apache Lucene requires Java7 since version 4.8 and we don't want you to miss out some of the great improvements it provides, or potential bugfixes in the near future so we now require Java7 too.

Apache Lucene 4.8.1 was released today, so we could include it in this release too.

Bridge Providers loaded by auto-discovery

We always had a strong differentiation between FieldBridge(s) included in Hibernate Search, and custom (application provided) FieldBridges. From this release the discovery of built-in bridges uses the Service Loader pattern, so that we can move some bridge implementations to optional modules, and also eventually provide support for the new date/time types defined in Java8 but also by Joda Time, and potentially your own custom types but this will need some further refinement work.

Several other improvements

These won't make the headlines as a Java requirements change, still we have some more relevant news:

  • Infinispan upgraded to 7.0.0.Alpha4: now also requires Java7 and supports the distributed Lucene Directory for Apache Lucene 4.8
  • the needed Infinispan update implies using latest JGroups 3.5.0.Beta5
  • all our documentation was migrated to AsciiDoc , it's now much easier to contribute to documentation!
<dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-search-orm</artifactId>
 <version>5.0.0.Alpha4</version>
</dependency>

Last night I uploaded two bugfix releases of Hibernate Search stable branches:

  • 4.4.3.Final (Hibernate ORM 4.2 and JPA 2.0 users, JBoss 7.2 and EAP6)
  • 4.5.1.Final (Hibernate ORM 4.3 and JPA 2.1 users, WildFly 8)

They both contain several backported fixes, thanks to the excellent testing efforst of Guillaume Smet and Yoann Rodiere, who found very sophisticated issues and also helped with patches. I now added Yoann as committer too, congratulations!

Details of fixes can be found in the 4.4.3.Final changelog.txt and 4.5.1.Final changelog.txt.

Happy searching!

Version 5.0.0.Alpha3 is now available: now integrating with Apache Lucene 4.7.1, which was released just 24 hours before.

<dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-search-orm</artifactId>
 <version>5.0.0.Alpha3</version>
</dependency>

More Like This

Introduced and better described on our previous post and in the Query DSL chapter, the new feature now also works with compressed fields and @IndexedEmbedded fields.

OSGi and ClassLoaders

On our path to 5.0 we're aiming to a full internal refactoring of ClassLoaders handling, Service loading strategies, etc.. with the goal to be reliable in complex modular deployments, including OSGi. To reach a full OSGi compatibility some public API packages will need to change in the next version too!

Many smaller details

There is a list of smaller polishing, like more reliable JGroups and Infinispan tests, a diet program for dependencies, updates to latest Hibernate ORM, JGroups and Infinispan versions.

Performance tuning

The primary performance bottleneck I've observed in the new Lucene 4 backend is the need to tune the max_threads_state option on Lucene's IndexWriter. This option controls the level of parallelism you want to allow to the IndexWriter. The default is a very reasonable 8, but this is now configurable using the typical format as expressed in the Lucene Tuning chapter:

hibernate.search.​[default|<indexname>].​indexwriter.max_thread_states 

What's next?

We're currently busy with OSGi tests, an easy way to extend the set of FieldBridges supported by the engine, improved handling of dynamic types and overall structure of how you define your indexed model. Also worth nothing that all of this will be integrated in the Infinispan Query engine soon. You can find an high level overview on our Roadmap page.

The release 5.0.0.Alpha2 is now available on our shiny new website: as the alpha1 release also did, it integrates with Apache Lucene 4.6.1, but now we do it better ;-)

<dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-search-orm</artifactId>
 <version>5.0.0.Alpha2</version>
</dependency>

More Like This queries

New features! A More Like This query is a special kind of query which takes a model document as an input, rather than a traditional string. It has been available for you to use since a long time via Lucene's MoreLikeThis Query implementation, but this implementation was rather tricky to use on our richer entity based model. Hibernate Search now provides direct support for this functionality via our Query builder DSL, and in its simplest form looks like this:

Coffee exampleCoffee = ...

QueryBuilder qb = fullTextSession.getSearchFactory()
        .buildQueryBuilder()
        .forEntity( Coffee.class )
        .get();

Query mltQuery = qb
        .moreLikeThis()
            .comparingAllFields()
            .toEntity( exampleCoffee )
            .createQuery();

List results = fullTextSession
        .createFullTextQuery( mltQuery, Coffee.class )
        .list();

What does it do? It returns a list of Coffee instances which are similar to the exampleCoffee instance. The definition of similar is as usual controlled by the analyzers and indexing options you choose. By default the list is of course ordered according to the scoring model, so the top match would be the example entity itself (this might be surprising but is often useful in practice).

A more extensive blogpost about this will follow, but if you can't wait to learn more see all details in the Building queries chapter.

Faceting improvements

One of the highest voted improvement requests on JIRA, it is now finally possible to facet on embedded collections. Hardy also started exploring possible performance improvements, and how to use the new Lucene 4 features: feedback, use cases or patches would be very welcome as we're eager to improve faceting more.

Watch the migration guide

If you're updating an application from previous versions of Hibernate Search, we highly recommend to keep an eye on the Migration Guide as the changes in the Lucene API are significant and not always self-documenting. Suggestions for the migration guide are also very welcome.

The Apache Lucene Migration Guide might also be useful, but we applied most of it already to the internal engine for you to use transparently.

The hibernate-search-analyzers module is removed

This module was created years ago when we had to fork some Lucene code to allow an easy migration path, but is now since long an empty module just depending on various commonly used analyzers. It's time for spring cleaning of dependencies, so the no longer needed module is removed: if you where using it, just remove it from your project and include a direct dependency to the analyzers you need from the Apache Lucene ecosystem.

What's next?

You can find an high level overview on our Roadmap page, or check the fine grained break down on this JIRA filter. Essentially we're aiming now at OSGi compability and at usability improvements which had to be postponed to a major release.

Showing 1 to 5 of 42 blog entries