Hibernate Search is a library that integrates Hibernate ORM with Apache Lucene or Elasticsearch by automatically indexing entities, enabling advanced search functionality: full-text, geospatial, aggregations and more. For more information, see Hibernate Search on hibernate.org.

Past night I've uploaded the Hibernate Search release 4.4.0.CR1 (Candidate Release for 4.4) to Sourceforce and the Maven repositories.

For Maven users, a dependency reminder:

<dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-search</artifactId>
 <version>4.4.0.CR1</version>
</dependency>

This release mostly addressed very minor issues, and 4.4.0.Final is expected next week with probably no more code changes, just some improvements in documentation. For a detailed list of all changes in this release, see the release notes.

Some additions to the new Dynamic Sharding support

The new Dynamic Sharding feature announced in the previous post got some more improvements, in particular we now provide a base class to be extended to take care of the most common house keeping. You can see the effect in the example implementation included in the documentation: less boilerplate code for you. You're of course free to ignore this base class and craft your own implementation, as long as it implements org.hibernate.search.store.ShardIdentifierProvider.

We need your Sharding story!

We had lots of brainstorming on both IRC and mailing lists about adding an additional method to the interface. This method would allow Strategy implementors to narrow down to which shard a delete operation needs to be sent to: currently delete operations are sent to all shards, which is functionally correct but sub-optimal. The culprit of the discussion is if you're actually able to make such a decision in a real world scenario. I believe it's possible, so this method will certainly be added in the near future; sadly it wasn't included yet because the exact shape of it was getting rushed. If you have plans on how you're going to use Dynamic Sharding, it would be great to tell us about it so that we improve on this feature discussing on a more concrete use case.

Time-wise, the result is the org.hibernate.search.store.ShardIdentifierProvider was marked as experimental for this release cycle, but fear not the feature is great and is meant to stay.

Example: Time-split sharding using the new API, combined with advanced filtering

In the following test you'll see something very unusual for this blog: I'm not using the Hibernate API, nor the Hibernate Search public API you might be used to. This is how we run unit tests in the hibernate-search-engine module: in isolation from these other services.

The example scenario: a system which indexes log messages of some periodic event which is recorded every second. By design only one message is expected to be stored for each second. The idea is to shard on a hourly base, and the logs are set to rotate so that the ones older than 24 hours are deleted.

This use case benefits from an advanced org.hibernate.search.store.ShardIdentifierProvider so that, during a given hour, all writes happen on a specific shard latest. If we would add the additional method in Search to be able to control deletes too, we'd also constrain the deletes to happen on a specific shard oldest. This approach provides several benefits:

  • FulltextFilter instances on each of the 22 immutable indexes are fully cacheable
  • IndexReader instances on these same 22 indexes are never requiring a refresh
  • Time-Range queries can easily target the subset of indexes they need
  • Add operations happen on separate backends, which provides several other performance benefits, for example an NRT backend can keep writing without needing to flush. (A need for flush is normally triggered by a delete operation).

This test is intentionally not using the Hibernate ORM API as this use case of storing log messages is probably more suited for users of Infinispan (reminder: the same indexing technology is included in Infinispan). For this reason some operations like store/delete/query use the internal API: when using Hibernate Search with Hibernate ORM these method implementations would be much simpler, but conceptually you'd have to activate the same filtering options.

public class LogRotationExampleTest {

        //Make a SearchFactory using the test entity "LogMessage" and enabling the custom sharding strategy.
        @Rule
        public SearchFactoryHolder sfHolder = new SearchFactoryHolder( LogMessage.class )
                .withProperty( "hibernate.search.logs.sharding_strategy", LogMessageShardingStrategy.class.getName() );

        @Test
        public void filtersTest() {
                SearchFactory searchFactory = sfHolder.getSearchFactory();
                storeLog( makeTimestamp( 2013, 10, 7, 21, 33 ), "implementing method makeTimestamp" );
                storeLog( makeTimestamp( 2013, 10, 7, 21, 35 ), "implementing method storeLog" );
                storeLog( makeTimestamp( 2013, 10, 7, 15, 15 ), "Infinispan team meeting" );
                storeLog( makeTimestamp( 2013, 10, 7, 7, 30 ), "reading another bit from Mordechai Ben-Ari" );
                storeLog( makeTimestamp( 2013, 10, 7, 9, 00 ), "email nightmare begins" );
                storeLog( makeTimestamp( 2013, 10, 7, 9, 50 ), "sync-up with Davide" );
                storeLog( makeTimestamp( 2013, 10, 7, 10, 0 ), "first cofee. At Costa!" );
                storeLog( makeTimestamp( 2013, 10, 7, 10, 10 ), "sync-up with Gunnar and Hardy" );
                storeLog( makeTimestamp( 2013, 10, 7, 10, 20 ), "Checking JIRA state for Hibernate Search release plans" );
                storeLog( makeTimestamp( 2013, 10, 7, 10, 30 ), "Check my Infinispan pull requests from the weekend, cleanup git branches" );
                storeLog( makeTimestamp( 2013, 10, 7, 22, 00 ), "Implementing LogMessageShardingStrategy" );

                QueryBuilder logsQueryBuilder = searchFactory.buildQueryBuilder()
                                .forEntity( LogMessage.class )
                                .get();

                Query allLogs = logsQueryBuilder.all().createQuery();

                Assert.assertEquals( 11, queryAndFilter( allLogs, 0, 24 ) );
                Assert.assertEquals( 0, queryAndFilter( allLogs, 2, 5 ) );
                Assert.assertEquals( 1, queryAndFilter( allLogs, 2, 8 ) );
                Assert.assertEquals( 3, queryAndFilter( allLogs, 0, 10 ) );

                deleteLog( makeTimestamp( 2013, 10, 7, 9, 00 ) );
                Assert.assertEquals( 10, queryAndFilter( allLogs, 0, 24 ) );
        }

        private int queryAndFilter(Query luceneQuery, int fromHour, int toHour) {
                SearchFactoryImplementor searchFactory = sfHolder.getSearchFactory();

                //In this specific test se use the internal HSQuery API, while
                //you would normally use a FullTextSession:
                HSQuery hsQuery = searchFactory.createHSQuery()
                        .luceneQuery( luceneQuery )
                        .targetedEntities( Arrays.asList( new Class<?>[]{ LogMessage.class } ) );
                hsQuery
                        .enableFullTextFilter( "timeRange" )
                                .setParameter( "from", Integer.valueOf( fromHour ) )
                                .setParameter( "to", Integer.valueOf( toHour ) )
                        ;
                return hsQuery.queryResultSize();
        }

        private void storeLog(long timestamp, String message) {
                LogMessage log = new LogMessage();
                log.timestamp = timestamp;
                log.message = message;

                //You would normally just save the LogMessage through a Session / EntityManager
                //this emulates the same using the internal API:
                SearchFactoryImplementor searchFactory = sfHolder.getSearchFactory();
                Work work = new Work( log, log.timestamp, WorkType.ADD, false );
                ManualTransactionContext tc = new ManualTransactionContext();
                searchFactory.getWorker().performWork( work, tc );
                tc.end();
        }

        private void deleteLog(long timestamp) {
                //Again just emulating a delete operation with the internal API,
                //don't worry too much about these details:
                SearchFactoryImplementor searchFactory = sfHolder.getSearchFactory();
                Work work = new Work( LogMessage.class, log.timestamp, WorkType.DELETE );
                ManualTransactionContext tc = new ManualTransactionContext();
                searchFactory.getWorker().performWork( work, tc );
                tc.end();
        }

        /**
         * A ShardIdentifierProvider suitable for the rotating - logs design
         * as described in this test.
         * Sharding isn't actually dynamic as we know all hours in advance, but
         * both addition operations target a specific index, and a range
         * filter can make queries need to search only a subset of all indexes.
         */
        public static final class LogMessageShardingStrategy implements ShardIdentifierProvider {

                private Set<String> hoursOfDay;

                @Override
                public void initialize(Properties properties, BuildContext buildContext) {
                        Set<String> hours = new HashSet<String>( 24 );
                        for ( int hour = 0; hour < 24; hour++ ) {
                                hours.add( String.valueOf( hour ) );
                        }
                        hoursOfDay = Collections.unmodifiableSet( hours );
                }

                @Override
                public String getShardIdentifier(Class<?> entityType, Serializable id, String idAsString, Document document) {
                        return fromIdToHour( (Long) id );
                }

                @Override
                public Set<String> getShardIdentifiersForQuery(FullTextFilterImplementor[] fullTextFilters) {
                        for ( FullTextFilterImplementor ftf : fullTextFilters ) {
                                if ( "timeRange".equals( ftf.getName() ) ) {
                                        Integer from = (Integer) ftf.getParameter( "from" );
                                        Integer to = (Integer) ftf.getParameter( "to" );
                                        Set<String> hours = new HashSet<String>();
                                        for ( int hour = from; hour < to; hour++ ) {
                                                hours.add( String.valueOf( hour ) );
                                        }
                                        return Collections.unmodifiableSet( hours );
                                }
                        }
                        return hoursOfDay;
                }

                @Override
                public Set<String> getAllShardIdentifiers() {
                        return hoursOfDay;
                }

        }

        @Indexed( index = "logs" )
        //ShardSensitiveOnlyFilter is a special marker filter which just serves
        //as a transport to provide the filter parameters to ShardIdentifierProvider
        //See details on http://docs.jboss.org/hibernate/search/4.4/reference/en-US/html_single/#query-filter-shard
        @FullTextFilterDef( name = "timeRange", impl = ShardSensitiveOnlyFilter.class )
        public static final class LogMessage {

                private long timestamp;
                private String message;

                @DocumentId
                public long getId() { return timestamp; }

                public void setId(long id) { this.timestamp = id; }

                @Field
                public String getMessage() { return message; }

                public void setMessage(String message) { this.message = message; }
        }

        /**
         * @return a timestamp from the calendar-style encoding using GMT as timezone (precision to the minute)
         */
        private static long makeTimestamp(int year, int month, int date, int hourOfDay, int minute) {
                Calendar gmtCalendar = createGMTCalendar();
                gmtCalendar.set( year, month, date, hourOfDay, minute );
                gmtCalendar.set( Calendar.SECOND, 0 );
                gmtCalendar.set( Calendar.MILLISECOND, 0 );
                return gmtCalendar.getTimeInMillis();
        }

        /**
         * @return the hour of the day from a timetamp, in string format matching the index shard identifiers format
         */
        private static String fromIdToHour(long millis) {
                Calendar gmtCalendar = createGMTCalendar();
                gmtCalendar.setTimeInMillis( millis );
                return String.valueOf( gmtCalendar.get( Calendar.HOUR_OF_DAY ) );
        }

        /**
         * @return a new GMT Calendar
         */
        private static Calendar createGMTCalendar() {
                return Calendar.getInstance( TimeZone.getTimeZone( "GMT" ) );
        }

}

The full test source can be found in our testsuite. I've developed this example to make the point of how beneficial it would be to control the shard for delete operations as well; I think it's a good example but I'd love to receive your story on how you would use this.

Where to go next

For development suggestions and brainstorming on the latest features, join the developer's mailing list or write us on the forums.

The issue tracker is JIRA and all code is on GitHub: pull requests and feedback welcome.


Back to top