Help

I joined JBoss in February '12. I'm currently working on two projects: CapeDwarf (JBoss' open-source implementation of the Google AppEngine API) and Weld (the reference implementation of CDI); occasionally I also contribute to Infinispan.

I actually took my first programming steps when I was 6 years old and have been a nerd ever since :). For the last 12+ years, I've mostly been programming in Java.

Location: Ljubljana, Slovenia
Archive

JBoss CapeDwarf is an implementation of the Google App Engine API, which allows applications written for the Google AppEngine to be deployed on JBoss Application Servers without modification. Behind the scenes, CapeDwarf uses existing JBoss APIs such as Infinispan, JGroups, PicketLink, HornetQ and others.

The Users API

One of the smallest APIs of the Google AppEngine in terms of the number of API methods is the Users API. It basically contains only the following API methods: getLoginUrl, getCurrentUser and getLogoutUrl. By using the Users API, the application developer need not implement any kind of login system, because authentication is handled by AppEngine itself. Instead of directing the user to a custom login form, the application simply needs to point the user to the URL returned by getLoginUrl. This will allow the user to log in with either their Google account, their Google Apps Domain account or through an external OpenID provider. After the user logs in, they are redirected back to the application. The application can then obtain the logged-in user’s email address simply by calling the getCurrentUser API method.

In order to allow migration of existing GAE applications to and from CapeDwarf, CapeDwarf must also support using Google accounts and not introduce any custom method(s) of user authentication. Thanks to OpenID support in PicketLink Social this was pretty straightforward to achieve.

OpenID is an open standard that allows users to be authenticated by a trusted third party service. In CapeDwarf’s case, the third party service is the Google Accounts OpenID provider.

Using PicketLink Social to authenticate the user through Google Accounts is a very simple process.

Directing the user to the OpenID provider's authentication page

When the user requests the login URL returned by getLoginUrl, CapeDwarf’s AuthServlet instantiates an instance of PicketLink Social’s OpenIDManager, associates it with CapeDwarf’s own implementation of OpenIDProtocolAdapter and then simply instructs the OpenIDManager to authenticate the user:

OpenIDManager manager = new OpenIDManager(new OpenIDRequest("https://www.google.com/accounts/o8/id"));
CapedwarfOpenIDProtocolAdaptor adapter = new CapedwarfOpenIDProtocolAdaptor(request, response, getReturnUrl(request));
OpenIDManager.OpenIDProviderList providers = manager.discoverProviders();
OpenIDManager.OpenIDProviderInformation providerInfo = manager.associate(adapter, providers);
manager.authenticate(adapter, providerInfo);

Behind the scenes, the manager calls the OpenIDProtocolAdapter and instructs it to redirect the user to the OpenID provider’s URL. The redirect is achieved either through a standard HTTP redirect or by sending a self-submitting HTML form to the browser (this is necessary when the OpenID payload is larger than 2048 bytes).

The OpenID provider then displays the login form to the user (or authenticates the user in some other way).

Handling the user's return from the OpenID provider's authentication page

After the user is authenticated successfully, the provider redirects the browser back to the consumer - CapeDwarf. This returning request is also handled by AuthServlet. The servlet verifies if the user is authenticated by calling verify() on the OpenIDManager. If the user has been authenticated, CapeDwarf can now access the email address of the authenticated user, store it in the session and redirect the browser to the destination URL that was supplied by the application when it requested the login URL in the first step.

boolean authenticated = manager.verify(adapter, getStringToStringParameterMap(request), getFullRequestURL(request));
if (authenticated) {
    response.sendRedirect(request.getParameter(AuthServlet.DESTINATION_URL_PARAM));
}

During the invocation of the verify() method, the manager calls back our OpenIDProtocolAdapter with two types of OpenIDLifecycleEvents: SESSION and SUCCESS. The SESSION events instruct the adapter to store certain data in the session, while the SUCCESS event obviously signals that the authentication was successful.

The authentication process has now completed. Whenever the application now calls the User API’s getCurrentUser method, CapeDwarf will return the authenticated user’s email address. The email address simply identifies the user. It is up to the application to use this information whichever way it wants to.

How we handle application admins

On a final note, there is another API method that I haven’t mentioned yet: isUserAdmin. As is obvious from the method’s name, this method returns true if the logged in user is the admin of the application - either the Google user that uploaded the application to Google’s AppSpot or another user that was manually added as an admin by the application uploader through the Google Cloud Console.

Since there is no user account/email associated with deploying an AppEngine application to CapeDwarf, there is no notion of an admin. To specify who the app’s admins are, you need to list their email addresses in capedwarf-web.xml like this:

<capedwarf-web-app>
    <admin>my.email@gmail.com</admin>
    <admin>another.admin.email@gmail.com</admin>
</capedwarf-web-app>

Further info

For the complete source code of how CapeDwarf uses PicketLink Social, please turn to the Users module in CapeDwarf’s GitHub repo

Inside CapeDwarf is a series of blog posts about the internals of the CapeDwarf project. CapeDwarf is an open-source implementation of Google AppEngine APIs on top of various JBoss technologies. You'll find more info on CapeDwarf at the project's page at http://www.jboss.org/capedwarf

AppEngine Datastore API

The single most important AppEngine API is probably the Datastore API, which (as evident from the name itself) provides an API for storing, retrieving and querying data. This was the first API we set out to implement in CapeDwarf. It basically served as proof-of-concept for the whole project.

Those familiar with JBoss technologies will know that JBoss already has an existing project that would offer most of the things needed to implement the Datastore API - Infinispan. Infinispan is an extremely scalable, highly available key/value NoSQL datastore and distributed data grid platform. So basically Infinispan offers everything we need - all we needed to do is implement an adapter between Infinispan and the Datastore API.

Hacking into Google's factories

No, of course I'm not talking about breaking into Google's real-world facilities. What I am talking about are all the XYServiceFactory classes in the AppEngine API. They represent the entry point into the API and have methods like getXYService(), which you use to obtain references to all the various services the API has to offer. One of those factories is the DatastoreServiceFactory, which we needed to force into returning our own custom implementation of DatastoreService, so that every time anyone would call DatastoreServiceFactory.getDatastoreService(), they would get the reference to CapeDwarf's implementation of the service.

Since the factory itself is not configurable and always returns Google's own implementation of DatastoreService, we needed to resort to bytecode manipulation of the factory. Javassist made this pretty simple - we simply replaced the whole body of the getDatastoreService method, so it would create a new CapedwarfDatastoreService instance and return it.

We used the same technique with all the other XYFactory.getXYService() methods.

Making sure we're actually implementing the API correctly

Much of the coding was done TDD style. We used JUnit and Arquillian, which enables you to programatically create micro-deployments, automatically deploy them to a running application server and run tests inside the deployment. Initially, we only ran the tests against CapeDwarf and JBossAS7.1, but later also added the option of running the same tests against Google's own development app-server and even the production system (appspot). This ability to run our tests against the real Google AppEngine proved to be extremely valuable, as it allowed us to validate our tests and to see if our implementation of the GAE API was aligned with that of GAE itself.

Of course, it's hard to write perfect tests based only on API documentation, which usually doesn't go into every implementation detail of the API. This meant that when we initially ran the tests against the real GAE, quite a few of them actually failed, even though they had been passing against CapeDwarf. There were minor differences between Google's and CapeDwarf's implementation of the API, and through the tests, we were able to pin-point the differences and iron-out CapeDwarf so it would behave exactly like GAE. This would have been a lot harder if we had not had Arquillian at our disposal.

Storing and retrieving data

OK, let's finally move on to the actual implementation of the datastore. The most basic operations of the Datastore are storing and retrieving Entity objects by key. Implementing this with Infinispan was very straight-forward, since Infinispan exposes a Cache interface, which extends java.util.Map. So, basically, implementing datastore's get and put methods was as simple as invoking put and get on a Map. It really doesn't get any easier than this.

One caveat that would reveal itself later is that by default Infinispan does not make a defensive copy when you store an object in the cache. This means any modification made to the object after storing it would also be seen by clients retrieving the object from the cache (this is only true if the object hasn't been passivated and is being accessed on the same node in the cluster). We could have used Infinispan's storeAsBinary option, but we figured it was faster to simply clone the Entity prior to storing and returning it, since the clone() method was already implemented on Entity.

Querying

With the basic operations implemented (storing, retrieving and deleting entities by their keys), we moved on to the hard(er) part - querying. Infinispan-Query and Hibernate-Search already offered the ability to index the properties of entities into a Lucene index and perform queries against this index and then retrieve the results from the Infinispan cache.

In order to make Infinispan index the entities, we needed to add a few annotations to the Entity class. We accomplished this through bytecode manipulation with Javassist as well. Since every datastore Entity can have a completely dynamic set of properties (the properties don't have to be specified up-front in any kind of schema), we needed to implement a Hibernate-Search Field Bridge that would map all the properties of the Entity to a Lucene document.

With the entities now stored in the cache as well as in the Lucene index, all that was left to do was implement a query converter that would convert Google AppEngine queries into Infinispan's cache-queries. Actually, Infinispan's CacheQuery is not much more than a wrapper around a Lucene query, so the converter actually converts GAE queries into Lucene queries. It does this through a DSL provided by Hibernate-Search.

AppEngine splits certain types of queries, CapeDwarf doesn't need to

One interesting thing about Google's implementation of Datastore queries is the fact that certain types of queries (IN, OR, NOT_EQUAL) are split up into multiple queries with the results then merged into a single result set. We opted not to do this, since Infinispan-Query/Hibernate-Search/Lucene are quite capable of doing this in a single query. However, there is a problem with using this single-query approach with queries containing the IN operator. Since GAE performs multiple queries in this case, the order of the results depends on the order of the items in the IN list. Since this is clearly documented in the GAE documentation and certain applications may depend on this, we were forced to implement sorting of the results according to the same rules. While this is not really a problem when queries return whole entities, it is quite a pain when performing projection queries (these queries return only a subset of the entity's properties). When a client performs a projection query returning property foo, and filtering (with an IN operator) on property bar, we have to add bar to the list of requested projection properties, just so we can sort the results according to the order of items in the IN clause. In hindsight, maybe we should have simply gone the same route as GAE does and split these kinds of queries into multiple queries as well. It is possible we will do this in the future.

Wrap-up

This was a quick look at how we implemented the Datastore API in CapeDwarf. I didn't cover Datastore statistics, Callbacks and Metadata yet. These are fairly recent additions to the Datastore API and I'll go into how we implemented them in one of my future Inside CapeDwarf blog posts.

If any of you have an application running on Google AppEngine, we'd really appreciate it if you could give CapeDwarf a try and report to us any problems you encounter. For detailed instructions on how to run your app on CapeDwarf, see Aleš Justin's recent blog post about the first CapeDwarf release.