Using Hibernate Search with Spring Boot

Spring Boot is a framework, that makes it much easier to develop Spring based applications, by following a convention over configuration principle (while in contrast Spring critics claim that the framework’s principle is rather configuration over everything). In this article, I am going to provide an example how to achieve the following:

  • Create a simple Web application based on Spring Boot
  • Persist and access data with Hibernate
  • Make it searchable with Hibernate Search (Lucine)

I use Eclipse with a Gradle plugin for convenience. MySQL will be our back-end for storing the data. The full example can be obtained from my Github Repository.

Bootstrapping: Create a Simple Spring Boot Webapp

The easiest way to start with Spring Boot is heading over to start.spring.io and create a new project. In this example, I will use Gradle for building the application and handling the dependencies and I add Web and JPA starters.

 

 

Download the archive to your local drive and extract it to a folder. I called the project SearchaRoo.

Import the Project with Eclipse

Import it as an existing Gradle Project in Eclipse by using the default settings. You will end up with a nice little project structure as shown below:

We have a central application starter class denoted SearchaRooAppication.java, package definitions, application properties and even test classes. The great thing with Spring Boot is that it is very simple to start and that you can debug it as every other local Java application. There is no need for remote debugging or complex application server setups.

Prepare the Database

We need a few permissions on our MySQL instance before we can start.

We can then add the connection details into the application.properties file. We will edit this file several times when the complexity of this project increases.

Now the basic database setup is done. We can then start adding model classes.

Getting some Employees on Board

MySQL offers a rather small but well documented sample database called employees, which is hosted on Github.  Obtain and import the data as follows:

The script creates a new schema called employees and you will end up with a schema like this:

In the course of this article, we are going to model this schema with Java POJOs by annotating the entities and the a appropriate fields with JPA.

Dependencies

Before we can start modelling the entities in Java, have a look at the Gradle build file. We include additional dependencies for the MySQL connector and Apache commons.

Modelling Reality

The next step covers modelling the data which we imported with Java POJOs. Obviously this is not the most natural way, because in general you would create the model first and then add data to it, but as we already had the data we decided to go in this direction. In the application.properties file, set the database to the imported employees database and set the Hibernate create property to validate. With this setting, we can confirm that we modelled the Java classed in accordance with the database model defined by the MySQL employees database. 

An example of such a class is shown below, the other classes can be found in the Github repository.

Now that we have prepared the data model, our schema is now fixed and does not change any more. We can deactivate the Hibernate based dynamic generation of the database tables and use the Spring database initialization instead.To see if we modelled the data correctly, we import MySQL employee data dump we obtained before and import it into our newly created schema, which maps the Java POJOs.

Importing the Initial Data

In the next step, we import the data from the MySQL employee database into our schema spring_hibernate. This schema contains the tables that Hibernate created for us. The following script copies the data between the two schemata. If you see an error, then there is an issue with your model.

We now imported the data in the database schema that we defined for our project. Spring can load schema and initial data during start-up. So we provide two files, one containing the schema and the other one containing the data. To do that, we create two dumps of the database. One containing the schema only, the other one containing the data only.

By deactivating the Hibernate data generation and activating the Spring way, the database gets initialized every time the application starts. Change and edit the following lines in the application.properties

Before we can import the data with the scripts, make sure to drop the schema and disable foreign key checks in the schema file and enable them again at the end. Spring ignores the actionable MySQL comments. So your schema file should contain this

And also insert the two foreign key statements to the data file. Note that the import can take a while. If you are happy with the initialized data, you can deactivate the initialization by setting the variable to false: spring.datasource.initialize=false

The application.properties file meanwhile looks like this:

Adding Hibernate Search

Hibernate search offers full-text search capabilities by using a dedicated index. We need to add the dependencies to the build file.

Refresh the gradle file after including the search dependencies.

Adding Hibernate Search Dependencies

In this step, we annotate the model POJO classes and introduce the full-text search index. Hibernate search utilises just a few basic settings to get started. Add the following variables to tne application properties file.

Please not that storing the Lucene index in the tmp directory is not the best idea, but for testing we can use this rather futile location. We also use the filesystem to store the index, as this is the simplest approach.

Create a Service

In order to facilitate Hibernate Search on our data, we add a service class, which offers methods for searching. The service uses a configuration, which is injected by Spring during run time. The configuration is very simple.

The @Configuration is loaded when Spring builds the application context. It provides a bean of our service, which can then be injected into the application. The service itself provides methods for creating and searching the index. In this example, the search method is very simple: it only searches on the first and the last name of an employee and it allows users to make one mistake (distance 1).

The service implementation currently only contains an initialization method, which used for creating the Lucene index on the filesystem. Before we can test the index, we need to have at least one indexed entity. This can be achieved by simply adding the annotation @Indexed to the POJO.

When we start the application now, we can see that Hibernate creates the index and a short check on disk shows that it worked:

So far, we did not tell Hibernate search which fields we want add to the index and thus make them full-text searchable. The following listing shows the annotated @Fields.

Starting the application again re-creates the index. Time for some basic searching.

Seaching Fulltext

Hibernate Search offers many features, which are not offered in a similar quality by native databases. One interesting feature is for instance fuzzy search, which allows finding terms within an edit distance of up to two letters. The method for searchin on two fields was already shown above. We can use this method in a small JUnit test:

The user made a small typo by entering Chrisu instead of Chris. As we allowed two mistakes, we receive a list of similar names and the test evaluates to passed. Sone possible results are shown below.

Conclusions

Hibernate Search is a great tool and can be easily integrsted into Spring Boot Applicstions. In this post, I gave a minimalistic example how fulltext fuzzy search can be added to existing databases and allows a flexible and powerful search. A few more advanced thoughts on Hibernate Search are given in this blog post here. The Hibernate Search documentation contains a lot of useful and more elaborate examples. The full example can be obtained on Github.

Continue reading


Hibernate Search and Spring Boot: Building Bridges

Hibernate Search is a very convenient way for storing database content in a Lucine index and add fulltext search capabilities to data driven projects simply by annotating classes. It can be easily integrated into Spring Boot applications and as long as only the basic features are used, it works out of the box.  The fun starts when the Autoconfiguration cannot find out how to properly configure things automatically, then it gets tricky quite quickly. Of course this is natural behaviour, but one gets spoiled quite quickly. 

Using the latest Features: Hibernate ORM, Hibernate Search and Spring Boot

The current version of Spring Boot is 1.5.2. This version uses Hibernate ORM 5.0. The latest stable Hibernate Search versions are 5.6.1.Final and 5.7.0.Final, which in  in contrast require Hibernate ORM 5.1 and 5.2 respectively. Also you need Java 8 now. For this reason if you need the latest Spring Search features in combination with Spring Boot, you need to adapt the dependencies as follows:

Note that the Hibernate Entity Manager needs to be excluded, because it has been integrated into the core in the new Hibernate version. Details are given in the Spring Boot documentation.

Enforcing the Dependencies to be Loaded in the Correct Sequence 

As written earlier, Spring Boot takes care of a lot of configurations for us. Most of the time, this works perfectly and reduces the pain for configuring a new application manually. In some particular cases, Spring cannot figure out that there exists a dependency between different services, which needs to be resolved in a specified order. A typical use case is the implementation of FieldBridges for Hibernate Search. FieldBrides translate between the actual Object from the Java World and the representation of such an object in the Lucene index. Typically an EnumBridge is used for indexing Enums, which are often used for realizing internationalization (I18n).

When the Lucene Index  is created, Hibernate checks if Enum fields need to be indexed and if there exist Bridge that converts between the object and the actual record in the Index. The problem here is that Hibernate JPA is loaded at a very early stage in the Spring Boot startup proces. The problem only arises if the BridgeClass utilises @Autowired fields which get injected. Typically, these fields would get injected when the AnnotationBeanConfigurerAspect bean is loaded. Hibernate creates the session with the session factory auto configuration before the spring configurer aspect bean was loaded. So the FieldBridge used by Hibernate during the initialization of the index does not have the service injected yet, causing a nasty Null Pointer Exception. 

Example EnumBridge

The following EnumBridge example utilises an injected Service, which needs to be available before Hibernate starts. If not taken care of, this causes a Null Pointer Exception.

Enforce Loading the Aspect Configurer Before the Session Factory

In order to enforce that the AnnotationBeanConfigurerAspect is created before the Hibernate Session Factory is created, we simply implement our own HibernateJpaAutoConfiguration by extension and add the AnnotationBeanConfigurerAspect to the constructor. Spring Boot now knows that it needs to instantiate the AnnotationBeanConfigurerAspect before it can instantiate the HibernateJpaAutoConfiguration and we then have wired Beans ready for the consumption of the bridge. I found the correct hint here and here.

As it has turned out, using @DependsOn annotations did not work and also @Ordering the precedence of the Beans was not suffucient. With this little hack, we can ensure the correct sequence of initialization.

Continue reading


Deploying MySQL in a Local Development Environment

Installing MySQL via apt-get is a simple task, but the migration between different MySQL versions requires planning and testing. Thus installing one central instance of the database system might not be suitable, when the version of MySQL or project specific settings should be switched quickly without interfering with other applications. Using one central instance can quickly become cumbersome. In this article, I will describe how any number of MySQL instances can be stored and executed from within a user’s home directory.

Adapting MySQL Data an Log File Locations

Some scenarios might require to run several MySQL instances at once, other scenarios cover sensitive data, where we do not want MySQL to write any data on non-encrypted partitions. This is especially true for devices which can get easily stolen, for instance laptops.  If you use a laptop for developing your applications from time to time, chances are good that you need to store sensitive data in a database, but need to make sure that the data is encrypted when at rest. The data stored in a database needs to be protected when at rest.

This can be solved with full disk encryption, but this technique has several disadvantages. First of all, full disk encryption only utilises one password. This entails, that several users who utilise a device need to share one password, which reduces the reliability of this approach. Also when the system needs to be rebooted, full disk encryption can become an obstacle, which increases the complexity further.

Way easier to use is the transparent home directory encryption, which can be selected during many modern Linux setup procedures out of the box. We will use this encryption type for this article, as it is reasonable secure and easy to setup. Our goal is to store all MySQL related data in the home directory and run MySQL with normal user privileges.

Creating the Directory Structure

The first step is creating a directory structure for storing the data. In this example, the user name is stefan, please adapt to your needs.

Create a Configuration File

Make sure to use absolute paths and utilise the directories we created before. Store this file in MySQL-5.6-Local/MySQL-5.6-Conf/my-5.6.cnf. The configuration is pretty self explanatory.

Stop the Running MySQL Instance

If you already have a running MySQL instance, make sure to shut it down. You can also disable MySQL from starting automatically.

Setting up Apparmor

Apparmor protects sensitive applications by defining in which directory they might write. We need to update this configuration to suit our needs. We need to make sure that the global configuration file for the central MySQL instance also includes an additional local information. Edit this file first: /etc/apparmor.d/usr.sbin.mysqld and make sure that the reference for the local file is not commented out.

Now we need to add the directories in stean’s home directory to the local file by editing /etc/apparmor.d/local/usr.sbin.mysqld .

An incorrect Apparmor configuration is often the cause of permission errors, which can be a pain. Make sure to reload the the Apparmor service and if you struggle with it, consider to disable it temporarily and check if the rest works. Do not forget to turn it on again.

Initialize the Local MySQL Instance

Now it is time to initialize the MySQL instance. In this step, MySQL creates all the files it needs in the data directory. It is important that the data directory is empty, when you initiate the following commands.

Note that this command is marked as deprecated. It works with MySQL 5.6 and MySQL 5.7, but can be removed.

Start and Stop the Instance

You can now start the MySQL instance with the following command:

For your convenience, add a custom client configuration in your $HOME/.my.cnf and point it to the user defined socket.

In addition, startup and shutdown scripts are useful as well. Place both scripts in the directory we created before and add execution permissions with chmod +x .

The stop script is similar.

Conclusion

The technique described above allows to install and run multiple MySQL instances from within the user’s home directory. The MySQL instances run with user privileges and can utilise dedicated data and log file directories. As the all data is stored within the $HOME directory, we can easily apply transparent encryption to protect data at rest.

Continue reading


Switch the Git Clone Protocol from HTTPS to SSH

Gitlab offers several options for interacting with remote repositories: git, http, https and ssh. The first option – git – is the native transport protocol and does not encrypt the traffic. The same applies for http, rendering https and ssh the only feasible protocols if you commit and retrieve data via insecure networks. Ssh and https are also both available via the web interfaces of Github and Gitlab. In both systems you can simply copy and paste the clone URLs including the protocol. The following screenshot shows the Github version.

HTTPS

The simplest way to fetch the repository is to just copy the default HTTPS URL and clone it to the local drive. Git will ask you for the Github credentials.

You will be asked for the credentials every time you interact with the Github remote repository. Per default, git stores credentials for 5 minutes. Instead of waiting so long, we can just drop the credentials and proceed with an empty cache again.

To make our live a little easier, we can store the username. In this example, we store this information only locally, valid for this cloned repository only. The same settings can also be applied globally.

Git will store that information locally (i.e. inside the repository) in the config file:

For storing the password temporarily, you can re-activate the cache again and set a timeout.

Git will now store the password for your Github account for one hour. Although this is convenient, this is not an optimal solution. SSH keys are more secure and more convenient, as they do not expose your personal password and can be set individually for your repositories. In addition you can protect your keys with a password and add a second factor.

SSH

The Github documentation is great, you can find details how to create on how to create SSH keys here. All you need to do is to associate your public key with your remote repository on Github or Gitlab, as explained for instance here. Some general tips for working with keys in a secure way can be found here. As git stores information about how you access your repositories in the local repository config file, you can easily modify this information to fit your needs. For automating SSH access to specific repositories, you can also modify the SSH configuration of your local user account ~/.ssh/config .

For instance if we cloned the repository using the HTTPS method and would rather switch to SSH for the reasons mentioned above, there are two steps necessary:

  1. Add a SSH configuration for the host
  2. Adapt the git config

So first, we add a new entry for the SSH authentication with Github in the file ~/.ssh/config.

We define a hostname  github-test-project for the individual repository, define which SSH key to use and specify that we only want to authenticate with the key. Now that this is settled, we need to tell git to use this connection information. This is done in the local git repository configuration ~/Projects/test-project/.git/config . The file initially looks like this:

All we need to do is change the remote repository data to incorporate the SSH connection we defined in the SSH config. Just replace the url target to the SSH connection definition:

Note the semicolon and that we omitted the username before the SSH host. This information will be read from the SSH config. Also note that the repository needs to be initialited so that we have a master branch.

Continue reading


Getting Familiar with Eclipse Again: Git Integration in Comparison with IntelliJ IDEA

Eclipse and IntelliJ are both great Java IDEs, which have their own communities, advantages and disadvantages. After having spent a few years in JetBrains IntelliJ Community Edition, I got accustomed to the tight and clean Git integration into the user interface. Now I consider switching back up Eclipse, I stumbled over a few things that I try to describe in this post.

IntelliJ and Eclipse Handle Project Structures Differently

Eclipse utilises a workspace concept, which allows to work on several projects at the same time. IntelliJ in contrast allows only one open project and organizes substructures in modules. A comparison of these concepts can be found here. These two different viewpoints have effects on the way how Git is integrated into the workflow.

Sharing Projects

The different views on project structures of both IDEs imply that Git repositories are also treated differently. While IntelliJ utilises the root of a repository directly, Eclipse introduces a subfolder for the project. This leads to the interesting observation that importing a project from Git again into an Eclipse workspace requires a small adaption in order to let Eclipse recognize the structure again.

A Small Workflow

In order to get familiar again with Eclipse, I created a small test project, which I then shared by pushing it to a Git repository. I then deleted the project and tried to re-import it again.

Step 1

Create new test project. In this case a Spring Boot Project, which works with Maven. Note that the new project is stored in the Eclipse workspace.

 

Step 2

As the second step, we create a new repository. Login into Github or your Gitlab instance and create a new project. Initialize it so that you have a master branch ready and copy the URL of the repository. We will then add this repository by opening the Git Repository perspective in Eclipse and add the repository. You can provide a default location for your local repositories in the Eclipse -> Team -> Git properties. In the Git Repository perspective, you can then see the path of the local storage location and some information about the repository, for instance that the local and the master branch are identical (they have the same commit hash). Note that the Git path is different than your workspace project path.

Step 3

We now have a fresh Java Maven based project in our Eclipse workspace and an empty Git repository in the default location for Git repositories in a different location. What we need to do next is to share the source code, by moving it into the Git storage location and add it to the Git index. Eclipse can help us with that,, by using the Team->Share menu from the Project Explorer view, when right clicking on the project.

Step 4

In the next step, we need to specify the Git repository we want to push our code to. This is an easy step, simply select the repository we just created from the drop down menu. In the menu you can see that the content of the current project location on the left side will be moved to the right side. Eclipse created a new subfolder within the repository for our project. This is something that IntelliJ would not do.

In this step, eclipse separates the local and custom project metadata from the actual source code, which is a good thing.

Step 5

In the fifth step, we simply apply some changes and commit and push them to the remote repository using the git staging window.

After this step, the changes are visible in the remote repository as well and available to collaborators. In order to simulate someone, who is going to checkout our little project from Gitlab, we delete the project locally and also remove the git repository.

Step 6

Now we start off with a clean workspace and clone the project from Gitlab. In the git repositories window, we can select clone project and provide the same URL again.

Step 7

In the next screen, we select the local destination for the cloned project. This could be for instance the default directory for Git projects or any other location on your disk. Confirm this dialogue.

Step 8

Now comes the tricky part, which did not work as expected in Eclipse Neon 4.6.1. Usually, one would tell Eclipse, that the cloned project is a Maven project, and it should detect the pom.xml file and download the dependencies. Todo so, we would select Import-> Git -> Projects from Git and clone the repository from this dialogue. Then, as a next step, we would select the Configure -> Convert to Maven Project option, but Eclipse does not seem to recognize the Maven structure. It would only show the files and directories, but not consider the Maven dependencies specified in the pom.xml file.

What happens is that Eclipse tries to add a new pom.xml file and ignores the actual one.

Of course this is a problem and does not work.

Step 9 – Solution

Instead of using the method above, just clone the repository from the Git Repository perspective and then go back to the Project Explorer. Now instead of importing the project via the Git menu, chose the existing Maven project and select the path of the Git repository we cloned before.

And in the next dialogue, specify the path:

As you can see, now Eclipse found the correct pom.xml file and provides the correct dependencies and structure!

Conclusion

Which IDE you prefer is a matter of taste and habit. Both environments do provide a lot of features for developers and differ in the implementation of these features. With this short article, we tried to understand some basic implications of the two philosophies how Eclipse and IntelliJ handle project structures. Once we anticipate these differences, it becomes easy to work with both of them.

 

 

 

Continue reading