Using Hibernate Search with Spring Boot

Spring Boot is a framework, that makes it much easier to develop Spring based applications, by following a convention over configuration principle (while in contrast Spring critics claim that the framework’s principle is rather configuration over everything). In this article, I am going to provide an example how to achieve the following:

  • Create a simple Web application based on Spring Boot
  • Persist and access data with Hibernate
  • Make it searchable with Hibernate Search (Lucine)

I use Eclipse with a Gradle plugin for convenience. MySQL will be our back-end for storing the data. The full example can be obtained from my Github Repository.

Bootstrapping: Create a Simple Spring Boot Webapp

The easiest way to start with Spring Boot is heading over to start.spring.io and create a new project. In this example, I will use Gradle for building the application and handling the dependencies and I add Web and JPA starters.

 

 

Download the archive to your local drive and extract it to a folder. I called the project SearchaRoo.

Import the Project with Eclipse

Import it as an existing Gradle Project in Eclipse by using the default settings. You will end up with a nice little project structure as shown below:

We have a central application starter class denoted SearchaRooAppication.java, package definitions, application properties and even test classes. The great thing with Spring Boot is that it is very simple to start and that you can debug it as every other local Java application. There is no need for remote debugging or complex application server setups.

Prepare the Database

We need a few permissions on our MySQL instance before we can start.

We can then add the connection details into the application.properties file. We will edit this file several times when the complexity of this project increases.

Now the basic database setup is done. We can then start adding model classes.

Getting some Employees on Board

MySQL offers a rather small but well documented sample database called employees, which is hosted on Github.  Obtain and import the data as follows:

The script creates a new schema called employees and you will end up with a schema like this:

In the course of this article, we are going to model this schema with Java POJOs by annotating the entities and the a appropriate fields with JPA.

Dependencies

Before we can start modelling the entities in Java, have a look at the Gradle build file. We include additional dependencies for the MySQL connector and Apache commons.

Modelling Reality

The next step covers modelling the data which we imported with Java POJOs. Obviously this is not the most natural way, because in general you would create the model first and then add data to it, but as we already had the data we decided to go in this direction. In the application.properties file, set the database to the imported employees database and set the Hibernate create property to validate. With this setting, we can confirm that we modelled the Java classed in accordance with the database model defined by the MySQL employees database. 

An example of such a class is shown below, the other classes can be found in the Github repository.

Now that we have prepared the data model, our schema is now fixed and does not change any more. We can deactivate the Hibernate based dynamic generation of the database tables and use the Spring database initialization instead.To see if we modelled the data correctly, we import MySQL employee data dump we obtained before and import it into our newly created schema, which maps the Java POJOs.

Importing the Initial Data

In the next step, we import the data from the MySQL employee database into our schema spring_hibernate. This schema contains the tables that Hibernate created for us. The following script copies the data between the two schemata. If you see an error, then there is an issue with your model.

We now imported the data in the database schema that we defined for our project. Spring can load schema and initial data during start-up. So we provide two files, one containing the schema and the other one containing the data. To do that, we create two dumps of the database. One containing the schema only, the other one containing the data only.

By deactivating the Hibernate data generation and activating the Spring way, the database gets initialized every time the application starts. Change and edit the following lines in the application.properties

Before we can import the data with the scripts, make sure to drop the schema and disable foreign key checks in the schema file and enable them again at the end. Spring ignores the actionable MySQL comments. So your schema file should contain this

And also insert the two foreign key statements to the data file. Note that the import can take a while. If you are happy with the initialized data, you can deactivate the initialization by setting the variable to false: spring.datasource.initialize=false

The application.properties file meanwhile looks like this:

Adding Hibernate Search

Hibernate search offers full-text search capabilities by using a dedicated index. We need to add the dependencies to the build file.

Refresh the gradle file after including the search dependencies.

Adding Hibernate Search Dependencies

In this step, we annotate the model POJO classes and introduce the full-text search index. Hibernate search utilises just a few basic settings to get started. Add the following variables to tne application properties file.

Please not that storing the Lucene index in the tmp directory is not the best idea, but for testing we can use this rather futile location. We also use the filesystem to store the index, as this is the simplest approach.

Create a Service

In order to facilitate Hibernate Search on our data, we add a service class, which offers methods for searching. The service uses a configuration, which is injected by Spring during run time. The configuration is very simple.

The @Configuration is loaded when Spring builds the application context. It provides a bean of our service, which can then be injected into the application. The service itself provides methods for creating and searching the index. In this example, the search method is very simple: it only searches on the first and the last name of an employee and it allows users to make one mistake (distance 1).

The service implementation currently only contains an initialization method, which used for creating the Lucene index on the filesystem. Before we can test the index, we need to have at least one indexed entity. This can be achieved by simply adding the annotation @Indexed to the POJO.

When we start the application now, we can see that Hibernate creates the index and a short check on disk shows that it worked:

So far, we did not tell Hibernate search which fields we want add to the index and thus make them full-text searchable. The following listing shows the annotated @Fields.

Starting the application again re-creates the index. Time for some basic searching.

Seaching Fulltext

Hibernate Search offers many features, which are not offered in a similar quality by native databases. One interesting feature is for instance fuzzy search, which allows finding terms within an edit distance of up to two letters. The method for searchin on two fields was already shown above. We can use this method in a small JUnit test:

The user made a small typo by entering Chrisu instead of Chris. As we allowed two mistakes, we receive a list of similar names and the test evaluates to passed. Sone possible results are shown below.

Conclusions

Hibernate Search is a great tool and can be easily integrsted into Spring Boot Applicstions. In this post, I gave a minimalistic example how fulltext fuzzy search can be added to existing databases and allows a flexible and powerful search. A few more advanced thoughts on Hibernate Search are given in this blog post here. The Hibernate Search documentation contains a lot of useful and more elaborate examples. The full example can be obtained on Github.

Continue reading


Hibernate Search and Spring Boot: Building Bridges

Hibernate Search is a very convenient way for storing database content in a Lucine index and add fulltext search capabilities to data driven projects simply by annotating classes. It can be easily integrated into Spring Boot applications and as long as only the basic features are used, it works out of the box.  The fun starts when the Autoconfiguration cannot find out how to properly configure things automatically, then it gets tricky quite quickly. Of course this is natural behaviour, but one gets spoiled quite quickly. 

Using the latest Features: Hibernate ORM, Hibernate Search and Spring Boot

The current version of Spring Boot is 1.5.2. This version uses Hibernate ORM 5.0. The latest stable Hibernate Search versions are 5.6.1.Final and 5.7.0.Final, which in  in contrast require Hibernate ORM 5.1 and 5.2 respectively. Also you need Java 8 now. For this reason if you need the latest Spring Search features in combination with Spring Boot, you need to adapt the dependencies as follows:

Note that the Hibernate Entity Manager needs to be excluded, because it has been integrated into the core in the new Hibernate version. Details are given in the Spring Boot documentation.

Enforcing the Dependencies to be Loaded in the Correct Sequence 

As written earlier, Spring Boot takes care of a lot of configurations for us. Most of the time, this works perfectly and reduces the pain for configuring a new application manually. In some particular cases, Spring cannot figure out that there exists a dependency between different services, which needs to be resolved in a specified order. A typical use case is the implementation of FieldBridges for Hibernate Search. FieldBrides translate between the actual Object from the Java World and the representation of such an object in the Lucene index. Typically an EnumBridge is used for indexing Enums, which are often used for realizing internationalization (I18n).

When the Lucene Index  is created, Hibernate checks if Enum fields need to be indexed and if there exist Bridge that converts between the object and the actual record in the Index. The problem here is that Hibernate JPA is loaded at a very early stage in the Spring Boot startup proces. The problem only arises if the BridgeClass utilises @Autowired fields which get injected. Typically, these fields would get injected when the AnnotationBeanConfigurerAspect bean is loaded. Hibernate creates the session with the session factory auto configuration before the spring configurer aspect bean was loaded. So the FieldBridge used by Hibernate during the initialization of the index does not have the service injected yet, causing a nasty Null Pointer Exception. 

Example EnumBridge

The following EnumBridge example utilises an injected Service, which needs to be available before Hibernate starts. If not taken care of, this causes a Null Pointer Exception.

Enforce Loading the Aspect Configurer Before the Session Factory

In order to enforce that the AnnotationBeanConfigurerAspect is created before the Hibernate Session Factory is created, we simply implement our own HibernateJpaAutoConfiguration by extension and add the AnnotationBeanConfigurerAspect to the constructor. Spring Boot now knows that it needs to instantiate the AnnotationBeanConfigurerAspect before it can instantiate the HibernateJpaAutoConfiguration and we then have wired Beans ready for the consumption of the bridge. I found the correct hint here and here.

As it has turned out, using @DependsOn annotations did not work and also @Ordering the precedence of the Beans was not suffucient. With this little hack, we can ensure the correct sequence of initialization.

Continue reading


Deploying MySQL in a Local Development Environment

Installing MySQL via apt-get is a simple task, but the migration between different MySQL versions requires planning and testing. Thus installing one central instance of the database system might not be suitable, when the version of MySQL or project specific settings should be switched quickly without interfering with other applications. Using one central instance can quickly become cumbersome. In this article, I will describe how any number of MySQL instances can be stored and executed from within a user’s home directory.

Adapting MySQL Data an Log File Locations

Some scenarios might require to run several MySQL instances at once, other scenarios cover sensitive data, where we do not want MySQL to write any data on non-encrypted partitions. This is especially true for devices which can get easily stolen, for instance laptops.  If you use a laptop for developing your applications from time to time, chances are good that you need to store sensitive data in a database, but need to make sure that the data is encrypted when at rest. The data stored in a database needs to be protected when at rest.

This can be solved with full disk encryption, but this technique has several disadvantages. First of all, full disk encryption only utilises one password. This entails, that several users who utilise a device need to share one password, which reduces the reliability of this approach. Also when the system needs to be rebooted, full disk encryption can become an obstacle, which increases the complexity further.

Way easier to use is the transparent home directory encryption, which can be selected during many modern Linux setup procedures out of the box. We will use this encryption type for this article, as it is reasonable secure and easy to setup. Our goal is to store all MySQL related data in the home directory and run MySQL with normal user privileges.

Creating the Directory Structure

The first step is creating a directory structure for storing the data. In this example, the user name is stefan, please adapt to your needs.

Create a Configuration File

Make sure to use absolute paths and utilise the directories we created before. Store this file in MySQL-5.6-Local/MySQL-5.6-Conf/my-5.6.cnf. The configuration is pretty self explanatory.

Stop the Running MySQL Instance

If you already have a running MySQL instance, make sure to shut it down. You can also disable MySQL from starting automatically.

Setting up Apparmor

Apparmor protects sensitive applications by defining in which directory they might write. We need to update this configuration to suit our needs. We need to make sure that the global configuration file for the central MySQL instance also includes an additional local information. Edit this file first: /etc/apparmor.d/usr.sbin.mysqld and make sure that the reference for the local file is not commented out.

Now we need to add the directories in stean’s home directory to the local file by editing /etc/apparmor.d/local/usr.sbin.mysqld .

An incorrect Apparmor configuration is often the cause of permission errors, which can be a pain. Make sure to reload the the Apparmor service and if you struggle with it, consider to disable it temporarily and check if the rest works. Do not forget to turn it on again.

Initialize the Local MySQL Instance

Now it is time to initialize the MySQL instance. In this step, MySQL creates all the files it needs in the data directory. It is important that the data directory is empty, when you initiate the following commands.

Note that this command is marked as deprecated. It works with MySQL 5.6 and MySQL 5.7, but can be removed.

Start and Stop the Instance

You can now start the MySQL instance with the following command:

For your convenience, add a custom client configuration in your $HOME/.my.cnf and point it to the user defined socket.

In addition, startup and shutdown scripts are useful as well. Place both scripts in the directory we created before and add execution permissions with chmod +x .

The stop script is similar.

Conclusion

The technique described above allows to install and run multiple MySQL instances from within the user’s home directory. The MySQL instances run with user privileges and can utilise dedicated data and log file directories. As the all data is stored within the $HOME directory, we can easily apply transparent encryption to protect data at rest.

Continue reading


Flashing a NanoPc T3 with DietPi

The NanoPc T3 is a 64 bit octa core single board computer, quite similar to the famous Raspberry Pi boards. It is also often referred to as NanoPi T3 as well.

Hardware Specification

The single board computer has eight cores with up to 1.4GhZ and 1 GB of DDR3 RAM. It has a lot of nice interfaces, the specification below is taken from here.

Overview

The device offers quite a lot considering its small measurements. The picture below is an overview picture taken from here.

The device with the heat sink and attached cables is shown below.

Comparison with the Raspberry Pi Model 3B

It costs about twice as much as the Raspberry Pi 3, but comes with eight cores at 1.4GHz instead of four cores with 1.2GHz, GBit Ethernet instead of just 100 MBit and several additional interfaces. It has a dedicated power switch, supports soft poweroff and provides reset and boot buttons. It comes with an SD card slot instead of micro SD, has only two standard USB ports but also one micro USB port. This port however is not for powering the device, but only for data.

Some remarks at First

The board can get quite warm, so I would recommend buying the heat sins that fit directly on the board as well. The wifi signal is also rather weak, I would recommend investing in the external antenna if the device is in an area with low signal reception. Also it requires an external 5V power source and does not provide a micro USB port for power like similar boards use.

Buying and Additional Information

The board can be obtained for 60$ from here and there also exists a wiki page dedicated to the T3. The images are stored at a One-Click share hoster and the download is very slow. Also the files are not that well organized and can be easily confused with other platforms offered by the same company.

  • Nano PC T3 ($60)
  • Heat sink ($1.99)
  • Power supply ($20)
  • SD card (~ $10)

Additionally there is shipping ($20 to Europe) and also very likely some toll to pay.

Initial Setup

The NanoPi T3 has an internal eMMC storage with 8GB capacity. It comes pre-installed with Android, which is not really useful for my applications. Instead, there exist different ISO images wich can be obtained here. The wiki page documents how to create bootable SD cards with Windows and Linux and there are also scripts offered, which automate the process. Unfortunately, the scripts are not documented well and some of the links are already broken, which reduces the usability of the provided information. Also as the images should be downloaded from some Sharehoster, there is no way of verifying, what kind of image you actually obtained. This is a security risk and not applicable in many scenarios. Fortunately, there also exist alternative images which are more transparent to use.

By default, the device boots from the eMMC flash storage. By pressing the boot button in the lower right corner, we can also boot from the SD card. This is a nice feature, but if you want to reboot the system unattended, then we need to replace the default operating system. In the course of this article, we are going to write an alternative Debian image to the flash memory and boot this OS automatically.

DietPi

DietPi is a Debian based distribution, which claims to be an optimized and lightweight alternative for single board PCs. The number of supported devices is impressive and luckily, also the NanoPC T3 is in the list. It also comes with a list of nice features for the configuration and the backup of the system. DietPi can be dowloaded here and the documentation is available here.

The following steps are requried:

  1. Download the DietPi Image
  2. Write the image to the SD card
  3. Mount the SD card on your desktop and copy the DietPi image to the card
  4. Boot the NanoPC T3 from the card
  5. Flash the DietPi image to the eMMC
  6. Reboot
  7. Configure

Creating a Bootable SD Card

The fist step involves creating a bootable SD card by writing the DietPi image with dd to the card. To do so, download the DietPi image to your local Desktop and then write the file with dd. The process does not differ from other single board machines and is described here. The next step might seem a bit odd. After you finished writing the SD card, mount it on your local Dekstop and copy the DietPi image to the tmp directory of the SD card.The reason we do this is that we need to have a running Linux system so that we can flash the integrated eMMC of the T3. We then use the DietPi Linux zu actually flash the eMMC of the T3 also with the DietPi image. By copying the image we save some time for downloading and we have the image right available in the next step.

Boot the SD Card

Make sure the T3 is powered off and insert the SD card into the board. Hold and keep pressed the boot button and flip the power switch. The T3 then should boot into the DietPi system. It is easier if you attach a monitor and a keyboard to the system for the further configuration. Alternatively, you can also configure the networking settings in advance, by mounting the SD card at the Desktop and edit the configuration files there, but as we simply use this system for installing the actual operating system, this might be a bit too much effort. Press CRTL+ALT+F2 to switch to a new TTY and login. The standard login for the DietPi system is with the user root and password dietpi.

First, create a backup of the original eMMC content, just in case anything does wrong. Use fdisk, to see the available drives.

In the example above you can see my 16GB SD card at /dev/mmcblk0 and the internal eMMC at /dev/mmcblk1. Use dd to create a backup of the whole eMMC like this:

We now have a backup of the original content at the SD card and can proceed with the actual flashing.

Flashing the DietPi Image to the eMMC

In this step, we flash the DietPi image we copied to the SD card before to the eMMC and overwrite its default Android system. To to so, we use again dd:

This may take a while, be patient. After the image has been written, poweroff the T3 and remove the SD card. Now hold again the boot button and flip the power switch again. This causes the T3 to initialize the new DietPi installation, this time from the internal flash memory. After this step, the system automatically boots from the eMMC flash the next time, without having to press the button.

As a result, we now only utilise the internal flash memory for running the OS, which is not only faster than the SD card, but also allows using the SD card as additional storage. The tool fdisk now shows the eMMC with its two partitions from DietPi:

 

Configuration

The DietPi comes with a few nice setup tools, which make the installation process rather easy. After logging in, the DietPi will guide you through the installation, but it expects a working Internet connection. You can add the SSID, pre-shared key and additional information in the file /DietPi/dietpi.txt. The following steps are basic and the setup needs to be completed, otherwise you will be bugged with the same menu after every reboot:

  1. The first step is to configure wireless networking. Add your network information to the file mentioned above and reboot. Todo so, you need to abort the setup process, because following the menu did not work for me, as the menu allows only to setup a hotspot instead of connecting to an existing network.
  2. When you login next time, you will be greeted by the dietpi-software dialogue. You can install basic software components such as editors, build essentials etc
  3. The system will now reboot and you are ready go go
  4. Login again and change the root password with passwd and add a new user with adduser $USER and add this user to the sudo group with adduser $USER sudo

You can re-open the configuration menus later by using dietpi-config for the basic setup and dietpi-software for adding new software. Of course for the latter you can always use apt.

Conclusions

The NanoPC T3 is an affordable single board computer with eight cores and 1GB of DDR3 RAM, integrated wireless and bluetooth interfaces, camera interfaces and many other features more. Its mall size renders it an ideal candidate for hardware and software projects based on Linux. The documentation could be improved, especially the transparency of the images and the details of the installation procedure. Also more RAM (as always) would be nice.

Continue reading


Switch the Git Clone Protocol from HTTPS to SSH

Gitlab offers several options for interacting with remote repositories: git, http, https and ssh. The first option – git – is the native transport protocol and does not encrypt the traffic. The same applies for http, rendering https and ssh the only feasible protocols if you commit and retrieve data via insecure networks. Ssh and https are also both available via the web interfaces of Github and Gitlab. In both systems you can simply copy and paste the clone URLs including the protocol. The following screenshot shows the Github version.

HTTPS

The simplest way to fetch the repository is to just copy the default HTTPS URL and clone it to the local drive. Git will ask you for the Github credentials.

You will be asked for the credentials every time you interact with the Github remote repository. Per default, git stores credentials for 5 minutes. Instead of waiting so long, we can just drop the credentials and proceed with an empty cache again.

To make our live a little easier, we can store the username. In this example, we store this information only locally, valid for this cloned repository only. The same settings can also be applied globally.

Git will store that information locally (i.e. inside the repository) in the config file:

For storing the password temporarily, you can re-activate the cache again and set a timeout.

Git will now store the password for your Github account for one hour. Although this is convenient, this is not an optimal solution. SSH keys are more secure and more convenient, as they do not expose your personal password and can be set individually for your repositories. In addition you can protect your keys with a password and add a second factor.

SSH

The Github documentation is great, you can find details how to create on how to create SSH keys here. All you need to do is to associate your public key with your remote repository on Github or Gitlab, as explained for instance here. Some general tips for working with keys in a secure way can be found here. As git stores information about how you access your repositories in the local repository config file, you can easily modify this information to fit your needs. For automating SSH access to specific repositories, you can also modify the SSH configuration of your local user account ~/.ssh/config .

For instance if we cloned the repository using the HTTPS method and would rather switch to SSH for the reasons mentioned above, there are two steps necessary:

  1. Add a SSH configuration for the host
  2. Adapt the git config

So first, we add a new entry for the SSH authentication with Github in the file ~/.ssh/config.

We define a hostname  github-test-project for the individual repository, define which SSH key to use and specify that we only want to authenticate with the key. Now that this is settled, we need to tell git to use this connection information. This is done in the local git repository configuration ~/Projects/test-project/.git/config . The file initially looks like this:

All we need to do is change the remote repository data to incorporate the SSH connection we defined in the SSH config. Just replace the url target to the SSH connection definition:

Note the semicolon and that we omitted the username before the SSH host. This information will be read from the SSH config. Also note that the repository needs to be initialited so that we have a master branch.

Continue reading