Lightning fast integration tests with Docker, MySQL and tmpfs

Integration tests that involve database operations requite to tear down and re-initialize a database multiple times. Although most developer machines offer SSD and enough RAM, database initialization can consume a considerable amount of time nevertheless.

Docker allows defining volumes that are mounted directly into the memory by using tmpfs. We can utilize this feature to utilize the speed up disk operations as during database imports, by moving the data from the disk into the memory.

The following example measures the time for writing 1GB worth of data to an SSD:

dd if=/dev/zero of=/tmp/output bs=1024k count=1024; 
1024+0 Datensätze ein
1024+0 Datensätze aus
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 2,22033 s, 484 MB/s

For comparison, the following steps create a RAM disk and write the data to memory.

$ sudo mount -o size=1G -t tmpfs none /tmp/tmpfs

$ dd if=/dev/zero of=/tmp/tmpfs/output bs=1024k count=1024; 
1024+0 Datensätze ein
1024+0 Datensätze aus
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,309017 s, 3,5 GB/s

As you can see writing 1GB to memory is 7x faster. With the following Docker run command, you can spin-up a default MySQL container, where the data directory resides in a tmpfs.

docker run -d \
    --rm \
    --name mysql-56-in-memory \
    -p 3307:3306 \
    --tmpfs /var/lib/mysql:rw,noexec,nosuid,size=1024m \
    -e MYSQL_DATABASE=dev \

The arguments of Docker run mean the following

  • –rm: delete the container once it was stopped
  • –name: a name for the container
  • -p: map the host’s port 3307 to the port 3306 inside the container. This allows to run multiple MySQL containers in parallel and connect to them from the host via the port specified
  • –tmpfs: This line mounts the internal directory of the container to a RAM disk. It should be writeable (rm). Noexec prevents the execution of binaries, nosuid prevents changing the permission flags and the size specifies the size occupied by the tmpfs partition in memory. Adapt this to your usecase. The minimum for MySQL is around 200MB. Add the space needed for your data, indices etc.
  • MYSQL_ALLOW_EMPTY_PASSWORD does what it implies
  • MYSQL_DATABASE defines the name of a database to be created

If you run this command you can connect to the container like this: mysql -u root -h -P 3307

The container behaves like a normal MySQL database, unless the data is not persisted on a hard disk, but only stored in the ephemeral memory. If you stop the container, it will be removed by docker and if you reboot the machine the data will be gone. for obvious reasons this is only a good idea for test data that can be re-created at any time.

You can achieve the same also with Docker Compose if you would like to orchestrate multiple containers.

version: '3'
    container_name: mysql-56-in-memory
    restart: unless-stopped
    image: mysql:5.6
      - MYSQL_HOST=''
      - data:/var/cache
      - "3307:3306"
      type: tmpfs
      device: tmpfs

Continue reading

Setup AWS MySQL 5.6 Aurora as a Slave for an external Master with SSL

Setting up Aurora as a slave for an external MySQL server that acts as the master is a bit tricky. Of course we want a secured connection. For this reason we need to create client certificates to be used by AWS RDS. The steps below should work for RDS as well. 

Generate and Sign  the Certificates

The process is actually simple, but AWS is picky how you generate the certificates. I was using a SHA flag that was accepted by a regular MySQL 5.6 instance, but caused a cryptic (pun intended) MySQL 2026 Generic SSL error and it was quite hard to find the source.  Also note that you need to have different common names (CN) for all three certificate pairs. They do not necessarily need to fit the actual domain name, but they need to be different. 

First we need to create the certificate authority that can sign the keys

# Generate a certificate authority key pair
openssl genrsa 2048 > ca-key.pem
# Notice the CN name. It needs to be different for all of the three key pairs that we create!
openssl req -new -x509 -nodes -days 3600 -key ca-key.pem -out ca.pem -subj "/C=AT/ST=Tirol/L=Innsbruck/O=The Good Corp/OU=IT Department/CN=ca.mysql"

Then create the server key pair

#Generate a server key. Note again the different CN
openssl req -newkey rsa:2048 -days 3600 -nodes -keyout server-key.pem -out server-req.pem -subj "/C=AT/ST=Tirol/L=Innsbruck/O=The Good Corp/OU=IT Department/CN=server.mysql"
# Convert the format
openssl rsa -in server-key.pem -out server-key.pem
# Sign it
openssl x509 -req -in server-req.pem -days 3600 -CA ca.pem -CAkey ca-key.pem -set_serial 01 -out server-cert.pem

Finally we generate a client certificate and its key. You can repeat these steps to generate multiple certificates for clients

# Again, note the CN
openssl req -newkey rsa:2048 -days 3600 -nodes -keyout client-key.pem -out client-req.pem -subj "/C=AT/ST=Tirol/L=Innsbruck/O=The Good Corp/OU=IT Department/CN=client.mysql"
# Convert
openssl rsa -in client-key.pem -out client-key.pem
# Sign
openssl x509 -req -in client-req.pem -days 3600 -CA ca.pem -CAkey ca-key.pem -set_serial 01 -out client-cert.pem
# Verify
openssl verify -CAfile ca.pem server-cert.pem client-cert.pem

Now we have all the certs we need.

Master Setup

The setup is pretty standard. Add the server certificates to the MySQL configuration of your master and restart.

# SSL Server Certificate

Then create a user for the slave


Slave Setup

On AWS you do not have SUPER() privileges, but can use stored procedures provided by Amazon to setup the slave.

Start fresh by removing old records. If there was no previous setup, there might be an error.

CALL mysql.rds_remove_binlog_ssl_material;
CALL mysql.rds_reset_external_master;

Now you need to pass the client certificate data as a JSON to AWS Aurora.

CALL mysql.rds_import_binlog_ssl_material('{"ssl_ca":"-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----\n","ssl_cert":"-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----\n","ssl_key":"-----BEGIN RSA PRIVATE KEY-----
-----END RSA PRIVATE KEY-----"}');

A message that the SSL data was accepted will appear if you pasted the certificate, the key and the CA certificate correctly.

Finally, start the replication and check the status

CALL mysql.rds_start_replication;

Tests and Troubleshooting

On the master, you can check if the slave even tries to connect for instance with tcpdump. In the example below the IP would be the AWS gateway address as seen by your firewall.

sudo tcpdump src host -vv


Continue reading

Illegal mix of collations: IntelliJ and UTF8mb4

When using variables inside SQL scripts within IntelliJ products (e.g. DataGrip), certain queries will not work because the encodings of the IntelliJ client and the server missmatch. This occurs for instance when you compare variables. A typical error message looks like this:

[HY000][1267] Illegal mix of collations (utf8mb4_unicode_520_ci,IMPLICIT) \
   and (utf8mb4_general_ci,IMPLICIT) for operation 'like'

IntelliJ products do not yet support MySQL’s utf8mb4 character set encodings. The problem occurs when using variables in queries. Per default. IntelliJ uses a UTF-8 encoding for the connection. When you use utf8mb4 as the database default character set, then variables will be encoded in UTF-8 while the database content remailns in utf8mb4. It is not possible to provide the character set encodings to the IntelliJ settings, as it will refuse to connect.

Check your server settings using the MySQL client:

MySQL [cropster_research]> show variables like '%char%';
| Variable_name            | Value                      |
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |

This seems correct, but when you connect with the IntelliJ client, you will get wrong results when you use variables. Until the products support utf8mb4, you would need to add the following settings to the script in order to force the right settings.

SET character_set_connection=utf8mb4;
SET collation_connection=utf8mb4_unicode_520_ci;

Continue reading

Verifying Replication Consistency with Percona’s pt-table-checksum

Replication is an important concept for improving database performance and security. In this blog post, I would like to demonstrate how the consistency between a MySQL master and a slave can be verified. We will create two Docker containers, one for the master one for the slave.

Installing the Percona Toolkit

The Percona Toolkit is a collection of useful utilities, which can be obained for free from the company’s portal. The following commands install the prerequisits, download the package and eventually the package.

sudo apt-get install -y wget libdbi-perl libdbd-mysql-perl libterm-readkey-perl libio-socket-ssl-perl
sudo dpkg -i percona-toolkit_3.0.4-1.xenial_amd64.deb 

Setting up a Test Environment with Docker

The following command creates and starts a docker container. Note that these are minimal examples and are not suitable for a serious environment.

docker run --name mysql_master -e MYSQL_ALLOW_EMPTY_PASSWORD=true -d mysql:5.6 --log-bin \
   --binlog-format=ROW --server-id=1

Get the IP address from the master container:

# Get the IP of the master 
docker inspect mysql_master | grep IPAddress 

"SecondaryIPAddresses": null, 
    "IPAddress": ""

You can connect to this container like this and verify the server id:

stefan@Lenovo ~/Docker-Projects $ mysql -u root -h
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.35-log MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like 'server_id';
| Variable_name | Value |
| server_id     | 1     |
1 row in set (0,00 sec)

We repeat the command for the slave, but use a different id. port and name:

docker run --name mysql_slave -e MYSQL_ALLOW_EMPTY_PASSWORD=true -d mysql:5.6 --server-id=2

For simplicity, we did not use Docker links, but will rather use IP addresses assigned by Docker directly.

Replication Setup

First, we need to setup a user with replication privileges. This user will connect from the slave to the master.

# On the host, interact with the master container
## Get the IP address of the slave container
$ docker inspect mysql_slave | grep IPAddress
            "SecondaryIPAddresses": null,
            "IPAddress": "",
                    "IPAddress": "",

## Login to the MySQL console of the master
### Grant permissions
### Get the current binlog position
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
| mysqld-bin.000002 | 346 | | | |
1 row in set (0,00 sec)

Now log into the slave container and add the connection details for the master:

## Connect to the MySQL Slave instance
$ mysql -u root -h

### Setup the slave

Query OK, 0 rows affected, 2 warnings (0,05 sec)

### Start and check
mysql>   start slave;
Query OK, 0 rows affected (0,01 sec)

mysql> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_User: percona
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.000002
          Read_Master_Log_Pos: 346
               Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 284
        Relay_Master_Log_File: mysqld-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

Now our simple slave setup is running.

Get some test data

Lets download the Sakila test database and import it into the master. It will be replicated immediately.

~/Docker-Projects $ tar xvfz sakila-db.tar.gz 

mysql -u root -h < sakila-db/sakila-schema.sql 
mysql -u root -h < sakila-db/sakila-data.sql

Verify that the data is on the slave as well:

mysql -u root -h -e "USE sakila;SHOW TABLES;"
| Tables_in_sakila           |
| actor                      |
| actor_info                 |
| address                    |
| category                   |
| city                       |
| country                    |
| customer                   |
| store                      |

After our setup is completed, we can proceed with Percona pt-table checksum.

Percona pt-table-checksum

The Percona pt-table-checksum tool requires the connection information of the master and the slave in a specific format. This is called the DSN (data source name), which is a coma separated string. We can store this information in a dedicated database called percona in a table called dsns. We create this table on the master. Note that the data gets replicated to the slave within the blink of an eye.

USE percona;
 `dsn` varchar(255) NOT NULL,

The next step involves creating permissions on the slave and the master!

GRANT ALL PRIVILEGES ON percona.* TO 'percona'@'';

The percona user is needed to run the script. Note that the IP address is this time from the (Docker) host, having the IP by default. In real world scenarios, this script would either be run on the master or on the slave directly.

Now we need to add the information about the slave to the table we created. The Percona tool could also read this from the process list, but it is more reliable if we add the information ourselves. To do so, we add a record to the table we just created, which describes the slave DSN:

INSERT INTO percona.DSN-Table VALUES (1,'h=,u=percona,p=SECRET,P=3306');

The pt-table-checksum tool the connects to the master instance and the the slave. It computes checksums of all databases and tables and compares results. You can use the tool like this:

pt-table-checksum --replicate=percona.checksums --create-replicate-table --empty-replicate-table \
  --recursion-method=dsn=t=percona.DSN_Table -h -P 3306 -u percona -pSECRET
09-10T10:13:11      0      0        0       1       0   0.020 mysql.columns_priv
09-10T10:13:11      0      0        3       1       0   0.016 mysql.db
09-10T10:13:11      0      0        0       1       0   0.024 mysql.event
09-10T10:13:11      0      0        0       1       0   0.014 mysql.func
09-10T10:13:11      0      0       40       1       0   0.026 mysql.help_category
09-10T10:13:11      0      0      614       1       0   0.023 mysql.help_keyword
09-10T10:13:11      0      0     1224       1       0   0.022 mysql.help_relation
09-10T10:13:12      0      0      585       1       0   0.266 mysql.help_topic
09-10T10:13:12      0      0        0       1       0   0.031 mysql.ndb_binlog_index
09-10T10:13:12      0      0        0       1       0   0.024 mysql.plugin
09-10T10:13:12      0      0        6       1       0   0.287 mysql.proc
09-10T10:13:12      0      0        0       1       0   0.031 mysql.procs_priv
09-10T10:13:12      0      1        2       1       0   0.020 mysql.proxies_priv
09-10T10:13:12      0      0        0       1       0   0.024 mysql.servers
09-10T10:13:12      0      0        0       1       0   0.017 mysql.tables_priv
09-10T10:13:12      0      0     1820       1       0   0.019 mysql.time_zone
09-10T10:13:12      0      0        0       1       0   0.015 mysql.time_zone_leap_second
09-10T10:13:12      0      0     1820       1       0   0.267 mysql.time_zone_name
09-10T10:13:13      0      0   122530       1       0   0.326 mysql.time_zone_transition
09-10T10:13:13      0      0     8843       1       0   0.289 mysql.time_zone_transition_type
09-10T10:13:13      0      1        4       1       0   0.031 mysql.user
09-10T10:13:13      0      0        1       1       0   0.018 percona.DSN_Table
09-10T10:13:13      0      0      200       1       0   0.028
09-10T10:13:13      0      0      603       1       0   0.023 sakila.address
09-10T10:13:13      0      0       16       1       0   0.033 sakila.category
09-10T10:13:13      0      0      600       1       0   0.023
09-10T10:13:13      0      0      109       1       0   0.029
09-10T10:13:14      0      0      599       1       0   0.279 sakila.customer
09-10T10:13:14      0      0     1000       1       0   0.287
09-10T10:13:14      0      0     5462       1       0   0.299 sakila.film_actor
09-10T10:13:14      0      0     1000       1       0   0.027 sakila.film_category
09-10T10:13:14      0      0     1000       1       0   0.032 sakila.film_text
09-10T10:13:14      0      0     4581       1       0   0.276 sakila.inventory
09-10T10:13:15      0      0        6       1       0   0.030 sakila.language
09-10T10:13:15      0      0    16049       1       0   0.303 sakila.payment
09-10T10:13:15      0      0    16044       1       0   0.310 sakila.rental
09-10T10:13:15      0      0        2       1       0   0.029 sakila.staff
09-10T10:13:15      0      0        2       1       0   0.020

The result shows a difference in the MySQL internal table for permissions. This is obviously not what we are interested in, as permissions are individual to a host. So we rather exclude the MySQL internal database and also the percona database, because it is not what we are interested in. Also in order to test it the tool works, we delete the last five category assignments from the table  with mysql -u root -h -e “DELETE FROM sakila.film_category WHERE film_id > 995;” and update a row in the city table with 

mysql -u root -h -e "update SET city='Innsbruck' WHERE city_id=590;"

Now execute the command again:

pt-table-checksum --replicate=percona.checksums --create-replicate-table --empty-replicate-table \
   --recursion-method=dsn=t=percona.DSN_Table --ignore-databases mysql,percona -h -P 3306 -u percona -pSECRET
09-10T10:46:33      0      0      200       1       0   0.017
09-10T10:46:34      0      0      603       1       0   0.282 sakila.address
09-10T10:46:34      0      0       16       1       0   0.034 sakila.category
09-10T10:46:34      0      1      600       1       0   0.269
09-10T10:46:34      0      0      109       1       0   0.028
09-10T10:46:34      0      0      599       1       0   0.285 sakila.customer
09-10T10:46:35      0      0     1000       1       0   0.297
09-10T10:46:35      0      0     5462       1       0   0.294 sakila.film_actor
09-10T10:46:35      0      1     1000       1       0   0.025 sakila.film_category
09-10T10:46:35      0      0     1000       1       0   0.031 sakila.film_text
09-10T10:46:35      0      0     4581       1       0   0.287 sakila.inventory
09-10T10:46:35      0      0        6       1       0   0.035 sakila.language
09-10T10:46:36      0      0    16049       1       0   0.312 sakila.payment
09-10T10:46:36      0      0    16044       1       0   0.320 sakila.rental
09-10T10:46:36      0      0        2       1       0   0.030 sakila.staff
09-10T10:46:36      0      0        2       1       0   0.027

You see that there is a difference in the tables and in the table sakila.film_category. The tool does not report the actual number of differences, but rather the number of different chunks. To get the actual differences, we need to use a different tool, which utilises the checksum table that the previous step created.

Show the differences with pt-tabel-sync

The pt-table-sync tool is the counter part for the pt-table-checksum util. It can print or even replay the SQL statements that would render the slave the same state again to be in sync with the master. We can run a dry-run first, as the tool is potentially dangerous.

pt-table-sync --dry-run  --replicate=percona.checksums --sync-to-master h= -P 3306 \
   -u percona -pSECRET --ignore-databases mysql,percona
# NOTE: --dry-run does not show if data needs to be synced because it
#       does not access, compare or sync data.  --dry-run only shows
#       the work that would be done.
# Syncing via replication P=3306,h=,p=...,u=percona in dry-run mode, without accessing or comparing data
#      0       0      0      0 Chunk     08:57:51 08:57:51 0
#      0       0      0      0 Nibble    08:57:51 08:57:51 0    sakila.film_category

With –dry-run, you only see affected tables, but not the actual data because it does not really access the databases tables in question. Use –print additionally or instead of dry-run to get a list:

pt-table-sync --print --replicate=percona.checksums --sync-to-master h= -P 3306 \
  -u percona -pSECRET --ignore-databases mysql,percona
REPLACE INTO `sakila`.`city`(`city_id`, `city`, `country_id`, `last_update`) VALUES \
   ('590', 'Yuncheng', '23', '2006-02-15 04:45:25') 
  \ /*percona-toolkit src_db:sakila src_tbl:city  ...
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ... 
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('997',... 
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('998', '11 ...
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('999', '3', ...
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('1000', '5', ... 

The command shows how we can rename back from Innsbruck to Yuncheng again and also provides the INSERT statements to get the deleted records back.When we replace –print with –execute, the data gets written to the master and replicated to the slave. To allow this, we need to set the permissions on the master

GRANT INSERT, UPDATE, DELETE ON sakila.* TO 'percona'@'';
pt-table-sync --execute  --replicate=percona.checksums --check-child-tables \ 
  --sync-to-master h= -P 3306 -u percona -pSECRET --ignore-databases mysql,percona
REPLACE statements on can adversely affect child table `sakila`.`address` 
   because it has an ON UPDATE CASCADE foreign key constraint. 
   See --[no]check-child-tables in the documentation for more information. 
   --check-child-tables error  while doing on

This error indicates that updating the city table has consequences, because it is a FK to child tables. In this example, we are bold and ignore this warning. This is absolutely not recommended for real world scenarios.

pt-table-sync --execute  --replicate=percona.checksums --no-check-child-tables \
   --no-foreign-key-checks --sync-to-master h= -P 3306 -u percona -pSECRET \ 
   --ignore-databases mysql,percona

The command –no-check-child-tables ignores child tables and the command –no-foreign-key-checks ignores foreign keys.

Run the checksum command again to verify that the data has been restored:

pt-table-checksum --replicate=percona.checksums --create-replicate-table --empty-replicate-table \ 
   --recursion-method=dsn=t=percona.DSN_Table --ignore-databases mysql,percona 
   -h -P 3306 -u percona -pSECRET

09-10T11:24:42      0      0      200       1       0   0.268
09-10T11:24:42      0      0      603       1       0   0.033 sakila.address
09-10T11:24:42      0      0       16       1       0   0.029 sakila.category
09-10T11:24:42      0      0      600       1       0   0.275
09-10T11:24:42      0      0      109       1       0   0.023
09-10T11:24:43      0      0      599       1       0   0.282 sakila.customer
09-10T11:24:43      0      0     1000       1       0   0.046
09-10T11:24:43      0      0     5462       1       0   0.284 sakila.film_actor
09-10T11:24:43      0      0     1000       1       0   0.036 sakila.film_category
09-10T11:24:43      0      0     1000       1       0   0.279 sakila.film_text
09-10T11:24:44      0      0     4581       1       0   0.292 sakila.inventory
09-10T11:24:44      0      0        6       1       0   0.031 sakila.language
09-10T11:24:44      0      0    16049       1       0   0.309 sakila.payment
09-10T11:24:44      0      0    16044       1       0   0.325 sakila.rental
09-10T11:24:44      0      0        2       1       0   0.029 sakila.staff
09-10T11:24:44      0      0        2       1       0   0.028

0 DIFFS, we are done!














Continue reading

Using Hibernate Search with Spring Boot

Spring Boot is a framework, that makes it much easier to develop Spring based applications, by following a convention over configuration principle (while in contrast Spring critics claim that the framework’s principle is rather configuration over everything). In this article, I am going to provide an example how to achieve the following:

  • Create a simple Web application based on Spring Boot
  • Persist and access data with Hibernate
  • Make it searchable with Hibernate Search (Lucine)

I use Eclipse with a Gradle plugin for convenience. MySQL will be our back-end for storing the data. The full example can be obtained from my Github Repository.

Bootstrapping: Create a Simple Spring Boot Webapp

The easiest way to start with Spring Boot is heading over to and create a new project. In this example, I will use Gradle for building the application and handling the dependencies and I add Web and JPA starters.



Download the archive to your local drive and extract it to a folder. I called the project SearchaRoo.

Import the Project with Eclipse

Import it as an existing Gradle Project in Eclipse by using the default settings. You will end up with a nice little project structure as shown below:

We have a central application starter class denoted, package definitions, application properties and even test classes. The great thing with Spring Boot is that it is very simple to start and that you can debug it as every other local Java application. There is no need for remote debugging or complex application server setups.

Prepare the Database

We need a few permissions on our MySQL instance before we can start.

CREATE DATABASE spring_employees;
CREATE USER 'dev'@'localhost' IDENTIFIED BY 'sEcReT';
    on spring_employees.* TO 'dev'@'localhost';
GRANT RELOAD on *.* TO 'dev'@'localhost';

We can then add the connection details into the file. We will edit this file several times when the complexity of this project increases.

# ===============================
# ===============================

# Specify the DBMS
spring.jpa.database = MYSQL

# Show or not log for each sql query = true

Now the basic database setup is done. We can then start adding model classes.

Getting some Employees on Board

MySQL offers a rather small but well documented sample database called employees, which is hosted on Github.  Obtain and import the data as follows:

git clone
cd test_db
mysql -u dev -p sEcReT < employees.sql

The script creates a new schema called employees and you will end up with a schema like this:

In the course of this article, we are going to model this schema with Java POJOs by annotating the entities and the a appropriate fields with JPA.


Before we can start modelling the entities in Java, have a look at the Gradle build file. We include additional dependencies for the MySQL connector and Apache commons.

buildscript {
	ext {
		springBootVersion = '1.5.1.RELEASE'
	repositories {
	dependencies {

apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'org.springframework.boot'

jar {
	baseName = 'SearchaRoo'
	version = '0.0.1-SNAPSHOT'

sourceCompatibility = 1.8

repositories {

dependencies {

Modelling Reality

The next step covers modelling the data which we imported with Java POJOs. Obviously this is not the most natural way, because in general you would create the model first and then add data to it, but as we already had the data we decided to go in this direction. In the file, set the database to the imported employees database and set the Hibernate create property to validate. With this setting, we can confirm that we modelled the Java classed in accordance with the database model defined by the MySQL employees database. 

An example of such a class is shown below, the other classes can be found in the Github repository.

package at.stefanproell.model;

import java.util.Date;
import java.util.List;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.EnumType;
import javax.persistence.Enumerated;
import javax.persistence.Id;
import javax.persistence.OneToMany;

public class Employee {
    private int employee_id;
    private Date birthdate;
    private String firstname;
    private String lastname;
    @Column(name="gender",columnDefinition = "ENUM('M', 'F', 'UNKNOWN') DEFAULT 'UNKNOWN'")
    private Gender gender;
    private Date hireDate;

    List<Title> titles; 
    List<Department_Employee> employee_department;

    // Setters and getters

Now that we have prepared the data model, our schema is now fixed and does not change any more. We can deactivate the Hibernate based dynamic generation of the database tables and use the Spring database initialization instead.To see if we modelled the data correctly, we import MySQL employee data dump we obtained before and import it into our newly created schema, which maps the Java POJOs.

Importing the Initial Data

In the next step, we import the data from the MySQL employee database into our schema spring_hibernate. This schema contains the tables that Hibernate created for us. The following script copies the data between the two schemata. If you see an error, then there is an issue with your model.

-- The original data is stored in the database called employees
-- Spring created the new schema called spring_employees
USE `spring_employees`;

-- Departments

INSERT INTO `spring_employees`.`departments`
SELECT `departments`.`dept_no`,
FROM `employees`.`departments`;

-- Employees

INSERT INTO `spring_employees`.`employees`
SELECT `employees`.`emp_no`,
FROM `employees`.`employees`;

-- Join table 
INSERT INTO `spring_employees`.`dept_emp`
FROM `employees`.`dept_emp`;

-- Join table 

INSERT INTO `spring_employees`.`dept_manager`
SELECT `dept_manager`.`emp_no`,
FROM `employees`.`dept_manager`;

-- Titles

INSERT INTO `spring_employees`.`titles`
SELECT `titles`.`emp_no`,
FROM `employees`.`titles`;

-- Salaries

INSERT INTO `spring_employees`.`salaries`
SELECT `salaries`.`emp_no`,
FROM `employees`.`salaries`;

We now imported the data in the database schema that we defined for our project. Spring can load schema and initial data during start-up. So we provide two files, one containing the schema and the other one containing the data. To do that, we create two dumps of the database. One containing the schema only, the other one containing the data only.

mysqldump -u dev -psEcReT --no-data --databases spring_employees > src/main/resources/schema.sql
mysqldump -u dev -psEcReT --no-create-info --databases employees > src/main/resources/data.sql

By deactivating the Hibernate data generation and activating the Spring way, the database gets initialized every time the application starts. Change and edit the following lines in the


Before we can import the data with the scripts, make sure to drop the schema and disable foreign key checks in the schema file and enable them again at the end. Spring ignores the actionable MySQL comments. So your schema file should contain this

DROP DATABASE IF EXISTS `spring_employees`;
SET foreign_key_checks = 0;

// rest of the code

SET foreign_key_checks = 1;

And also insert the two foreign key statements to the data file. Note that the import can take a while. If you are happy with the initialized data, you can deactivate the initialization by setting the variable to false: spring.datasource.initialize=false

The file meanwhile looks like this:

# ===============================
# ===============================

# Specify the DBMS
spring.jpa.database = MYSQL

# Show or not log for each sql query = true
spring.datasource.connectionProperties=useUnicode=true;characterEncoding=utf-8; = org.hibernate.dialect.MySQL5Dialect
# Do not initialize anything

Adding Hibernate Search

Hibernate search offers full-text search capabilities by using a dedicated index. We need to add the dependencies to the build file.

dependencies {

Refresh the gradle file after including the search dependencies.

Adding Hibernate Search Dependencies

In this step, we annotate the model POJO classes and introduce the full-text search index. Hibernate search utilises just a few basic settings to get started. Add the following variables to tne application properties file.

# ===============================
# ===============================

# Spring Data JPA will take any properties under* and 
# pass them along (with the prefix stripped) once the EntityManagerFactory is 
# created.

# Specify the DirectoryProvider to use (the Lucene Directory) = filesystem

# Using the filesystem DirectoryProvider you also have to specify the default
# base directory for all indexes (make sure that the application have write
# permissions on such directory) = /tmp/SearchRroo/

Please not that storing the Lucene index in the tmp directory is not the best idea, but for testing we can use this rather futile location. We also use the filesystem to store the index, as this is the simplest approach.

Create a Service

In order to facilitate Hibernate Search on our data, we add a service class, which offers methods for searching. The service uses a configuration, which is injected by Spring during run time. The configuration is very simple.

package at.stefanproell.service;

import javax.persistence.EntityManager;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

public class HibernateSearchConfiguration {
	private final Logger logger = LoggerFactory.getLogger(HibernateSearchConfiguration.class);

	private EntityManager entityManager;

	HibernateSearchService hibernateSearchService() {
		HibernateSearchService hibernateSearchService = new HibernateSearchService(entityManager);
		return hibernateSearchService;

The @Configuration is loaded when Spring builds the application context. It provides a bean of our service, which can then be injected into the application. The service itself provides methods for creating and searching the index. In this example, the search method is very simple: it only searches on the first and the last name of an employee and it allows users to make one mistake (distance 1).

package at.stefanproell.service;

import java.util.List;

import javax.persistence.EntityManager;
import javax.persistence.NoResultException;
import javax.persistence.PersistenceContext;

import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;

import at.stefanproell.model.Employee;
import ch.qos.logback.classic.Logger;

public class HibernateSearchService  {

    private final Logger logger = (Logger) LoggerFactory.getLogger(HibernateSearchService.class);

    private final EntityManager entityManager;

    public HibernateSearchService(EntityManager entityManager) {
        this.entityManager = entityManager;

    public void initializeHibernateSearch() {

        try {
            FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
    public List<Employee> fuzzySearch(String searchTerm){

        FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
        QueryBuilder qb = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Employee.class).get();
        Query luceneQuery = qb.keyword().fuzzy().withEditDistanceUpTo(1).withPrefixLength(1).onFields("firstname", "lastname")

        javax.persistence.Query jpaQuery = fullTextEntityManager.createFullTextQuery(luceneQuery, Employee.class);

        // execute search

        List<Employee> employeeList = null;
        try {
            employeeList  = jpaQuery.getResultList();
        } catch (NoResultException nre) {
            logger.warn("No result found");


        return employeeList;


The service implementation currently only contains an initialization method, which used for creating the Lucene index on the filesystem. Before we can test the index, we need to have at least one indexed entity. This can be achieved by simply adding the annotation @Indexed to the POJO.

package at.stefanproell.model;

import java.util.Date;
import java.util.List;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.EnumType;
import javax.persistence.Enumerated;
import javax.persistence.Id;
import javax.persistence.OneToMany;


public class Employee {

When we start the application now, we can see that Hibernate creates the index and a short check on disk shows that it worked:

$ tree /tmp/SerchaRoo/
└── at.stefanproell.model.Employee
    ├── _6.fdt
    ├── _6.fdx
    ├── _6.fnm
    ├── _6_Lucene50_0.doc
    ├── _6_Lucene50_0.pos
    ├── _6_Lucene50_0.tim
    ├── _6_Lucene50_0.tip
    ├── segments_2
    └── write.lock

1 directory, 10 files

So far, we did not tell Hibernate search which fields we want add to the index and thus make them full-text searchable. The following listing shows the annotated @Fields.

package at.stefanproell.model;

import java.util.Date;
import java.util.List;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.EnumType;
import javax.persistence.Enumerated;
import javax.persistence.Id;
import javax.persistence.OneToMany;


public class Employee {
	private int employee_id;
	@Field(index = Index.YES, analyze=Analyze.NO, store = Store.YES)
	@DateBridge(resolution = Resolution.DAY)
	private Date birthdate;
	@Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO)
	private String firstname;
	@Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO)
	private String lastname;
	@Column(name="gender",columnDefinition = "ENUM('M', 'F', 'UNKNOWN') DEFAULT 'UNKNOWN'")
	private Gender gender;
	@Field(index = Index.YES, analyze=Analyze.NO, store = Store.YES)
	@DateBridge(resolution = Resolution.DAY)
	private Date hireDate;

	List<Title> titles; 
	List<Department_Employee> employee_department;


Starting the application again re-creates the index. Time for some basic searching.

Seaching Fulltext

Hibernate Search offers many features, which are not offered in a similar quality by native databases. One interesting feature is for instance fuzzy search, which allows finding terms within an edit distance of up to two letters. The method for searchin on two fields was already shown above. We can use this method in a small JUnit test:

package at.stefanproell;

import java.util.List;

import org.junit.Test;
import org.junit.runner.RunWith;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import at.stefanproell.model.Employee;
import at.stefanproell.service.HibernateSearchService;
import ch.qos.logback.classic.Logger;

public class SearchaRooApplicationTests {
	private final Logger logger = (Logger) LoggerFactory.getLogger(SearchaRooApplicationTests.class);

	HibernateSearchService service;
	public void searchSingleTerm() {
		List<Employee> result =service.fuzzySearch("Muntz");"Found " + result.size() + " employees");
		for(Employee emp : result){"Id: "+ emp.getEmployee_id() + " " + emp.getFirstname() + " " + emp.getLastname() );

The user made a small typo by entering Chrisu instead of Chris. As we allowed two mistakes, we receive a list of similar names and the test evaluates to passed. Sone possible results are shown below.

Id: 456935 Hiroyasu Christ
Id: 467982 Alselm Christ
Id: 482230 Dinah Christ
Id: 484411 Akeel Christ
Id: 498582 Mihalis Christ
Id: 14298 Chriss Granlund
Id: 14418 Chriss Kadhim
Id: 18643 Christ Bernini
Id: 19919 Chrisa Gornas
Id: 20537 Chrisa Rando


Hibernate Search is a great tool and can be easily integrsted into Spring Boot Applicstions. In this post, I gave a minimalistic example how fulltext fuzzy search can be added to existing databases and allows a flexible and powerful search. A few more advanced thoughts on Hibernate Search are given in this blog post here. The Hibernate Search documentation contains a lot of useful and more elaborate examples. The full example can be obtained on Github.

Continue reading