Hibernate Search and Spring Boot: Building Bridges

Hibernate Search is a very convenient way for storing database content in a Lucine index and add fulltext search capabilities to data driven projects simply by annotating classes. It can be easily integrated into Spring Boot applications and as long as only the basic features are used, it works out of the box.  The fun starts when the Autoconfiguration cannot find out how to properly configure things automatically, then it gets tricky quite quickly. Of course this is natural behaviour, but one gets spoiled quite quickly. 

Using the latest Features: Hibernate ORM, Hibernate Search and Spring Boot

The current version of Spring Boot is 1.5.2. This version uses Hibernate ORM 5.0. The latest stable Hibernate Search versions are 5.6.1.Final and 5.7.0.Final, which in  in contrast require Hibernate ORM 5.1 and 5.2 respectively. Also you need Java 8 now. For this reason if you need the latest Spring Search features in combination with Spring Boot, you need to adapt the dependencies as follows:

buildscript {
	ext {
		springBootVersion = '1.5.1.RELEASE'
	}
	repositories {
		mavenCentral()
	}
	dependencies {
		classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}")
	}
}

apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'org.springframework.boot'

jar {
	baseName = 'SearchaRoo'
	version = '0.0.1-SNAPSHOT'
}

sourceCompatibility = 1.8
targetCompatibility = 1.8

repositories {
	mavenCentral()
}


dependencies {
	compile('org.springframework.boot:spring-boot-starter-data-jpa')
	compile('org.springframework.boot:spring-boot-starter-web')
	testCompile('org.springframework.boot:spring-boot-starter-test')
	compile("mysql:mysql-connector-java")
	compile('org.apache.commons:commons-lang3:3.5')
	compile('org.springframework.boot:spring-boot-starter-test')
	compile('org.springframework.boot:spring-boot-starter-logging')
	compile('org.springframework.boot:spring-boot-starter-freemarker')

	// Hibernate Search
    compile("org.hibernate:hibernate-core:5.2.8.Final")
    compile("org.hibernate:hibernate-search-orm:5.7.0.Final")    
            configurations.all 
    {
    	exclude group: "org.hibernate:", module: "hibernate-entitymanager"
	}
}

Note that the Hibernate Entity Manager needs to be excluded, because it has been integrated into the core in the new Hibernate version. Details are given in the Spring Boot documentation.

Enforcing the Dependencies to be Loaded in the Correct Sequence 

As written earlier, Spring Boot takes care of a lot of configurations for us. Most of the time, this works perfectly and reduces the pain for configuring a new application manually. In some particular cases, Spring cannot figure out that there exists a dependency between different services, which needs to be resolved in a specified order. A typical use case is the implementation of FieldBridges for Hibernate Search. FieldBrides translate between the actual Object from the Java World and the representation of such an object in the Lucene index. Typically an EnumBridge is used for indexing Enums, which are often used for realizing internationalization (I18n).

When the Lucene Index  is created, Hibernate checks if Enum fields need to be indexed and if there exist Bridge that converts between the object and the actual record in the Index. The problem here is that Hibernate JPA is loaded at a very early stage in the Spring Boot startup proces. The problem only arises if the BridgeClass utilises @Autowired fields which get injected. Typically, these fields would get injected when the AnnotationBeanConfigurerAspect bean is loaded. Hibernate creates the session with the session factory auto configuration before the spring configurer aspect bean was loaded. So the FieldBridge used by Hibernate during the initialization of the index does not have the service injected yet, causing a nasty Null Pointer Exception. 

Example EnumBridge

The following EnumBridge example utilises an injected Service, which needs to be available before Hibernate starts. If not taken care of, this causes a Null Pointer Exception.

@Configurable
public class HibernateSearchEnumBridgeExample extends EnumBridge
{
    private I18nMessageService i18nMessageService;

    @Autowired
    @Required
    public void setI18nMessageService(I18nMessageService service) {
	this.i18nMessageService = service;
    }

    @Override
    public String objectToString(Object object)
    {
         return  i18nMessageService.getMessageForEnum(object);
    }


    @Override
    public Enum<? extends Enum> stringToObject(String name)
    {
        return Enum.valueOf(name);

    }

    @Override
    public void setAppliedOnType(Class<?> returnType)
    {
        super.setAppliedOnType(returnType);
    }
}

Enforce Loading the Aspect Configurer Before the Session Factory

In order to enforce that the AnnotationBeanConfigurerAspect is created before the Hibernate Session Factory is created, we simply implement our own HibernateJpaAutoConfiguration by extension and add the AnnotationBeanConfigurerAspect to the constructor. Spring Boot now knows that it needs to instantiate the AnnotationBeanConfigurerAspect before it can instantiate the HibernateJpaAutoConfiguration and we then have wired Beans ready for the consumption of the bridge. I found the correct hint here and here.

@Configuration
public class HibernateSearchConfig extends HibernateJpaAutoConfiguration {

	public HibernateSearchConfig(DataSource dataSource, JpaProperties jpaProperties,
				AnnotationBeanConfigurerAspect beanConfigurerAspect,
				ObjectProvider<JtaTransactionManager> jtaTransactionManager,
				ObjectProvider<TransactionManagerCustomizers> transactionManagerCustomizers) {

			super(dataSource, jpaProperties, jtaTransactionManager, transactionManagerCustomizers);
	}
}

As it has turned out, using @DependsOn annotations did not work and also @Ordering the precedence of the Beans was not suffucient. With this little hack, we can ensure the correct sequence of initialization.

Continue reading


Deploying MySQL in a Local Development Environment

Installing MySQL via apt-get is a simple task, but the migration between different MySQL versions requires planning and testing. Thus installing one central instance of the database system might not be suitable, when the version of MySQL or project specific settings should be switched quickly without interfering with other applications. Using one central instance can quickly become cumbersome. In this article, I will describe how any number of MySQL instances can be stored and executed from within a user’s home directory.

Adapting MySQL Data an Log File Locations

Some scenarios might require to run several MySQL instances at once, other scenarios cover sensitive data, where we do not want MySQL to write any data on non-encrypted partitions. This is especially true for devices which can get easily stolen, for instance laptops.  If you use a laptop for developing your applications from time to time, chances are good that you need to store sensitive data in a database, but need to make sure that the data is encrypted when at rest. The data stored in a database needs to be protected when at rest.

This can be solved with full disk encryption, but this technique has several disadvantages. First of all, full disk encryption only utilises one password. This entails, that several users who utilise a device need to share one password, which reduces the reliability of this approach. Also when the system needs to be rebooted, full disk encryption can become an obstacle, which increases the complexity further.

Way easier to use is the transparent home directory encryption, which can be selected during many modern Linux setup procedures out of the box. We will use this encryption type for this article, as it is reasonable secure and easy to setup. Our goal is to store all MySQL related data in the home directory and run MySQL with normal user privileges.

Creating the Directory Structure

The first step is creating a directory structure for storing the data. In this example, the user name is stefan, please adapt to your needs.

$ cd ~
$ mkdir -p MySQL-5.6-Local/MySQL-5.6-Data # Data storage
$ mkdir -p MySQL-5.6-Local/MySQL-5.6-Conf # Config file
$ mkdir -p MySQL-5.6-Local/MySQL-5.6-Log # Log file
$ mkdir -p MySQL-5.6-Local/MySQL-5.6-Tmp # Tmp
$ mkdir -p MySQL-5.6-Local/MySQL-5.6-Scripts # Start and stop scripts

Create a Configuration File

Make sure to use absolute paths and utilise the directories we created before. Store this file in MySQL-5.6-Local/MySQL-5.6-Conf/my-5.6.cnf. The configuration is pretty self explanatory.

[client]
port        = 3306
socket      = /home/stefan/MySQL-5.6-Local/mysqld-5.6.sock

[mysqld_safe]
pid-file    = /home/stefan/MySQL-5.6-Local/mysqld-5.6.pid
nice        = 0
log_error=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Log/mysql_error.log

[mysqld]
user        = stefan
pid-file    = /home/stefan/MySQL-5.6-Local/mysqld-5.6.pid
socket      = /home/stefan/MySQL-5.6-Local/mysqld-5.6.sock
port        = 3306
basedir     = /usr
datadir     = /home/stefan/MySQL-5.6-Local/MySQL-5.6-Data
tmpdir      = /home/stefan/MySQL-5.6-Local/MySQL-5.6-Tmp
log_error=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Log/mysql_error.log
bind-address = 127.0.0.1

Stop the Running MySQL Instance

If you already have a running MySQL instance, make sure to shut it down. You can also disable MySQL from starting automatically.

sudo service msql stop
echo "manual" | sudo tee /etc/init/mysql.override

Setting up Apparmor

Apparmor protects sensitive applications by defining in which directory they might write. We need to update this configuration to suit our needs. We need to make sure that the global configuration file for the central MySQL instance also includes an additional local information. Edit this file first: /etc/apparmor.d/usr.sbin.mysqld and make sure that the reference for the local file is not commented out.

# vim:syntax=apparmor
# Last Modified: Fri Feb 28 18:06:30 2014
#include <tunables/global>
 
 
/usr/sbin/mysqld {
  #include <abstractions/base>
  #include <abstractions/nameservice>
  #include <abstractions/user-tmp>
 
# .... shortened
 
  # Include local configuration!!
  include <local/usr.sbin.mysqld>
 
}

Now we need to add the directories in stean’s home directory to the local file by editing /etc/apparmor.d/local/usr.sbin.mysqld .

# Site-specific additions and overrides for usr.sbin.mysqld. 
# For more details, please see /etc/apparmor.d/local/README. 
# Site-specific additions and overrides for usr.sbin.mysqld. 
# For more details, please see /etc/apparmor.d/local/README. 

/home/stefan/MySQL-5.6-Local/ r, 
/home/stefan/MySQL-5.6-Local/** rwk,

An incorrect Apparmor configuration is often the cause of permission errors, which can be a pain. Make sure to reload the the Apparmor service and if you struggle with it, consider to disable it temporarily and check if the rest works. Do not forget to turn it on again.

sudo service apparmor reload # reload
sudo service apparmor teardown # disable for testing

Initialize the Local MySQL Instance

Now it is time to initialize the MySQL instance. In this step, MySQL creates all the files it needs in the data directory. It is important that the data directory is empty, when you initiate the following commands.

sudo mysql_install_db --defaults-file=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Conf/my-5.6.cnf --datadir=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Data --basedir=/usr

Note that this command is marked as deprecated. It works with MySQL 5.6 and MySQL 5.7, but can be removed.

Start and Stop the Instance

You can now start the MySQL instance with the following command:

mysqld_safe --defaults-file=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Conf/my-5.6.cnf

For your convenience, add a custom client configuration in your $HOME/.my.cnf and point it to the user defined socket.

[client]
port        = 3306
socket      = /home/stefan/MySQL-5.6-Local/mysqld-5.6.sock

In addition, startup and shutdown scripts are useful as well. Place both scripts in the directory we created before and add execution permissions with chmod +x .

# Startup script
mysqld_safe --defaults-file=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Conf/my-5.6.cnf --timezone=utc

The stop script is similar.

# Shutdown script
mysqladmin -u root -S /home/stefan/MySQL-5.6-Local/mysqld-5.6.sock shutdown

Conclusion

The technique described above allows to install and run multiple MySQL instances from within the user’s home directory. The MySQL instances run with user privileges and can utilise dedicated data and log file directories. As the all data is stored within the $HOME directory, we can easily apply transparent encryption to protect data at rest.

Continue reading


A MySQL 5.7 Cluster Based on Ubuntu 16.04 LTS – Part 2

In a recent article, I described how to setup a basic MySQL Cluster with two data nodes and a combined SQL and management node. In this article, I am going to highlight a hew more things and we are going to adapt the cluster a little bit.

Using Hostnames

For making our lives easier, we can use hostnames which are easier to remember than IP addresses. Hostnames can be specified for each VM in the file /etc/hosts . For each request to the hostname, the operating system will lookup the corresponding IP address. We need to change this file on all three nodes to the following example:

root@node1:~# cat /etc/hosts
127.0.0.1	localhost
192.168.0.81	node1
192.168.0.82	node2
192.168.0.83	node3


# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Now we can lookup IP addresses by the associated hostname.

Adapting Node1

The management and the SQL node are both running in the same VM. Now that we introduced the hostnames for all machines, we need to reconfigure the cluster manager and the MySQL Server. The cluster configuration is stored in the file /opt/mysql/server-5.7/mysql-cluster/config.ini . Replace all the IP addresses with the corresponding host name.

root@node1:~# cat /opt/mysql/server-5.7/mysql-cluster/config.ini
[ndb_mgmd]
NodeId=1
# Management process options:
hostname= node1  # Hostname of the manager
datadir=/opt/mysql/server-5.7/mysql-cluster/cluster-data  # Directory for the log files

[ndbd]
NodeId=2
hostname= node2    # Hostname of the first data node
datadir=/usr/local/mysql/data   # Remote directory for the data files

[ndbd]
NodeId=3
hostname= node3
datadir=/usr/local/mysql/data   # Remote directory for the data files

[mysqld]
# SQL node options:
NodeId=4
hostname=node1

For reconfiguring the SQL node, we need to adapt the file /etc/my.cnf . We need to replace all IP addresses in this file too.

root@node1:~# cat /etc/my.cnf 
[mysqld_safe]
socket          = /tmp/mysql.sock
nice            = 0

[mysqld]
#
# * Basic Settings
#

user            = mysql
pid-file        = /tmp/mysql.pid
socket          = /tmp/mysql.sock
port            = 3306
basedir         = /opt/mysql/server-5.7
datadir         = /opt/mysql/server-5.7/data
tmpdir          = /tmp
skip-external-locking
bind-address=node1

key_buffer_size         = 16M
max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 8
query_cache_limit       = 1M
query_cache_size        = 16M

log_error = /var/log/mysql/error.log
expire_logs_days        = 10
max_binlog_size   = 100M

#
# ssl-ca=/etc/mysql/cacert.pem
# ssl-cert=/etc/mysql/server-cert.pem
# ssl-key=/etc/mysql/server-key.pem


sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES 

ndbcluster # run NDB storage engine
ndb-connectstring=node1

[mysql_cluster]
# Options for MySQL Cluster processes:
ndb-connectstring=node1 

If done, stop the cluster and the SQL node like this:

root@node1:~# /opt/mysql/server-5.7/bin/ndb_mgm -e shutdown
Connected to Management Server at: node1:1186
3 NDB Cluster node(s) have shutdown.
Disconnecting to allow management server to shutdown.

root@node1:~# service mysql stop

Adapt the Data Nodes

Replacing the IP adresses with hostnames is pretty straightforward. Change the IP address on each node with the host name in the file /etc/my.cnf :

[mysqld]
# Options for mysqld process:
ndbcluster

[mysql_cluster]
ndb-connectstring=node1

Startup the Cluster

Now that we have exchanged all IP addresses for the hostname, we can restart the cluster as follows:

# Login into node1
root@node1:~# /opt/mysql/server-5.7/bin/ndb_mgmd --reload -f /opt/mysql/server-5.7/mysql-cluster/config.ini
MySQL Cluster Management Server mysql-5.7.16 ndb-7.5.4

# Login into each data node and restart the ndbd process
stefan@node2:~$ sudo /opt/mysql/server-5.7/bin/ndbd
stefan@node3:~$ sudo /opt/mysql/server-5.7/bin/ndbd

# Start the MySQL instance
root@node1:~# service mysql start

Now the cluster should be up again. Verify it like this:

root@node1:~# /opt/mysql/server-5.7/bin/ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: node1:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=2	@192.168.0.82  (mysql-5.7.16 ndb-7.5.4, Nodegroup: 0, *)
id=3	@192.168.0.83  (mysql-5.7.16 ndb-7.5.4, Nodegroup: 0)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@192.168.0.81  (mysql-5.7.16 ndb-7.5.4)

[mysqld(API)]	1 node(s)
id=4	@192.168.0.81  (mysql-5.7.16 ndb-7.5.4)

ndb_mgm> 

Encrypt Data in Transit

For improving the security of the cluster, we can encrypt the traffic which is exchanged between the MySQL node and the MySQL client. Please note that this encryption method is not specific for the cluster, but rather encrypts the traffic between the MySQL client and the MySQL server. You can use this method also for standard MySQL server setups. It does not cover encryption between the data nodes. By design, data nodes are intended to be run in a private network, not via the open Interwebs.

Create a CA Authority Key and Certificate

To do so, we work on node 1. In the first step we create a CA authority and a CA certificate.

# Change into the installation directory
root@node1: cd /opt/mysql/server-5.7
# Create a folder called certs
root@node1:/opt/mysql/server-5.7# mkdir certs
# Enter the folder
root@node1:/opt/mysql/server-5.7# cd certs/
# Create the CA key
root@node1:/opt/mysql/server-5.7/certs# openssl genrsa 2048 > ca-key.pem
# Using that key, create a new CA certificate valid for 365 days. Leave the password empty 
# Example data to provide in the next step:
## Country Name (2 letter code) [AU]:AT
## State or Province Name (full name) [Some-State]:Tirol
## Locality Name (eg, city) []:Innsbruck
## Organization Name (eg, company) [Internet Widgits Pty Ltd]:Acme
## Organizational Unit Name (eg, section) []:Infra
## Common Name (e.g. server FQDN or YOUR name) []:ca-node
## Email Address []:test@example.org 

openssl req -new -x509 -nodes -days 365 -key ca-key.pem -out ca-cert.pem



Create a Server Certificate

In the second step, we create a server certificate and sign it with the CA from the previous step.

# Create a new key
root@node1:/opt/mysql/server-5.7/certs#  openssl req -newkey rsa:2048 -days 365 -nodes -keyout server-key.pem -out server-req.pem
# Convert the key 
root@node1:/opt/mysql/server-5.7/certs# openssl rsa -in server-key.pem -out server-key.pem
# Sign it with the CA
openssl x509 -req -in server-req.pem -days 365 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 -out server-cert.pem

The server key is ready. We now need to create a client certificate.

Create a Client Certificate

In a rather similar fashion, we create a client certificate.

# Create the certificate
openssl req -newkey rsa:2048 -days 365 -nodes -keyout client-key.pem -out client-req.pem
# Create the key, again passwordless
openssl req -newkey rsa:2048 -days 365 -nodes -keyout client-key.pem -out client-req.pem
# Sign it
openssl x509 -req -in client-req.pem -days 365 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 -out client-cert.pem
# Convert it
openssl rsa -in client-key.pem -out client-key.pem

Now we have created the keys and can configure MySQL for using SSL encryption.

Configure MySQL to use Encryption

Add the following configuration parameters to the /etc/my.cnf file in order to define server and client certificates and keys.

# MySQL Server
[mysqld]
ssl-ca = /opt/mysql/server-5.7/certs/ca-cert.pem
ssl-key = /opt/mysql/server-5.7/certs/server-key.pem
ssl-cert = /opt/mysql/server-5.7/certs/server-cert.pem

# MySQL Client Configuration
[mysql]
ssl-ca = /opt/mysql/server-5.7/certs/ca-cert.pem
ssl-key = /opt/mysql/server-5.7/certs/client-key.pem
ssl-cert = /opt/mysql/server-5.7/certs/client-cert.pem

Restart the MySQL server and test the SSL encryption. You can immediately see that the client uses SSL, although it does not really make a lot of sense for the local user.

root@node1:/opt/mysql/server-5.7/certs# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.7.16-ndb-7.5.4-cluster-gpl MySQL Cluster Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

You are enforcing ssl conection via unix socket. Please consider
switching ssl off as it does not make connection via unix socket
any more secure.
mysql> 

Let’s create a new test user, allow remote access from 192.168.0.12 and enforce SSL connections:

mysql> CREATE USER 'ssl-test'@'192.168.0.12' IDENTIFIED BY 'obscure' REQUIRE SSL;
Query OK, 0 rows affected (0.01 sec)

Switch to the console of the machine 192.168.0.12 and try to connect. You can then display some variables about the SSL configuration.

stefan@192.168.0.12:~$ mysql -p -h 192.168.0.81 -u ssl-test -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 5.7.16-ndb-7.5.4-cluster-gpl MySQL Cluster Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> 

mysql> show variables like '%ssl%';
mysql>  show variables like '%ssl%';
+---------------+---------------------------------------------+
| Variable_name | Value                                       |
+---------------+---------------------------------------------+
| have_openssl  | YES                                         |
| have_ssl      | YES                                         |
| ssl_ca        | /opt/mysql/server-5.7/certs/ca-cert.pem     |
| ssl_capath    |                                             |
| ssl_cert      | /opt/mysql/server-5.7/certs/server-cert.pem |
| ssl_cipher    |                                             |
| ssl_crl       |                                             |
| ssl_crlpath   |                                             |
| ssl_key       | /opt/mysql/server-5.7/certs/server-key.pem  |
+---------------+---------------------------------------------+
9 rows in set (0,01 sec)

To verify that you are actually using SSL encryption, the following command helps, it gives SSL: Cipher in use is DHE-RSA-AES256-SHA :

mysql> status
--------------
mysql  Ver 14.14 Distrib 5.7.16, for Linux (x86_64) using  EditLine wrapper

Connection id:		11
Current database:	
Current user:		ssl-test@192.168.0.12
SSL:			Cipher in use is DHE-RSA-AES256-SHA
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.7.16-ndb-7.5.4-cluster-gpl MySQL Cluster Community Server (GPL)
Protocol version:	10
Connection:		192.168.0.81 via TCP/IP
Server characterset:	latin1
Db     characterset:	latin1
Client characterset:	utf8
Conn.  characterset:	utf8
TCP port:		3306
Uptime:			32 min 36 sec

Threads: 4  Questions: 14  Slow queries: 0  Opens: 114  Flush tables: 2  Open tables: 9  Queries per second avg: 0.007
--------------

What about the Data Nodes?

So far we encrypted only the traffic between the MySQL client and the MySQL server. This configuration does not differ from a single server or replication setup and does not include the traffic between the data nodes, which is not encrypted at all. MySQL cluster has been designed to be run in a controlled environment within high speed network locally. The MySQL Cluster FAQ states:

It is very unlikely that a cluster would perform reliably under such conditions, as NDB Cluster was designed and implemented with the assumption that it would be run under conditions guaranteeing dedicated high-speed connectivity such as that found in a LAN setting using 100 Mbps or gigabit Ethernet—preferably the latter. We neither test nor warrant its performance using anything slower than this.

Also, it is extremely important to keep in mind that communications between the nodes in an NDB Cluster are not secure; they are neither encrypted nor safeguarded by any other protective mechanism. The most secure configuration for a cluster is in a private network behind a firewall, with no direct access to any Cluster data or management nodes from outside.

 

 

 

Continue reading


A MySQL 5.7 Cluster Based on Ubuntu 16.04 LTS – Part 1

A Cluster Scenario

In this example we create the smallest possible MySQL cluster based on four nodes running on three machines. Node 1 will run the cluster management software, Node 2 and Node 3 will serve as dats nodes and Node 4 is the MySQSL API, which runs on the same VM on Node 1.

  +------------------+ +------------------+ +-------------------+ 
  |    Node 1 & 4    | |      Node 2      | |       Node 3      | 
  | IP: 192.168.0.81 | | IP: 192.168.0.82 | | IP: 192.168.0.83  | 
  |  ndb_mgm; MySQL  | |       ndbd       | |       ndbd        |
  +------------------+ +------------------+ +-------------------+

+------------------------------------------------------------------+
|                         VirtualBox Host                          |
+------------------------------------------------------------------+

What is a Cluster and When to Use it?

When applications grow, there are several possibilities to improve the performance of the database layer. The two technologies used in most scenarios are replication and clustering.

Replication is used for moving the data from a master database to one or more slave databases. Per default, this data exchange is asynchronously. Semi-synchronous data replication can be implemented, but if you need synchronous data replication on several machines, clusters are the way to go.

A MySQL Cluster implements a shared nothing architecture and supports synchronous replication. You can very easily add and remove nodes to your cluster and scale out horizontally. This is especially true if you need scaling the database writes. Clusers use a special storage engine called NDB. This technology offers high-availability and high-redundancy. The documentation for a MySQL 5.7 cluster can be found here.

Preconditions

In this example, we will use VirtualBox as virtualization platform and run three nodes as the smallest possible cluster. In the first step, we create a template virtual machine, which comes with the basic packages, passwordless keybased authentication and a manually configured IP address. We will clone this machine and adapt it to our needs. Parts of this tutorial are based on the digital ocean howto.

The Template

We create the first VM based on Ubuntu 16.05 LTS and install the basics. MySQL requires the libaio1 package, which we will install in the template like this:

sudo apt-get install libaio1

Then it is time to download the MySQL Cluster software, which can be obtained for free from the MySQL developer page. This are the two steps necessary:

# Download
wget https://dev.mysql.com/get/Downloads/MySQL-Cluster-7.5/mysql-cluster-gpl-7.5.4-debian7-x86_64.deb
# Install
sudo dpkg -i mysql-cluster-gpl-7.5.4-debian7-x86_64.deb

Assign a static IP address in the file /etc/network/interfaces .

# The loopback network interface
auto lo
iface lo inet loopback

## The primary network interface
auto enp0s3
iface enp0s3 inet static
address 192.168.0.81
netmask 255.255.255.0
gateway 192.168.0.2
dns-nameservers 192.168.0.2,8.8.8.8

Make sure to install the guest additions, add a MySQL user, copy the MySQL startup script etc. Now the template machine is ready. Shut it down and clone it two times. Make sure to assign a new MAC address and then assign a different, static IP address.

Setting Up the Machines

In the following steps, we are going to setup the virtul machines. We run three machines which provide four services.

Install the Manager on Node 1

The cluster will be controlled via Node 1. First, create a data directory, where we will store the configuration files.

mkdir /opt/mysql/server-5.7/mysql-cluster/cluster-data
chown -R mysql:mysql /opt/mysql/server-5.7/mysql-cluster/cluster-data

Then create the configuration file. This file defines the cluster and will be read during the startup of the cluster. Each node needs to have a unique id. Usualy, the ID starts with 0, but in this example we  defined the node IDs starting from 1 in order to align it with the IP addresses.

root@node1:~# cat /opt/mysql/server-5.7/mysql-cluster/config.ini
[ndb_mgmd]
NodeId=1
# Management process options:
hostname=192.168.0.81 #node1  # Hostname of the manager
datadir=/opt/mysql/server-5.7/mysql-cluster/cluster-data  # Directory for the log files

[ndbd]
NodeId=2
hostname=192.168.0.82 #node2    # Hostname of the first data node
datadir=/usr/local/mysql/data   # Remote directory for the data files

[ndbd]
NodeId=3
hostname=192.168.0.83
datadir=/usr/local/mysql/data   # Remote directory for the data files

[mysqld]
# SQL node options:
NodeId=4
hostname=192.168.0.81

Note that if you make changes to the config.ini files, they might not be loaded directly, but are read from a cache file. So if you change for instance the data directory, make sure to delete all cached files, e.g. ndb_1_config.bin.1*.

Install the MySQL Server on Node1

The installation of the MySQL service is pretty straight forward and follows a similar pattern as a regular MySQL setup. You just need to ensure that you use the Cluster Version which we already installed in the template machine and create symlinks accordingly. The configuration of the MySQL Server is done in /etc/my.cnf , as usual. In the following you can see an example configuration.

root@node1:~# cat /etc/my.cnf 
[mysqld_safe]
socket          = /tmp/mysql.sock
nice            = 0

[mysqld]
#
# * Basic Settings
#

user            = mysql
pid-file        = /tmp/mysql.pid
socket          = /tmp/mysql.sock
port            = 3306
basedir         = /opt/mysql/server-5.7
datadir         = /opt/mysql/server-5.7/data
tmpdir          = /tmp

skip-external-locking

key_buffer_size         = 16M
max_allowed_packet      = 16M


query_cache_limit       = 1M
query_cache_size        = 16M

log_error = /var/log/mysql/error.log

server-id              = 1
expire_logs_days       = 10
max_binlog_size   = 100M


sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES 

ndbcluster # run NDB storage engine
ndb-connectstring=192.168.0.81

[mysql_cluster]
# Options for MySQL Cluster processes:
ndb-connectstring=192.168.0.81  # location of management server

The most important bits are at the bottom of the file. The command ndbcluster makes the NDB storage engine available and the section [mysql_cluster] points to the cluster management node. In our case, the MySQL server node and the cluster management node are on the same machine. In larger settings it can be benefitial to have a dedicated VM for each task.

Make sure to set the permissions for the data directory correctly and to place the startup script in /etc/init.d/mysql . Make sure to start the MySQL server on Node 1 and check the error.log file for any strange messages.

Setting up the Data Nodes

In the next step, we need to configure the data nodes. Launch Node 2 and login.

# Create the data directory
sudo mkdir -p /var/mysql/data
# permissions
sudo chown -R mysql:mysql /var/mysql

The basic configuration is very simple. Just edit the /etc/my.cnf file and provide the address of the management node.

stefan@node2:~$ cat /etc/my.cnf 
[mysqld]
# Options for mysqld process:
ndbcluster  

[mysql_cluster]
ndb-connectstring=192.168.0.81

Then we can start the data node with the following command:

sudo /opt/mysql/server-5.7/bin/ndbd  --initial

You should see something like this:

stefan@node3:~$ sudo /opt/mysql/server-5.7/bin/ndbd 
2016-11-26 16:39:29 [ndbd] INFO     -- Angel connected to '192.168.0.81:1186'
2016-11-26 16:39:29 [ndbd] INFO     -- Angel allocated nodeid: 3

If something goes wrong, check the error log file called ndb_X_out.log , where  X is the node id. So in this example the actual file name is ndb_2_out.log . Proceed the same way with the second data node called Node 3.

Starting the Cluster

Now comes the great moment. We start the cluster. Please note that you cannot start th cluster before you have configured at least two data nodes and a MySQL API node. All components need to be running, otherwise there will be error messages.

/opt/mysql/server-5.7/bin/ndb_mgmd --reload -f /opt/mysql/server-5.7/mysql-cluster/config.ini

If there are no errors, we can login into the management console for the cluster with the following command.

/opt/mysql/server-5.7/bin/ndb_mgm

You can then get an overview of your cluster with the show  command. In the ideal case, the output reads as follows:

root@node1:~# /opt/mysql/server-5.7/bin/ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: 192.168.0.81:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=2	@192.168.0.82  (mysql-5.7.16 ndb-7.5.4, Nodegroup: 0, *)
id=3	@192.168.0.83  (mysql-5.7.16 ndb-7.5.4, Nodegroup: 0)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@192.168.0.81  (mysql-5.7.16 ndb-7.5.4)

[mysqld(API)]	1 node(s)
id=4	@192.168.0.81  (mysql-5.7.16 ndb-7.5.4)

Connecting to MySQL

Now that our cluster is up and running, we can interact with MySQL, by connecting to the MySQL server running on Node 1. In order to distribute the data automatically on the nodes, we need to utilise the NDB Engine when we create our tables. This is a distributed version of the InnoDB engine and comes with different features. A comparison is given here. Connect to the MySQL instance on Node 1 as you always would.

root@node1:~# mysql -u root -p
Enter password:

Then we can create a test table to see of the system works properly. In contrast to replication, the nodes do not store the whole data each, but rather gets the data distributed over at least two nodes. This is also the reason why we need to have at least two nodes.

Sakila in the Cluster

We can use the famous Sakila database that is available from the MySQL web site to test our Cluster briefly. The following series of commands retrieves the database and imports it into our cluster.

# Fetch the data
wget http://downloads.mysql.com/docs/sakila-db.tar.gz
# Extract
tar xvfz sakila-db.tar.gz 
cd sakila-db/
# Change the default engine with the cluster engine
sed -i 's/engine=InnoDB/ndbcluster/g' sakila-schema.sql 
# Import schema and data
mysql -u root -p  < sakila-schema.sql 
mysql -u root -p sakila < sakila-data.sql 

Now the data resides in our cluster and is distributed via our two nodes. We can run the following sample query and the system transparently gets the data from the two nodes.

USE sakila;
SELECT CONCAT(customer.last_name, ', ', customer.first_name) AS customer,
    address.phone, film.title
    FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id
    INNER JOIN address ON customer.address_id = address.address_id
    INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id
    INNER JOIN film ON inventory.film_id = film.film_id
    WHERE rental.return_date IS NULL
    AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE();
...
183 rows in set (0.00 sec)

Caveats

If you try to start the cluster before all components are up and ready, finding the error can be a bit tricky. In the following example, we did not start the data nodes. The MySQL API node is running, but it does not show up. 

root@node1:~# /opt/mysql/server-5.7/bin/ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: 192.168.0.81:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=2 (not connected, accepting connect from 192.168.0.82)
id=3 (not connected, accepting connect from 192.168.0.83)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@192.168.0.81  (mysql-5.7.16 ndb-7.5.4)

[mysqld(API)]	1 node(s)
id=4 (not connected, accepting connect from 192.168.0.81)

Check the error log of the management cluster in that case. As long as the cluster does not consist of at least two data nodes and a MySQL API node, it will not start.

Continue reading


Parsing SQL Statements

JDBC and the Limits of ResultSet Metadata

For my work in the area of data citation, I need to analyse queries, which are used for creating subsets. I am particularly interested in query parameters, sortings and filters. One of the most commonly used query languages is SQL, which is used by many relational database management systems such as MySQL. In some cases, the interaction with databases is abstract, meaning that there is hardly any SQL statements executed directly. The SQL statements are rather built on the fly by object relational mappers such as Hibernate. Other scenarios use SQL statements as String and also prepared statements, which are executed via JDBC. However,  analysing SQL statements is tricky as the language is very flexible.

In order to understand what columns have been selected, it is sufficient to utilise the ResultSet Metadata and retrieve the column names from there. In my case I need to extract this imformation from the query in advance and potentially enforce a specific sorting by adding columns to the ORDER BY clause. In this scenario, I need to parse the SQL statement and retrieve this information from the statement itself. Probably the best way to do this would be to implement a parser for the SQL dialect with ANTLR (ANother Tool for Language Recognition). But this is quite a challenge, so I decided to take a shortcut: FoundationDB.

The FoundationDB Parser

FoundationDB was a NoSQL database which provided several layers for supporting different paradigms at once. I am using past tense here, because the project got acquired by Apple in 2015 and since then does pursue the open source project any more. However, the Maven libraries for the software are still available at Maven Central. FoundationDB uses its own SQL parser, which understands standard SQL queries. These queries can be interpreted as a tree and the parser library allows traversing SQL statements and analyse the nodes. We can use this tree to parse and interpret SQL statements and extract additional information.

The Foundations of FoundationDB

The FoundationDB parser can be included into your own project with the following Maven dependency:

<!-- https://mvnrepository.com/artifact/com.foundationdb/fdb-sql-parser -->
<dependency>
    <groupId>com.foundationdb</groupId>
    <artifactId>fdb-sql-parser</artifactId>
    <version>1.6.1</version>
</dependency>

The usage of the parser is straight forward. We use the following example SQL statement as input:

SELECT a.firstColumn,b.secondColumn,b.thirdColumn
	FROM tableA AS a, tableB AS b 
	WHERE a.firstColumn = b.secondColumn AND 
	b.thirdColumn < 5 
	ORDER BY a.thirdColumn,a.secondColumn DESC

The following function calls the parser and prints the tree of the statement.

   /**
     * Print a SQL statement
     * @param sqlString
     */
	public void parseSQLString(String sqlString) {
                Parser parser = new Parser(); 
		StatementNode stmt;
		try {
		    stmt = this.parser.parseStatement(sqlString);
                    stmt.treePrint();

		} catch (StandardException e) {
			e.printStackTrace();
		}
	}

The resulting tree is listed below. The statement has also been normalized, which ensures a stable sequence of the parameters.

com.foundationdb.sql.parser.CursorNode@728938a9
name: null
updateMode: UNSPECIFIED
statementType: SELECT
resultSet: 	
	com.foundationdb.sql.parser.SelectNode@21b8d17c
	isDistinct: false
	resultColumns: 		
		com.foundationdb.sql.parser.ResultColumnList@6433a2

		[0]:		
		com.foundationdb.sql.parser.ResultColumn@5910e440
		exposedName: firstcolumn
		name: firstcolumn
		tableName: null
		isDefaultColumn: false
		type: null
		expression: 			
			com.foundationdb.sql.parser.ColumnReference@6267c3bb
			columnName: firstcolumn
			tableName: a
			type: null
		[1]:		
		com.foundationdb.sql.parser.ResultColumn@533ddba
		exposedName: secondcolumn
		name: secondcolumn
		tableName: null
		isDefaultColumn: false
		type: null
		expression: 			
			com.foundationdb.sql.parser.ColumnReference@246b179d
			columnName: secondcolumn
			tableName: b
			type: null
		[2]:		
		com.foundationdb.sql.parser.ResultColumn@7a07c5b4
		exposedName: thirdcolumn
		name: thirdcolumn
		tableName: null
		isDefaultColumn: false
		type: null
		expression: 			
			com.foundationdb.sql.parser.ColumnReference@26a1ab54
			columnName: thirdcolumn
			tableName: b
			type: null
	fromList: 		
		com.foundationdb.sql.parser.FromList@3d646c37

		[0]:		
		com.foundationdb.sql.parser.FromBaseTable@41cf53f9
		tableName: tablea
		updateOrDelete: null
		null
		correlation Name: a
		a
		[1]:		
		com.foundationdb.sql.parser.FromBaseTable@5a10411
		tableName: tableb
		updateOrDelete: null
		null
		correlation Name: b
		b
	whereClause: 		
		com.foundationdb.sql.parser.AndNode@2ef1e4fa
		operator: and
		methodName: and
		type: null
		leftOperand: 			
			com.foundationdb.sql.parser.BinaryRelationalOperatorNode@306a30c7
			operator: =
			methodName: equals
			type: null
			leftOperand: 				
				com.foundationdb.sql.parser.ColumnReference@b81eda8
				columnName: firstcolumn
				tableName: a
				type: null
			rightOperand: 				
				com.foundationdb.sql.parser.ColumnReference@68de145
				columnName: secondcolumn
				tableName: b
				type: null
		rightOperand: 			
			com.foundationdb.sql.parser.BinaryRelationalOperatorNode@27fa135a
			operator: <
			methodName: lessThan
			type: null
			leftOperand: 				
				com.foundationdb.sql.parser.ColumnReference@46f7f36a
				columnName: thirdcolumn
				tableName: b
				type: null
			rightOperand: 				
				com.foundationdb.sql.parser.NumericConstantNode@421faab1
				value: 5
				type: INTEGER NOT NULL
orderByList: 	
	com.foundationdb.sql.parser.OrderByList@2b71fc7e
	allAscending: false
	[0]:	
	com.foundationdb.sql.parser.OrderByColumn@5ce65a89
	ascending: true
	nullsOrderedLow: false
	columnPosition: -1
	expression: 		
		com.foundationdb.sql.parser.ColumnReference@25f38edc
		columnName: thirdcolumn
		tableName: a
		type: null
	[1]:	
	com.foundationdb.sql.parser.OrderByColumn@1a86f2f1
	ascending: false
	nullsOrderedLow: false
	columnPosition: -1
	expression: 		
		com.foundationdb.sql.parser.ColumnReference@3eb07fd3
		columnName: secondcolumn
		tableName: a
		type: null

This tree offers a lot of information, which can be used programmatically as well. In the top of the output, we can see that the statement was a SELECT statement and that it was not DISTINCT. Then follows the ResultSet, which contains a list of the three ResultColumns, which have been specified in the SELECT clause. We can see the column names and the table names from which they are drawn. The next block provides the referenced tables (the FROM list) and their alias names. The WHERE – block contains the operands which have been used for filtering and last but not least, there is the list of ORDER BY clauses and their sorting directions.

The Visitor

In order to access the information shown above programmatically, we need to access the content of the node one by one. This can be achieved with the visitor pattern, which traverses all the nodes of the tree. The following listing shows how the visitor pattern can be used for accessing the list of columns from the SELECT clause.

 /**
     * Return a list of columns of a SELECT clause
     * @param sql
     * @return
     */
	public ArrayList selectColumnsList(String sql){
        SQLParser parser = new SQLParser();
        BooleanNormalizer normalizer = new BooleanNormalizer(parser);
        StatementNode stmt = null;

        try {
            stmt = parser.parseStatement(sql);
            stmt = normalizer.normalize(stmt);
        } catch (StandardException e) {
            e.printStackTrace();
        }


        final ArrayList<ResultColumn> columns = new ArrayList<ResultColumn>();
        Visitor v = new Visitor() {

            @Override
            public boolean visitChildrenFirst(Visitable node) {
                if (node instanceof SelectNode)
                    return true;
                return false;
            }

            @Override
            public Visitable visit(Visitable node) throws StandardException {
                if (node instanceof ResultColumn) {
                    ResultColumn resultColumn = (ResultColumn) node;
                    columns.add(resultColumn);
                    System.out.println("Column " + columns.size()+ ": " + resultColumn.getName());
                }
                return null;
            }

            @Override
            public boolean stopTraversal() {
                // TODO Auto-generated method stub
                return false;
            }

            @Override
            public boolean skipChildren(Visitable node) throws StandardException {
                if (node instanceof FromList) {
                    return true;
                }
                return false;
            }
        };

        try {
            stmt.accept(v);
        } catch (StandardException e) {
            e.printStackTrace();
        }

        return columns;

    }

This code example, we define a visitor which traverses all the ResultColumn nodes. Every time the current node is an instance of ResultColumn, we add this node to our list of columns. The nodes are only visited, if they are children of a SELECT statement. This is our entry point into the tree. We leave the tree when we reach the FROM list. We then apply the visitor to the statement, which initiates the traversal. As a result, we receive a list of columns which have been used for the result set.

In order to get the list of ORDER BY columns, we can utilise a similar approach. The following functions gives an example:

 /**
     * Return list of order by clauses
     * @param sqlText
     * @return
     */
	public OrderByList orderByColumns(String sqlText){

		SQLParser parser = new SQLParser();
		BooleanNormalizer normalizer = new BooleanNormalizer(parser);
		StatementNode stmt;
        OrderByList orderByList = null;
        try {
			stmt = parser.parseStatement(sqlText);
			stmt = normalizer.normalize(stmt);
            CursorNode node = (CursorNode) stmt;
            orderByList = node.getOrderByList();
            int i=0;
            for(OrderByColumn orderByColumn : orderByList){
                i++;
                String direction;
                if(orderByColumn.isAscending()){
                    direction="ASC";
                }else{
                    direction="DESC";
                }
                System.out.println("ORDER BY Column " +i+ ": " +orderByColumn.getExpression().getColumnName()+ " Direction: " + direction );

            }
        } catch (StandardException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		return orderByList;


	}

This time, we retrieve the list of ORDER BY columns directly from the CurserNode. Similar principles can be used for manipulating SQL statements and apply a different sorting for instance.

 

 

Continue reading