Hibernate Search and Spring Boot: Building Bridges

Hibernate Search is a very convenient way for storing database content in a Lucine index and add fulltext search capabilities to data driven projects simply by annotating classes. It can be easily integrated into Spring Boot applications and as long as only the basic features are used, it works out of the box.  The fun starts when the Autoconfiguration cannot find out how to properly configure things automatically, then it gets tricky quite quickly. Of course this is natural behaviour, but one gets spoiled quite quickly. 

Using the latest Features: Hibernate ORM, Hibernate Search and Spring Boot

The current version of Spring Boot is 1.5.2. This version uses Hibernate ORM 5.0. The latest stable Hibernate Search versions are 5.6.1.Final and 5.7.0.Final, which in  in contrast require Hibernate ORM 5.1 and 5.2 respectively. Also you need Java 8 now. For this reason if you need the latest Spring Search features in combination with Spring Boot, you need to adapt the dependencies as follows:

buildscript {
	ext {
		springBootVersion = '1.5.1.RELEASE'
	}
	repositories {
		mavenCentral()
	}
	dependencies {
		classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}")
	}
}

apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'org.springframework.boot'

jar {
	baseName = 'SearchaRoo'
	version = '0.0.1-SNAPSHOT'
}

sourceCompatibility = 1.8
targetCompatibility = 1.8

repositories {
	mavenCentral()
}


dependencies {
	compile('org.springframework.boot:spring-boot-starter-data-jpa')
	compile('org.springframework.boot:spring-boot-starter-web')
	testCompile('org.springframework.boot:spring-boot-starter-test')
	compile("mysql:mysql-connector-java")
	compile('org.apache.commons:commons-lang3:3.5')
	compile('org.springframework.boot:spring-boot-starter-test')
	compile('org.springframework.boot:spring-boot-starter-logging')
	compile('org.springframework.boot:spring-boot-starter-freemarker')

	// Hibernate Search
    compile("org.hibernate:hibernate-core:5.2.8.Final")
    compile("org.hibernate:hibernate-search-orm:5.7.0.Final")    
            configurations.all 
    {
    	exclude group: "org.hibernate:", module: "hibernate-entitymanager"
	}
}

Note that the Hibernate Entity Manager needs to be excluded, because it has been integrated into the core in the new Hibernate version. Details are given in the Spring Boot documentation.

Enforcing the Dependencies to be Loaded in the Correct Sequence 

As written earlier, Spring Boot takes care of a lot of configurations for us. Most of the time, this works perfectly and reduces the pain for configuring a new application manually. In some particular cases, Spring cannot figure out that there exists a dependency between different services, which needs to be resolved in a specified order. A typical use case is the implementation of FieldBridges for Hibernate Search. FieldBrides translate between the actual Object from the Java World and the representation of such an object in the Lucene index. Typically an EnumBridge is used for indexing Enums, which are often used for realizing internationalization (I18n).

When the Lucene Index  is created, Hibernate checks if Enum fields need to be indexed and if there exist Bridge that converts between the object and the actual record in the Index. The problem here is that Hibernate JPA is loaded at a very early stage in the Spring Boot startup proces. The problem only arises if the BridgeClass utilises @Autowired fields which get injected. Typically, these fields would get injected when the AnnotationBeanConfigurerAspect bean is loaded. Hibernate creates the session with the session factory auto configuration before the spring configurer aspect bean was loaded. So the FieldBridge used by Hibernate during the initialization of the index does not have the service injected yet, causing a nasty Null Pointer Exception. 

Example EnumBridge

The following EnumBridge example utilises an injected Service, which needs to be available before Hibernate starts. If not taken care of, this causes a Null Pointer Exception.

@Configurable
public class HibernateSearchEnumBridgeExample extends EnumBridge
{
    private I18nMessageService i18nMessageService;

    @Autowired
    @Required
    public void setI18nMessageService(I18nMessageService service) {
	this.i18nMessageService = service;
    }

    @Override
    public String objectToString(Object object)
    {
         return  i18nMessageService.getMessageForEnum(object);
    }


    @Override
    public Enum<? extends Enum> stringToObject(String name)
    {
        return Enum.valueOf(name);

    }

    @Override
    public void setAppliedOnType(Class<?> returnType)
    {
        super.setAppliedOnType(returnType);
    }
}

Enforce Loading the Aspect Configurer Before the Session Factory

In order to enforce that the AnnotationBeanConfigurerAspect is created before the Hibernate Session Factory is created, we simply implement our own HibernateJpaAutoConfiguration by extension and add the AnnotationBeanConfigurerAspect to the constructor. Spring Boot now knows that it needs to instantiate the AnnotationBeanConfigurerAspect before it can instantiate the HibernateJpaAutoConfiguration and we then have wired Beans ready for the consumption of the bridge. I found the correct hint here and here.

@Configuration
public class HibernateSearchConfig extends HibernateJpaAutoConfiguration {

	public HibernateSearchConfig(DataSource dataSource, JpaProperties jpaProperties,
				AnnotationBeanConfigurerAspect beanConfigurerAspect,
				ObjectProvider<JtaTransactionManager> jtaTransactionManager,
				ObjectProvider<TransactionManagerCustomizers> transactionManagerCustomizers) {

			super(dataSource, jpaProperties, jtaTransactionManager, transactionManagerCustomizers);
	}
}

As it has turned out, using @DependsOn annotations did not work and also @Ordering the precedence of the Beans was not suffucient. With this little hack, we can ensure the correct sequence of initialization.

Continue reading


Deploying MySQL in a Local Development Environment

Installing MySQL via apt-get is a simple task, but the migration between different MySQL versions requires planning and testing. Thus installing one central instance of the database system might not be suitable, when the version of MySQL or project specific settings should be switched quickly without interfering with other applications. Using one central instance can quickly become cumbersome. In this article, I will describe how any number of MySQL instances can be stored and executed from within a user’s home directory.

Adapting MySQL Data an Log File Locations

Some scenarios might require to run several MySQL instances at once, other scenarios cover sensitive data, where we do not want MySQL to write any data on non-encrypted partitions. This is especially true for devices which can get easily stolen, for instance laptops.  If you use a laptop for developing your applications from time to time, chances are good that you need to store sensitive data in a database, but need to make sure that the data is encrypted when at rest. The data stored in a database needs to be protected when at rest.

This can be solved with full disk encryption, but this technique has several disadvantages. First of all, full disk encryption only utilises one password. This entails, that several users who utilise a device need to share one password, which reduces the reliability of this approach. Also when the system needs to be rebooted, full disk encryption can become an obstacle, which increases the complexity further.

Way easier to use is the transparent home directory encryption, which can be selected during many modern Linux setup procedures out of the box. We will use this encryption type for this article, as it is reasonable secure and easy to setup. Our goal is to store all MySQL related data in the home directory and run MySQL with normal user privileges.

Creating the Directory Structure

The first step is creating a directory structure for storing the data. In this example, the user name is stefan, please adapt to your needs.

$ cd ~
$ mkdir -p MySQL-5.6-Local/MySQL-5.6-Data # Data storage
$ mkdir -p MySQL-5.6-Local/MySQL-5.6-Conf # Config file
$ mkdir -p MySQL-5.6-Local/MySQL-5.6-Log # Log file
$ mkdir -p MySQL-5.6-Local/MySQL-5.6-Tmp # Tmp
$ mkdir -p MySQL-5.6-Local/MySQL-5.6-Scripts # Start and stop scripts

Create a Configuration File

Make sure to use absolute paths and utilise the directories we created before. Store this file in MySQL-5.6-Local/MySQL-5.6-Conf/my-5.6.cnf. The configuration is pretty self explanatory.

[client]
port        = 3306
socket      = /home/stefan/MySQL-5.6-Local/mysqld-5.6.sock

[mysqld_safe]
pid-file    = /home/stefan/MySQL-5.6-Local/mysqld-5.6.pid
nice        = 0
log_error=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Log/mysql_error.log

[mysqld]
user        = stefan
pid-file    = /home/stefan/MySQL-5.6-Local/mysqld-5.6.pid
socket      = /home/stefan/MySQL-5.6-Local/mysqld-5.6.sock
port        = 3306
basedir     = /usr
datadir     = /home/stefan/MySQL-5.6-Local/MySQL-5.6-Data
tmpdir      = /home/stefan/MySQL-5.6-Local/MySQL-5.6-Tmp
log_error=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Log/mysql_error.log
bind-address = 127.0.0.1

Stop the Running MySQL Instance

If you already have a running MySQL instance, make sure to shut it down. You can also disable MySQL from starting automatically.

sudo service msql stop
echo "manual" | sudo tee /etc/init/mysql.override

Setting up Apparmor

Apparmor protects sensitive applications by defining in which directory they might write. We need to update this configuration to suit our needs. We need to make sure that the global configuration file for the central MySQL instance also includes an additional local information. Edit this file first: /etc/apparmor.d/usr.sbin.mysqld and make sure that the reference for the local file is not commented out.

# vim:syntax=apparmor
# Last Modified: Fri Feb 28 18:06:30 2014
#include <tunables/global>
 
 
/usr/sbin/mysqld {
  #include <abstractions/base>
  #include <abstractions/nameservice>
  #include <abstractions/user-tmp>
 
# .... shortened
 
  # Include local configuration!!
  include <local/usr.sbin.mysqld>
 
}

Now we need to add the directories in stean’s home directory to the local file by editing /etc/apparmor.d/local/usr.sbin.mysqld .

# Site-specific additions and overrides for usr.sbin.mysqld. 
# For more details, please see /etc/apparmor.d/local/README. 
# Site-specific additions and overrides for usr.sbin.mysqld. 
# For more details, please see /etc/apparmor.d/local/README. 

/home/stefan/MySQL-5.6-Local/ r, 
/home/stefan/MySQL-5.6-Local/** rwk,

An incorrect Apparmor configuration is often the cause of permission errors, which can be a pain. Make sure to reload the the Apparmor service and if you struggle with it, consider to disable it temporarily and check if the rest works. Do not forget to turn it on again.

sudo service apparmor reload # reload
sudo service apparmor teardown # disable for testing

Initialize the Local MySQL Instance

Now it is time to initialize the MySQL instance. In this step, MySQL creates all the files it needs in the data directory. It is important that the data directory is empty, when you initiate the following commands.

sudo mysql_install_db --defaults-file=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Conf/my-5.6.cnf --datadir=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Data --basedir=/usr

Note that this command is marked as deprecated. It works with MySQL 5.6 and MySQL 5.7, but can be removed.

Start and Stop the Instance

You can now start the MySQL instance with the following command:

mysqld_safe --defaults-file=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Conf/my-5.6.cnf

For your convenience, add a custom client configuration in your $HOME/.my.cnf and point it to the user defined socket.

[client]
port        = 3306
socket      = /home/stefan/MySQL-5.6-Local/mysqld-5.6.sock

In addition, startup and shutdown scripts are useful as well. Place both scripts in the directory we created before and add execution permissions with chmod +x .

# Startup script
mysqld_safe --defaults-file=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Conf/my-5.6.cnf --timezone=utc

The stop script is similar.

# Shutdown script
mysqladmin -u root -S /home/stefan/MySQL-5.6-Local/mysqld-5.6.sock shutdown

Conclusion

The technique described above allows to install and run multiple MySQL instances from within the user’s home directory. The MySQL instances run with user privileges and can utilise dedicated data and log file directories. As the all data is stored within the $HOME directory, we can easily apply transparent encryption to protect data at rest.

Continue reading


Switch the Git Clone Protocol from HTTPS to SSH

Gitlab offers several options for interacting with remote repositories: git, http, https and ssh. The first option – git – is the native transport protocol and does not encrypt the traffic. The same applies for http, rendering https and ssh the only feasible protocols if you commit and retrieve data via insecure networks. Ssh and https are also both available via the web interfaces of Github and Gitlab. In both systems you can simply copy and paste the clone URLs including the protocol. The following screenshot shows the Github version.

HTTPS

The simplest way to fetch the repository is to just copy the default HTTPS URL and clone it to the local drive. Git will ask you for the Github credentials.

:~/Projects$ git clone https://github.com/username/test-project.git
Klone nach 'test-project' ...
Username for 'https://github.com': 

You will be asked for the credentials every time you interact with the Github remote repository. Per default, git stores credentials for 5 minutes. Instead of waiting so long, we can just drop the credentials and proceed with an empty cache again.

git credential-cache exit

To make our live a little easier, we can store the username. In this example, we store this information only locally, valid for this cloned repository only. The same settings can also be applied globally.

git config user.email "user@example.org"
git config user.name "username"

Git will store that information locally (i.e. inside the repository) in the config file:

:~/Projects/test-project$ cat .git/config 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
[remote "origin"]
	url = https://github.com/username/test-project.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
[user]
	name = username
	email = user@example.org

For storing the password temporarily, you can re-activate the cache again and set a timeout.

git config credential.helper 'cache --timeout=3600'

Git will now store the password for your Github account for one hour. Although this is convenient, this is not an optimal solution. SSH keys are more secure and more convenient, as they do not expose your personal password and can be set individually for your repositories. In addition you can protect your keys with a password and add a second factor.

SSH

The Github documentation is great, you can find details how to create on how to create SSH keys here. All you need to do is to associate your public key with your remote repository on Github or Gitlab, as explained for instance here. Some general tips for working with keys in a secure way can be found here. As git stores information about how you access your repositories in the local repository config file, you can easily modify this information to fit your needs. For automating SSH access to specific repositories, you can also modify the SSH configuration of your local user account ~/.ssh/config .

For instance if we cloned the repository using the HTTPS method and would rather switch to SSH for the reasons mentioned above, there are two steps necessary:

  1. Add a SSH configuration for the host
  2. Adapt the git config

So first, we add a new entry for the SSH authentication with Github in the file ~/.ssh/config .

Host github-test-project
    HostName github.com
    User git
    IdentityFile ~/.ssh/github-private-key
    IdentitiesOnly yes

We define a hostname github-test-project for the individual repository, define which SSH key to use and specify that we only want to authenticate with the key. Now that this is settled, we need to tell git to use this connection information. This is done in the local git repository configuration ~/Projects/test-project/.git/config . The file initially looks like this:

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
[remote "origin"]
	url = https://github.com/username/test-project.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master

All we need to do is change the remote repository data to incorporate the SSH connection we defined in the SSH config. Just replace the url target to the SSH connection definition:

:~/Projects/test-project$ cat .git/config 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
[remote "origin"]
	url = github-test-project:username/test-project.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
[user]
	name = username
	email = user@example.org

Note the semicolon and that we omitted the username before the SSH host. This information will be read from the SSH config. Also note that the repository needs to be initialited so that we have a master branch.

Continue reading


Getting Familiar with Eclipse Again: Git Integration in Comparison with IntelliJ IDEA

Eclipse and IntelliJ are both great Java IDEs, which have their own communities, advantages and disadvantages. After having spent a few years in JetBrains IntelliJ Community Edition, I got accustomed to the tight and clean Git integration into the user interface. Now I consider switching back up Eclipse, I stumbled over a few things that I try to describe in this post.

IntelliJ and Eclipse Handle Project Structures Differently

Eclipse utilises a workspace concept, which allows to work on several projects at the same time. IntelliJ in contrast allows only one open project and organizes substructures in modules. A comparison of these concepts can be found here. These two different viewpoints have effects on the way how Git is integrated into the workflow.

Sharing Projects

The different views on project structures of both IDEs imply that Git repositories are also treated differently. While IntelliJ utilises the root of a repository directly, Eclipse introduces a subfolder for the project. This leads to the interesting observation that importing a project from Git again into an Eclipse workspace requires a small adaption in order to let Eclipse recognize the structure again.

A Small Workflow

In order to get familiar again with Eclipse, I created a small test project, which I then shared by pushing it to a Git repository. I then deleted the project and tried to re-import it again.

Step 1

Create new test project. In this case a Spring Boot Project, which works with Maven. Note that the new project is stored in the Eclipse workspace.

 

Step 2

As the second step, we create a new repository. Login into Github or your Gitlab instance and create a new project. Initialize it so that you have a master branch ready and copy the URL of the repository. We will then add this repository by opening the Git Repository perspective in Eclipse and add the repository. You can provide a default location for your local repositories in the Eclipse -> Team -> Git properties. In the Git Repository perspective, you can then see the path of the local storage location and some information about the repository, for instance that the local and the master branch are identical (they have the same commit hash). Note that the Git path is different than your workspace project path.

Step 3

We now have a fresh Java Maven based project in our Eclipse workspace and an empty Git repository in the default location for Git repositories in a different location. What we need to do next is to share the source code, by moving it into the Git storage location and add it to the Git index. Eclipse can help us with that,, by using the Team->Share menu from the Project Explorer view, when right clicking on the project.

Step 4

In the next step, we need to specify the Git repository we want to push our code to. This is an easy step, simply select the repository we just created from the drop down menu. In the menu you can see that the content of the current project location on the left side will be moved to the right side. Eclipse created a new subfolder within the repository for our project. This is something that IntelliJ would not do.

In this step, eclipse separates the local and custom project metadata from the actual source code, which is a good thing.

Step 5

In the fifth step, we simply apply some changes and commit and push them to the remote repository using the git staging window.

After this step, the changes are visible in the remote repository as well and available to collaborators. In order to simulate someone, who is going to checkout our little project from Gitlab, we delete the project locally and also remove the git repository.

Step 6

Now we start off with a clean workspace and clone the project from Gitlab. In the git repositories window, we can select clone project and provide the same URL again.

Step 7

In the next screen, we select the local destination for the cloned project. This could be for instance the default directory for Git projects or any other location on your disk. Confirm this dialogue.

Step 8

Now comes the tricky part, which did not work as expected in Eclipse Neon 4.6.1. Usually, one would tell Eclipse, that the cloned project is a Maven project, and it should detect the pom.xml file and download the dependencies. Todo so, we would select Import-> Git -> Projects from Git and clone the repository from this dialogue. Then, as a next step, we would select the Configure -> Convert to Maven Project option, but Eclipse does not seem to recognize the Maven structure. It would only show the files and directories, but not consider the Maven dependencies specified in the pom.xml file.

What happens is that Eclipse tries to add a new pom.xml file and ignores the actual one.

Of course this is a problem and does not work.

Step 9 – Solution

Instead of using the method above, just clone the repository from the Git Repository perspective and then go back to the Project Explorer. Now instead of importing the project via the Git menu, chose the existing Maven project and select the path of the Git repository we cloned before.

And in the next dialogue, specify the path:

As you can see, now Eclipse found the correct pom.xml file and provides the correct dependencies and structure!

Conclusion

Which IDE you prefer is a matter of taste and habit. Both environments do provide a lot of features for developers and differ in the implementation of these features. With this short article, we tried to understand some basic implications of the two philosophies how Eclipse and IntelliJ handle project structures. Once we anticipate these differences, it becomes easy to work with both of them.

 

 

 

Continue reading


Parsing SQL Statements

JDBC and the Limits of ResultSet Metadata

For my work in the area of data citation, I need to analyse queries, which are used for creating subsets. I am particularly interested in query parameters, sortings and filters. One of the most commonly used query languages is SQL, which is used by many relational database management systems such as MySQL. In some cases, the interaction with databases is abstract, meaning that there is hardly any SQL statements executed directly. The SQL statements are rather built on the fly by object relational mappers such as Hibernate. Other scenarios use SQL statements as String and also prepared statements, which are executed via JDBC. However,  analysing SQL statements is tricky as the language is very flexible.

In order to understand what columns have been selected, it is sufficient to utilise the ResultSet Metadata and retrieve the column names from there. In my case I need to extract this imformation from the query in advance and potentially enforce a specific sorting by adding columns to the ORDER BY clause. In this scenario, I need to parse the SQL statement and retrieve this information from the statement itself. Probably the best way to do this would be to implement a parser for the SQL dialect with ANTLR (ANother Tool for Language Recognition). But this is quite a challenge, so I decided to take a shortcut: FoundationDB.

The FoundationDB Parser

FoundationDB was a NoSQL database which provided several layers for supporting different paradigms at once. I am using past tense here, because the project got acquired by Apple in 2015 and since then does pursue the open source project any more. However, the Maven libraries for the software are still available at Maven Central. FoundationDB uses its own SQL parser, which understands standard SQL queries. These queries can be interpreted as a tree and the parser library allows traversing SQL statements and analyse the nodes. We can use this tree to parse and interpret SQL statements and extract additional information.

The Foundations of FoundationDB

The FoundationDB parser can be included into your own project with the following Maven dependency:

<!-- https://mvnrepository.com/artifact/com.foundationdb/fdb-sql-parser -->
<dependency>
    <groupId>com.foundationdb</groupId>
    <artifactId>fdb-sql-parser</artifactId>
    <version>1.6.1</version>
</dependency>

The usage of the parser is straight forward. We use the following example SQL statement as input:

SELECT a.firstColumn,b.secondColumn,b.thirdColumn
	FROM tableA AS a, tableB AS b 
	WHERE a.firstColumn = b.secondColumn AND 
	b.thirdColumn < 5 
	ORDER BY a.thirdColumn,a.secondColumn DESC

The following function calls the parser and prints the tree of the statement.

   /**
     * Print a SQL statement
     * @param sqlString
     */
	public void parseSQLString(String sqlString) {
                Parser parser = new Parser(); 
		StatementNode stmt;
		try {
		    stmt = this.parser.parseStatement(sqlString);
                    stmt.treePrint();

		} catch (StandardException e) {
			e.printStackTrace();
		}
	}

The resulting tree is listed below. The statement has also been normalized, which ensures a stable sequence of the parameters.

com.foundationdb.sql.parser.CursorNode@728938a9
name: null
updateMode: UNSPECIFIED
statementType: SELECT
resultSet: 	
	com.foundationdb.sql.parser.SelectNode@21b8d17c
	isDistinct: false
	resultColumns: 		
		com.foundationdb.sql.parser.ResultColumnList@6433a2

		[0]:		
		com.foundationdb.sql.parser.ResultColumn@5910e440
		exposedName: firstcolumn
		name: firstcolumn
		tableName: null
		isDefaultColumn: false
		type: null
		expression: 			
			com.foundationdb.sql.parser.ColumnReference@6267c3bb
			columnName: firstcolumn
			tableName: a
			type: null
		[1]:		
		com.foundationdb.sql.parser.ResultColumn@533ddba
		exposedName: secondcolumn
		name: secondcolumn
		tableName: null
		isDefaultColumn: false
		type: null
		expression: 			
			com.foundationdb.sql.parser.ColumnReference@246b179d
			columnName: secondcolumn
			tableName: b
			type: null
		[2]:		
		com.foundationdb.sql.parser.ResultColumn@7a07c5b4
		exposedName: thirdcolumn
		name: thirdcolumn
		tableName: null
		isDefaultColumn: false
		type: null
		expression: 			
			com.foundationdb.sql.parser.ColumnReference@26a1ab54
			columnName: thirdcolumn
			tableName: b
			type: null
	fromList: 		
		com.foundationdb.sql.parser.FromList@3d646c37

		[0]:		
		com.foundationdb.sql.parser.FromBaseTable@41cf53f9
		tableName: tablea
		updateOrDelete: null
		null
		correlation Name: a
		a
		[1]:		
		com.foundationdb.sql.parser.FromBaseTable@5a10411
		tableName: tableb
		updateOrDelete: null
		null
		correlation Name: b
		b
	whereClause: 		
		com.foundationdb.sql.parser.AndNode@2ef1e4fa
		operator: and
		methodName: and
		type: null
		leftOperand: 			
			com.foundationdb.sql.parser.BinaryRelationalOperatorNode@306a30c7
			operator: =
			methodName: equals
			type: null
			leftOperand: 				
				com.foundationdb.sql.parser.ColumnReference@b81eda8
				columnName: firstcolumn
				tableName: a
				type: null
			rightOperand: 				
				com.foundationdb.sql.parser.ColumnReference@68de145
				columnName: secondcolumn
				tableName: b
				type: null
		rightOperand: 			
			com.foundationdb.sql.parser.BinaryRelationalOperatorNode@27fa135a
			operator: <
			methodName: lessThan
			type: null
			leftOperand: 				
				com.foundationdb.sql.parser.ColumnReference@46f7f36a
				columnName: thirdcolumn
				tableName: b
				type: null
			rightOperand: 				
				com.foundationdb.sql.parser.NumericConstantNode@421faab1
				value: 5
				type: INTEGER NOT NULL
orderByList: 	
	com.foundationdb.sql.parser.OrderByList@2b71fc7e
	allAscending: false
	[0]:	
	com.foundationdb.sql.parser.OrderByColumn@5ce65a89
	ascending: true
	nullsOrderedLow: false
	columnPosition: -1
	expression: 		
		com.foundationdb.sql.parser.ColumnReference@25f38edc
		columnName: thirdcolumn
		tableName: a
		type: null
	[1]:	
	com.foundationdb.sql.parser.OrderByColumn@1a86f2f1
	ascending: false
	nullsOrderedLow: false
	columnPosition: -1
	expression: 		
		com.foundationdb.sql.parser.ColumnReference@3eb07fd3
		columnName: secondcolumn
		tableName: a
		type: null

This tree offers a lot of information, which can be used programmatically as well. In the top of the output, we can see that the statement was a SELECT statement and that it was not DISTINCT. Then follows the ResultSet, which contains a list of the three ResultColumns, which have been specified in the SELECT clause. We can see the column names and the table names from which they are drawn. The next block provides the referenced tables (the FROM list) and their alias names. The WHERE – block contains the operands which have been used for filtering and last but not least, there is the list of ORDER BY clauses and their sorting directions.

The Visitor

In order to access the information shown above programmatically, we need to access the content of the node one by one. This can be achieved with the visitor pattern, which traverses all the nodes of the tree. The following listing shows how the visitor pattern can be used for accessing the list of columns from the SELECT clause.

 /**
     * Return a list of columns of a SELECT clause
     * @param sql
     * @return
     */
	public ArrayList selectColumnsList(String sql){
        SQLParser parser = new SQLParser();
        BooleanNormalizer normalizer = new BooleanNormalizer(parser);
        StatementNode stmt = null;

        try {
            stmt = parser.parseStatement(sql);
            stmt = normalizer.normalize(stmt);
        } catch (StandardException e) {
            e.printStackTrace();
        }


        final ArrayList<ResultColumn> columns = new ArrayList<ResultColumn>();
        Visitor v = new Visitor() {

            @Override
            public boolean visitChildrenFirst(Visitable node) {
                if (node instanceof SelectNode)
                    return true;
                return false;
            }

            @Override
            public Visitable visit(Visitable node) throws StandardException {
                if (node instanceof ResultColumn) {
                    ResultColumn resultColumn = (ResultColumn) node;
                    columns.add(resultColumn);
                    System.out.println("Column " + columns.size()+ ": " + resultColumn.getName());
                }
                return null;
            }

            @Override
            public boolean stopTraversal() {
                // TODO Auto-generated method stub
                return false;
            }

            @Override
            public boolean skipChildren(Visitable node) throws StandardException {
                if (node instanceof FromList) {
                    return true;
                }
                return false;
            }
        };

        try {
            stmt.accept(v);
        } catch (StandardException e) {
            e.printStackTrace();
        }

        return columns;

    }

This code example, we define a visitor which traverses all the ResultColumn nodes. Every time the current node is an instance of ResultColumn, we add this node to our list of columns. The nodes are only visited, if they are children of a SELECT statement. This is our entry point into the tree. We leave the tree when we reach the FROM list. We then apply the visitor to the statement, which initiates the traversal. As a result, we receive a list of columns which have been used for the result set.

In order to get the list of ORDER BY columns, we can utilise a similar approach. The following functions gives an example:

 /**
     * Return list of order by clauses
     * @param sqlText
     * @return
     */
	public OrderByList orderByColumns(String sqlText){

		SQLParser parser = new SQLParser();
		BooleanNormalizer normalizer = new BooleanNormalizer(parser);
		StatementNode stmt;
        OrderByList orderByList = null;
        try {
			stmt = parser.parseStatement(sqlText);
			stmt = normalizer.normalize(stmt);
            CursorNode node = (CursorNode) stmt;
            orderByList = node.getOrderByList();
            int i=0;
            for(OrderByColumn orderByColumn : orderByList){
                i++;
                String direction;
                if(orderByColumn.isAscending()){
                    direction="ASC";
                }else{
                    direction="DESC";
                }
                System.out.println("ORDER BY Column " +i+ ": " +orderByColumn.getExpression().getColumnName()+ " Direction: " + direction );

            }
        } catch (StandardException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		return orderByList;


	}

This time, we retrieve the list of ORDER BY columns directly from the CurserNode. Similar principles can be used for manipulating SQL statements and apply a different sorting for instance.

 

 

Continue reading