Validate Hibernate Search Input with an Analyzer

Stop Words

Hibernate Search lets you easily assign an @Analyzer on Fields, which are used to process terms before they are written to the index. An anlyzer can be used for instance for stemming and removing of words which are so frequent that they are insignificant for the results. These are examples for stop words:

It is a common technique, to split input search terms into single keywords and use these keywords for combining a complex queries over several fields. The problem with this approach is that if a user provides such stop words as input and you manually split the input string into a list of keywords, for instance with a split method, it can occur that a stop word becomes a single keyword for Hibernate Search to process.

Hibernate does not know what to search for an replies with this error message and also suggests a solution.

In orderb to validate the string with the same analyzer as hibernate does when building the index, we can retrieve the analyzer from the class we want to search on and parse the string by providing it as a stream to the analyzer. First we get the analyser like this:

And then we can write a simple method, which takes the whole string as input and chops the whole string into a list of keywords, which is sent through the analyzer. Thus if your input string contains a stop word and it would not be added to the index, it will also not be included in the list. Thus Hibernate won’t even try to search for it, as it never sees the stemmed word in the first place.

 

Leave a Reply

Your email address will not be published. Required fields are marked *