www.destructor.de

About | Contact | Impressum


Home |  Code |  Articles |  Misc |  x
Firebird |  Talks |  Character Sets |  x
General |  1.0 |  1.5 |  2.0 |  2.1 |  2.5 |  3.0 |  x

Performing Case Insensitive Searches

This article explains how to perform case insensitive searches using Firebird.

Imagine you have a table of persons like this one:

CREATE TABLE PERSONS (
  PERS_ID INTEGER NOT NULL PRIMARY KEY,
  LAST_NAME VARCHAR(50),
  FIRST_NAME VARCHAR(50)
);

Now there is an application, which allows the user to search for persons by last name and/or first name. So the user inputs the last name of the person he is searching for. 

We want this search to be case insensitive, i.e. no matter if the user enters "Presley", "presley",  "PRESLEY", or even "PrESley", we always want to find the King.

Ah yes, and we want that search to be fast, please. So there must be an index speeding it up.

A simple way to do case insensitive comparisons is to uppercase both strings and then compare the uppercased versions of both strings. 

Uppercasing has limitations, because some letters cannot be uppercased. Note also that there are languages/scripts where there is no such thing as case. So the technique described in this article will work best for European languages.

In order to get really perfect results one would need a case insensitive (CI) and/or accent insensitive (AI) collation. However, at the time of this writing (July 2006) there are only two Czech AI/CI collations for Firebird 2.0. The situation will hopefully improve over time.

(You should know the concepts of Character Sets and Collations in order to understand what comes next. I use the DE_DE collation in my examples, this is the collation for German/Germany in the ISO8859_1 character set.)

In order to get correct results from the UPPER() function that is built into Firebird, you must specify a collation. This can be in the DDL definition of the table:

CREATE TABLE PERSONS (
  PERS_ID INTEGER NOT NULL PRIMARY KEY,
  LAST_NAME VARCHAR(50) COLLATE DE_DE,
  FIRST_NAME VARCHAR(50) COLLATE DE_DE
);

or it can be done when calling the UPPER() function:

SELECT UPPER (LAST_NAME COLLATE DE_DE) FROM PERSONS;

Firebird 1.5

As there is no way to build an index on a function call, it is a common technique to create a "shadow column" that will take the UPPERed text. This column will be filled by a trigger and search is done on the uppered column.

So our table definition will look like this:

CREATE TABLE PERSONS (
  PERS_ID INTEGER NOT NULL PRIMARY KEY,
  LAST_NAME VARCHAR(50) COLLATE DE_DE,
  LAST_NAME_UPPER VARCHAR(50) COLLATE DE_DE,
  FIRST_NAME VARCHAR(50) COLLATE DE_DE,
  FIRST_NAME_UPPER VARCHAR(50) COLLATE DE_DE,
  PHONE VARCHAR(50),
  EMAIL VARCHAR(100)
);

The trigger to automatically fill in the UPPER fields looks like this:

CREATE TRIGGER BIU_PERSONS FOR PERSONS
ACTIVE
BEFORE INSERT OR UPDATE
AS
BEGIN
  NEW.LAST_NAME_UPPER = UPPER (NEW.LAST_NAME);
  NEW.FIRST_NAME_UPPER = UPPER (NEW.FIRST_NAME);
END

From now on, everytime the LAST_NAME or FIRST_NAME column changes during an UPDTAE, or a new persons gets inserted, the trigger will automagically fill up the LAST_NAME_UPPER and FIRST_NAME_UPPER columns. There's nothing that the application code must do here.

Creating an index makes searching faster:

CREATE INDEX IDX_LASTNAME ON PERSONS (LAST_NAME_UPPER);
CREATE INDEX IDX_FIRSTNAME ON PERSONS (FIRST_NAME_UPPER);

In order to search a person, all the application must do now is compare the search string against the UPPER columns:

SELECT * FROM PERSONS WHERE UPPER(:SEARCH_STRING COLLATE DE_DE) = LAST_NAME_UPPER
  or
SELECT * FROM PERSONS WHERE UPPER(:SEARCH_STRING COLLATE DE_DE) = FIRST_NAME_UPPER

Firebird 2.0

In Firebird 2.0 it is possible to create a so-called "Expression Index" on an UPPER() function call:

CREATE INDEX IDX_PERSON_LASTNAME ON PERSONS COMPUTED BY (UPPER (LAST_NAME COLLATE DE_DE))

From now on, it is possible to search using exactly the same expression:

SELECT * FROM PERSONS WHERE UPPER (LAST_NAME COLLATE DE_DE) = UPPER(:SEARCH_STRING COLLATE DE_DE)

That's it.

Firebird 2.1.2 and UNICODE_CI

Starting with Firebird 2.1.2 (or earlier, at least 2.1.2 is what the Release Notes seem to tell), you can use a new Case Insensitive Collation named UNICODE_CI for the UTF8 character set (Unicode).

So when your default character set is UTF8, your table declaration might look like this:

CREATE TABLE PERSONS (
  PERS_ID INTEGER NOT NULL PRIMARY KEY,
  LAST_NAME VARCHAR(50) COLLATE UNICODE_CI,
  FIRST_NAME VARCHAR(50) COLLATE UNICODE_CI
);

Your Index is a regular index declaration without any specials:

CREATE INDEX IDX_PERSON_LASTNAME ON PERSONS (LAST_NAME);

And your search expressions need no UPPER or whatsoever:

SELECT * FROM PERSONS WHERE LAST_NAME = :SEARCH_STRING

That's it.


Stefan Heymann, Last Change 2009-07-05

This documentation is licensed under (choose your favorite): GPL, LGPL, CC, IDPL, GFDL, BSD, (did I forget one?)