Case-Insensitive Sort of UTF8 Data Using System.Data.SQLite






4.92/5 (19 votes)
SQLite lacks case-insensitive sort of UTF8 data. In this article, you will see how to get rid of this limitation in .NET.
Introduction
If you are using SQLite in .NET, perhaps you could have the problem of case-insensitive sorting of UTF8 strings.
Background
This code is useful if you use System.Data.SQLite
, which can be downloaded here: Download System.Data.SQLite. I have been using SQLite for a long time and stored Russian strings in it. As it's stated in SQLite FAQ, case-insensitive matching of Unicode characters does not work. This problem has become very significant when the amount of stored data has grown. System.Data.SQLite
provides a cool solution for that but it's unclear how to implement it.
Using the Code
Using the code is pretty simple: just add this line in the beginning of your application:
SQLiteFunction.RegisterFunction(typeof(SQLiteCaseInsensitiveCollation));
The Database
Okay, let us start. First of all, SQLite has three built-in collations (if you need more information, you can read it on SQLite's site):
BINARY
– All the characters are treated as bytesNOCASE
– Compares ASCII characters in case-insensitive manner; others are treated as bytesRTRIM
– Just skips trailing spaces
We want to create the following table:
ID | Name | Val (some value stored in the table) | Comment |
1 | Foo | 123 | |
2 | foo | 234 | |
3 | Bar | 345 | |
4 | bar | 456 | |
5 | Буква | 567 | A sample Russian word starting from uppercase letter |
6 | буква | 678 | The same Russian word starting from lowercase letter |
7 | Тест | 789 | Another Russian word starting from uppercase letter |
8 | тест | 890 | The same Russian word starting from lowercase letter |
Here's SQL to create it:
CREATE TABLE `testtbl` (
`ID` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
`Name` TEXT NOT NULL COLLATE NOCASE,
`Val` REAL)
No we'll run a simple SELECT
:
SELECT * FROM `testtbl` ORDER BY `Name`
We'll get the result:
ID | Name | Val (some value stored in the table) | Comment |
2 | Bar | 345 | |
3 | bar | 456 | |
1 | Foo | 123 | |
2 | foo | 234 | |
5 | Буква | 567 | Incorrect sort! |
7 | Тест | 789 | |
6 | буква | 678 | |
8 | тест | 890 |
As you see, data in Russian is sorted incorrectly. Correct would be the following order: Буква, буква, Тест, тест.
Now we'll fix this. It would be perfect if SQLite could sort by this expression (in C#):
string.Compare(x, y, CultureInfo.CreateSpecificCulture("ru-RU"),
CompareOptions.IgnoreCase);
Defining Case-Insensitive Collation in C# Code
I have created a simple class SQLiteCaseInsensitiveCollation
that adds collation to SQLite engine:
using System.Data.SQLite;
using System.Globalization;
namespace SQLiteUTF8CIComparison {
///
/// This function adds case-insensitive sort feature to SQLite engine
/// To initialize, use SQLiteFunction.RegisterFunction()
/// before all connections are open
///
[SQLiteFunction(FuncType = FunctionType.Collation, Name = "UTF8CI")]
public class SQLiteCaseInsensitiveCollation : SQLiteFunction {
///
/// CultureInfo for comparing strings in case insensitive manner
///
private static readonly CultureInfo _cultureInfo =
CultureInfo.CreateSpecificCulture("ru-RU");
///
/// Does case-insensitive comparison using _cultureInfo
///
/// Left string
/// Right string
/// The result of a comparison
public override int Compare(string x, string y) {
return string.Compare(x, y, _cultureInfo, CompareOptions.IgnoreCase);
}
}
}
Please pay your attention to the attribute:
[SQLiteFunction(FuncType = FunctionType.Collation, Name = "UTF8CI")]
Here we define that:
- This is a function that must be added to SQLite engine
- It's a collation function
- Its name is
UTF8CI
which means UTF8 Case Insensitive
To activate the collation, you need register it in SQLite engine – put this line of code before you create all SQLite connections in your application:
SQLiteFunction.RegisterFunction(typeof(SQLiteCaseInsensitiveCollation));
Please note that the function must be called only once, e.g. when your application starts.
Using UTF8CI Collation in SQLite
How to use this collation in SQLite?
CREATE TABLE `testtbl` (
`ID` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
`Name` TEXT NOT NULL COLLATE UTF8CI,
`Val` REAL)
As you see, everything we need is just replacing the collation. Now we also need create an index with this collation:
CREATE INDEX `IDX_testtbl_Name` ON `testtbl` (`Name` COLLATE UTF8CI)
Performing a Test
Now the SELECT
will give us the result:
ID | Name | Val (some value stored in the table) | Comment |
2 | Bar | 345 | |
3 | bar | 456 | |
1 | Foo | 123 | |
2 | foo | 234 | |
5 | Буква | 567 | That's it! |
6 | буква | 678 | |
7 | Тест | 789 | |
8 | тест | 890 |
Yep! The Russian data is sorted in case-insensitive way, as we wanted. Our index is used because it's case-insensitive too.
Compatibility
What happens if you open SQLite file with UTF8CI encoding without adding our custom encoding? Let's see the result of SELECT
s:
SELECT * FROM `testtbl`
It executes normally – neither sort collation, nor index is used.
SELECT * FROM `testtbl` ORDER BY `Name`
Oops. We get an error: No such collation: UTF8CI
. To avoid this, you can use the following SQL:
SELECT * FROM `testtbl` ORDER BY `Name` COLLATE BINARY
This executes normally and gives the result as usual case-sensitive (binary) sort. Our index with UTF8CI encoding cannot be used: SQLite has no information about the collation of the index.
Points of Interest
There are several things we have learned:
- It is possible to use case-insensitive sort for UTF8 data in SQLite.
- You can also create case-insensitive UTF8 index in SQLite.
- You can use my function (with your locale) for creating UTF8CI collation in SQLite.
- Please note: if you will forget to register this function, you will get an exception when reading data.
- The database will be usable without the code of the function; but sort by field will be unavailable only with built-in encodings.
- Any index created with UTF8CI encoding will be disabled.
Revision History
- Nov. 2009 - Initial revision