![]() |
Database »
Database »
Other databases
Intermediate
License: The Code Project Open License (CPOL)
Case-Insensitive Sort of UTF8 Data Using System.Data.SQLiteBy Dmitry VitkovskySQLite lacks case-insensitive sort of UTF8 data. In this article, you will see how to get rid of this limitation in .NET. |
C#, SQL, Windows, WinMobile, .NETCF, .NET, Dev
|
||||||||||||
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||
If you are using SQLite in .NET perhaps you could have the problem of case-insensitive sorting of UTF8 strings.
This code is useful if you use System.Data.SQLite, which can be downloaded here: Download System.Data.SQLite. I have been using SQLite for a long time and stored Russian strings in it. As it's stated in SQLite FAQ, case-insensitive matching of Unicode characters does not work. This problem has become very significant when the amount of stored data has grown. System.Data.SQLite provides a cool solution for that but it's unclear how to implement it.
Using the code is pretty simple: just add this line in the beginning of your application:
SQLiteFunction.RegisterFunction(typeof(SQLiteCaseInsensitiveCollation));
Okay, let us start. First of all, SQLite has three built-in collations (if you need more info, you can read it on SQLite's site):
BINARY – all the characters are treated as bytes; NOCASE – compares ASCII characters in case-insensitive manner; others are treated as bytes; RTRIM – just skips trailing spaces. We want to create the following table:
| ID | Name | Val (some value stored in the table) | Comment |
|---|---|---|---|
| 1 | Foo | 123 | |
| 2 | foo | 234 | |
| 3 | Bar | 345 | |
| 4 | bar | 456 | |
| 5 | Буква | 567 | A sample Russian word starting from uppercase letter |
| 6 | буква | 678 | The same Russian word starting from lowercase letter |
| 7 | Тест | 789 | Another Russian word starting from uppercase letter |
| 8 | тест | 890 | The same Russian word starting from lowercase letter |
Here's SQL to create it:
CREATE TABLE `testtbl` (
`ID` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
`Name` TEXT NOT NULL COLLATE NOCASE,
`Val` REAL)
No we'll run a simple SELECT:
SELECT * FROM `testtbl` ORDER BY `Name`
We'll get the result:
| ID | Name | Val (some value stored in the table) | Comment |
|---|---|---|---|
| 2 | Bar | 345 | |
| 3 | bar | 456 | |
| 1 | Foo | 123 | |
| 2 | foo | 234 | |
| 5 | Буква | 567 | Incorrect sort! |
| 7 | Тест | 789 | |
| 6 | буква | 678 | |
| 8 | тест | 890 |
As you see, data in Russian is sorted incorrectly. Correct would be the following order: Буква, буква, Тест, тест.
Now we'll fix this. It would be perfect if SQLite could sort by this expression (in C#):
string.Compare(x, y, CultureInfo.CreateSpecificCulture("ru-RU"), CompareOptions.IgnoreCase);
I have created a simple class SQLiteCaseInsensitiveCollation that adds collation to SQLite engine:
using System.Data.SQLite;
using System.Globalization;
namespace SQLiteUTF8CIComparison {
///
/// This function adds case-insensitive sort feature to SQLite engine
/// To initialize, use SQLiteFunction.RegisterFunction() before all connections are open
///
[SQLiteFunction(FuncType = FunctionType.Collation, Name = "UTF8CI")]
public class SQLiteCaseInsensitiveCollation : SQLiteFunction {
///
/// CultureInfo for comparing strings in case insensitive manner
///
private static readonly CultureInfo _cultureInfo = CultureInfo.CreateSpecificCulture("ru-RU");
///
/// Does case-insensitive comparison using _cultureInfo
///
/// Left string
/// Right string
/// The result of a comparison
public override int Compare(string x, string y) {
return string.Compare(x, y, _cultureInfo, CompareOptions.IgnoreCase);
}
}
}
Please pay your attention to the attribute:
[SQLiteFunction(FuncType = FunctionType.Collation, Name = "UTF8CI")]
Here we define that:
UTF8CI which means UTF8 Case Insensitive To activate the collation, you need register it in SQLite engine – put this line of code before you create all SQLite connections in your application:
SQLiteFunction.RegisterFunction(typeof(SQLiteCaseInsensitiveCollation));
Please note that the function must be called only once, e.g. when your application starts.
How to use this collation in SQLite?
CREATE TABLE `testtbl` (
`ID` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
`Name` TEXT NOT NULL COLLATE UTF8CI,
`Val` REAL)
As you see everything we need is just replace the collation. Now we also need create an index with this collation:
CREATE INDEX `IDX_testtbl_Name` ON `testtbl` (`Name` COLLATE UTF8CI)
Now the SELECT will give us the result:
| ID | Name | Val (some value stored in the table) | Comment |
|---|---|---|---|
| 2 | Bar | 345 | |
| 3 | bar | 456 | |
| 1 | Foo | 123 | |
| 2 | foo | 234 | |
| 5 | Буква | 567 | That's it! |
| 6 | буква | 678 | |
| 7 | Тест | 789 | |
| 8 | тест | 890 |
Yep! The Russian data is sorted in case-insensitive way, as we wanted. Our index is used because it's case-insensitive too.
What happens if you open SQLite file with UTF8CI encoding without adding our custom encoding? Let's see the result of SELECTs:
SELECT * FROM `testtbl`
It executes normally – neither sort collation, nor index is used.
SELECT * FROM `testtbl` ORDER BY `Name`
Oops. We get an error: No such collation: UTF8CI. To avoid this you can use the following SQL:
SELECT * FROM `testtbl` ORDER BY `Name` COLLATE BINARY
This executes normally and gives the result as usual case-sensitive (binary) sort. Our index with UTF8CI encoding cannot be used: SQLite has no information about the collation of the index.
There are several things we have learned:
Nov. 2009 - Initial revision
General
News
Question
Answer
Joke
Rant
Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads.
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 22 Nov 2009 Editor: Smitha Vijayan |
Copyright 2009 by Dmitry Vitkovsky Everything else Copyright © CodeProject, 1999-2010 Web19 | Advertise on the Code Project |