Click here to Skip to main content
6,822,123 members and growing! (17,208 online)
Email Password   helpLost your password?
Database » Database » Other databases     Intermediate License: The Code Project Open License (CPOL)

Case-Insensitive Sort of UTF8 Data Using System.Data.SQLite

By Dmitry Vitkovsky

SQLite lacks case-insensitive sort of UTF8 data. In this article, you will see how to get rid of this limitation in .NET.
C#, SQL, Windows, WinMobile, .NETCF, .NET, Dev
Revision:3 (See All)
Posted:21 Nov 2009
Updated:22 Nov 2009
Views:2,267
Bookmarked:12 times
Unedited contribution
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
7 votes for this article.
Popularity: 3.80 Rating: 4.50 out of 5

1

2
1 vote, 14.3%
3
1 vote, 14.3%
4
5 votes, 71.4%
5

Introduction

If you are using SQLite in .NET perhaps you could have the problem of case-insensitive sorting of UTF8 strings.

Background

This code is useful if you use System.Data.SQLite, which can be downloaded here: Download System.Data.SQLite. I have been using SQLite for a long time and stored Russian strings in it. As it's stated in SQLite FAQ, case-insensitive matching of Unicode characters does not work. This problem has become very significant when the amount of stored data has grown. System.Data.SQLite provides a cool solution for that but it's unclear how to implement it.

Using the Code

Using the code is pretty simple: just add this line in the beginning of your application:

SQLiteFunction.RegisterFunction(typeof(SQLiteCaseInsensitiveCollation));

The Database

Okay, let us start. First of all, SQLite has three built-in collations (if you need more info, you can read it on SQLite's site):

  • BINARY – all the characters are treated as bytes;
  • NOCASE – compares ASCII characters in case-insensitive manner; others are treated as bytes;
  • RTRIM – just skips trailing spaces.

We want to create the following table:

ID Name Val (some value stored in the table) Comment
1 Foo 123  
2 foo 234  
3 Bar 345  
4 bar 456  
5 Буква 567 A sample Russian word starting from uppercase letter
6 буква 678 The same Russian word starting from lowercase letter
7 Тест 789 Another Russian word starting from uppercase letter
8 тест 890 The same Russian word starting from lowercase letter

Here's SQL to create it:

CREATE TABLE `testtbl` (
    `ID` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    `Name` TEXT NOT NULL COLLATE NOCASE, 
    `Val` REAL)
    

No we'll run a simple SELECT:

SELECT * FROM `testtbl` ORDER BY `Name`

We'll get the result:

ID Name Val (some value stored in the table) Comment
2 Bar 345  
3 bar 456  
1 Foo 123  
2 foo 234  
5 Буква 567 Incorrect sort!
7 Тест 789
6 буква 678
8 тест 890

As you see, data in Russian is sorted incorrectly. Correct would be the following order: Буква, буква, Тест, тест.

Now we'll fix this. It would be perfect if SQLite could sort by this expression (in C#):

string.Compare(x, y, CultureInfo.CreateSpecificCulture("ru-RU"), CompareOptions.IgnoreCase);

Defining Case-Insensitive Collation in C# Code

I have created a simple class SQLiteCaseInsensitiveCollation that adds collation to SQLite engine:

using System.Data.SQLite;
using System.Globalization;

namespace SQLiteUTF8CIComparison {
    /// 
    /// This function adds case-insensitive sort feature to SQLite engine 
    /// To initialize, use SQLiteFunction.RegisterFunction() before all connections are open 
    /// 
    [SQLiteFunction(FuncType = FunctionType.Collation, Name = "UTF8CI")]
    public class SQLiteCaseInsensitiveCollation : SQLiteFunction {
        /// 
        /// CultureInfo for comparing strings in case insensitive manner 
        /// 
        private static readonly CultureInfo _cultureInfo = CultureInfo.CreateSpecificCulture("ru-RU");

        /// 
        /// Does case-insensitive comparison using _cultureInfo 
        /// 
        /// Left string
        /// Right string
        /// The result of a comparison
        public override int Compare(string x, string y) {
            return string.Compare(x, y, _cultureInfo, CompareOptions.IgnoreCase);
        }
    }
}
    

Please pay your attention to the attribute:

[SQLiteFunction(FuncType = FunctionType.Collation, Name = "UTF8CI")]

Here we define that:

  • This is a function that must be added to SQLite engine
  • It's a collation function
  • Its name is UTF8CI which means UTF8 Case Insensitive

To activate the collation, you need register it in SQLite engine – put this line of code before you create all SQLite connections in your application:

SQLiteFunction.RegisterFunction(typeof(SQLiteCaseInsensitiveCollation));

Please note that the function must be called only once, e.g. when your application starts.

Using UTF8CI Collation in SQLite

How to use this collation in SQLite?

CREATE TABLE `testtbl` (
    `ID` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    `Name` TEXT NOT NULL COLLATE UTF8CI, 
    `Val` REAL)

As you see everything we need is just replace the collation. Now we also need create an index with this collation:

CREATE INDEX `IDX_testtbl_Name` ON `testtbl` (`Name` COLLATE UTF8CI)

Performing a Test

Now the SELECT will give us the result:

ID Name Val (some value stored in the table) Comment
2 Bar 345  
3 bar 456  
1 Foo 123  
2 foo 234  
5 Буква 567 That's it!
6 буква 678
7 Тест 789
8 тест 890

Yep! The Russian data is sorted in case-insensitive way, as we wanted. Our index is used because it's case-insensitive too.

Compatibility 

What happens if you open SQLite file with UTF8CI encoding without adding our custom encoding? Let's see the result of SELECTs:

SELECT * FROM `testtbl`

It executes normally – neither sort collation, nor index is used.

SELECT * FROM `testtbl` ORDER BY `Name`

Oops. We get an error: No such collation: UTF8CI. To avoid this you can use the following SQL:

SELECT * FROM `testtbl` ORDER BY `Name` COLLATE BINARY

This executes normally and gives the result as usual case-sensitive (binary) sort. Our index with UTF8CI encoding cannot be used: SQLite has no information about the collation of the index.

Points of Interest

There are several things we have learned:

  • It is possible to use case-insensitive sort for UTF8 data in SQLite
  • You can also create case-insensitive UTF8 index in SQLite
  • You can use my function (with your locale) for creating UTF8CI collation in SQLite
  • Please note: of you will forget to register this function, you will get an exception when reading data
  • The database will be usable without the code of the function; but sort by field will be unavailable only with built-in encodings
  • Any index created with UTF8CI encoding will be disabled

Revision History

Nov. 2009 - Initial revision 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Dmitry Vitkovsky


Member

Occupation: Team Leader
Location: Russian Federation Russian Federation

Other popular Database articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
  (Refresh) 
-- There are no messages in this forum --

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads.

PermaLink | Privacy | Terms of Use
Last Updated: 22 Nov 2009
Editor: Smitha Vijayan
Copyright 2009 by Dmitry Vitkovsky
Everything else Copyright © CodeProject, 1999-2010
Web19 | Advertise on the Code Project