Click here to Skip to main content
Click here to Skip to main content

The Pauper Man Dictionary

By , 14 Aug 2009
 
The_Pauper_Man_Dictionary

Introduction

The Pauper Man Dictionary is a Windows Mobile 2003 Phone Application for an English-English Dictionary. The idea came to my mind from mixing the articles Google Suggest like Dictionary and Dictionary for Google Suggest like Dictionary and due to the necessity of an English dictionary in my old Windows mobile cell phone.

Background

The Visual Studio 2008 solution includes two projects.

The first one is a WinForms application for downloading the data from The Online Plain Text English Dictionary that is based on "The Project Gutenberg Etext of Webster's Unabridged Dictionary" which in turn is based on the 1913 US Webster's Unabridged Dictionary and is used to create the SQLite.Net database file. 

The second project is the PPC implementation for use in a Windows Mobile 2003 cell phone using the same SQLite database file.

Using the Code

The first problem to solve is “to read” the HTML page and split each word in order to accommodate into DB file. For a better performance of the application, I am using a background worker control to use another thread for the download and word processing. Additionally, it is necessary to remove all HTML tags from the page. I found a good example here.

 class HTMLremover
    {
        /// <summary>
        /// Remove HTML from string with Regex.
        /// </summary>
        public static string StripTagsRegex(string source)
        {
            return Regex.Replace(source, "<.*?>", string.Empty);
        }

        /// <summary>
        /// Compiled regular expression for performance.
        /// </summary>
        static Regex _htmlRegex = new Regex("<.*?>", RegexOptions.Compiled);

        /// <summary>
        /// Remove HTML from string with compiled Regex.
        /// </summary>
        public static string StripTagsRegexCompiled(string source)
        {
            return _htmlRegex.Replace(source, string.Empty);
        }

        /// <summary>
        /// Remove HTML tags from string using char array.
        /// </summary>
        public static string StripTagsCharArray(string source)
        {
            char[] array = new char[source.Length];
            int arrayIndex = 0;
            bool inside = false;

            for (int i = 0; i < source.Length; i++)
            {
                char let = source[i];
                if (let == '<')
                {
                    inside = true;
                    continue;
                }
                if (let == '>')
                {
                    inside = false;
                    continue;
                }
                if (!inside)
                {
                    array[arrayIndex] = let;
                    arrayIndex++;
                }
            }
            return new string(array, 0, arrayIndex);
        }
    }

After I analyzed the text in pages, I found that the characters ‘(‘and ‘)’ are the key to solve the problem of word processing.

This is the portion of the code where the main work takes place:

/// <summary>
/// Method to download data and insert to DB
/// </summary>
/// <param name="worker"></param>
/// <param name="e"></param>
/// <returns></returns>
private bool DowloadData(BackgroundWorker worker, DoWorkEventArgs e)
{
    string[] dataReturn = new string[2];

    int wordCount = 0;

    for (int asciiCode = 97; asciiCode <= 122; asciiCode++) //Processing from 'a' to 'z'
    {
        char page = (char)asciiCode;
        string connString = "Data Source = dict.db";
        SQLiteConnection sqConnection = new SQLiteConnection(connString);
        sqConnection.Open();

        dataReturn[0] = wordCount.ToString();
        dataReturn[1] = page.ToString();

        worker.ReportProgress(0, dataReturn);

        SQLiteTransaction sqTrans = 
        sqConnection.BeginTransaction(System.Data.IsolationLevel.ReadCommitted);

        SQLiteCommand sqCommand = new SQLiteCommand();

        sqCommand.Transaction = sqTrans;
        sqCommand.Connection = sqConnection;

        sqCommand.Parameters.Add(new SQLiteParameter());
        sqCommand.Parameters.Add(new SQLiteParameter());
        sqCommand.Parameters.Add(new SQLiteParameter());

        WebRequest request = WebRequest.Create
        		("http://www.mso.anu.edu.au/~ralph/OPTED/v003/wb1913_" + 
		page.ToString() + ".html");
        request.Credentials = CredentialCache.DefaultCredentials;
        request.Proxy.Credentials = CredentialCache.DefaultNetworkCredentials;

        WebResponse response = request.GetResponse();

        StreamReader responseReader =
            new StreamReader(response.GetResponseStream());

        string responseData = responseReader.ReadToEnd();

        //Remove tags
        string textInPage = HTMLremover.StripTagsRegex(responseData);
        StreamWriter tempOutput = new StreamWriter("temp.txt");
        tempOutput.Write(textInPage);
        tempOutput.Close();
        int letterSize = textInPage.Length;  //Used to calculate % of the letter
        StreamReader text = new StreamReader("temp.txt");

        //Add data to DB
        string line;
        int textProcessed = 0;

        try
        {
            while ((line = text.ReadLine()) != null)
            {
                textProcessed += line.Length;
                int percentage = (int)(textProcessed * 100 / letterSize);

                if (line != string.Empty && line.Contains('('))
                {
                    string[] field = new string[3];
                    field[0] = string.Empty;
                    field[1] = string.Empty;
                    field[2] = string.Empty;

                    char[] letters = line.ToCharArray();

                    int fieldNumber = 0;

                    foreach (char character in letters)
                    {
                        if (fieldNumber == 0 && character == '(')
                        {
                            fieldNumber++;
                        }

                        field[fieldNumber] += character.ToString();

                        if (fieldNumber == 1 && character == ')')
                        {
                            fieldNumber++;
                        }
                    }

                    if (field[0].Length < 30)
                    {
                        dataReturn[0] = wordCount.ToString();
                        dataReturn[1] = page.ToString();
                        worker.ReportProgress(percentage, dataReturn);

                        wordCount++;

                        sqCommand.Parameters[0].Value = field[0];
                        sqCommand.Parameters[1].Value = field[1];
                        sqCommand.Parameters[2].Value = field[2];

                        sqCommand.CommandText =
                                @"INSERT INTO [dict] ([word], [type], [mean]) " +
                                "VALUES (?, ?, ?)";
                        sqCommand.ExecuteNonQuery();
                    }
                }
            }

            dataReturn[0] = wordCount.ToString();
            dataReturn[1] = page.ToString();
            worker.ReportProgress(100, dataReturn);
            sqTrans.Commit();
        }
        catch (Exception ex)
        {
            MessageBox.Show(ex.ToString());
        }
        finally
        {
            sqConnection.Close();
        }

        text.Close();
    }
    File.Delete("temp.txt");
    return true;
}	

Points of Interest

This code is very useful if you want to check out how to open and “read” an internet page inside your code. Or if you want to check how to use backgroundworker control for receiving additional information and not only the percentage of advance of the process. Additionally it shows how to remove the HTML tags from “downloaded” internet pages. It is using SQLite.Net for DB work in both platforms, Win 7 (that’s what I'm using) and Win Mobile. At this point, it is good to mention that there is a speed problem with “SELECT” clause in SQLite.Net.

Finally, this code is a simple way to show you how to build your own dictionary in your old cell phone. But, if you want a free better one (without the source code), you can visit MDict.

History

  • 14th August, 2009: Initial post

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)

About the Author

ignotus confutatis
Software Developer
Mexico Mexico
Member
Civil Engineer and C# Developer

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5mvpDalek Dave9 Jan '11 - 14:27 
GeneralRe: My vote of 5memberEber Ramirez31 Jan '11 - 11:41 
GeneralSimilar problemmemberssonby3 Jun '10 - 19:31 
Hi,
   Even I wanted to make a words-related app in C#
 
My application needs a database that contains a simple English wordlist. No need for meaning.
 
even simple wordlists in other formats (.txt,.pdf) will do
 
Any ideas how to get it?
GeneralRe: Similar problemmemberEber Ramirez4 Jul '10 - 19:09 
GeneralRe: My vote of 1memberEber Ramirez28 Dec '09 - 15:55 
GeneralRe: My vote of 1memberAli BaderEddin12 Feb '10 - 12:27 
GeneralRe: My vote of 1mentorTrollslayer27 Jan '10 - 9:31 
GeneralVery cool idea!memberddarko10022 Nov '09 - 22:04 
GeneralRe: Very cool idea!memberEber Ramirez30 Nov '09 - 18:11 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130516.1 | Last Updated 14 Aug 2009
Article Copyright 2009 by ignotus confutatis
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid