Click here to Skip to main content
12,696,493 members (31,669 online)
Click here to Skip to main content
Add your own
alternative version

Stats

19.4K views
272 downloads
37 bookmarked
Posted

Converting Text Numbers to Numeric Values

, 23 Feb 2015 CPOL
Rate this:
Please Sign up or sign in to vote.
Spell it out! No! Wait!

Introduction

Recently, someone in the Lounge complained that sorting the following text was difficult:

  • First
  • Second
  • Third
  • ...
  • eleventh

The idea was that he wanted to sort them by their numeric equivalent. Well, it's actually NOT all that difficult once you convert the text to their actual numeric values, and even that conversion isn't really difficult. Here is my solution, which is one of many ways to approach the problem.

Assumptions

This code utilizes recursion and extension methods. It is assumed that you are already familiar with the concepts behind these language components.

Background

As with an other programming problem, you usually start with a vague list of requirements, a hazy user story, and (usually) very specific results requirements. This problem follows that paradigm. No programmer that is worth a damn (and left to his own devices) is going to provide a solution that merely meets the stated requirements. In this case, the example data stopped at "eleventh", but that doesn't mean squat. You simply MUST assume that the incoming data is not going to be limited to the provided samples, and besides, you may as well code for as many contingencies as is possible (and reasonable).

Code

To implement this solution, I created three classes. The first one - NumberDictionary - defines a dictionary of terms and their associated decimal values.

public static class NumberDictionary
{
    public static Dictionary<string, decimal=""> Numbers = new Dictionary<string, decimal="">()
    {
        {"ZERO",   0}, {"TEN",       10}, {"HUNDRED",     100},
        {"FIRST",  1}, {"ELEVEN",    11}, {"THOUSAND",    1000},
        {"ONE",    1}, {"TWELF",     12}, {"MILLION",     1000000},
        {"SECOND", 2}, {"TWELVE",    12}, {"BILLION",     1000000000}, //*
        {"TWO",    2}, {"THIRTEEN",  13}, {"MILLIARD",    1000000000},
        {"THIRD",  3}, {"FOURTEEN",  14}, {"TRILLION",    1000000000000}, //*
        {"THREE",  3}, {"FIFTEEN",   15}, {"QUADRILLION", 1000000000000000},
        {"FOUR",   4}, {"SIXTEEN",   16}, {"BILLIARD",    1000000000000000},
        {"FIF",    5}, {"SEVENTEEN", 17}, {"QUINTILLION", 1000000000000000000},
        {"FIVE",   5}, {"EIGHTEEN",  18}, // I had to stop here because even a 
        {"SIX",    6}, {"NINETEEN",  19}, // decimal can't hold a sextillion or 
        {"SEVEN",  7}, {"TWENTY",    20}, // septillion
        {"EIGH",   8}, {"THIRTY",    30},
        {"NIN",    9}, {"FORTY",     40},
        {"NINE",   9}, {"FIFTY",     50},
                       {"SIXTY",     60},
                       {"SEVENTY",   70},
                       {"EIGHTY",    80},
                       {"NINETY",    90},
    };
    // * These values are adjusted if the region is a long-scale region
}
</string,>

People in the US might have noticed the weird ones - "BILLIARD" and "MILLIARD". These terms are used in long-scale countries, where billion and trillion are interpreted quite differently. If you want a lot of info about these and other differences, go to this Wiki page.

Text Handling

This functionality is encompassed in a static string extension class.

It made sense to me to make the outward-facing interface a string extension method. This is a simple method which normalizes the text and converts each text component into a numeric value. You may have already noticed that I'm using a decimal type. I'm doing this because of its ungodly max value which FAR exceeds the currently coded maximum possible value of 999 trillion.

There's really nothing special about this code. I merely replace some stuff, remove some stuff, and add some stuff so that the text is as normalized as I can reasonable make it. Once that's done, I convert each component "word" into its numeric equivalent by trying to find it in the numbers Dictionary. If I can't find even one of the component words in the dictionary, an exception is thrown. Once we're past the normalization/parsing part, we do the math to arrive at the numeric value.

UPDATE, 13 Feb 2015 - Apparently, everybody except the US interprets the value of billion and trillion incorrectly (grin). The difference is referred to as short-scale (US) or long-scale (GBR, et al). To appease the guys that mentioned it, I came up with a solution. I moved the numbers dictionary to its own class, and added a method here to adjust for long scale regions. This new method is called before we do anything else. I had neither the time nor desire to include all long-scale regions, so I left it as a very minor exercise to the programmer to add his/her 3-letter ISO region code to the longScaleRegions string. I also added some code to replace "AND" with a space, in case someone tries to do something like "one hundred and five".

public static class ExtendString
{
    private static IntList values;

    private static void AdjustForLongScale()
    {
        string longScaleRegions = "GBR,";
        if (longScaleRegions.Contains(RegionInfo.CurrentRegion.ThreeLetterISORegionName))
        {
            numbers["BILLION"]  = numbers["MILLION"] * numbers["MILLION"];
            numbers["TRILLION"] = numbers["BILLION"] * numbers["MILLION"];
        }
    }

    public static decimal Translate(this string text)
    {
        ExtendString.AdjustForLongScale();
        text = text.ToUpper().Trim();
        string trimChars = "TH";
        text = text.Replace("TY", "TY ");
        text = text.Replace("-"," ").Replace("_"," ").Replace("."," ").Replace(",", " ");
        text = text.Replace(" AND", " ");
        if (text.EndsWith(trimChars))
        {
            text = text.TrimEnd(trimChars.ToArray());
        }
        text = text.Replace("  ", " ");

        values = new IntList();
        string[] parts = text.Split(' ');
        foreach (string numberText in parts)
        {

            if (numbers.Keys.Contains(numberText))
            {
                values.Add(numbers[numberText]);
            }
            else
            {
                throw new Exception("Not a number (might be spelled wrong)");
            }
        }
        return values.NumericValue;
    }
}

The Math

I created a class derived from List<decimal> that contains the parsed numeric values, and performs operations on those values. This effectively separates and contains what I consider to be discrete functionality. It also goes a long way toward keeping the extension method free of clutter. Give then text "four hundred twenty nine thousand six hundred six", the contents of the list starts out like this after parsing:

  • [0] 4
  • [1] 100
  • [2] 20
  • [3] 9
  • [4] 1000
  • [5] 6
  • [6] 100
  • [7] 6

Given the domain of the problem, we should never really have many more that 15 or so elements in the list (more if you want to support more than 999 trillion), so we don't have to concern ourselves with the size of this list.

The first thing we do is break the list up into smaller chunks. This just makes it easier to manage. There's a"mini list" for each major value break, and are named to indicate their numeric purpose.

private List<decimal> trillions = new List<decimal>();
private List<decimal> billions = new List<decimal>();
private List<decimal> millions = new List<decimal>();
private List<decimal> thousands = new List<decimal>();
private List<decimal> hundreds = new List<decimal>();

NOTE: Notice that the mini-lists don't inlcude anything over a trillion, and don't include the long-scale-specific terms. Feel free to add them if you want to.

When the text has been parsed, the extension method retrieves the IntList.NumericValue property. Several actions are taken in order to retrieve the expected numeric value.

public decimal NumericValue
{
    get
    {
        this.BuildMiniLists();
        this.DoMiniMaths();
        this.AddMiniValues();
        decimal value = this.Sum(x=>x);
        return value;
    }
}

Each of the mini lists are populated (if necessary). We do this by looking for the big value and copying it - and all of the receding values - to the appropriate mini-list.

UPDATE - 13 Feb 2015 - I changed the call to BuildMiniList to refer to the NumberDictionary values. This only made sense since you could be in a long-scale region where the values can change for some of the dictionary numbers.

private void BuildMiniLists() 
{ 
    this.BuildMiniList(this.trillions, NumberDictionary.Numbers["TRILLION"]);
    this.BuildMiniList(this.billions, NumberDictionary.Numbers["BILLION"]);
    this.BuildMiniList(this.millions, NumberDictionary.Numbers["MILLION"]);
    this.BuildMiniList(this.thousands, NumberDictionary.Numbers["THOUSAND"]);
    this.BuildMiniList(this.hundreds, NumberDictionary.Numbers["HUNDRED"]);
}

private void BuildMiniList(List<decimal> list, decimal amount) { 
    // find the index of the specified amount
    int index = this.IndexOf(amount); 
    // if we have an index if (index >= 0) 
    {
        // copy the values from 0 to the found index into the 
        // specified mini list 
        decimal[] values;
        values = new decimal[index+1];
        this.CopyTo(0, values, 0, index+1);
        list.AddRange(values);
        // and remove those items from the parent list 
        for (int i = index; i >= 0; i--)
        {
            this.RemoveAt(i);
        }
    }
}

Next, math is performed on each mini list. The DoMath method is recursive, and iterates through the specified list looking for this to multiple or add. Using our example above, the thousands mini list will look like this:

  • [0] 4
  • [1] 100
  • [2] 20
  • [3] 9
  • [4] 1000

The DoMath method below will build the appropriate value by reading each value and either adding or multiplying it into the resulting value. The processing will do this:

    ((4 * 100) + 20 + 9) * 1000) = 429000

p>After process, the mini list will look like this:

 

  • [0] 0
  • [1] 0
  • [2] 0
  • [3] 0
  • [4] 429000
private void DoMiniMaths()
{
    this.DoMath(this.trillions, 0);
    this.DoMath(this.billions,  0);
    this.DoMath(this.millions,  0);
    this.DoMath(this.thousands, 0);
    this.DoMath(this.hundreds,  0);
}

private void DoMath(List<decimal> list, int lastBigIndex)
{
    if (list.Count > 0)
    {
        decimal rollingValue = 0;
        decimal[] big = new decimal[]{100,1000,1000000,1000000000,1000000000000};
        for (int j = 0; j < list.Count; j++)
        {
            if (big.Contains(list[j]))
            {
                rollingValue = Math.Max(1, rollingValue) * list[j];
                list[j] = rollingValue;
                for (int i = lastBigIndex; i < j; i++)
                {
                    list[i] = 0;
                }
                lastBigIndex = j;
                if (j < list.Count - 1)
                {
                    this.DoMath(list, lastBigIndex);
                }
            }
            else
            {
                rollingValue += list[j];
            }
        }
    }
}

After the math has been performed on each mini-list, we sum up the values inside each list, and add each sum back to the parent list.

private void AddMiniValues()
{
    this.Add(this.trillions.Sum(x=>x));
    this.Add(this.billions.Sum(x=>x));
    this.Add(this.millions.Sum(x=>x));
    this.Add(this.thousands.Sum(x=>x));
    this.Add(this.hundreds.Sum(x=>x));
}

After summing the mini-lists, the parent list should have the following contents:

  • [0] 6
  • [1] 100
  • [2] 6
  • [3] 429000

When the values in the parent list are summed, we arrive at the result of "429606". Now that we've written all of that code, I leave it as an exercise for the programmer to associate the resulting numeric value with the text for sorting.

Using the Code

Usage goes something like this (from a console app):

class Program
{
    static void Main(string[] args)
    {
        Translate("twelfth");
        Translate("First");
        Translate("One Hundredth");
        Translate("five thousand seven Hundred thirtysecond");
        Translate("four hundred twenty nine thousand");
        Translate("four hundred twenty nine thousand six hundred six");
        Translate("fortyfive");
        Translate("twenty seven million two hundred thirtyfour thousand one");
        Console.ReadKey();
    }

    static void Translate(string text)
    {
        decimal value = text.Translate();
        Console.WriteLine(string.Format("{0} - {1:#,##0}", text, value));
    }
}

Points of Interest

Nothing particularly wacky or unexpected occurred while writing this code, but it was interesting and allowed me to do some mental exercise outside my normal work-a-day life. I am programmer.

History

  • 13 Feb 2015 (B)  Added support for long-scale regions
  • 13 Feb 2015 (A)  Formatted a code block and fixed some misspellings.
  • 12 Feb 2015  Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

John Simmons / outlaw programmer
Software Developer (Senior) Paddedwall Software
United States United States
I've been paid as a programmer since 1982 with experience in Pascal, and C++ (both self-taught), and began writing Windows programs in 1991 using Visual C++ and MFC. In the 2nd half of 2007, I started writing C# Windows Forms and ASP.Net applications, and have since done WPF, Silverlight, WCF, web services, and Windows services.

My weakest point is that my moments of clarity are too brief to hold a meaningful conversation that requires more than 30 seconds to complete. Thankfully, grunts of agreement are all that is required to conduct most discussions without committing to any particular belief system.

You may also be interested in...

Pro
Pro

Comments and Discussions

 
QuestionNegative numbers Pin
Forogar 25-Oct-15 6:08
member Forogar 25-Oct-15 6:08 
QuestionFour and twenty blackbirds baked in pie... Pin
Forogar 25-Oct-15 5:24
member Forogar 25-Oct-15 5:24 
AnswerRe: Four and twenty blackbirds baked in pie... Pin
John Simmons / outlaw programmer28-Oct-15 6:19
memberJohn Simmons / outlaw programmer28-Oct-15 6:19 
GeneralRe: Four and twenty blackbirds baked in pie... Pin
Forogar 28-Oct-15 8:54
member Forogar 28-Oct-15 8:54 
GeneralMy vote of 5 Pin
richard_x8623-Feb-15 5:05
memberrichard_x8623-Feb-15 5:05 
GeneralRe: My vote of 5 Pin
John Simmons / outlaw programmer23-Feb-15 6:57
memberJohn Simmons / outlaw programmer23-Feb-15 6:57 
GeneralGreat article Pin
skandland23-Feb-15 4:27
memberskandland23-Feb-15 4:27 
QuestionWhere is the code ? Pin
SAM LIVE22-Feb-15 22:43
memberSAM LIVE22-Feb-15 22:43 
AnswerRe: Where is the code ? Pin
John Simmons / outlaw programmer23-Feb-15 3:37
memberJohn Simmons / outlaw programmer23-Feb-15 3:37 
AnswerRe: Where is the code ? Pin
Pete O'Hanlon23-Feb-15 4:07
protectorPete O'Hanlon23-Feb-15 4:07 
GeneralMy vote of 5 Pin
Omar Gameel Salem21-Feb-15 19:52
professionalOmar Gameel Salem21-Feb-15 19:52 
QuestionVery nice article Pin
Sacha Barber19-Feb-15 22:32
mvpSacha Barber19-Feb-15 22:32 
QuestionFormatting Pin
Kenneth Haugland13-Feb-15 0:53
professionalKenneth Haugland13-Feb-15 0:53 
AnswerRe: Formatting Pin
John Simmons / outlaw programmer13-Feb-15 1:09
memberJohn Simmons / outlaw programmer13-Feb-15 1:09 
GeneralRe: Formatting Pin
Kenneth Haugland13-Feb-15 1:20
professionalKenneth Haugland13-Feb-15 1:20 
GeneralRe: Formatting Pin
John Simmons / outlaw programmer13-Feb-15 1:34
memberJohn Simmons / outlaw programmer13-Feb-15 1:34 
GeneralRe: Formatting Pin
Kenneth Haugland13-Feb-15 2:53
professionalKenneth Haugland13-Feb-15 2:53 
QuestionWow Pin
Avijnata12-Feb-15 17:27
professionalAvijnata12-Feb-15 17:27 
AnswerRe: Wow Pin
Garth J Lancaster12-Feb-15 17:42
memberGarth J Lancaster12-Feb-15 17:42 
GeneralRe: Wow Pin
Avijnata12-Feb-15 18:00
professionalAvijnata12-Feb-15 18:00 
AnswerRe: Wow Pin
PIEBALDconsult12-Feb-15 20:22
protectorPIEBALDconsult12-Feb-15 20:22 
GeneralRe: Wow Pin
John Brett12-Feb-15 22:02
memberJohn Brett12-Feb-15 22:02 
GeneralRe: Wow Pin
John Simmons / outlaw programmer13-Feb-15 1:35
memberJohn Simmons / outlaw programmer13-Feb-15 1:35 
AnswerRe: Wow Pin
John Simmons / outlaw programmer13-Feb-15 0:53
memberJohn Simmons / outlaw programmer13-Feb-15 0:53 
GeneralRe: Wow Pin
Avijnata13-Feb-15 1:45
professionalAvijnata13-Feb-15 1:45 
AnswerRe: Wow Pin
John Simmons / outlaw programmer13-Feb-15 5:42
memberJohn Simmons / outlaw programmer13-Feb-15 5:42 
GeneralRe: Wow Pin
Avijnata13-Feb-15 6:04
professionalAvijnata13-Feb-15 6:04 
QuestionWhat if my number is a billion zillion and 3 Pin
Slacker00712-Feb-15 10:53
professionalSlacker00712-Feb-15 10:53 
GeneralRe: What if my number is a billion zillion and 3 Pin
PIEBALDconsult12-Feb-15 11:09
protectorPIEBALDconsult12-Feb-15 11:09 
GeneralRe: What if my number is a billion zillion and 3 Pin
John Simmons / outlaw programmer12-Feb-15 11:33
memberJohn Simmons / outlaw programmer12-Feb-15 11:33 
GeneralRe: What if my number is a billion zillion and 3 Pin
PIEBALDconsult12-Feb-15 11:36
protectorPIEBALDconsult12-Feb-15 11:36 
GeneralRe: What if my number is a billion zillion and 3 Pin
John Simmons / outlaw programmer12-Feb-15 11:38
memberJohn Simmons / outlaw programmer12-Feb-15 11:38 
GeneralRe: What if my number is a billion zillion and 3 Pin
Kenneth Haugland12-Feb-15 11:46
professionalKenneth Haugland12-Feb-15 11:46 
GeneralRe: What if my number is a billion zillion and 3 Pin
PIEBALDconsult12-Feb-15 12:19
protectorPIEBALDconsult12-Feb-15 12:19 
GeneralRe: What if my number is a billion zillion and 3 Pin
John Simmons / outlaw programmer13-Feb-15 4:28
memberJohn Simmons / outlaw programmer13-Feb-15 4:28 
GeneralRe: What if my number is a billion zillion and 3 Pin
Chris Copeland12-Feb-15 14:13
professionalChris Copeland12-Feb-15 14:13 
GeneralRe: What if my number is a billion zillion and 3 Pin
Steven Nicholas13-Feb-15 0:16
memberSteven Nicholas13-Feb-15 0:16 
GeneralRe: What if my number is a billion zillion and 3 Pin
Pete O'Hanlon23-Feb-15 4:06
protectorPete O'Hanlon23-Feb-15 4:06 
GeneralRe: What if my number is a billion zillion and 3 Pin
Kenneth Haugland12-Feb-15 11:42
professionalKenneth Haugland12-Feb-15 11:42 
AnswerRe: What if my number is a billion zillion and 3 Pin
John Simmons / outlaw programmer13-Feb-15 6:12
memberJohn Simmons / outlaw programmer13-Feb-15 6:12 
GeneralRe: What if my number is a billion zillion and 3 Pin
Slacker00713-Feb-15 6:15
professionalSlacker00713-Feb-15 6:15 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170118.1 | Last Updated 23 Feb 2015
Article Copyright 2015 by John Simmons / outlaw programmer
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid