Click here to Skip to main content
15,867,594 members
Articles / General Programming / String

Converting Text Numbers to Numeric Values

Rate me:
Please Sign up or sign in to vote.
4.96/5 (31 votes)
27 Feb 2018CPOL6 min read 59.3K   691   45   45
Spell it out! No! Wait!

Introduction

Recently, someone in the Lounge complained that sorting the following text was difficult:

  • First
  • Second
  • Third
  • ...
  • Eleventh

The idea was that he wanted to sort the list above by their numeric equivalent. Well, it's actually NOT all that difficult once you convert the text to their actual numeric values, and even that conversion isn't really difficult. Here is my solution, which is probably one of many ways to approach the problem. If you have another way to do it, please feel free to publish an article.

Assumptions

This code utilizes recursion and extension methods. It is assumed that you are already familiar with the concepts behind these language features.

Background

As with any other programming problem, you typically start with a vague list of requirements, a hazy user story, and (usually) very specific results requirements - traits often associated with homework assignments, but that also, more often than not, reflect real-world conditions during your employment. This problem follows that paradigm. No programmer that is worth a damn (and left to his own devices) is going to provide a solution that merely meets the stated requirements. In this case, the example data stopped at "eleventh", but that doesn't mean squat. You simply MUST assume that the incoming data is not going to be limited to the provided samples, and besides, you may as well code for as many contingencies as is possible (and reasonable).

Code

To implement this solution, I created three classes. The first one - NumberDictionary - defines a dictionary of terms and their associated decimal values.

C#
public static class NumberDictionary
{
    public static Dictionary<string, decimal=""> Numbers = new Dictionary<string, decimal="">()
    {
        {"ZERO",   0}, {"TEN",       10}, {"HUNDRED",     100},
        {"FIRST",  1}, {"ELEVEN",    11}, {"THOUSAND",    1000},
        {"ONE",    1}, {"TWELF",     12}, {"MILLION",     1000000},
        {"SECOND", 2}, {"TWELVE",    12}, {"BILLION",     1000000000}, //*
        {"TWO",    2}, {"THIRTEEN",  13}, {"MILLIARD",    1000000000},
        {"THIRD",  3}, {"FOURTEEN",  14}, {"TRILLION",    1000000000000}, //*
        {"THREE",  3}, {"FIFTEEN",   15}, {"QUADRILLION", 1000000000000000},
        {"FOUR",   4}, {"SIXTEEN",   16}, {"BILLIARD",    1000000000000000},
        {"FIF",    5}, {"SEVENTEEN", 17}, {"QUINTILLION", 1000000000000000000},
        {"FIVE",   5}, {"EIGHTEEN",  18}, // I had to stop here because even a 
        {"SIX",    6}, {"NINETEEN",  19}, // decimal can't hold a sextillion or 
        {"SEVEN",  7}, {"TWENTY",    20}, // septillion
        {"EIGH",   8}, {"THIRTY",    30},
        {"NIN",    9}, {"FORTY",     40},
        {"NINE",   9}, {"FIFTY",     50},
                       {"SIXTY",     60},
                       {"SEVENTY",   70},
                       {"EIGHTY",    80},
                       {"NINETY",    90},
    };
    // * These values are adjusted if the region is a long-scale region
}
</string,></string,>

People in the US might have noticed the weird ones - "BILLIARD" and "MILLIARD". These terms are used in long-scale countries, where billion and trillion are interpreted quite differently. If you want a lot of info about these and other differences, go to this Wiki page.

Text Handling

This functionality is encompassed in a static string extension class.

It made sense to me to make the outward-facing interface a string extension method. This is a simple method which normalizes the text and converts each text component into a numeric value. You may have already noticed that I'm using a decimal type. I'm doing this because of its ungodly max value which FAR exceeds the currently coded maximum possible value of 999 trillion.

There's really nothing special about this code. I merely replace some stuff, remove some stuff, and add some stuff (don't you love to hear programmers trivialize code?) so that the text is as normalized as I can reasonable make it. Once that's done, I convert each component "word" into its numeric equivalent by trying to find it in the numbers Dictionary. If I can't find even one of the component words in the dictionary, an exception is thrown. Once we're past the normalization/parsing part, we do the math to arrive at the numeric value.

UPDATE, 13 Feb 2015 - Apparently, everybody except the US interprets the value of billion and trillion incorrectly (grin). The difference is referred to as short-scale (US) or long-scale (GBR, et al). To appease the guys that mentioned it, I came up with a solution. I moved the numbers dictionary to its own class, and added a method here to adjust for long scale regions. This new method is called before we do anything else. I had neither the time nor desire to include all long-scale regions, so I left it as a very minor exercise to the programmer to add his/her 3-letter ISO region code to the longScaleRegions string. I also added some code to replace "AND" with a space, in case someone tries to do something like "one hundred and five".

C#
public static class ExtendString
{
    private static IntList values;

    private static void AdjustForLongScale()
    {
        string longScaleRegions = "GBR,";
        if (longScaleRegions.Contains(RegionInfo.CurrentRegion.ThreeLetterISORegionName))
        {
            numbers["BILLION"]  = numbers["MILLION"] * numbers["MILLION"];
            numbers["TRILLION"] = numbers["BILLION"] * numbers["MILLION"];
        }
    }

    public static decimal Translate(this string text)
    {
        ExtendString.AdjustForLongScale();
        text = text.ToUpper().Trim();
        string trimChars = "TH";
        text = text.Replace("TY", "TY ");
        text = text.Replace("-"," ").Replace("_"," ").Replace("."," ").Replace(",", " ");
        text = text.Replace(" AND", " ");
        if (text.EndsWith(trimChars))
        {
            text = text.TrimEnd(trimChars.ToArray());
        }
        text = text.Replace("  ", " ");

        values = new IntList();
        string[] parts = text.Split(' ');
        foreach (string numberText in parts)
        {

            if (numbers.Keys.Contains(numberText))
            {
                values.Add(numbers[numberText]);
            }
            else
            {
                throw new Exception("Not a number (might be spelled wrong)");
            }
        }
        return values.NumericValue;
    }
}

The Math

I created a class derived from List<decimal> that contains the parsed numeric values, and performs operations on those values. This effectively separates and contains what I consider to be discrete functionality. It also goes a long way toward keeping the extension method free of clutter. Given the textm "four hundred twenty nine thousand six hundred six", the contents of the list starts out like this after parsing:

  • [0] 4
  • [1] 100
  • [2] 20
  • [3] 9
  • [4] 1000
  • [5] 6
  • [6] 100
  • [7] 6

Given the domain of the problem, we should never really have many more that 15 or so elements in the list (more if you want to support more than 999 trillion), so we don't have to concern ourselves with the size of this list.

The first thing we do is break the list up into smaller chunks. This just makes it easier to manage. There's a"mini list" for each major value break, and are named to indicate their numeric purpose.

C#
private List<decimal> trillions = new List<decimal>();
private List<decimal> billions = new List<decimal>();
private List<decimal> millions = new List<decimal>();
private List<decimal> thousands = new List<decimal>();
private List<decimal> hundreds = new List<decimal>();

NOTE: Notice that the mini-lists don't inlcude anything over a trillion, and don't include the long-scale-specific terms. Feel free to add them if you want to.

When the text has been parsed, the extension method retrieves the IntList.NumericValue property. Several actions are taken in order to retrieve the expected numeric value.

C#
public decimal NumericValue
{
    get
    {
        this.BuildMiniLists();
        this.DoMiniMaths();
        this.AddMiniValues();
        decimal value = this.Sum(x=>x);
        return value;
    }
}

Each of the mini lists are populated (if necessary). We do this by looking for the big value and copying it - and all of the receding values - to the appropriate mini-list.

UPDATE - 13 Feb 2015 - I changed the call to BuildMiniList to refer to the NumberDictionary values. This only made sense since you could be in a long-scale region where the values can change for some of the dictionary numbers.

C#
private void BuildMiniLists() 
{ 
    this.BuildMiniList(this.trillions, NumberDictionary.Numbers["TRILLION"]);
    this.BuildMiniList(this.billions, NumberDictionary.Numbers["BILLION"]);
    this.BuildMiniList(this.millions, NumberDictionary.Numbers["MILLION"]);
    this.BuildMiniList(this.thousands, NumberDictionary.Numbers["THOUSAND"]);
    this.BuildMiniList(this.hundreds, NumberDictionary.Numbers["HUNDRED"]);
}

private void BuildMiniList(List<decimal> list, decimal amount) { 
    // find the index of the specified amount
    int index = this.IndexOf(amount); 
    // if we have an index if (index >= 0) 
    {
        // copy the values from 0 to the found index into the 
        // specified mini list 
        decimal[] values;
        values = new decimal[index+1];
        this.CopyTo(0, values, 0, index+1);
        list.AddRange(values);
        // and remove those items from the parent list 
        for (int i = index; i >= 0; i--)
        {
            this.RemoveAt(i);
        }
    }
}

Next, math is performed on each mini list. The DoMath method is recursive, and iterates through the specified list looking for this to multiply or add. Using our example above, the thousands mini list will look like this:

  • [0] 4
  • [1] 100
  • [2] 20
  • [3] 9
  • [4] 1000

The DoMath method below will build the appropriate value by reading each value and either adding or multiplying it into the resulting value. The processing will do this:

    ((4 * 100) + 20 + 9) * 1000) = 429000

After process, the mini list will look like this:

  • [0] 0
  • [1] 0
  • [2] 0
  • [3] 0
  • [4] 429000
C#
private void DoMiniMaths()
{
    this.DoMath(this.trillions, 0);
    this.DoMath(this.billions,  0);
    this.DoMath(this.millions,  0);
    this.DoMath(this.thousands, 0);
    this.DoMath(this.hundreds,  0);
}

private void DoMath(List<decimal> list, int lastBigIndex)
{
    if (list.Count > 0)
    {
        decimal rollingValue = 0;
        decimal[] big = new decimal[]{100,1000,1000000,1000000000,1000000000000};
        for (int j = 0; j < list.Count; j++)
        {
            if (big.Contains(list[j]))
            {
                rollingValue = Math.Max(1, rollingValue) * list[j];
                list[j] = rollingValue;
                for (int i = lastBigIndex; i < j; i++)
                {
                    list[i] = 0;
                }
                lastBigIndex = j;
                if (j < list.Count - 1)
                {
                    this.DoMath(list, lastBigIndex);
                }
            }
            else
            {
                rollingValue += list[j];
            }
        }
    }
}

After the math has been performed on each mini-list, we sum up the values inside each list, and add each sum back to the parent list.

C#
private void AddMiniValues()
{
    this.Add(this.trillions.Sum(x=>x));
    this.Add(this.billions.Sum(x=>x));
    this.Add(this.millions.Sum(x=>x));
    this.Add(this.thousands.Sum(x=>x));
    this.Add(this.hundreds.Sum(x=>x));
}

After summing the mini-lists, the parent list should have the following contents:

  • [0] 6
  • [1] 100
  • [2] 6
  • [3] 429000

When the values in the parent list are summed, we arrive at the result of "429606". Now that we've written all of that code, I leave it as an exercise for the programmer to associate the resulting numeric value with the text for sorting.

Using the Code

Usage goes something like this (from a console app):

C#
class Program
{
    static void Main(string[] args)
    {
        Translate("twelfth");
        Translate("First");
        Translate("One Hundredth");
        Translate("five thousand seven Hundred thirtysecond");
        Translate("four hundred twenty nine thousand");
        Translate("four hundred twenty nine thousand six hundred six");
        Translate("fortyfive");
        Translate("twenty seven million two hundred thirtyfour thousand one");
        Console.ReadKey();
    }

    static void Translate(string text)
    {
        decimal value = text.Translate();
        Console.WriteLine(string.Format("{0} - {1:#,##0}", text, value));
    }
}

Points of Interest

Nothing particularly wacky or unexpected occurred while writing this code, but it was interesting and allowed me to do some mental exercise outside my normal work-a-day life. I am programmer.

History

  • 27 Feb 2018%nbsp; Fixed some spelling mistakes and partial HTML tags.
  • 13 Feb 2015 (B)  Added support for long-scale regions
  • 13 Feb 2015 (A)  Formatted a code block and fixed some misspellings.
  • 12 Feb 2015  Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) Paddedwall Software
United States United States
I've been paid as a programmer since 1982 with experience in Pascal, and C++ (both self-taught), and began writing Windows programs in 1991 using Visual C++ and MFC. In the 2nd half of 2007, I started writing C# Windows Forms and ASP.Net applications, and have since done WPF, Silverlight, WCF, web services, and Windows services.

My weakest point is that my moments of clarity are too brief to hold a meaningful conversation that requires more than 30 seconds to complete. Thankfully, grunts of agreement are all that is required to conduct most discussions without committing to any particular belief system.

Comments and Discussions

 
Questionvalues are not correct for all country Pin
JimmyO28-Feb-18 11:59
JimmyO28-Feb-18 11:59 
AnswerRe: values are not correct for all country Pin
#realJSOP28-Feb-18 13:40
mve#realJSOP28-Feb-18 13:40 
GeneralSlow day, John? Pin
PIEBALDconsult27-Feb-18 9:13
mvePIEBALDconsult27-Feb-18 9:13 
GeneralRe: Slow day, John? Pin
#realJSOP28-Feb-18 3:31
mve#realJSOP28-Feb-18 3:31 
QuestionNegative numbers Pin
  Forogar  25-Oct-15 5:08
professional  Forogar  25-Oct-15 5:08 
QuestionFour and twenty blackbirds baked in pie... Pin
  Forogar  25-Oct-15 4:24
professional  Forogar  25-Oct-15 4:24 
AnswerRe: Four and twenty blackbirds baked in pie... Pin
#realJSOP28-Oct-15 5:19
mve#realJSOP28-Oct-15 5:19 
GeneralRe: Four and twenty blackbirds baked in pie... Pin
  Forogar  28-Oct-15 7:54
professional  Forogar  28-Oct-15 7:54 
GeneralMy vote of 5 Pin
jaguar8423-Feb-15 4:05
professionaljaguar8423-Feb-15 4:05 
GeneralRe: My vote of 5 Pin
#realJSOP23-Feb-15 5:57
mve#realJSOP23-Feb-15 5:57 
GeneralGreat article Pin
skandland23-Feb-15 3:27
skandland23-Feb-15 3:27 
QuestionWhere is the code ? Pin
SAM LIVE22-Feb-15 21:43
SAM LIVE22-Feb-15 21:43 
AnswerRe: Where is the code ? Pin
#realJSOP23-Feb-15 2:37
mve#realJSOP23-Feb-15 2:37 
AnswerRe: Where is the code ? Pin
Pete O'Hanlon23-Feb-15 3:07
subeditorPete O'Hanlon23-Feb-15 3:07 
GeneralMy vote of 5 Pin
Omar Gameel Salem21-Feb-15 18:52
professionalOmar Gameel Salem21-Feb-15 18:52 
QuestionVery nice article Pin
Sacha Barber19-Feb-15 21:32
Sacha Barber19-Feb-15 21:32 
high 5 from me man
QuestionFormatting Pin
Kenneth Haugland12-Feb-15 23:53
mvaKenneth Haugland12-Feb-15 23:53 
AnswerRe: Formatting Pin
#realJSOP13-Feb-15 0:09
mve#realJSOP13-Feb-15 0:09 
GeneralRe: Formatting Pin
Kenneth Haugland13-Feb-15 0:20
mvaKenneth Haugland13-Feb-15 0:20 
GeneralRe: Formatting Pin
#realJSOP13-Feb-15 0:34
mve#realJSOP13-Feb-15 0:34 
GeneralRe: Formatting Pin
Kenneth Haugland13-Feb-15 1:53
mvaKenneth Haugland13-Feb-15 1:53 
QuestionWow Pin
Amarnath S12-Feb-15 16:27
professionalAmarnath S12-Feb-15 16:27 
AnswerRe: Wow Pin
Garth J Lancaster12-Feb-15 16:42
professionalGarth J Lancaster12-Feb-15 16:42 
GeneralRe: Wow Pin
Amarnath S12-Feb-15 17:00
professionalAmarnath S12-Feb-15 17:00 
AnswerRe: Wow Pin
PIEBALDconsult12-Feb-15 19:22
mvePIEBALDconsult12-Feb-15 19:22 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.