Click here to Skip to main content
13,352,157 members (78,335 online)
Click here to Skip to main content
Add your own
alternative version


10 bookmarked
Posted 31 Dec 2013

How to add spaces between words in a spaceless strings

, 2 Jan 2014
Rate this:
Please Sign up or sign in to vote.
A simple solution for processing spaceless strings


In this article I would like to provide an easy to use, simple solution to convert spaceless strings into meaningful sentences. My solution could be further optimized, nevertheless it works perfectly fine for relatively short strings.


To illustrate the problem, let's look at the following spaceless string:


If we iterate through a dictionary, the first word to be found would probably be "ate" or "at" so if we simply process our sample this way, the result would look something like this:

'I h ate attention" 

To exclude this obviously flawed solution we can prioritize results that use up all (or most of) the characters, like:

'I hate attention"

This is the 1. criteria.

Unfortunately there could be many results that meet this criteria:

"I hat eat tent ion"
"I hate at tent ion" 

It's easy to see that short (2-3 character long) words can often come up by chance. The most simple approach is to prioritize solutions with the longest words. This is the 2. criteria I used. As it turned out, these conditions provide the correct solution most of the times.

The next issue is to compute all the possible permutations.  For this I used recursion in my code.

Following are the steps in the recursive method:

  • Make a sorted dictionary of the possible substrings
  • Sort the dictionary descending by length (Implementing the 2. criteria)
  • If the sorted dictionary is empty then return the input string and mark it as nonsense
  • Iterate through all the entries in the sorted dictionary
    • For each entry divide the string into 3 parts:
      • left (before the entry),
      • entry,
      • right (after the entry)
    • Mark the entry as meaningful and add it to the possible resultlist
    • Call our recursive method on the left string
    • Call our recursive method on the right string
    • Combine all the results and check if it's better than the  combined results from the previous entries  (by comparing the numbers of the meaningful characters: implementing the 1. criteria). If it is, then we keep it as the best solution (so far).
  • Return the best solution.

This method works as intended, but it could be very slow for long strings, because the algorithm computes the same substrings many times.  To avert this problem I made an array to store all the processed substrings. The recursive method then starts by checking the corresponding entry in the generated substring list and if it isn't empty, returns its value. This optimization boosts the performance by magnitudes.

Using the code 

The implementation consists of a main class and a nested one. I needed a class that makes strings trackable by storing their relative index to the initial input string, plus a variable to mark if a string has sense. The nested IndexedString class implements these features.

public class WordSplitter
    string[] Dictionary;

    //To boost performance
    List<IndexedString>[,] Substringsbuffer;

    public WordSplitter(string[] Dictionary)
        this.Dictionary = Dictionary.OrderByDescending(x=>x.Length).ToArray();   

    public string SplitToWords(string Input)

        Substringsbuffer = new List<IndexedString>[Input.Length+1,Input.Length+1];
        IndexedString stringtosplit = new IndexedString(0, Input, false);

        var sortedlist = recursiveWordSearch(stringtosplit, Dictionary).OrderBy(x => x.Index);
        string result = "";
        foreach (var word in sortedlist)
            result = result + word.Text + " ";

        return result;

    // Private Methods
    private List<IndexedString> recursiveWordSearch(
        IndexedString Stringtosplit, 
        string[] Dictionary)
        // Checking the buffer
        int length = Stringtosplit.Text.Length;
        if (Substringsbuffer[Stringtosplit.Index, length + Stringtosplit.Index] != null)
            return Substringsbuffer[Stringtosplit.Index, length + Stringtosplit.Index];

        // Initializing result list
        List<IndexedString> result = new List<IndexedString>();

        // Narrowing the dictionary
        string[] newdictionary = Dictionary.Where(x => Stringtosplit.Text.Contains(x)).ToArray();
        // Trivial case
        if (newdictionary.Length < 1)
            return result;
        // Non trivial case
        foreach (string entry in newdictionary)
            List<IndexedString> temporarylist = new List<IndexedString>();

            // Deviding the string to 3 parts: the entry, left and right 
            IndexedString[] devidedBytheEntry = splitByEntry(Stringtosplit, entry);

            IndexedString left = devidedBytheEntry[0];
            IndexedString middle = devidedBytheEntry[1];
            IndexedString right = devidedBytheEntry[2];

            // Calling the method on the other two parts recursively
            temporarylist.AddRange(recursiveWordSearch(left, newdictionary));
            temporarylist.AddRange(recursiveWordSearch(right, newdictionary));

            // Comparing current score and temporary score
            var temporaryScore = temporarylist.Where(x => x.Word == true);
            var currentScore = result.Where(x => x.Word == true);

            if (temporaryScore.Select(
                x => x.Text.Length).Sum() > currentScore.Select(
                x => x.Text.Length).Sum())
                result = temporarylist;
        Substringsbuffer[Stringtosplit.Index, length + Stringtosplit.Index] = result;
        return result;

    private IndexedString[] splitByEntry(IndexedString Source, string Entry)

        int indexofentry = Source.Text.IndexOf(Entry);
        IndexedString[] result = new IndexedString[3];

        // Compute realitve indexes
        int leftindex = Source.Index;
        int entryindex = Source.Index + indexofentry;
        int rightindex = Source.Index + indexofentry + Entry.Length;

        // Generate substrings
        string leftstring = Source.Text.Substring(0, indexofentry);
        string entrystring = Entry;
        string rightstring = Source.Text.Substring(indexofentry + Entry.Length);

        // Generate results
        result[0] = new IndexedString(leftindex, leftstring, false);
        result[1] = new IndexedString(entryindex, entrystring, true);
        result[2] = new IndexedString(rightindex, rightstring, false);
        return result;

    private class IndexedString
        public int Index { get; set; }
        public string Text { get; set; }
        public bool Word { get; set; }

        public IndexedString(int Index, string Text, bool Word)
            this.Index = Index;
            this.Text = Text;
            this.Word = Word;

To convert strings, you need to make an instance of the WordSplitter class by passing a "dictionary" string  array, then call the public SplitToWords() method.

dictionary = new string[5] { "hat", "bob", "a", "green","has" };
WordSplitter WS = new WordSplitter(dictionary);
string test = WS.SplitToWords("Bobhasagreenhat".ToLower()); 


  • Update (1 Jan 2014):  I excluded the order by descending query from the recursive method, it needs to be called only once, so it is executed in the constructor now. I added more comments to clarify some functions.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Viktor Kovács
Hungary Hungary
I'm a hungarian software developer, mostly interested in .net.

You may also be interested in...


Comments and Discussions

QuestionExclude non-dictionary words Pin
Member 123741936-Mar-16 21:53
memberMember 123741936-Mar-16 21:53 
GeneralMy vote of 5 Pin
Maciej Los30-Mar-15 9:12
protectorMaciej Los30-Mar-15 9:12 
QuestionRuby Version or Pseudo-Code ? Pin
Member 1121464810-Nov-14 1:26
memberMember 1121464810-Nov-14 1:26 
Questionbig dictionary Pin
fredatcodeproject31-Dec-13 6:52
memberfredatcodeproject31-Dec-13 6:52 
Good idea
but for a real life example, you would need to pass a 50,000 word dictionary which is could be too big.
another idea would be to create a dictionary according to the string passed based on a big on a file dictionary but that would take some time.
AnswerRe: big dictionary Pin
szesan931-Dec-13 17:01
memberszesan931-Dec-13 17:01 
GeneralRe: big dictionary Pin
fredatcodeproject1-Jan-14 6:12
memberfredatcodeproject1-Jan-14 6:12 
GeneralMy vote of 5 Pin
ano6931-Dec-13 6:49
memberano6931-Dec-13 6:49 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.180111.1 | Last Updated 2 Jan 2014
Article Copyright 2013 by Viktor Kovács
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid