Click here to Skip to main content
Click here to Skip to main content

Jumble Solving Utility

By , 9 Nov 2005
 

Introduction

This program finds all the words in the dictionary that can be created from a jumbled word.

Background

I've never been excellent at solving Jumbles, mainly because my vocabulary doesn't include some of the odd-ball words they like to use. A Jumble is a word problem that can be found in newspapers and on some websites like jumble.com. A basic Jumble problem is a bunch of letters that form a word when re-arranged. The solution to the problem is an anagram of the original jumbled word, Ex. lbujme -> jumble. This program will take the bunch of letters and give a list of any words (found in the dictionary) that can be formed from those letters.

Using the code

To use the code in your own application just include trie.cpp and trie.h to your project source, and #include "trie.h" when you want to use the class.

A very simple use of the class CTrie:

//Create the List
CTrie* List = new CTrie();

//Add a word to the List
List->AddWord("jumble");

//Get a vector of words from the List
vector<CHAR*>* all = List->GiveWords("lbujme");

//Output the results
if (all && !all->empty())
{
    vector<CHAR*>::const_iterator vai;
    int cnt = 0;
    for (vai = all->begin(); vai != all->end(); vai++, cnt++)
        cout << cnt + 1 << " " << *vai << endl;
}
else
    cout << "none" << endl;

Algorithm Outline

My algorithm uses the basis that the same number of any letter must occur in a word as the given jumbled word for the word to be formed from the jumbled word. In other words lbujme has: 1 B, 1 E, 1 J, 1 M, 1 L, 1 U, and jumble has: 1 B, 1 E, 1 J, 1 M, 1 L, 1 U. These are the same therefore you can make jumble out of lbujme. The data structure of my algorithm uses a binary tree. The tree starts off with the root node which contains a list of all the words with zero occurrences of the letter 'a'. The root node then can have left and right pointers. The left pointer points to the node with zero occurrences of the letter 'a' and zero occurrences of the letter 'b'. The right pointer points to the node with one occurrence of the letter 'a'. This can be expanded generally to say that going left in the tree means you move to the next letter. To go right means you add an occurrence of the current letter.

History prior to CodeProject

The first time I wrote my Jumble solving program was about five years ago. It took about five minutes to find all the possible words for a bunch of six letters. Eight letters I estimated would take half a day. I never did find out how long it took. Also the original program was written in Turbo Pascal; I could say that was the reason for the slow algorithm but must blame the programmer.

The original algorithm was to arrange the bunch of letters in all there possible combinations (including repeats of letters) and check if it was a word. For a six letter word this gives us 6^6 = 46656 words to check. Not too bad but I was also scanning the dictionary file every single time to check if the word existed or not. This was the major bottleneck, so I read all the words into memory once, and scanned the list. This improved the search time considerable but was still incredibly slow for eight bunches of letters. Also the basis of my algorithm was a whole lot of if statements, making expanding to anything beyond eight letters tedious at best.

I eventually improved the algorithm to remove repeats of letters. This meant that instead of 6^6 = 46656 words to check I had to check 6! = 720 words. A vast improvement. Years passed...and I decided to code the Jumble program again, this time in C++ :). I did exactly the same algorithm and still came across a huge barrier when using words greater then size of eight. The problem sat in the back of my mind and a new algorithm popped into my head. I could count the number of times a letter occurred in the jumbled set of letters and count the number of times a letter occurred in a word and compare the two. This worked wonders and improved the algorithm drastically. But, I had no nice data structure for finding if a word belonged to a certain jumbled set.

University courses gave me trees, from which my jumble tree was born. Now the program takes a few seconds to load up and a few to shut down but searching is 26 pointers away (in the worst case). I've cleaned up the code, separated it into classes and posted it here. Obviously this program is pointless with this web site but has been an adventure thus far. I'm sure improvements can still be made, and look forward to feedback.

Possible Future Updates

Implement a Directed Acyclic Word Graph [DAWG] structure to save memory, and time stepping through structure.
Test other algorithms.

History

  • 8 November 2003 - v1.0
    First public release
  • 2 December 2003 - v1.1
    Changed class and file names
    Changed return type of GiveWords() to vector<CHAR*>*
    Added .NET Demo Project

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

greba
Web Developer
Canada Canada
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
Generala simpler algorithmmemberpeatrick29 Aug '07 - 11:13 
1
Set up a translation table. You only have to do this once.
For every word in the dictionary, list that word with all its letters in alphabetcal order:
For example, moat becomes amot, rain becomes ainr, needle becomes deeeln
Think of this as left column real word, right column alpha-ordered word
 
2
Then the utility just has to alpha-order the jumbled word, look it up in the right column of the translation table, and then display the word that is in the left column.
tmoa becomes amot, and when you look that up in the right column you see that moat is in the left column.
 
I use the word "column" because it's easy to visualize.
GeneralThanks greba :)memberrzs4 Mar '07 - 4:45 
Great work..hats off to u buddy Smile | :)
GeneralAnagrams generationmemberGirish Mahadevan2 Jan '07 - 19:12 
hello people ....
 
i'd want this jumble program..for my engineering mini project..it would be helpful if you can send it as soon as possible...Cry | :((
 
the program isnt running...i ment the jumble solver..its giving me as many as 25 errors..Sigh | :sigh:
 
if u have any code(which is running).Sniff | :^) ..a beginner level code..in c/c++..please send it to my mail....mgirish_87@yahoo.com
 

regards,
girish
 

 
seriously in need of it...plz help
GeneralRe: Anagrams generationmembergreba3 Jan '07 - 3:04 
I downloaded the demo project, and using the .net version of the project, I had to convert the project to the latest version of Visual Studio. I then had to remove the tree.h and tree.cpp files, and add-in the trie.h and trie.cpp files. It then built fine.
GeneralRe: Anagrams generation(greb)memberGirish Mahadevan3 Jan '07 - 18:16 
seriously thanks a lot for ur concern.......
 
but plz can u just give the changed program,if u can fuse everything in one module and send it to me..(hope that is possible)...mgirish_87@yahoo.com... so that i can take the print out of it as one source file and run the file and get the output......i m very new 2 programming...just that DA VINCI CODE..inspired me...i thought of learning this...
 
...so plz it would be nice if u do the needfull.......
 
thanks a lot again.....
 


QuestionPort to CSharp?memberDoncp22 Dec '06 - 8:08 
Very nice.
 
How about a port to C#?
 
Don
AnswerRe: Port to CSharp?memberDoncp24 Dec '06 - 12:32 
Never mind - it was easier than I thought.
 
Fun algorithm!
 
Don
GeneralSimple Addition for BIG ImprovementmemberHumble Programmer9 Nov '05 - 21:27 
I had code for something like this years ago, that has long since been lost to the "great bit bucket in the sky"--or more likely, the box of dead hard drives in my basement. But, I remember the basic concepts: improve your anagram-solver by using the frequency with which pairs of letters (digraphs) occur in a language. For example, in English the letter pair 'th' will occur way more often than 'kx'. So how do you take advantage of this?
 
Step #1: build a table of letter pair frequencies by reading in some text--I like to use ASCII text files of long-winded newspaper or magazine articles--and count how many times each possible letter pair occurs. Setting up a 256 x 256 matrix is easiest, but if you want to use UNICODE, be prepared for some headaches.
 
Step #2: for each permutation of the letters, i.e., each possible solution, total up the net frequency of each pair of letters. (Watch out for overflow if you read in LOTS of source text.) In theory (and pretty consistently in practice) the higher scoring permutations are more likely to be 'real' words.
 
One neat side effect of this technique is that if you feed your frequency table Italian, it will suggest 'Italian' solutions. You can also use 3, 4, or even 5 letter combinations, but this quickly increases the memory and processing requirements.

 
Cheers!
Humble Programmer
,,,^..^,,,
Generalfile versionmemberwrax29 Oct '05 - 1:49 
hey. i think you forgot to upload the new version of your program. the main files are still named "tree.*" and not "trie.*".
 
just an advice
GeneralRe: file versionmembergreba9 Nov '05 - 7:32 
This should be fixed, as I've sent off new zips to be put in the old zips place. Sorry about that, no idea how it happened.
 
Greba,
 
My lack of content on my home page should be entertaining.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 9 Nov 2005
Article Copyright 2003 by greba
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid