Mr. Crossworder - Create Crosswords in Seconds!

Mehedi Shams

5.00/5 (14 votes)

Dec 14, 2018

CPOL

30 min read

31246

2268

Crossword creator - with a touch of Unicode Logic!

Version 2.0

Version 1.0

Introduction

This is a crossword puzzle creator coded in C#.NET with .NET Framework 4.5.2. It is also extended to support crossword creation with Unicode letters. Different human languages use different Unicode codepage, hence coding would be different for different Unicode languages. However, this project gives an idea of how to extend the logic to accommodate different human languages.

Background

Necessity is the mother of invention. While I was trying to download crossword puzzles for my son, I just hit the idea of why not code for it? I already have a similar design in my other project, I can re-use it to suit a little different requirement. That’s how it started.

How It Works

At the very beginning, it loads the regular (English) words and clues automatically.
If the user is not satisfied with the words assembly, then s/he can click on 'Reshuffle Board' menu item. This can be done as many times as needed. However, logically a better assembly should be determined by the count of successful words placement which is displayed in the bottom right status label (e.g., 6 failed case(s), 6 isolated case(s); remaining 38 words will be on the crossword).
The user can select a word in the listview. The corresponding word will be highlighted in the grid.
If the user is not happy with a word and wants to pick another random word from the dictionary, then s/he needs to select the word in the listview and press ENTER.
If the user wants to modify a word and meaning (clue), then s/he needs to double-click on the word. A small dialog will appear that will facilitate changing the word.
After the user is satisfied, s/he clicks on the menu item 'Create Crossword'. The actual crossword board will be displayed.
Click on File->Save Crossword on this board. The board (bmp image), clues (text file) and answers (text file) will be saved in the 'Crosswords' folder of the current executable path. These files will be suffixed with the current date-time stamp.
If the user wants to create Bangla unicode crosswords, then s/he clicks on the 'Load Bangla Unicode' menu item of the main board.
If the JSON dictionary is somehow tampered with and not in correct format, then it displays an error msg.
The necessary configurations are in the app.config file.

Logic

The logic is to use a JSON dictionary with key-value pair as word-clue pairs. For example, if following is a JSON entry, then the idea is to use the meaning as the clue and the word as the crossword.

{

“BUS”: “A public transportation used to carry people from place to place”

}

The word “BUS” will be placed on the grid either ACROSS or DOWN, and the meaning would be the clue to find the crossword. After all the words are placed on the board and the user is satisfied with the assembly, then s/he proceeds with the crossword generation.

For word generation, an open-source JSON dictionary is obtained from here. To reduce bandwidth, a small portion of the dictionary is added to the project (about 600 words). It is advised to download the whole dictionary and use it; effects will be the same however, with more words at hand.

High Level Logic

Randomly select (X, Y) axis and direction
Try to place the word on the board
1. If there are not enough sparse words on the board, then find an isolated axis on the board and place it there.
2. Or if there are enough sparse words on the board, then make sure the current word crosses with existing word(s) on the board. During this phase, if the attempts for placement reaches a maximum count, then abort the word and proceed with the next word.

The explanation for (2a) is, the first few words are placed as disjoint words. This is to make sure that words are scattered over all through the board.

The explanation for (2b) is, all the rest of the words should cross other existing word(s) on the board. There might be an unfortunate situation when a word doesn’t find a suitable place after a lot of attempts. In such cases, the word is marked as a fail after the threshold is reached.

Improved High Level Logic

Rather than randomly selecting the starting (X, Y) of a word, a second logic is applied which is more efficient. The second logic checks for each letter of the word if there is another word on the board that contains the letter.

For example, if (CART) is to be placed, then it checks for any existing word on the board that contains ‘C’ or ‘A’ or ‘R’ or ‘T’. E.g., there might be such words like CAR, ATTEST, ASTEROID on the board.

Pseudocode for this logic is the following:

For each letter in the current word: (E.g.: ‘C’ in CAT)

Take the letter and look for words on the board that contains that letter. (E.g.: COW, ARC, SCATTER).
Check if the letter can be placed there:
However, the second logic is applied to the Unicode section only. It is left as an exercise to the reader to apply it on the regular English alphabets.If a placement is possible, then place the word (CAT) there and proceed on with the next word.
If a placement is not possible (failed to cross with any existing word on the board, then start with the next letter (e.g., ‘A’ in CAT) and try to find similarly words on the board that contains ‘A’ (e.g.: CAR, ASTEROID, PASCAL, etc.); loop from (2a).

Legitimate Placement:

The logic for a valid placement is as follows:

First check if the word (e.g., CART) can be placed on the board – if it (CAT) crosses through another word (e.g., HAT), then the letter at the crossing (e.g., ‘A’) is the same that is on the board.
If a word is to be placed ACROSS, then:
1. Under no circumstances, the word can have any other letter before and after it. E.g.: if CART is to be placed ACROSS, then the cell before and after should be blank; as TRAIN and STOP are already on the board, so CART cannot be placed here.
2. If there is any letter on any cell above the row of the word, then that word (which is already on the board) cannot stop at the row before, but can only cross through the word. For example, if CART is the current word, then it cannot be placed below HAT, but can be placed along MART, ACTOR, TRIM, ALONG.
3. Similarly if there is any letter on any cell below the row of the word, then that word (which is already on the board) should stop at the row before, but should only cross through the word. For example, if CAT is the current word, then it cannot be placed above HAT, but can be placed along MART, ACTOR, TRIM, ALONG.
If a word is to be placed DOWN, then:
1. Under no circumstances, the word can have any other letter above and below it. E.g.: if CAT is to be placed DOWN, then the cell to the top and bottom should be blank.
2. If there is any letter on any cell to the left of the word, then that word (which is already on the board) cannot stop at the column before, but can only cross through the word. For example, if CAT is the current word, then it cannot be placed below HAT, but can be placed along MANGO, ARC, STAY, THREAD.
3. Similarly, if there is any letter on any cell to the right of the word, then that word (which is already on the board) cannot start at the column to the right, but should only cross through the word. For example, if CAT is the current word, then it cannot be placed before HAT, but can be placed along MANGO, TRAIN, SCOOP, STAY, THREAD, SCANT.

Project Structure

The project has two main forms, one auxiliary form, 6 class files. The purpose of individual elements are:

Form – MainBoard: This is the main form. Its activities are:
1. Load JSON dictionary into a collection (e.g.: about 86,000 words).
2. Randomly load a certain amount of words and meanings (e.g.: 50).
3. Populate the listview so the user can see the words and meanings.
4. Call the GameEngine class to utilize the placement logic and populate the word matrix.
5. Draw grids (horizontal, vertical lines).
6. Map the matrix to individual cells.
7. Update legends (status labels).
8. Update the listbox with different colours to represent failed words, isolated words, and words with lengthy clues.
9. Interact with different menu selections:
  1. Load English Words – load English dictionary of words
  2. Load Bangla Unicode – load Bangla Unicode dictionary of words
  3. Reshuffle board – try a different assembly of the words
  4. Create Crossword – display the ‘FinalCrosswordBoard’
  5. About – display the ‘About’ box
10. Enable the user to highlight the word on the board if a word is selected on the board.
11. Enable the user to change an individual word by selecting it on the listview and pressing ENTER.
12. Enable the user to tweak (change) an individual word by double-clicking on it. This displays the ‘EditWord’ form.
Form – EditWord: allows the user to change a word and meaning (clue).
Form – FinalCrosswordBoard: This is the crossword form. Its activities are:
1. Arrange the clues in the ACROSS and DOWN textboxes. Apply logic for proper numbering.
2. Draw grids (horizontal, vertical lines).
3. Fill-in blank cells (cells in matrix with NULLs) with grey colour.
4. Place indices accordingly in individual white boxes where the word would appear.
5. Interact with different menu selections: Save the crossword.
Interface – IDetails, ICompositeUnicode: The interfaces containing the basic signature of the word details info – word, meaning, axes, direction, failing flag, overlapping flag, isolation flag, output sequence. The 'ICompositeUnicode' has one extra list to hold the composite unicode characters.
Class – DetailsAndAxes: contains two classes (structural bodies) – one for regular words, the other for Unicode. The Unicode one has an extra element ‘CompositeUnicodeLetters’ for individual composite elements.
Class – Globals: for global and static variables.
Class – BanglaUnicodeParser: for parsing Bangla Unicode characters. Input: Whole word (e.g.: ভণ্ডুল), output list of strings (e.g.: individualLetters[0] = ভ, individualLetters[1] = ণ্ডু, individualLetters[2] = ল).
Class – GameEngine: The class with placement logic:
1. Method – PlaceWordsOnTheBoard(): loops through all the words in the list and tries to find a placement for them on the board.
  1. GetRandomAxis() – generate random axes for the word.
  2. PlaceTheWord() – try to place the word on the board. Follow the high-level logic specified in 'high level logic' section.
    1. If it is a right-directed (ACROSS) word:
      1. See if there is no mismatching overlap on the board.
      2. See if the left cell is free.
      3. See if the right cell is free.
      4. See if the top cells along all the letters of this word are free; if not, see if this is a legitimate crossing.
      5. See if the bottom cells along all the letters of this word are free; if not, see if this is a legitimate crossing.
      6. If all these are passed, then this is a valid axis for the word; place it there.
    2. If it is a down-directed (DOWN) word:
      1. See if there is no mismatching overlap on the board.
      2. See if the top cell is free.
      3. See if the bottom cell is free.
      4. See if the left cells along all the letters of this word are free; if not, see if this is a legitimate crossing.
      5. See if the right cells along all the letters of this word are free; if not, see if this is a legitimate crossing.
      6. If all these are passed, then this is a valid axis for the word; place it there.
Class – BanglaUnicodeGameEngine: Like the previous class. However, instead of random initial axes generation, it offers a better logic. Please refer to 'improved high level logic' section for a high-level logic overview. The only addition is, since each cell represents a compound Unicode letter, so how do you accommodate a compound letter for a cell? You guessed right! Add a third dimension to the 2D matrix where the third dimension takes care of individual compound Unicode letters.

After the words are placed, they would look something like the following:

Touch of Unicode

Each language in the world has its own Unicode page. In this project, Bangla Unicode is applied. This section sheds some light on how to extend the logic to other Unicode languages.

Apart from regular English alphabets, Unicode is used to represent other languages. However, coding in Unicode is a little different as the alphabets are usually represented by a combination of different codes. For example, the word ‘ভণ্ডুল’ is represented as:

Each alphabet is represented as a different code, and a Unicode alphabet can be represented as a single code (e.g.: 2477 for 'ভ'), or a combination of codes (e.g.: ণ্ডু = 2467 'ণ' + 2509 '্' + 2465 'ড').

Following is a simple example of how to output the word (ভণ্ডুল). This shows a message box displaying the word (ভণ্ডুল).

MessageBox.Show(((char)2477).ToString() +
                ((char)2467).ToString() +
                ((char)2509).ToString() +
                ((char)2465).ToString() +
                ((char)2497).ToString() +
                ((char)2482).ToString());

For regular English words, a letter is there by itself, so wherever there is a need to work with individual alphabets, the letters can be used as such. However, for Unicode letters, a list of strings is needed where each string in the list represent a composite Unicode letter.

public List<string> CompositeUnicodeLetters { get; set; }

In other words, the word (ভণ্ডুল) needs to be segregated into three individual composite letters and put in the list. So, the list would look like:

CompositeUnicodeLetters[0] = ‘ভ’
CompositeUnicodeLetters[1] = ‘ণ্ডু’
CompositeUnicodeLetters[2] = ‘ল’

This is needed wherever there is a need to walk to the length of the word. To compare, following is a snippet that walks to the length of the word to find if it not isolated.

if (wrd.Y > 0)
    for (int x = wrd.X, y = wrd.Y - 1, i = 0; i < wrd.Word.Length; x++, i++)
        if (matrix[x, y] != '\0')
        {
            wrd.Isolated = false;
            return;
        }

This word.length cannot be used as such for Unicode. As for example, the word length for the word (ভণ্ডুল) would be 6 as it comprises of 6 Unicode numbers.

That is why the split is necessary that segregates the word into distinct values, so the list correctly walks along the length as follows:

if (wrd.Y > 0)
    for (int x = wrd.X, y = wrd.Y - 1, i = 0; i < wrd.CompositeUnicodeLetters.Count; x++, i++)
        if (matrix[x, y, 0] != '\0')
        {
            wrd.Isolated = false;
            return;
        }

Now the problem is, individual compound letters are needed for the crossword where each compound letter can be put in a cell. When a Unicode language is read, it can be read as is and parsed as such. However, problem lies in separating the individual compound letters as there is no delimiter between each successive letter. As a comparison, in English each letter is of its own and no delimiter is needed. E.g.: Each alphabet in CAT is of its own and no delimiter is needed; each letter can be placed on individual cells on the board.

To do the same for Bangla or other Unicode languages, a logic is needed to parse individual compound letters. The parsing logic is obviously different for different Unicode languages. Further, the delimiter is not length-specific. For example, the letter (ন্দ্রি) in the word (চন্দ্রিমা) alone requires six individual Unicode codes to make the compound letter (ন্দ্রি).

So, there is no hard and fast rule of how to parse the individual compound Unicode letters. A logic is developed for parsing individual Bangla Unicode letters which is available in the file ‘BanglaUnicodeParser.cs’ of the project. As mentioned, the segregation logic is different for different Unicode languages. It requires language-specific expertise as well. Hence, different Unicode languages need to develop their own parsers as the language semantics and structure are completely different from each other. The Bangla Unicode crossword would look something like the following:

Program Flow

Reading from File

NewtonSoft.Json is used to parse the JSON file and put the words in a collection:

using (StreamReader reader = new StreamReader(fileName))
    jsonWords = reader.ReadToEnd();
JObject obj = (JObject)JsonConvert.DeserializeObject(jsonWords);
wordsAndMeaning = obj.ToObject<Dictionary<string, string>>();

Take a Snapshot in the Collection

After that, a snapshot of some words is put in a list. This is the list of words that will be put in the crossword. The words are trimmed off any space and hyphen. Also, no duplicates are allowed.

Populate the Listview with the Words in the Snapshot

After obtaining a snapshot, the words are put in the list for the user to have a look at them. Column widths are maintained dynamically by a scale factor and the maximum word-length in the list view. User can change a word and meaning by double-clicking on a word. Also, if the user wants to pick a new word instead of a word on the list, all s/he needs is to press ENTER, and another word is randomly selected from the collection.

Start the Game Engine

Now it is time for the crucial logic to find proper placement of the words on the board. The logic is described in 'Logic' section of this article.

After the engine successfully runs, it exposes two public variables to be used by other forms:

wordDetails: The list of word details that contain information of a word – the axes, direction, word, meaning, direction, isolation flag, failure flag, and the sequence (that will be populated later in the crossword board).
matrix: The character matrix that represents letters on the board. In programming linguistics, this is a 2D char array.

Isolation of words is checked at the end of the engine’s primary activity. The word CROSSWORD means, every WORD CROSSes with each other. This project doesn’t conform to the orthodox view that all the words should be connected. That is left as an exercise to the reader. This project can have groups of isolated words. However, it doesn’t allow a word to be totally disjoint and standing on its own. Such words are flagged as isolated and will be removed from the final crossword board.

Place the Words on the Board

After returning from the game engine, the main board starts painting the characters from the matrix to the game board. Now the user can select a word on the list and the main board will indicate where the word is on the board.

At this point, the legends are updated with respective statuses. There are three status labels – one for failed words, one for isolated words, and one for long-meaning words. They are updated accordingly.

Generating the Crossword

After the user is satisfied with the assembly, s/he opts for creating the crossword. The current word list, the letter matrix, and the word details are sent to the constructor of the form.

Maintaining correct sequence of words is a challenge here as the main board has a single list of words whereas now it is time to separate them into two groups – ACROSS and DOWN.

At the very beginning, the words that have the same starting axes are placed in both ACROSS and DOWN strings. A clone is taken of the original word details collection. After that, the words with same starting axes are placed in ACROSS and DOWN strings. When these words are done, then the rest of the words are placed in ACROSS and DOWN strings according to their direction. After all the words are taken care of, then the clone is copied back to the original collection. The textboxes are also populated with respective clues.

After the clues are parsed successfully, it is time to place the numbers on the board. The same line drawing functionality is used, only this time numbers are to be placed at the cells instead of the word. After the numbers are placed, the only thing left is to fill in the other cells with a block colour so the cells with the CROSSWORDs are more vigilant.

Finally, when the user selects File->Save, the crossword is saved in the root folder as an image. Along with the image, the answers and the clues are also written in separate text files. For simplicity, the user is not asked for any filename, but the application simply puts a date-time stamp to separate from subsequent CROSSWORDs in future.

A Glimpse of the Code

Interface: IDetails

This contains the basic signature of the details of the words – axes, direction, max attempts, fail flag and isolation flag.

The regular words class implements this interface. Basically, the regular words have exactly the same properties – no more or less.

Interface: ICompositeUnicode

This contains the basic signature for an extra field required for holding split composite Unicode characters. The Unicode words class implements this as well as the IDetails interface.

Reading from File

Words are read from file and parsed into a dictionary object as key-value pairs. This is done in the following code:

using (StreamReader reader = new StreamReader(fileName))
    jsonWords = reader.ReadToEnd();
JObject obj = (JObject)JsonConvert.DeserializeObject(jsonWords);
wordsAndMeaning = obj.ToObject<Dictionary<string, string>>();

Placement Logic

There can be two orientations for the words - ACROSS (Direction.Right) and DOWN (Direction.Down). First, it checks if the word can be placed on the board. For each letter of the word, it checks if the corresponding cell in the matrix (i.e., the corresponding cell in the board) is blank ('\0') or not. If it is not blank (not '\0'), then at least the current letter should be the same as the letter that is already staying on the board. This is done in the following code:

for (int i = 0, xx = x; i < word.Length; i++, xx++) // First we check if the word 
                                                    // can be placed in the array. 
                                                    // For this, it needs blanks there 
                                                    // or the same letter (of another word) 
                                                    // in the cell.
{
    if (xx >= Globals.gridCellCount) return false;  // Falling outside the grid. 
                                                    // Hence placement unavailable.
    if (matrix[xx, y] != '\0')
    {
        if (matrix[xx, y] != word[i])               // If there is an overlap, then we see if 
                                                    // the characters match. If matches, 
                                                    // then it can still go there.
        {
            placeAvailable = false;
            break;
        }
        else overlapped = true;
    }
}

Similar check is done for the DOWN words, only that for them we need to travel down (i.e., x remains constant, y changes).

For Unicode, we need one additional line in this logic. This because, for Unicode, there is no more a single letter in the cell, but there are a couple of Unicode letters that combine into a composite code (letter). Also, for Unicode, we have a 3D matrix. Hence the line:

if (matrix[xx, y] != '\0')

changes to:

if (matrix[xx, y, 0] != '\0')

And the same letter check for a non-blank cell changes from:

if (matrix[xx, y] != word[i])
{
    placeAvailable = false;
    break;
}

to:

string compositeUnicodeLetter = Globals.GetCompositeLetterFromTheMatrix(xx, y, matrix);
if (compositeUnicodeLetter != unicodeLetters[i])
{
    placeAvailable = false;
    break;
}

After the initial blank cell check and same letter check is satisfied, then the 'overlapped' flag is used along with the maximum non-overlapping word count threshold to determine if the word should be alone, or it should overlap. Just to remind, the first few words should not overlap to make the words spread sparsely across the board, whereas the rest of the words must overlap with existing word(s) on the board. These are checked in the following part:

if (currentWordCount < Globals.MAX_NON_OVERLAPPING_WORDS_THRESHOLD && overlapped)
    return false;

else if (currentWordCount >= Globals.MAX_NON_OVERLAPPING_WORDS_THRESHOLD && !overlapped)
    return false;

After these conditions are satisfied, now it is time to check if the word is really placeable on the current axes in the given direction.

This part discusses the logic for ACROSS words, named leftFree, topFree, bottomFree, rightMostFree.

There are two types of checks - one is, if there cannot be any letter at the beginning and ending of an ACROSS word. The leftFree and rightMostFree flags confirm this through the methods they call. For example, the leftFree flag is determined by the method 'LeftCellFreeForRightDirectedWord' which has the following code:

if (x == 0) return true;
if (x - 1 >= 0)
    return matrix[x - 1, y] == '\0';
return false;

Here, (x, y) are the axes where the word is to be placed ACROSS. Now if it is the leftmost column (x = 0), then there is no need to check if the left cell is blank or not, as there is no left cell. Otherwise, it checks if the left cell of x is blank or not.

Similarly, the check for the freeness of the rightmost cell of this ACROSS word is determined by the following code in the method 'RightMostCellFreeForRightDirectedWord':

if (x + word.Length == Globals.gridCellCount) return true;
if (x + word.Length < Globals.gridCellCount)
    return matrix[x + word.Length, y] == '\0';
return false;

First, it checks if the last letter of the word reaches the rightmost column of the matrix. If it reaches the right-most cell, then there is no need to further check the rightmost letter, as there is no cell further right. Otherwise, it checks if the next rightmost cell of the word is blank or not.

For an ACROSS word, the check for top and bottom cell freeness is much more complex. Let us see what is happening at the 'TopCellFreeForRightDirectedWord' method.

if (y == 0) return true;
bool isValid = true;
if (y - 1 >= 0)
{
    for (int i = 0; i < word.Length; x++, i++)
    {
        if (matrix[x, y - 1] != '\0')
            isValid = LegitimateOverlapOfAnExistingWord(x, y, word, Direction.Up);
        if (!isValid) break;
    }
}
return isValid;

First, it checks if the word is to be placed ACROSS on the topmost cell of the matrix (y = 0). If that is the case, then there is no further top cell to check. Otherwise, for each letter of the word check, if the top cell is blank or not (matrix[x, y - 1] != '\0'). If it is not blank, then check if the letter above is part of another word that must satisfy three conditions:

The letter belongs to an existing word on the board.
That other word on the board is not also ACROSS.
That letter above is not the last letter of the existing word on the board.

Now let's examine the Up case of the 'LegitimateOverlapOfAnExistingWord' method:

while (--y >= 0)
    if (matrix[x, y] == '\0') break; // First walk upwards until you reach 
                                     //the beginning of the word that is already on the board.
++y;

for (int i = 0; y < Globals.gridCellCount && 
     i < Globals.MAX_WORD_LENGTH; y++, i++) // Now walk downwards until you reach the end 
                                            // of the word that is already on the board.
{
    if (matrix[x, y] == '\0') break;
    chars[i] = matrix[x, y];
}

str = new string(chars);
str = str.Trim('\0');
wordOnBoard = (RegularWordDetails)wordDetails.Find
              (a => a.Word == str);     // See if the characters form a valid word 
                                        //that is already on the board.
if (wordOnBoard == null) return false;  // If this is not a word on the board, 
                                        // then this must be some random characters, 
                                        // hence not a legitimate word, 
                                        // hence this is a wrong placement.
if (wordOnBoard.WordDirection == Direction.Right) return false;  // If the word on the board 
                                        // is in parallel to the word on to be placed, 
                                        // then also this is a wrong placement as 
                                        // two words cannot be placed side by side 
                                        // in the same direction.
if (wordOnBoard.Y + wordOnBoard.Word.Length == originalY) return false; // The word on the 
                                        // board starts right below the y-coordinate 
                                        // for the current word to place. Hence illegitimate.
return true;                            // Else, passed all validation checks for a 
                                        // legitimate overlap, hence return true.

The first WHILE loop travels upwards to find the beginning of the existing word on the board.

The FOR loop then traverses downwards from that starting point and coins a word in chars.

Then a string str is formulated from the chars array. It also truncates blanks ('\0').

Then it checks if the word is a legitimate existing word on the board (number 1 in the above-mentioned 3 conditions). If not, it returns false.

It checks if the word is also an ACROSS word or not. If it is ACROSS, then also the current word cannot be placed there (number 2 in the above-mentioned 3 conditions).

It checks if the existing word on the board ends just above the top cell of the current placement index y (number 3 in the above-mentioned 3 conditions).

If all the three conditions are satisfied, then this is a legitimate crossing overlap of the current word with an existing word.

Similar check is done to make sure if there are letters at the bottom cells of the ACROSS word, then together they formulate a valid crossing. This is accomplished in the 'BottomCellFreeForRightDirectedWord' method.

After the four flags are satisfied, this would mean the current word is good to be placed in the given axes (x, y) in the given direction. So it is placed in the word matrix, and also details are saved in the 'RegularWordDetails' object via the method 'SaveWordDetailsInCollection'. This is done in the following portion of the 'PlaceTheWord' method in the 'GameEngine' class.

for (int i = 0, j = x; i < word.Length; i++, j++)
    matrix[j, y] = word[i];
SaveWordDetailsInCollection(word, wordMeaning, x, y, direction, attempts, false);

Remember, for unicode, we have one more dimension in the character matrix. For regular words, we have a single letter to place in the matrix, whereas for unicode, we need to place the composite letter (that comprises of a couple of unicodes). This is done in the following portion of the 'PlaceTheWord' method in the 'BanglaUnicodeGameEngine' class.

SaveWordDetailsInCollection(word, wordMeaning, x, y, direction, attempts, false);
for (int i = 0; i < unicodeLetters.Count; i++, x++)
{
    char[] atomElements = unicodeLetters[i].ToArray();
    int z = 0;
    foreach (char c in atomElements)
        matrix[x, y, z++] = c;
}

Similar logic follows for the DOWN words, so this is not discussed to reduce the length of the article.

Marking Isolated Words

As a minimal requirement, no word should be isolated in the matrix as every word should CROSS with at least another WORD. So at the end of placement, another check is done to flag the Isolated flag of the 'RegularWordDetails' object. This is done in the 'CheckIfTheWordIsIsolatedAndFlagAccordingly' method. For an ACROSS word, it simply walks along the top and bottom cells of the word; if there is at least a letter in any top/bottom cell along the word, then the flag is false (as it would mean the word is not isolated).

The blank check for TOP cells is done in the following portion. First it checks if the Y axis of the current word is not the first row (if it is the first row, then there is no point checking the row above as there is no row above). Then, it walks along the word from left to right (incrementing x), and checks for each top cell if it is blank or not. If at any point it finds a letter in the top cell, then it sets the flag to false and returns immediately.

if (wrd.Y > 0)                                    // If there is a row of cells 
                                                  // to the top of the right-directed word.
    for (int x = wrd.X, y = wrd.Y - 1, 
         i = 0; i < wrd.Word.Length; x++, i++)    // Walk rightwards along the top row 
                                                  // of the word.
        if (matrix[x, y] != '\0')                 // And see if there is any character 
                                                  // to any cell of that row.
        {                                         // Which would mean another word 
                                                  // passed through; 
                                                  // hence this is not isolated.
            wrd.Isolated = false;
            return;
        }

Similarly, the blank check for BOTTOM cells is done in the following portion. First, it checks if the Y axis of the current word is not the last row (if it is the last row, then there is no point checking the row above as there is no row above). Then it walks along the word from left to right (incrementing x), and checks for each bottom cell if it is blank or not. If at any point it finds a letter in the bottom cell, then it sets the flag to false and returns immediately.

if (wrd.Y < Globals.gridCellCount - 1)            // If there is a row of cells to 
                                                  // the bottom of the right-directed word.
    for (int x = wrd.X, y = wrd.Y + 1, 
         i = 0; i < wrd.Word.Length; x++, i++)    // Walk rightwards along the bottom row 
                                                  // of the word.
        if (matrix[x, y] != '\0')                 // And see if there is any character 
                                                  // to any cell of that row.
        {                                         // Which would mean another word 
                                                  // passed through; 
                                                  // hence this is not isolated.
            wrd.Isolated = false;
            return;
        }

If both the sweeps are done and the code didn't return from them, this would mean there was no letter in the top and bottom cells of the word. So this is definitely an isolated word. So it is flagged accordingly in the 'RegularWordDetails' object and the word is erased (set to '\0') in the word matrix to resist rendering them (not to display them). This is done in the following portion:

if (!wrd.FailedMaxAttempts)
    wrd.Isolated = true;

if (wrd.WordDirection == Direction.Right)
    for (int i = 0, x = wrd.X, y = wrd.Y; i < wrd.Word.Length && 
                                          i < Globals.gridCellCount; i++, x++)
        matrix[x, y] = '\0';

For unicode, the logic is same. But there is one more thing to keep in mind. What's that? You guessed right - there is a third dimension to consider. This part is not discussed to reduce the article length and should be easily perceivable by the reader.

Some LINQs

LINQ is used extensively in the project – to search key-value in a dictionary collection or finding an element in a list. Following is a LINQ query for obtaining a list of words which have the same starting axes:

var wordsStartingAtSameAxes = from j in detailsCopy
                              group j by new { j.X, j.Y } into d
                              where d.Count() > 1
                              select (d).ToList();

LINQ is also used to clone an existing list:

detailsCopy = new List<IDetails>(wordDetails.Select(x => x).ToList());

Automatic Window Scaling and Resizing

Automatic window resizing can be accomplished either in the load event or the resize event. Both the events are utilized in different forms to justify that, either of them can be used.

Automatic window scaling is applied which makes it resolution-independent. The design-time resolution was 1680x1050. However, the higher the resolution, the better is the quality of print. The trick for automatic window scaling is beyond the scope of this article, please refer to here.

Checking Mix of Regular and Unicode

Version 2.0 offers the provision to enter and save own words. However, it obviously doesn't make sense to mix regular and unicode words. Normally, the user won't do that, but still it makes sure that the user didn't do it. This is checked in the 'GetEncoding' method of 'CreateAndSaveOwnWords' class.

First, it segregates each code of the word - whether it is regular or unicode. For regular letters, the code must be between 65 and 255 inclusive. Hence, if the first code is regular, then all the other codes in the other letters (as well as for all words) should be regular. Similarly, if the first code is Bangla Unicode (between 0x0980 and 0x09fe inclusive), then all the subsequent codes of the other letters (as well as for all words) should lie in that range. It might be noted that for other Unicode words, the range will be different and coders need to change it according to the respective Unicode pages.

WordTypes type = WordTypes.Unknown;
WordTypes prevType = WordTypes.Unknown;
foreach (KeyValuePair<string, string> kvp in wordAndClue)
{
    char[] ch = kvp.Key.ToCharArray();
    if (ch[0] >= 65 && ch[0] <= 255)
        prevType = WordTypes.Regular;
    else if (ch[0] >= 0x0980 && ch[0] <= 0x09fe)  // Refer to Bangla Unicode chart: 
        // http://www.unicode.org/charts/PDF/U0980.pdf, modify the code range for 
        // other unicode letters.
        prevType = WordTypes.Unicode;

    for (int i = 1; i < ch.Length; i++)
    {
        if (ch[i] >= 65 && ch[i] <= 255)
            type = WordTypes.Regular;
        else if (ch[i] >= 0x0980 && ch[i] <= 0x09fe)    // Refer to Bangla Unicode chart: 
                                          // http://www.unicode.org/charts/PDF/U0980.pdf, 
                                          // modify the code range for other unicode letters.
            prevType = WordTypes.Unicode;

        if (type != prevType) return WordTypes.Mix;
        prevType = type;
    }
}
return type;

Points of Interest

If we contemplate on the work flow, following are the sequences:

The code loads a JSON word dictionary with around 86,000 words
Parses them in a collection
Picks random words from them
Places them in the matrix
Some of the words fail to find a place after 200,000 attempts; they are flagged as fails
Another sweep is performed to flag isolated words
Finally, the graphics renderer renders the matrix on the display

All these activities are accomplished in the twinkling of an eye. Thanks to the processors, compilers and after all, technology.

As obvious, the Unicode logic takes a little more time than the regular words, as the Unicode logic deals with one more dimension.

Glitches

Please put in comments if any found.

Limitations

There are some strict crossword rules like all the words on the board should be connected to each other; there should not be any group of words in isolation. Mr. Crossworder doesn’t conform to this rule, hence there might be isolated groups of words on the board.

Disclaimer

I am not a sexist, ladies should not loathe me for the title, LOL. It is just that I was listening to Steve Perry’s (Journey) ‘Trial by Fire’ and hit up the line:

“Hello Mr. Moon,
Can I have some time with you?”

Just to mimic:

“Hello Mr. Crossworder,
Can I have some time with you?”

Future Works

A software is never at its peak; there is always a chance to improve. Further, this is just a prototype. A lot of things can be done.

The logic itself can be revised and optimized. In fact, teachers in universities can place it as an optimization problem to the students. There are scattered groups at the moment and a better algorithm might bring them closer. Especially for Unicode languages, the words are observed to be a little more sparsed than expected.
The application can be extended as a web app to consume an online web dictionary. There are some online web dictionaries that expose the words and meanings through APIs.
There can be a separate GUI so that the user can create his/her own preset of words and save it on the disk. The GUI should also facilitate loading those presets. (This is accomplished in the second release.)
For Bangla Unicode, the indices of the clues, and the numbers on the board are still in English; I would leave that to the user as a practice to output them in Bangla.
This is not coded as per supreme design concepts. I focused more on the logic and get it going as an initial prototype. A lot of coding standards and best practices are out there which can be and should be implemented.
The project is coded in a denormalized form – there is more code that can be compacted. The purpose of such denormalization is to understand what is going on. After the purpose is served, codebase can be further compacted. For example, checking the freeness of the left or right cells of a DOWN word are mostly similar and can be further compacted into one method with minor tweaks and parameters. But such compaction would devoid the reader of the understanding of the purpose. So, it is left like that and the compaction is left as an exercise to them.
It might sound too optimistic, but how about applying machine learning or AI algorithms to be more effective?
The project worked up to 3^rd dimension. How about adding a 4^th dimension? (never mind, joking!)

Summary

This is a crossword creator based on a pre-defined set of dictionary words. It also experiments on a different human language (Bangla) where the language has its own Unicode. Different languages have their own Unicode pages, and each language differs from the other with regards to semantics and structure. However, this project gives an idea of how to extend the segregation logic to different human languages.

References

History

14^th December, 2018: First release
7^th January, 2019: Second release
- Added menu for creating own word-clues, and loading previously saved word-clues JSON file.
- There was a bug when the final crossword board was being created as it removed the isolated and failed words from the list. This was accomplished by taking a clone of the list. The change is in the method 'createCrosswordToolStripMenuItem_Click()' of the MainBoard.cs file.
- Added 'How It Works' section in the article
- 'A Glimpse of the Code' section comes with more explanations of the code
- Added more references