Click here to Skip to main content
11,635,656 members (79,995 online)
Click here to Skip to main content

Word Cloud (Tag Cloud) Generator Control for .NET Windows.Forms in C#

, 30 Jul 2011 CPOL 73.3K 7.5K 115
Rate this:
Please Sign up or sign in to vote.
Generate a word cloud form some input text. A word cloud is a randomly arranged set of words used in your text. The size and the color of each word expresses its usage frequency. Rarely used words are small and pale. The control is clickable and allows to identify a word under mouse.

Background

This control is inspired by the free web-based word cloud generator called Wordle. In fact, the control is a screw-out product of my project at http://sourcecodecloud.codeplex.com.

I really loved the visualizations produced by Wordle, but my goal was to write a non web based local solution to process large amounts of sensible data. There were a number of components I found on the web, but most of them had either very pure performance when processing text and the visualization or the layout was not what I expected.

Architecture and usage

There are four phases when visualizing the word cloud:

Processing data like text, HTML, or source code, and extracting the relevant words while ignoring others. As an example, I have implemented three of them. TextExtractor extracts all words from some text string ignoring spaces and all non-letter characters. FileExtractor is able to process large text files line by line. Another one UriExtractor fetches a URL content and tries to clean away HTML tags and JavaScript (to be honest, I just implemented it as a showcase and its filtering capabilities are very pure).

To tap your own data source, just implement the IEnumerable<string> interface or derive from BaseExtractor.

Counting words and ignoring ones from blacklist.

The result is an enumeration with pairs of terms (words) and integers representing the number of occurrences of this word in a text. In the first implementation, I was using KeyValuePair<string, int> to represent them. In this version, I switched to the IWord interface.

public interface IWord : IComparable<IWord>
{
    string Text { get; }
    int Occurrences { get; }
    string GetCaption();
}

I have also moved to LINQ and gave up my own classes for word counting, grouping, and sorting. I loved them very much, but using LINQ increased readability, reduced complexity, and shortened code. All these at the price of an ignorable insignificant performance drawback was really a good deal.

IBlacklist blacklist = new CommonWords();
IProgressIndicator progress = new ProgressBarWrapper(progressBar);
IEnumerable<string> terms = new StringExtractor(textBox.Text, progress);

cloudControl.WeightedWords =
    terms
        .Filter(blacklist)
        .CountOccurences()
        .SortByOccurences();

Layout – I use a QuadTree data structure to create a non overlapping map of words on controls graphics. The same data structure is also used to query control which words are under a certain rectangular area or point. This query is used to redraw only a particular area when needed or perform some action when a control is clicked. Thereby it is very useful to know which word was clicked to perform a word related action, let’s say show statistics or navigate to some URL.

private void cloudControl_Click(object sender, EventArgs e)
{
    LayoutItem itemUderMouse;
    Point mousePositionRelativeToControl = 
       cloudControl.PointToClient(new Point(MousePosition.X, MousePosition.Y));
    if (!cloudControl.TryGetItemAtLocation(
             mousePositionRelativeToControl, out itemUderMouse))
    {
        return;
    }
    MessageBox.Show(itemUderMouse.Word);
}

Configuring the Word Cloud Control

There are several things you may vary on this control:

You can change the font type and size.

cloudControl.MinFontSize = 6;
cloudControl.MaxFontSize = 60;
cloudControl.Font = new Font(new FontFamily("Verdana"), 8, FontStyle.Regular); 

Use different colours:

cloudControl.Palette = new Brush[] {Brushes.DarkRed, Brushes.Red, Brushes.LightPink};  

Use a different layout. Currently, there are two layouts implemented. You can implement your own by deriving from BaseLayout or just by implementing the ILayout interface on your own.

cloudControl.LayoutType = LayoutType.Typewriter;

The logic of lay out and drawing graphics is strictly separated by the IGraphicEngine interface. So I think it would not be a big deal to port it to WPF or Silverlight in the future.

For experts

By digging in the code, you will discover the following extra features:

  • Creating your own blacklist - IBlacklist interface or the CommonBlacklist base class.
  • Loading blacklist from file - CommonBlacklist.CreateFromFile(...) method.
  • Grouping words having common stem like - departed, depart, departing.
  • You are even able to see statistics on it.
  • departed.JPG

Credits

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

George Mamaladze
Software Developer
Germany Germany
Tweeter: @gmamaladze
Google+: gmamaladze
Blog: gmamaladze.wordpress.com

You may also be interested in...

Comments and Discussions

 
Questioncomment Pin
harsh agrawal7-May-15 3:08
memberharsh agrawal7-May-15 3:08 
QuestionWordCloud in C# Pin
Member 104974521-Jan-14 1:18
memberMember 104974521-Jan-14 1:18 
QuestionDundas Pin
davebenjamin18-Jul-13 0:20
memberdavebenjamin18-Jul-13 0:20 
QuestionMy vote of 5 Pin
dpuser30-Jun-13 21:51
memberdpuser30-Jun-13 21:51 
GeneralMy vote of 5 Pin
Mihnea Rădulescu12-Oct-12 4:42
memberMihnea Rădulescu12-Oct-12 4:42 
GeneralMy vote of 5 Pin
Member 12709965-Sep-12 23:05
memberMember 12709965-Sep-12 23:05 
QuestionGreat Control but I found an error Pin
rapunsel11129-Jan-12 21:31
memberrapunsel11129-Jan-12 21:31 
GeneralMy vote of 5 Pin
Johnny Deese1-Dec-11 11:51
memberJohnny Deese1-Dec-11 11:51 
Questionsimply excellent Pin
sgraf@psc-nrw.de18-Oct-11 6:58
membersgraf@psc-nrw.de18-Oct-11 6:58 
GeneralMy vote of 5 Pin
hadre13-Sep-11 12:35
memberhadre13-Sep-11 12:35 
QuestionSaving Cloud to file Pin
mswCP24-Aug-11 13:20
membermswCP24-Aug-11 13:20 
AnswerRe: Saving Cloud to file Pin
George Mamaladze24-Aug-11 19:42
memberGeorge Mamaladze24-Aug-11 19:42 
Questionnice Pin
CIDev8-Aug-11 3:37
memberCIDev8-Aug-11 3:37 
Questionissue with some word sets Pin
funazonki3-Aug-11 7:23
memberfunazonki3-Aug-11 7:23 
GeneralRe: issue with some word sets Pin
George Mamaladze3-Aug-11 7:57
memberGeorge Mamaladze3-Aug-11 7:57 
GeneralMy vote of 5 Pin
maq_rohit29-Jul-11 8:40
membermaq_rohit29-Jul-11 8:40 
GeneralRe: My vote of 5 Pin
Ranger18521-Aug-12 8:48
memberRanger18521-Aug-12 8:48 
QuestionRe: by, and, plural/singual, .. Pin
George Mamaladze26-Jul-11 3:17
memberGeorge Mamaladze26-Jul-11 3:17 
SuggestionRe: by, and, plural/singual, .. Pin
tassilo26-Jul-11 8:40
membertassilo26-Jul-11 8:40 
Hi George!

Sorry for deleting the inital post. I complained based on just using the binaries and looked at the code later on.

- At least the user should be able to specify the blacklisted words without coding & compiling Wink | ;-) - or did I overlook something?
- Depending on the nature of the text, my guess is that it is OK to leave out string.length < 4. This can be different if you generate the cloud for acronym rich texts. Especially for love songs, you should not stretch the limit to 5 Big Grin | :-D
- Looking at possible result sets, from variations of <word> like <word>'s, <word>ing, <word>s, ... only the one with the max count should make it into the cloud. Meaning, I wouldn't focus on only one of these during parsing the input text but evaluate afterwards.
- Regarding the collection of suffixes, I am not a native speaker. But the outer join between your and my examples should be a good starter. Many other suffixes can alter the meaning of a word. At times, filtering for 'ing' is OK, at times not - or think about hard and hardly or burn and burnout Wink | ;-)

Stefan
SuggestionRe: by, and, plural/singual, .. Pin
tassilo26-Jul-11 8:54
membertassilo26-Jul-11 8:54 
NewsRe: by, and, plural/singual, .. Pin
George Mamaladze28-Jul-11 8:28
memberGeorge Mamaladze28-Jul-11 8:28 
GeneralRe: by, and, plural/singual, .. Pin
Win32nipuh2-Aug-11 5:43
memberWin32nipuh2-Aug-11 5:43 
QuestionMy vote of 5! Pin
Filip D'haene23-Jul-11 3:20
memberFilip D'haene23-Jul-11 3:20 
QuestionMy vote of 5 Pin
Reiss20-Jul-11 0:35
memberReiss20-Jul-11 0:35 
NewsRe: My vote of 5 Pin
George Mamaladze29-Jul-11 10:26
memberGeorge Mamaladze29-Jul-11 10:26 
Questionyet one idea: scaling Pin
Win32nipuh19-Jul-11 6:03
memberWin32nipuh19-Jul-11 6:03 
QuestionWhat about Silverlight implementation? Pin
Win32nipuh19-Jul-11 5:33
memberWin32nipuh19-Jul-11 5:33 
AnswerRe: What about Silverlight implementation? Pin
George Mamaladze19-Jul-11 6:18
memberGeorge Mamaladze19-Jul-11 6:18 
GeneralMy vote of 5 Pin
Win32nipuh19-Jul-11 5:06
memberWin32nipuh19-Jul-11 5:06 
GeneralRe: My vote of 5 Pin
George Mamaladze19-Jul-11 5:28
memberGeorge Mamaladze19-Jul-11 5:28 
GeneralRe: My vote of 5 Pin
Win32nipuh19-Jul-11 5:32
memberWin32nipuh19-Jul-11 5:32 
SuggestionRe: My vote of 5 Pin
George Mamaladze19-Jul-11 6:26
memberGeorge Mamaladze19-Jul-11 6:26 
GeneralRe: My vote of 5 Pin
Win32nipuh19-Jul-11 6:35
memberWin32nipuh19-Jul-11 6:35 
GeneralRe: My vote of 5 Pin
Win32nipuh19-Jul-11 22:18
memberWin32nipuh19-Jul-11 22:18 
QuestionGreat looking tool Pin
Evoluteur18-Jul-11 13:07
memberEvoluteur18-Jul-11 13:07 
GeneralMy vote of 5 Pin
jesseseger13-Jul-11 1:53
memberjesseseger13-Jul-11 1:53 
GeneralMy vote of 5 Pin
dobbied12-Jul-11 7:03
memberdobbied12-Jul-11 7:03 
QuestionCoolio [modified] Pin
Sacha Barber11-Jul-11 23:50
mvpSacha Barber11-Jul-11 23:50 
AnswerRe: Coolio Pin
George Mamaladze12-Jul-11 9:51
memberGeorge Mamaladze12-Jul-11 9:51 
GeneralRe: Coolio Pin
Sacha Barber12-Jul-11 19:29
mvpSacha Barber12-Jul-11 19:29 
GeneralMy vote of 5 Pin
Walt Fair, Jr.11-Jul-11 18:28
subeditorWalt Fair, Jr.11-Jul-11 18:28 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.150728.1 | Last Updated 31 Jul 2011
Article Copyright 2011 by George Mamaladze
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid