Click here to Skip to main content
Click here to Skip to main content

Word stemming for German on .NET Framework

By , 22 Feb 2003
 

Introduction

I found many word stemming algorithms implemented for the English language, some very good, others not so, and so on... I once had a project where my task was implementation of a stemming algorithm on .NET framework, for German, and I could not found any implementation in .NET framework, for German. So this is my implementation of word stemming for German language, on the .NET framework in C#.

There is source code for stemingLib.dll and example source code on how to use stemingLib.dll. With the demo project there is a sample of German vocabulary in a file rjecnik.txt and same vocabulary correct stemmed in output.txt. Demo application provides new stemmed vocabulary rezultat.txt and compares with output.txt to check for errors. This is actually my test application for this implementation. The other example is in the next section of this article.

Using the code

This is a simple library with only one class called destemmer. This class has only one property Word and one method Stem. You can use this class like:

// Create object who will performe stemming
destemmer Stemmer = new destemmer();
Console.WriteLine("\n Input some german word: ");
string g_word = Console.ReadLine();
// You can call function Stem(string word) to get stemmed word.
string stemmed_word = Stemmer.Stem( g_word.ToLower() );
// Or You can initialize 'Word' property
Stemmer.Word = g_word.ToLower();
// Then call stem function like this
Stemmer.Stem();
/* and retrive result on this way 
(property Word after calling Stem() function 
contains stemmed word, not original word) */ 
string stemmed_word2 = Stemmer.Word;
Console.WriteLine("\n Stemming result on first way: {0} ", 
                                              stemmed_word);
Console.WriteLine("\n Stemming result on second way: {0} ", 
                                             stemmed_word2);
/* And its so simple */

Points of interest

This library works only for Unicode string, and all word strings passed to the Stem() function or in to Word property have to be lowercase. Algorithm: There is no need for me to give you the details of the algorithm which I use for the implementation of German word stemming. All information you could be interested to know, can be found on http://snowball.tartarus.org/, where I found the original algorithm and implemented it for the German language, on .NET framework.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

EasyWay
Web Developer
Bosnia And Herzegovina Bosnia And Herzegovina
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
SuggestionUpdate: words ending with 'nisse'memberStormrider4212 Dec '11 - 10:46 
GeneralWord Stemming with the open office spell checkermemberThomas Maierhofer15 Nov '09 - 4:49 
GeneralLicense for the C# stemmingmemberJohn Kuntoff7 Feb '05 - 0:34 
GeneralRe: License for the C# stemmingsussAnonymous28 Aug '05 - 10:45 
Generalanother german stemmermemberbergla25 Feb '03 - 1:27 
GeneralRe: another german stemmermemberEasyWay27 Feb '03 - 9:55 
GenerallowercasememberAndreas Saurwein24 Feb '03 - 5:23 
GeneralRe: lowercasememberEasyWay27 Feb '03 - 10:00 
Yes, but in for large number of words String.ToLower() always copies string and that can decrase performance, so I leave that implementation to user.
 
Faruk Kasumovic.
Student of Electrical Enginering (Information Technologies) in Tuzla, Bosnia and Herzegovina.
GeneralRe: lowercasesussStoyan Damov8 Mar '03 - 18:13 
GeneralRe: lowercasememberEasyWay9 Mar '03 - 12:34 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130516.1 | Last Updated 23 Feb 2003
Article Copyright 2003 by EasyWay
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid