Click here to Skip to main content
15,881,882 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
Hello Brothers, Master, and Programmer.

I'm a little bit confused how to determine POS (Part of Speech) tagging in English. In this case, I assume that one word in English has one type, for example word "book" is recognized as NOUN, not as VERB. I wanna recognized English sentences based on Tenses, for example "I sent the book" is recognized as Past Tense.


Description :

I have a number of database (*.txt) consist of NounList.txt, verbList.txt, adjectiveList.txt, adverbList.txt, conjunctionList.txt, prepositionList.txt, articleList.txt. And if input words are available in database, I assume that type of those word can be conclude. But, how to begin look up in database? For example, "I sent the book", how to begin search in database for every word, "I" as Noun, "sent" as verb, "the" as article, "book" as Noun. Any better approach than search every word in every database? I doubt that every databases has unique element.


I enclose my perspective here.
C#
private List<string> ParseInput(String allInput)
{
    List<string> listSentence = new List<string>();

    char[] delimiter = ".?!;".ToCharArray();
    var sentences = allInput.Split(delimiter, StringSplitOptions.RemoveEmptyEntries).Select(s => s.Trim());

    foreach (var s in sentences)
        listSentence.Add(s);

        return listSentence;
}

private void tenseReviewMenu_Click(object sender, EventArgs e)
    {
        string allInput = rtbInput.Text;

        List<string> listWord = new List<string>();
        List<string> listSentence = new List<string>();

        HashSet<string> nounList = new HashSet<string>(getDBList("nounList.txt"));
        HashSet<string> verbList = new HashSet<string>(getDBList("verbList.txt"));
        HashSet<string> adjectiveList = new HashSet<string>(getDBList("adjectiveList.txt"));
        HashSet<string> adverbList = new HashSet<string>(getDBList("adverbList.txt"));

        char[] separator = new char[] { ' ', '\t', '\n', ',' etc... };

        listSentence = ParseInput(allInput);

        foreach (string sentence in listSentence)
        {
            foreach (string word in sentence.Split(separator))
                if (word.Trim() != "")
                    listWord.Add(word);
        }

        string testPOS = "";

        foreach (string word in listWord)
        {
            if (nounList.Contains(word.ToLowerInvariant()))
                testPOS += "noun ";
            else if (verbList.Contains(word.ToLowerInvariant()))
                testPOS += "verb ";
            else if (adjectiveList.Contains(word.ToLowerInvariant()))
                testPOS += "adj ";
            else if (adverbList.Contains(word.ToLowerInvariant()))
                testPOS += "adv ";

        }
        tbTest.Text = testPOS;
    }


POS Tagging is my secondary explanation in my assignment. So I use simple approach to determine POS Tagging that is based on database. But, if there's simpler approach, easy to use, easy to understand, easy to get pseudocode, easy to design.. to determine POS Tagging, let me try. Please.....

Sincerely.. :) :) :)
Posted
Updated 23-Mar-13 17:02pm
v2
Comments
SoMad 23-Mar-13 23:44pm    
I am not an expert on this, but your basic assumption is going to be a problem. Think about a simple word like 'left' - two of several usages are: "My left arm is on fire" (adjective), "I left my house an hour ago" (verb, past tense).

Soren Madsen
Berry Harahap 24-Mar-13 0:06am    
Dear @SoMad, good job Sir. I realized it's low accuracy. So I wanna try to combine it with ruled base. :):):)
Cheers

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900