Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# Visual-Studio
Hello Brothers, Master, and Programmer.
 
I'm a little bit confused how to determine POS (Part of Speech) tagging in English. In this case, I assume that one word in English has one type, for example word "book" is recognized as NOUN, not as VERB. I wanna recognized English sentences based on Tenses, for example "I sent the book" is recognized as Past Tense.
 

Description :
 
I have a number of database (*.txt) consist of NounList.txt, verbList.txt, adjectiveList.txt, adverbList.txt, conjunctionList.txt, prepositionList.txt, articleList.txt. And if input words are available in database, I assume that type of those word can be conclude. But, how to begin look up in database? For example, "I sent the book", how to begin search in database for every word, "I" as Noun, "sent" as verb, "the" as article, "book" as Noun. Any better approach than search every word in every database? I doubt that every databases has unique element.
 

I enclose my perspective here.
    private List<string> ParseInput(String allInput)
    {
        List<string> listSentence = new List<string>();
 
        char[] delimiter = ".?!;".ToCharArray();
        var sentences = allInput.Split(delimiter, StringSplitOptions.RemoveEmptyEntries).Select(s => s.Trim());
 
        foreach (var s in sentences)
            listSentence.Add(s);
 
            return listSentence;
    }
 
    private void tenseReviewMenu_Click(object sender, EventArgs e)
        {
            string allInput = rtbInput.Text;
 
            List<string> listWord = new List<string>();
            List<string> listSentence = new List<string>();
 
            HashSet<string> nounList = new HashSet<string>(getDBList("nounList.txt"));
            HashSet<string> verbList = new HashSet<string>(getDBList("verbList.txt"));
            HashSet<string> adjectiveList = new HashSet<string>(getDBList("adjectiveList.txt"));
            HashSet<string> adverbList = new HashSet<string>(getDBList("adverbList.txt"));
 
            char[] separator = new char[] { ' ', '\t', '\n', ',' etc... };         
 
            listSentence = ParseInput(allInput);
        
            foreach (string sentence in listSentence)
            {
                foreach (string word in sentence.Split(separator))
                    if (word.Trim() != "")
                        listWord.Add(word);               
            }
 
            string testPOS = "";
 
            foreach (string word in listWord)
            {
                if (nounList.Contains(word.ToLowerInvariant()))
                    testPOS += "noun ";
                else if (verbList.Contains(word.ToLowerInvariant()))
                    testPOS += "verb ";
                else if (adjectiveList.Contains(word.ToLowerInvariant()))
                    testPOS += "adj ";
                else if (adverbList.Contains(word.ToLowerInvariant()))
                    testPOS += "adv ";
 
            }
            tbTest.Text = testPOS;
        }
 
POS Tagging is my secondary explanation in my assignment. So I use simple approach to determine POS Tagging that is based on database. But, if there's simpler approach, easy to use, easy to understand, easy to get pseudocode, easy to design.. to determine POS Tagging, let me try. Please.....
 
Sincerely.. Smile | :) Smile | :) Smile | :)
Posted 23-Mar-13 18:00pm
Edited 23-Mar-13 18:02pm
v2
Comments
SoMad at 23-Mar-13 23:44pm
   
I am not an expert on this, but your basic assumption is going to be a problem. Think about a simple word like 'left' - two of several usages are: "My left arm is on fire" (adjective), "I left my house an hour ago" (verb, past tense).
 
Soren Madsen
Berry Harahap at 24-Mar-13 0:06am
   
Dear @SoMad, good job Sir. I realized it's low accuracy. So I wanna try to combine it with ruled base. :):):)
Cheers

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 7,903
1 Sergey Alexandrovich Kryukov 7,192
2 DamithSL 5,604
3 Manas Bhardwaj 4,986
4 Maciej Los 4,820


Advertise | Privacy | Mobile
Web02 | 2.8.1411023.1 | Last Updated 23 Mar 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100