Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C++ Linux
Hi Friends,
I am trying to get the number of words in each sentences but facing some difficulties. I have to find the most number of repeated words in each sentence as well as in paragraph. So friends I need your help.
 
My code:
#include "stdafx.h"
#include "iostream"
#include "string"
#include "sstream"
 
using namespace std;
 
int _tmain(int argc, _TCHAR* argv[])
{
	string userInput="India is a country in South Asia. It is the Seventh-Largest country by area and second-largest by population and most populous democracy in the world.";
	int words = 1;     
	int sentences = 0;
	int paragraphs = 1;
 
	//cout << "Enter some text: ";
	//getline (cin, userInput);

	for (int i = 0; i < int(userInput.length()); i++) 
    { 		
		if (userInput.empty()) 
		{
			words--;
			paragraphs--;
		}
 
		if (userInput[i] == ' ')  
			words++ ;
 
		if (userInput[i] == '.')
			sentences++;
			
 
		if (userInput[i] == '\n' && userInput[i] == '\t')
			paragraphs++;
    }
	cout << "words: " << words << endl;
	cout << "sentences: " << sentences << endl;
	cout << "paragraphs: " << paragraphs << endl;
 
	//cout << "Number of words in sentence :" << endl;
	
	/*
     istringstream iss(userInput);
       do
       {
         string sub;
         iss >> sub;
         cout << "Substring: " << sub << endl;
       } 
	while (iss);*/
 
	return 0;
}
 
Advance Thanks
Johny_sa
Posted 10-Jul-12 2:41am
Edited 17-Jul-12 19:06pm
v3
Comments
pasztorpisti at 10-Jul-12 8:52am
   
Smells like homework... :D
Wes Aday at 10-Jul-12 9:01am
   
I think you are right. Stinks of homework and there isn't even a question here.
pasztorpisti at 10-Jul-12 9:05am
   
Why don't you ask some specific questions about problems you can't solve. Its less likely that someone will give you a full solution despite the fact that this is an easy task. If you don't like programming, you are taking the wrong course.
Sergey Chepurin at 10-Jul-12 13:16pm
   
If it is a homework, just add corresponding tag. Anyway, try to get something useful from the sample code of S.Meyers (http://www.artima.com/samples/cpp11NotesSample.pdfList the 20 most common words in a set of text files). Though, you will have to adapt it for own needs because, i guess, your teacher would not believe you wrote it by yourself.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

As you said. It is C++. So, you have to take advantage of C++ as object oriented language. Use standard C++ libraries, make your own objects. Each part of your task have to be solved separately. If you want to solve all question in a single main function, like in student homework approach, this is wrong. Possibility to separe tasks is a big advantage of C language. And possibility to separate tasks much better is a much bigger advantage of C++. Most likely, for each part you have to make a class. For instance class Text, class Paragraf, class Sentence. The Text have to contain array of Paragraf objects. Paragraf have to contain array of Sentence objects. Sentence is responsible for detecting most repeated words.
Take advantage of C++ standard template libraries, such as vectors, maps, iterators.
For instance to count word occurence you may use map of <string,int>. For each encountered word used as a key, increment its value.
  Permalink  
v2
Comments
Espen Harlinn at 11-Jul-12 8:17am
   
Good answer :-D
armagedescu at 11-Jul-12 17:07pm
   
:)
armagedescu at 13-Jul-12 10:17am
   
First of all, you should complete the task of counting occurrences of all words in the sentence. Just as I've described above. After that, use iterators to iterate each element of the map to compare values and find the max one. The corresponding key is the word you search.
YvesDaoust at 30-Nov-12 4:33am
   
I don't quite agree with this approach. It is overkill.
 
There is no need to store any hierarchical representation of paragraphs and sentences, as only counts are requested. Processing can very well be done on the fly.
 
The only relevant data structure I see here is a histogram of word counts, which is indeed appropriately implemented using a map.
 
Don't put classes everywhere. KISS.
armagedescu at 30-Nov-12 5:11am
   
Please read attentively, you are very wrong. See "most number of repeated words in each sentence". So, in each sentence, you should store the each distinct word with the repeating count, after that take the maximum one inside the sentence. After that, compare the numbers from each sentence between sentences.
YvesDaoust at 30-Nov-12 5:26am
   
Yep, you need to keep a count of words in the current sentence, and a count of words in the current paragraph. Can be done with a single map or two of them.
 
Storing the text structure is of no help.
armagedescu at 30-Nov-12 6:15am
   
I've told already about the need of a map. But you can only map the words. How will you map the sentences and the paragraphs? And there is nothing I've told about the text structure. Even for storing each of them you will need maps, arrays, relations between them, and a lot of spaghetti.
armagedescu at 30-Nov-12 5:13am
   
Yeah, and see the paragraphs as well.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

1. Try writing three different functions, one each for counting words, sentences, and paragraphs. You may have to duplicate some code, but by separating the the three tasks you will have an easier time to test for the correct conditions, and you will be able to solve one problem at a time.
 
2. It's a good idea to initialize variables when you define them, but you should use reasonable values. I understand your reasons to initialize some with 1 instead of 0, but it definitely looks odd, and makes it harder to understand your code and follow its logic.
 
3. Your test for blanks does not consider other cases of 'whitespace', such as tab characters, carriage return, form feed, or multiple whitespace characters. It may not apply to the case you are testing here, but if you base your code only on the specific test case you have, you may as well count the words by hand and return these numbers rather than write an entire algorithm around it...
 
4. '.' is not the only way to end a sentence. Also, depending on where you get your text from, you may be confronted with sequences of multiple punctuation marks!!! Wink | ;-)
 
5. The condition
if (userInput[i] == '\n' && userInput[i] == '\t')
is always false and always will be, no matter the text. Besides, why do you test for a tab character ( '\t' )? Reconsider the definition of paragraph that you use, or, rather, the definition of what separates paragraphs.
 
6. As a general rule, always consider corner cases: e. g. multiple separater characters where you only expect one, omitting or adding a separator at the end of the text, using variants on the commonly used separators, or interpreting characters that are not part of the readable text, but not one of the separators you catch either.
  Permalink  
Comments
Espen Harlinn at 11-Jul-12 8:17am
   
Good answer :-D
Stefan_Lang at 11-Jul-12 8:27am
   
Thank you :-)
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

well my solution exactly answers your question. I have code in java so if u might able to understand it its good. My solution will display top ten most repeated words in each paragraph...
 
here is a glimpse of code to find original solution visit
Click here to get to original source code in java.
 
for (int i = keys.length - 1, count = 0; i >= 0; i--)
            {
                if (count == 10) {
                    break;
                }
                count++;
                System.out.println(count + ". " + keys[i] + ",    \tFrequency "+ map1.get(keys[i]));
            }
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 4

Assuming well-formed text (single space between words, full stop at end of a sentence, newline at end of a paragraph), declare an empty dictionary with with key = word and record = pair of per-sentence word count, per-paragraph word count. Use the following single-pass scan:
for each character:
    if space, full stop or newline: (finishing a word)
        insert the word just seen in the dictionary, if need be; increment its per-sentence and per-paragraph counts
 
    if full stop or newline: (finishing a sentence)
        output the largest per-sentence word count; clear all per-sentence word counts
 
    if newline: (finishing a paragraph)
        output the largest per-paragraph word count; empty the dictionary
  Permalink  
v5

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 CPallini 345
1 BillWoodruff 324
2 George Jonsson 279
3 Sergey Alexandrovich Kryukov 258
4 OriginalGriff 227
0 OriginalGriff 5,050
1 CPallini 4,225
2 Sergey Alexandrovich Kryukov 3,639
3 George Jonsson 2,911
4 Gihan Liyanage 2,386


Advertise | Privacy | Mobile
Web04 | 2.8.140916.1 | Last Updated 30 Nov 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100