Click here to Skip to main content
12,948,566 members (62,722 online)
Rate this:
Please Sign up or sign in to vote.
See more:
Hi Friends,
I am trying to get the number of words in each sentences but facing some difficulties. I have to find the most number of repeated words in each sentence as well as in paragraph. So friends I need your help.

My code:
#include "stdafx.h"
#include "iostream"
#include "string"
#include "sstream"
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
	string userInput="India is a country in South Asia. It is the Seventh-Largest country by area and second-largest by population and most populous democracy in the world.";
	int words = 1;     
	int sentences = 0;
	int paragraphs = 1;
	//cout << "Enter some text: ";
	//getline (cin, userInput);
	for (int i = 0; i < int(userInput.length()); i++) 
		if (userInput.empty()) 
		if (userInput[i] == ' ')  
			words++ ;
		if (userInput[i] == '.')
		if (userInput[i] == '\n' && userInput[i] == '\t')
	cout << "words: " << words << endl;
	cout << "sentences: " << sentences << endl;
	cout << "paragraphs: " << paragraphs << endl;
	//cout << "Number of words in sentence :" << endl;
     istringstream iss(userInput);
         string sub;
         iss >> sub;
         cout << "Substring: " << sub << endl;
	while (iss);*/
	return 0;

Advance Thanks
Posted 10-Jul-12 2:41am
Updated 17-Jul-12 19:06pm
pasztorpisti 10-Jul-12 8:52am
Smells like homework... :D
Wes Aday 10-Jul-12 9:01am
I think you are right. Stinks of homework and there isn't even a question here.
pasztorpisti 10-Jul-12 9:05am
Why don't you ask some specific questions about problems you can't solve. Its less likely that someone will give you a full solution despite the fact that this is an easy task. If you don't like programming, you are taking the wrong course.
Sergey Chepurin 10-Jul-12 13:16pm
If it is a homework, just add corresponding tag. Anyway, try to get something useful from the sample code of S.Meyers ( the 20 most common words in a set of text files). Though, you will have to adapt it for own needs because, i guess, your teacher would not believe you wrote it by yourself.
Rate this: bad
Please Sign up or sign in to vote.

Solution 1

As you said. It is C++. So, you have to take advantage of C++ as object oriented language. Use standard C++ libraries, make your own objects. Each part of your task have to be solved separately. If you want to solve all question in a single main function, like in student homework approach, this is wrong. Possibility to separe tasks is a big advantage of C language. And possibility to separate tasks much better is a much bigger advantage of C++. Most likely, for each part you have to make a class. For instance class Text, class Paragraf, class Sentence. The Text have to contain array of Paragraf objects. Paragraf have to contain array of Sentence objects. Sentence is responsible for detecting most repeated words.
Take advantage of C++ standard template libraries, such as vectors, maps, iterators.
For instance to count word occurence you may use map of <string,int>. For each encountered word used as a key, increment its value.
Espen Harlinn 11-Jul-12 8:17am
Good answer :-D
armagedescu 11-Jul-12 17:07pm
armagedescu 13-Jul-12 10:17am
First of all, you should complete the task of counting occurrences of all words in the sentence. Just as I've described above. After that, use iterators to iterate each element of the map to compare values and find the max one. The corresponding key is the word you search.
YvesDaoust 30-Nov-12 4:33am
I don't quite agree with this approach. It is overkill.

There is no need to store any hierarchical representation of paragraphs and sentences, as only counts are requested. Processing can very well be done on the fly.

The only relevant data structure I see here is a histogram of word counts, which is indeed appropriately implemented using a map.

Don't put classes everywhere. KISS.
armagedescu 30-Nov-12 5:11am
Please read attentively, you are very wrong. See "most number of repeated words in each sentence". So, in each sentence, you should store the each distinct word with the repeating count, after that take the maximum one inside the sentence. After that, compare the numbers from each sentence between sentences.
YvesDaoust 30-Nov-12 5:26am
Yep, you need to keep a count of words in the current sentence, and a count of words in the current paragraph. Can be done with a single map or two of them.

Storing the text structure is of no help.
armagedescu 30-Nov-12 6:15am
I've told already about the need of a map. But you can only map the words. How will you map the sentences and the paragraphs? And there is nothing I've told about the text structure. Even for storing each of them you will need maps, arrays, relations between them, and a lot of spaghetti.
armagedescu 30-Nov-12 5:13am
Yeah, and see the paragraphs as well.
Rate this: bad
Please Sign up or sign in to vote.

Solution 2

1. Try writing three different functions, one each for counting words, sentences, and paragraphs. You may have to duplicate some code, but by separating the the three tasks you will have an easier time to test for the correct conditions, and you will be able to solve one problem at a time.

2. It's a good idea to initialize variables when you define them, but you should use reasonable values. I understand your reasons to initialize some with 1 instead of 0, but it definitely looks odd, and makes it harder to understand your code and follow its logic.

3. Your test for blanks does not consider other cases of 'whitespace', such as tab characters, carriage return, form feed, or multiple whitespace characters. It may not apply to the case you are testing here, but if you base your code only on the specific test case you have, you may as well count the words by hand and return these numbers rather than write an entire algorithm around it...

4. '.' is not the only way to end a sentence. Also, depending on where you get your text from, you may be confronted with sequences of multiple punctuation marks!!! ;-)

5. The condition
if (userInput[i] == '\n' && userInput[i] == '\t')

is always false and always will be, no matter the text. Besides, why do you test for a tab character ( '\t' )? Reconsider the definition of paragraph that you use, or, rather, the definition of what separates paragraphs.

6. As a general rule, always consider corner cases: e. g. multiple separater characters where you only expect one, omitting or adding a separator at the end of the text, using variants on the commonly used separators, or interpreting characters that are not part of the readable text, but not one of the separators you catch either.
Espen Harlinn 11-Jul-12 8:17am
Good answer :-D
Stefan_Lang 11-Jul-12 8:27am
Thank you :-)
Rate this: bad
Please Sign up or sign in to vote.

Solution 3

well my solution exactly answers your question. I have code in java so if u might able to understand it its good. My solution will display top ten most repeated words in each paragraph...

here is a glimpse of code to find original solution visit
Click here to get to original source code in java.

for (int i = keys.length - 1, count = 0; i >= 0; i--)
                if (count == 10) {
                System.out.println(count + ". " + keys[i] + ",    \tFrequency "+ map1.get(keys[i]));
Rate this: bad
Please Sign up or sign in to vote.

Solution 4

Assuming well-formed text (single space between words, full stop at end of a sentence, newline at end of a paragraph), declare an empty dictionary with with key = word and record = pair of per-sentence word count, per-paragraph word count. Use the following single-pass scan:
for each character:
    if space, full stop or newline: (finishing a word)
        insert the word just seen in the dictionary, if need be; increment its per-sentence and per-paragraph counts
    if full stop or newline: (finishing a sentence)
        output the largest per-sentence word count; clear all per-sentence word counts
    if newline: (finishing a paragraph)
        output the largest per-paragraph word count; empty the dictionary

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

    Print Answers RSS
Top Experts
Last 24hrsThis month
OriginalGriff 5,419
CHill60 3,275
Maciej Los 2,778
Jochen Arndt 1,935
ppolymorphe 1,795

Advertise | Privacy | Mobile
Web02 | 2.8.170524.1 | Last Updated 30 Nov 2012
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100