How to count the number of words only using string methods using a more efficient regex

Question

1.00/5 (1 vote)

See more:

Input data
You will read the text of the email from the keyboard. It can span multiple lines and contains only lowercase letters of the English alphabet and spaces.

Output data
A single integer will be displayed, representing the number of distinct words in the email

limitation
The text does not contain more than 100,000 words, and each word can contain up to 25 characters.

 Input data:
thanks for the list of shopping
Is helpful
thanks

Output data:
7

I must ignore the words that are repeated

What I have tried:

<pre>import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Set;
import java.util.TreeSet;
import java.util.*;
 
    class prog {
    public static void main(String args[] ) throws Exception {
  Set<String> words = new TreeSet<>();

   BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));

    String line;
    while((line = reader.readLine()) != null){

        words.addAll(
                Arrays.asList( 
                   line.split(" ") // I need to use another method, not split
                )
        );
    }
        System.out.println(words.size()-1);
    }
    }

Posted 10-May-22 20:53pm

Cristian Marinescu 2022

Updated 10-May-22 22:28pm

v2

Add a Solution

Comments

CPallini 11-May-22 3:04am

The code you posted works. What is the problem?

Cristian Marinescu 2022 11-May-22 4:34am

My regular expression is not enough efficient on the platform where I'm practising. They ask me for a better solution.

Richard MacCutchan 11-May-22 3:37am

Your example actually contains 9 words, not 7. Also, why are you creating Arrays and Sets? All you need to do is use split to get a simple list of words, and add the number in the array to your total.

Cristian Marinescu 2022 11-May-22 4:32am

I must ignore the words that are repeated. I need to use like this cause I'm practising on a platform and they ask me only to use a more efficient method than split

Richard MacCutchan 11-May-22 4:59am

OK, but you do not need a TreeSet, a simple Set<E> will ensure no duplicates. Also split is probably the most efficient method of separating words in a string. The alternative would be to implement your own version using the indexOf and substring methods.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

OriginalGriff · Answer 1 · 2022-05-10T21:41:00

First off, the sample data both breaks your rules:

Quote:
You will read the text of the email from the keyboard. It ... contains only lowercase letters of the English alphabet and spaces.

'I' is uppercase
And it contains more than seven words:

thanks for the list of shopping Is helpful thanks
   1    2   3    4   5     6     7    8       9

And a regex is a pretty inefficient way to do counting: I'd use the Java String indexOf() Method[^] or What is the StringUtils.indexOfAny method in Java?[^] method to locate the separators and work it out from there instead of playing with regex or creating new strings.