Click here to Skip to main content
15,561,958 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Input data
You will read the text of the email from the keyboard. It can span multiple lines and contains only lowercase letters of the English alphabet and spaces.

Output data
A single integer will be displayed, representing the number of distinct words in the email

The text does not contain more than 100,000 words, and each word can contain up to 25 characters.

 Input data:
thanks for the list of shopping
Is helpful

Output data:

I must ignore the words that are repeated

What I have tried:

import java.util.Set;
import java.util.TreeSet;
import java.util.*;
    class prog {
    public static void main(String args[] ) throws Exception {
  Set<String> words = new TreeSet<>();

   BufferedReader reader = new BufferedReader(new InputStreamReader(;

    String line;
    while((line = reader.readLine()) != null){

                   line.split(" ") // I need to use another method, not split
Updated 10-May-22 23:28pm
CPallini 11-May-22 3:04am    
The code you posted works. What is the problem?
Cristian Marinescu 2022 11-May-22 4:34am    
My regular expression is not enough efficient on the platform where I'm practising. They ask me for a better solution.
Richard MacCutchan 11-May-22 3:37am    
Your example actually contains 9 words, not 7. Also, why are you creating Arrays and Sets? All you need to do is use split to get a simple list of words, and add the number in the array to your total.
Cristian Marinescu 2022 11-May-22 4:32am    
I must ignore the words that are repeated. I need to use like this cause I'm practising on a platform and they ask me only to use a more efficient method than split
Richard MacCutchan 11-May-22 4:59am    
OK, but you do not need a TreeSet, a simple Set<E> will ensure no duplicates. Also split is probably the most efficient method of separating words in a string. The alternative would be to implement your own version using the indexOf and substring methods.

1 solution

First off, the sample data both breaks your rules:
You will read the text of the email from the keyboard. It ... contains only lowercase letters of the English alphabet and spaces.
'I' is uppercase
And it contains more than seven words:
thanks for the list of shopping Is helpful thanks
   1    2   3    4   5     6     7    8       9

And a regex is a pretty inefficient way to do counting: I'd use the Java String indexOf() Method[^] or What is the StringUtils.indexOfAny method in Java?[^] method to locate the separators and work it out from there instead of playing with regex or creating new strings.
Share this answer
Cristian Marinescu 2022 11-May-22 4:48am    
I need to ignore in my code both "thanks", that's why I have 7. I must use a regex, but a more efficient one because I'm exercising on a platform and I receive a wrong answer on the last test when I'm submitting my solution. They ask only to change the regex.
Dave Kreskowiak 11-May-22 7:30am    
You have 8 words, not 7., I suspect you're misinterpreting the "ignore duplicate words". It means you "don't count the same word more than once," not "ignore any words that are duplicated."

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900