Click here to Skip to main content
15,914,163 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
how to copy sentences from text containing only 5 or less than 5 words. need help friends...


for example i have two paragraphs each contains ten sentences. 5 of these sentences consists of 5 words. how can i separate these sentences from that paragraphs
Posted
Updated 31-Mar-15 8:07am
v3
Comments
King Fisher 31-Mar-15 12:43pm    
can you show some examples to clear understand.
Asad_Iqbal 31-Mar-15 12:49pm    
for example i have two paragraphs each contains ten sentences. 5 of these sentences consists of 5 words. how can i separate these sentences from that paragraphs
King Fisher 31-Mar-15 13:06pm    
Consider your Post as a paragraph then what do you want to separate from that paragraph?
PIEBALDconsult 31-Mar-15 13:29pm    
RegularExpression.
PIEBALDconsult 31-Mar-15 13:59pm    
What should happen with: "Are you the good Dr. Smith from the high street?"

you need to identify the conditions of how to identify lines in the paragraph and how to separate and count the words in each line. for example;
(note that you need to update the split characters for your requirements)
C#
string input = "aa bb cc dd ee. aa bb. aa bb cc dd. aa bb cc dd ee.";
var lines = input.Split(new[] { '.'}, StringSplitOptions.RemoveEmptyEntries) // get lines
   .Select(sentence =>sentence.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)) // get words
   .Where(words=>words.Count()==5)  // filter lines with given word count
   .Select(w=>string.Join(" ", w));  // build line form words

foreach(var line in lines)
   Console.WriteLine(line);


will print

aa bb cc dd ee
aa bb cc dd ee
 
Share this answer
 
v2
Comments
King Fisher 31-Mar-15 13:19pm    
Well play 5+.
Asad_Iqbal 31-Mar-15 14:04pm    
so helpful bro but the only problem is it only first sentence that contains only five words.. and what should i do to pick 5 or less than 5 words.
.Where(words=>words.Count()<=5)
i change this to this but doesn't work
Here's a Regular Expression solution. It still doesn't handle abbreviations (a dot at the end of a "word" that doesn't end a "sentence"), but it does support decimal points within numbers and it doesn't do all that nasssty string splitting X| .

C#
System.Text.RegularExpressions.Regex reg =
  new System.Text.RegularExpressions.Regex
  ( @"(?:^|\G|[.?!])\s*(?'Sentence'(?:\S+\s+){0,4}\S+[.?!])" ) ;

System.Text.RegularExpressions.MatchCollection mat = reg.Matches ( args [ 0 ] ) ;

for ( int i = 0 ; i < mat.Count ; i++ )
{
  System.Console.WriteLine ( mat [ i ].Groups [ "Sentence" ].Value ) ;
}
 
Share this answer
 
C#
string text = "a. a. b b. b b. c c c. c c c. d d d d. d d d d. e e e e e. e e e e e.";

string[] sentences = text.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries);

foreach (string sentence in sentences)
{
    if (sentence.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Length == 5)
    {
        System.Diagnostics.Debug.WriteLine(sentence);
    }
}
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900