Click here to Skip to main content
13,555,044 members
Click here to Skip to main content
Add your own
alternative version


19 bookmarked
Posted 9 Jan 2006

Sentence Breaker using Microsoft Word

, 9 Jan 2006
Rate this:
Please Sign up or sign in to vote.
A kind of preprocessing in text analysis.

Sample Image


This article presents a simple way to do language-independent sentence breaking using Microsoft Word 2003, which breaks texts into sentence groups internally. The target audience includes those who are interested in text processing, text mining from Internet, or NLP (natural-language processing) related fields.


In the NLP field, most technologies are sentence oriented, such as word breaking, POS tagging, and syntactic parsing etc., while the largest text resource "Internet" is document-based. So, there is a requirement to convert documents into sentences. That is the problem we want to solve here.

The objective of sentence breaking is to break one document into sentences. The problem is how to recognize the sentence boundaries. There are several popular algorithms in this field, but in this article, we will provide a low-cost way, if you have Microsoft Word installed in your computer.

Internally, Word breaks the loaded document into sentences for parsing. So we can extract those broken sentences for our own purpose.

Using the code

The implementation is very simple, just a couple of code snippets to trigger Word processing.

'/// Step 1 : start Word as the Sentence break engine 

    Dim oWord As Word.Application
    Sub New()
        oWord = CreateObject("Word.Application")
        oWord.Visible = False
    End Sub

'/// Step 2: vomit sentence from the given document in turn 

    Public Function VomitSentences(ByVal file As String) As StringCollection
            Dim vecSent As StringCollection = New StringCollection


                Dim thisDoc As Word.Document = _
                  oWord.Application.Documents.Open(file, ReadOnly:=True)

                Dim i As Int32
                For i = 1 To thisDoc.Sentences.Count
                    vecSent.Add(String.Format("-- Sentence {0} --", i))


            Catch ex As Exception
                VomitSentences = vecSent
            End Try

    End Function
'/// Step 3:  finilize 

    Protected Overrides Sub Finalize()
    End Sub 'Finalize

Points of Interest

What is the purpose of extracting sentences?

  • English sentences can be used to aid English-writing, especially for non-English users;
  • Bilingual sentences can be used to help translators;
  • Elite sentences can play a role in the language teaching field;

If you are interested in sentence searching, you can taste this professional sentences search engine. Chinese users can visit this website for bilingual sentences searching.


  • 2006-1-8 created.


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

China China
developer at China

You may also be interested in...


Comments and Discussions

GeneralError! Pin
kenshihimura3-Jan-07 13:46
memberkenshihimura3-Jan-07 13:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.180515.1 | Last Updated 9 Jan 2006
Article Copyright 2006 by engooAgent
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid