Click here to Skip to main content
Click here to Skip to main content

Notepad RE (Regular Expressions)

By , 22 Mar 2011
 
Screenshot - notepadre.png

Introduction

This is a simple Notepad replacement. The main feature is that you can Search and Replace optionally using regular expressions. The boost::regex library is used for regex support. Note that the intention is for the boost::regex library to eventually become part of the C++ Standard Library. Replace All is improved compared to normal Notepad, as it builds a new text file in memory and replaces the entire text at once when it has finished. This is much quicker than replacing every match in the edit window as you go along.

As the development of Notepad RE progresses, more sophisticated features are being added.

Features

  • Find and Replace using regex - notepadreDoc.cpp
  • Find and Replace in normal mode - notepadreDoc.cpp
  • GREP/Find in Files capability - FindInFilesDlg.cpp
  • Multiple Undo/Redo - notepadreView.cpp
  • Dockable Find and Replace dialogs - MainFrm.cpp
  • Find will wrap from bottom to top -- or top to bottom, depending on search direction -- if necessary - notepadreDoc.cpp
  • If the file you are editing is changed by another process, you have the option of being asked if you want to reload - notepadreDoc.cpp
  • Line and column displayed in status bar - MainFrm.cpp, called from notepadre.cpp
  • You can drop files as path/filename from Explorer by clearing Options->Drop Files - MainFrm.cpp
  • You can drag and drop text to and from the edit window to and from other applications that support drag and drop - notepadreView.cpp
  • You can re-open an existing file, something that does not work in the standard CEditView class - notepadre.cpp
  • You can open a text file bigger than 1 MB - notepadreView.cpp
  • Unicode is supported - notepadreFile.cpp
  • You can open and re-save UNIX text files correctly - notepadreDoc.cpp
  • The Find/Replace dialog is written from scratch - FindReplaceDlg.cp
  • Help file included - MainFrm.cpp

What are Regular Expressions?

A Regular Expression is simply some text. I think it is safe to assume that anyone who has used a modern computer will have used Find and/or Replace dialogs in more than one application that allows text processing, whether it is Notepad, a word processing program, or a web browser. At the simplest level, a regular expression is no different to the text you type into the edit field of a Find dialog. Where regular expressions differ to normal text is that they give special meaning to certain characters, allowing you to specify textual 'patterns' rather than just literal text. The special characters are the following:

'.', '|', '*', '?', '+', '(', ')', '{', '}', '[', ']', '^', '$' and '\'.

These characters are often known as 'metacharacters' in the jargon of regular expressions. If you have ever typed something like...

*.txt

... or something similar, then you are already familiar with the concept of characters having special meaning in a piece (string) of text. Wildcards -- i.e. the characters '*' and '?' -- used when negotiating most computer file systems are a massively simplified version of regular expressions. As well as being able to match any character ('?') or any string ('*'), regular expressions allow you to specify ranges of characters that can match, repeating textual patterns, alternative matching patterns and even matching positions within text. Note that in the syntax of regular expressions, the wildcard character '?' becomes '.' and '*' becomes '.*'.

If you have never used regular expressions before, then once you have learned the syntax you are in for a pleasant surprise. Once you have mastered their use, you will never look back! The official reference for the boost regular expression library is here. See this Regular Expression Primer for a very basic description. The book Mastering Regular Expressions is very good for when you really want to get in-depth!

Getting the Boost library

Visit Boost.org to obtain the boost regular expressions library.

Building the Boost Library

These instructions are for building under Visual C++ version 6.0

  • Download the ZIP file
  • Unzip the contents to C:\
  • From a command prompt:
    • C:\>"C:\Program Files\Microsoft Visual Studio\VC98\Bin\vcvars32.bat"
    • Ensure the environment variable 'include' includes the path "C:\Program Files\Microsoft Visual Studio\VC98\include"
    • Ensure environment variable 'lib' includes the path "C:\Program Files\Microsoft Visual Studio\VC98\lib"
    • C:\>cd C:\boost_1_39_0\libs\regex\build
    • C:\boost_1_39_0\libs\regex\build\>nmake /f vc6.mak
  • Wait until the build finishes (you might want to get a coffee..!)
  • Add C:\boost_1_39_0 to your includes (Tools, Options, Directories, Include files from the VC menu)
  • Add C:\boost_1_39_0\libs\regex\build\vc6 to your library path (Tools, Options, Directories, Library Files from the VC menu)

Getting the Microsoft HTML Help Workshop

Get it here.

Installing HTML Help

  • Download htmlhelp.exe from the link above
  • Run htmlhelp.exe, installing to the default directory C:\Program Files\HTML Help Workshop
  • Add C:\Program Files\HTML Help Workshop\include to your includes (Tools, Options, Directories, Include files from the VC menu)
  • Add C:\Program Files\HTML Help Workshop\lib to your library path (Tools, Options, Directories, Library Files from the VC menu)

Program Design

File Handling

Notepad RE supports ANSI, Unicode, Big Endian Unicode and UTF-8 file formats. Additionally, Windows, UNIX and Macintosh line endings are supported, including files with inconsistent line endings. The file handling routines are the most tricky parts of Notepad RE.

Regular Expression Syntax

The regular expression syntax is now selectable under the Options menu.

Matching, Including Over More Than One Line

I've aimed to provide default search functionality with the maximum amount of possibilities and the minimum amount of surprises. The basic aim is to provide functionality based on vi, but with several improvements.

  • 'Char Classes' are supported (i.e. [[:CLASS:]] syntax is allowed)
  • 'Intervals' are supported (i.e. {x,y} syntax allowed)
  • 'Back References' are supported (i.e. \1, \2 etc. are allowed)
  • 'Escape in Lists' is supported (i.e. the \ character is the escape character inside [...])
  • + is supported (of course)
  • ? is supported (of course)
  • | is supported (of course)
  • Use Perl-like variables $1, $2, $3 etc. in the Replace field to use captured text
  • .* matches characters on the current line, like vi. To continue a match to the next line, follow .* with \r\n
  • Note that characters \r and \n are treated as whitespace. For example, if you use \s+ as part of your regex, you may be surprised to find you have matched text across lines
  • $ works like it does in vi, but may also be followed by \r\n if you want to match the 'newline' character

References

  • "The C++ Programming Language Special Edition" by Bjarne Stroustrup
  • "Advanced Windows" by Jeffrey Richter, Microsoft Press
  • "The Essence of COM with ActiveX, a Programmer's Workbook" by David S. Platt, Prentice Hall
  • "Mastering Regular Expressions" by Jeffrey E. F. Friedl, O'Reilly
  • "Professional MFC with Visual C++ 6" by Mike Blaszczak, Wrox Press Inc

Future Work

  • Popup menu in Replace dialog for regex replace syntax
  • Investigate syntax highlighting
  • Use the std::tr1 interface to boost::regex
  • Use MicrosoftMS Unicode routines when loading/saving
  • HEX view

This MFC version of Notepad RE will be improved until it is as close to Windows Notepad as possible. After that, I may rewrite it as a WTL program.

History

  • 23 July, 2003
    • Original version posted
  • 2 June, 2007: Version 1.1.0.1
    • Multiple Undo/Redo added
  • 4 June, 2007: Version 1.1.0.2
    • BUG FIX: Replace with empty string works again!
    • Group characters for undo
    • Undoing all changes sets modified flag to FALSE
    • Replacing a selection now treated as an atomic undo/redo
  • 10 June, 2007: Version 1.1.0.3
    • BUG FIX: Clear Undo history when toggling word wrap
  • 12 June, 2007: Version 1.1.0.4
    • BUG FIX: Forgot to add the OnKeyUp function!
  • 14 June, 2007: Version 1.1.0.5
  • 16 June, 2007: Version 1.1.0.6
  • 17 June, 2007: Version 1.1.0.7
    • Added first cut of Find in Files
  • 21 June, 2007: Version 1.1.0.8
    • If Modified flag set before toggling word wrap -- therefore flushing the undo buffer -- don't set to false if subsequently all edits are undone!
    • Various tweaks to Find in Files
  • 27 June, 2007: Version 1.1.0.9
    • BUG FIX: A sequence of replacements is no longer treated as one big transaction by Undo
  • 3 July, 2007: Updated help file
  • 4 July, 2007: Version 1.1.1.0
    • Added popup menu to Find and Replace dialogs for regex syntax
  • 6 July, 2007: Version 1.1.1.1
    • BUG FIX: A sequence of replacements is no longer treated as one big transaction by Redo
    • Finished popup menu in Find and Replace dialogs for regex syntax
  • 10 July, 2007: Version 1.1.1.2
    • Find in Files now sends output to a dockable toolbar
    • Changed tab order in Replace dialog
    • Changed 'Number' regex to be PERL mode friendly
  • 7 August, 2007: Version 1.1.1.3
    • BUG FIX: Shift-Del only creates one Undo entry now!
  • 8 August, 2007: Version 1.1.1.4
    • BUG FIX: Ctrl-C works again...
  • 10 October, 2007: Version 1.1.1.5
    • BUG FIX: Saving with word wrap enabled no longer saves too much text
    • Find in Files now runs in the background
  • 26 June, 2008: Version 1.1.1.6
    • BUG FIX: Check for Non-Windows line endings in CNotepadreFile::CountCharsUTF8() fixed
    • Help file correction (thanks har0ld)
    • Fixes to CRegexSyntaxDlg
  • 17 March, 2009: Version 1.1.1.7
    • Selected text copied to Find and Replace dialogs
  • 23 March, 2009: Version 1.1.1.8
    • Re-enabled ".LOG" support
  • 24 March, 2009: Version 1.1.1.9
    • Find in Files now supports multi-line matching
  • 26 March, 2009: Version 1.1.2.0
    • Sped up Find in Files (make sure you use \r\n for multi-line matching)
  • 27 March, 2009: Version 1.1.2.1
    • Fixed memory leak in PeformGrep()
    • Selected text copied to Find in Files dialog
    • Find in Files now opens with CFile::modeRead | CFile::shareDenyNone
  • 27 March, 2009: Version 1.1.2.2
    • PerformGrep() wasn't counting newlines from the beginning of the file!
  • 29 March, 2009: Version 1.1.2.3
    • More improvements to Find in Files (more responsive, displays progress, etc.)
  • 7 April, 2009: Version 1.1.2.4
    • Double clicking the results from Find in Files now goes to correct line even with word wrap enabled
  • 9 April, 2009: Version 1.1.2.5
    • Ensure only one line is shown per match in the Find in Files results
    • Enable checkbox for Whole Word Only for regex mode (This is a convenience feature. All that happens is that the regex is wrapped in \b(?:)\b)
  • 16 April, 2009: Version 1.1.2.6
    • Changed regex whole word only syntax depending on regex flavour. Still not perfect as some flavours do not support this feature at all, but at least most work correctly now.
  • 8 January, 2011
    • Updated zip file
  • 19 March, 2011: Version 1.1.2.8
    • Added support for loading and saving toolbar positions
  • 21 March, 2011: Version 1.1.2.9 
    • Uses SetWindowPlacement() as it is more accurate than MoveWindow()

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Ben Hanson
Software Developer (Senior)
United Kingdom United Kingdom
Member
I started programming in 1983 using Sinclair BASIC, then moving on to Z80 machine code and assembler. In 1988 I programmed 68000 assembler on the ATARI ST and it was 1990 when I started my degree in Computing Systems where I learnt Pascal, C and C++ as well as various academic programming languages (ML, LISP etc.)
 
I have been developing commercial software for Windows using C++ for 15 years.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionHow to uninstall and restore default valuesmemberl3ny4 Jul '12 - 10:37 
Hello, I ran the notepadre.reg to register the app and regular notepad disappear.
Is not that i need notepad, but i found a beter regex application and want to remove notepadre.
 
Thank you.
AnswerRe: How to uninstall and restore default valuesmemberBen Hanson5 Jul '12 - 11:29 
Go into Explorer, right click a text file and select Open With->Choose Program... In there select the editor you want as default and tick the "Always use the selected program to open this kind of file" check box.
 
This is in Windows XP, but I think it is similar for Windows 7.
 
Regards,
 
Ben
GeneralUnexpected resultsmembermbue21 Jan '11 - 3:24 
regex   : \b.+?\b
desc      : split words and no words
text      : abc-abc-abc
expected: "abc" + "-" + "abc" + "-" + "abc"
result   : "abc" + "abc" + "abc"
 
regex   : \p{Ll}+
desc      : find all lower char literals
text      : abc-abc-abc
expected: "abc" + "abc" + "abc"
result   : Invalid character class name
 
how to avoid them?
GeneralRe: Unexpected resultsmemberBen Hanson21 Jan '11 - 5:01 
\b.+?\b matches strings starting and ending on a word boundary. That is why you are not seeing '-' matched. Maybe you wanted \b.+?\b|-.
 
- Instead of \p{LI}+ use \p{lower}+.
 
Regards,
 
Ben
GeneralRe: Unexpected resultsmembermbue21 Jan '11 - 8:03 
example: abc-abc-abc
word boundaries (\b position at |) are: "|a" "c|-" "-|a" "c|-" "-|a" "c|" or isnt?
thats why expected results not correct!
 
code: \p{lower} hello "lower" isnt a valid unicode property! "Ll" is the right one for "Lower letters".
take a look at: http://unicode.org/Public/UNIDATA/PropertyValueAliases.txt
using boost library seems to be the wrong way.
GeneralRe: Unexpected resultsmemberBen Hanson19 Mar '11 - 0:38 
http://www.boost.org/doc/libs/1_34_1/libs/regex/doc/icu_strings.html[^]
QuestionMultiple AND / OR in single search?memberMick Leong2 Jun '09 - 22:46 
I have a CRLF terminated large txt file and would like to do a search for
lines containing Joe and John OR Mary and Betty (case insensitive)
 
At the moment I do this in four passes!!!:
1. search for "Joe"
2. if found then search for "John"
3. if found then select record
4. if fail, then search for "Mary"
5. If found then search for "Betty"
6. if found then select record
 
This is very bad code! How do i code an optimized regex expression to do above?
AnswerRe: Multiple AND / OR in single search?memberBen Hanson3 Jun '09 - 7:08 
Assuming Perl compatible regex mode:
 
(Joe.*?John|John.*?Joe)|(Mary.*?Betty|Betty.*?Mary)
 
You could add \< before each name and \> after each name if you don't want to allow JoeJohn to match etc.
 
Regards,
 
Ben
GeneralRe: Multiple AND / OR in single search?memberzikl8 Jul '09 - 10:13 
Hello. I didn't want to open new thread, so I just use newest one. Just want to express my gratitude and it's shame this little gem is so hard to find. Sorry for bad english. Bye. Smile | :)
GeneralRe: Multiple AND / OR in single search?memberBen Hanson5 Aug '09 - 10:38 
Thanks!
 
Glad you found it useful.
 
Regards,
 
Ben
GeneralPCREmemberBen Hanson6 May '09 - 4:20 
So, in looking to take search and replace to the next level I have started to consider recursive regexes/full grammar specification. Boost.Xpressive supports recursive regexes but only at build time, but it turns out that PCRE supports them at runtime. Unfortunately, it appears that PCRE does not have a wchar_t interface which makes it unsuitable for Notepad RE. As source files are generally just ASCII anyway, I'm thinking I'll write a Visual Studio plugin using PCRE instead (the DEVs at work would prefer this approach in any case).
 
If this interests you, please add a comment to this message.
 
Also, if you would like to see boost support recursive regexes, drop them a line on the mailing list (I use nabble for postings if that helps - http://www.nabble.com/Boost---Dev-f14201.html).
 
Regards,
 
Ben
GeneralRe: PCREmemberprantlf10 Jan '11 - 7:28 
PCRE supports Unicode by accepting the UTF-8 encoding. You can convert your input to UTF-8 (char*) and execute the RE on the converted text. If you want to implement replacing too, it would be an editor-specific code anyway. (I was able to integrate PCRE in the UTF-8 mode to Scintilla.)
 
   --- Ferda
QuestionUnicodememberBen Hanson1 Apr '09 - 21:56 
I would like to test the Unicode support in Notepad RE, as I wasn't aware of UTF-16 when I wrote this originally! Does anyone have any good test files I could use to experiment? As well as reading trickier files correctly, I would also like to use Unicode aware searching in non-regex mode and switch to the ICU support for boost::regex.
 
Any help/pointers much appreciated.
 
Thanks,
 
Ben
QuestionWhich version of boost::regex is required?members_dimi27 Mar '09 - 8:33 
I have tried to compile using the latest boost release and I get a lot of errors.
Which version of boost should I use?
AnswerRe: Which version of boost::regex is required?memberBen Hanson29 Mar '09 - 6:23 
I tried version 1.38 and it worked fine.
 
Did you follow the build instructions from this article?
QuestionVC6 PollmemberBen Hanson21 Oct '08 - 22:04 
How many of you still use VC6? (We have switched to VS 2005 at work now and will move to VC 2008 ASAP).
 
Cheers,
 
Ben
AnswerRe: VC6 PollmemberAlexandre GRANVAUD18 Mar '09 - 0:21 
i do but will sonn switch to vc2008
QuestionHow many of you use tr1?memberBen Hanson21 Oct '08 - 22:01 
Hi Everyone,
 
How many of you use tr1? Should I switch to the tr1::regex library yet?
 
Thanks,
 
Ben
AnswerRe: How many of you use tr1?memberBen Hanson24 Mar '09 - 6:47 
I will answer my own question..!
 
Due to bugs in the VC 2008 implementation of tr1 and the fact that tr1::regex appears to be slower in some cases than the latest boost::regex, I will hold off switching for now.
GeneralHelp !!!memberSwapnil96320 Oct '08 - 23:46 
Hi
 
great artical... Smile | :)
I have downloaded Version 1.36.0.
but one problem when i am building Application
fatal error LNK1104: cannot open file "libboost_regex-vc6-mt-sgd-1_36.lib"
 
but this path "C:\boost_1_36_0\libs\regex\build\vc6" has only "libboost_regex-vc6-mt-sgd-1_35.lib"
 
plz Help..
AnswerRe: Help !!!memberBen Hanson21 Oct '08 - 21:59 
Hi there,
 
Thanks!
 
I would say just rename the .lib. If that doesn't work you could always download version 1.35 of boost, but I suspect a rename of the lib will be fine.
 
Let me know how you get on.
 
Cheers,
 
Ben
Generalthanks and maybe error in chm help filememberhar0ld2 Mar '08 - 1:00 
First off thanks a lot this program was a blessing. I searched hours for a simple windows program that would have a regExp search/replace feature and that didn't need to be installed. At first I thought Notepad++ was the answer to my prayers but it's regExp support is a joke (only quantifiers are the greedy * and +, and multi line matches are virtually impossible to achieve without dirty workarounds).
So once again THANK YOU A LOT.
 

I think I might have found a spelling-mistake in the chm help file ./Unicode_release/NotepadRE.chm in "Replace Syntax" it says:
\x{DDDD} Outputs the character whose hexadecimal code point is 0xDDDDD
I'm not sure but I think there's one D too much at the end
 
ps:thanks
GeneralRe: thanks and maybe error in chm help filememberBen Hanson10 Mar '08 - 2:12 
Thanks for your feedback and I'm glad you found it useful! I have also found that most Windows editors have cruddy regex support (even Visual Studio). I also think the .NET so-called (i.e. non standard) regular expressions are worse than useless.
 
I will update the .chm.
 
Cheers,
 
Ben
NewsThere's a new version of the RegEx Tester Tool !memberBucanerO_Slacker1 Mar '08 - 23:16 
I have released a new version of the RegEx Tester tool. You can download it free from http://www.codeproject.com/KB/string/regextester.aspx and http://sourceforge.net/projects/regextester
 
With RegEx Tester you can fully develop and test your regular expression against a target text. It's UI is designed to aid you in the RegEx developing. It uses and supports ALL of the features available in the .NET RegEx Class.
QuestionWhat the heck??? This is exactly what I need!memberDaniel Cohen Gindi15 Oct '07 - 9:43 
Hi!
 
Thanks for this! This notepad, is what I've been looking for, for about 10 years...
When I was a child I knew an old guy who was working on an advanced editor which does the same (much more than this one, but basically the same), it was for DOS, and it was like the project of his life... It was capable of finding and replacing so complex expressions that I didnt know who the heck will ever need it. But it was his personal project, and as far as I know, he never shared it, or even published as software/shareware...
 
Your Notepad RE is going to save me soooo much headache!
 
Thanks again...
Daniel
 
-----
Daniel Cohen Gindi
danielgindi (at) gmail dot com

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130516.1 | Last Updated 22 Mar 2011
Article Copyright 2003 by Ben Hanson
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid