Click here to Skip to main content
Click here to Skip to main content

PERL: Removing Text Within Parentheses

By , 18 Aug 2010
 


I recently wanted to get a list of major landmarks, but the text list had the name of the landmark, followed by the location of the landmark in parentheses: e.g. Eiffel Tower (Paris, France). I just wanted the names of the landmarks without the text in the parentheses, so had to figure out the command to remove all the parenthesized text.

Perl was the winner as tool of choice. There was a small trick to doing this seemingly trivial task, so document it we shall.

Original: Eiffel Tower (Paris, France)
Desired: Eiffel Tower

There are two ways to do it based on what you want:

perl -p -e 's#\(.*\)##g' textfile

You may have seen 's/oldtext/newtext/g' as the syntax before and are wondering why I am using hash marks (or pound signs) instead. You don't have to use the forward slash, it is just the common way, but if you want to use the forward slash in the search text without having to escape, using hash marks is the way. It can also be used to make it easier to read. Now, onto the command--the \( obviously says look for a left parenthesis, then there is the critical .* which says find any number of any characters. Finally, we close it off with a right parenthesis. This will find anything encapsulated by two parentheses.

perl -p -e 's#\([^)]*\)##g' textfile

This solution will also do the same thing based on our Original text string, but it is slightly different. The [^)] is telling "any character that is not a right parenthesis." The carat (^) is negating everything in the brackets. This is useful if you are making an exclusion set. You can place several characters, [^$)?], and it will look for any character except a $, ), or ?.

Since the two commands work the same for the given example, let's show how the commands will vary in different situations:

If textfile contains:

  1. Paris (France,) Hilton (Hotel)
  2. Paris (France (Hilton) Hotel)

Using:

perl -p -e 's#\(.*\)##g' textfile

The results would be:

  1. Paris
  2. Paris

Note the danger here is that, even though in line 1 Hilton is not in parentheses, it gets removed because there is an ending right parenthesis at the end of the line. This may not be the expected/intended operation.

Next, using:

perl -p -e 's#\([^)]*\)##g' textfile

The results would be:

  1. Paris Hilton
  2. Paris Hotel)

The operation for line 1 may have been what we were expecting, but line 2 doesn't look good. The moral here is to understand what you are trying to do and choose the correct command to do the appropriate operation.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Chief Endian
Chief Technology Officer Chiefs And Endians
United States United States
Member
Come visit us at http://www.chiefsandendians.com

A compilation of varied technical gems learned over many years of experience in the industry.

Hint for the confused: Endian is a Computer Science term--the title is a play on words, not a misspelling.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralCPAN is your friend.memberNicky___23 Aug '10 - 11:47 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130513.1 | Last Updated 18 Aug 2010
Article Copyright 2010 by Chief Endian
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid