Click here to Skip to main content
11,479,096 members (62,702 online)
Rate this: bad
good
Please Sign up or sign in to vote.
Hi guys ,I am parsing an html file in C# and extracting the text from html.My html file has a lot of tags in it.Html file has select tag and option tag. I need a regex for removing the select tag and option tag from html file.I don't want this information.So I want to delete it using any regex.Please help me. Any help would be appreciated.Below is the html that I want to remove from my html file:
<select name="state"  önchange="setCities();" id="state">>
                <option value="CA" selected="selected">CA</option>
<option value="WA">WA</option>
<option value="OR">OR</option>
<option value="AZ">AZ</option>
<option value="UT">UT</option>
<option value="IA">IA</option>
<option value="MD">MD</option>
 
<option value="TX">TX</option>
<option value="NV">NV</option>
<option value="CO">CO</option>
<option value="MI">MI</option>
<option value="SC">SC</option>
<option value="AL">AL</option>
<option value="OH">OH</option>
<option value="KY">KY</option>
<option value="FL">FL</option>
 
<option value="MT">MT</option>
<option value="WI">WI</option>
<option value="GA">GA</option>
<option value="NY">NY</option>
<option value="KS">KS</option>
<option value="MA">MA</option>
<option value="LA">LA</option>
<option value="VA">VA</option>
<option value=""></option>
 
<option value="IL">IL</option>
<option value="NM">NM</option>
<option value="IN">IN</option>
<option value="NC">NC</option>
<option value="ID">ID</option>
<option value="NJ">NJ</option>
<option value="DC">DC</option></select>
        
        
 
            <select name="city" id="city" style="width:150px;"><option value="Anaheim" selected="selected">Anaheim</option>
<option value="Azusa">Azusa</option>
<option value="Baldwin Park">Baldwin Park</option>
<option value="Bellflower">Bellflower</option>
<option value="Brea">Brea</option>
<option value="Buena Park">Buena Park</option>
<option value="Burbank">Burbank</option>
<option value="Canoga Park">Canoga Park</option>
 
<option value="Cerritos">Cerritos</option>
<option value="Chino">Chino</option>
<option value="Chino Hills">Chino Hills</option>
<option value="Chula Vista">Chula Vista</option>
<option value="Compton">Compton</option>
<option value="Corona">Corona</option>
<option value="Corona Del Mar">Corona Del Mar</option>
<option value="Costa Mesa">Costa Mesa</option>
<option value="Cudahy">Cudahy</option>
 
<option value="Cypress">Cypress</option>
<option value="Davis">Davis</option>
<option value="E. Los Angeles">E. Los Angeles</option>
<option value="El Monte">El Monte</option>
<option value="El Segundo">El Segundo</option>
<option value="Elk Grove">Elk Grove</option>
<option value="Encinitas">Encinitas</option>
<option value="Fontana">Fontana</option>
<option value="Fountain Valley">Fountain Valley</option>
 
<option value="Fullerton">Fullerton</option>
<option value="Garden Grove">Garden Grove</option>
<option value="Glendale">Glendale</option>
<option value="Granada Hills">Granada Hills</option>
<option value="Hesperia ">Hesperia </option>
<option value="Hollywood">Hollywood</option>
<option value="Huntington Beach">Huntington Beach</option>
<option value="Huntington Park">Huntington Park</option>
<option value="Inglewood">Inglewood</option>
 
<option value="Irvine">Irvine</option>
<option value="La Habra">La Habra</option>
<option value="La Palma">La Palma</option>
<option value="La Quinta">La Quinta</option>
<option value="Ladera Ranch">Ladera Ranch</option>
<option value="Laguna Beach">Laguna Beach</option>
<option value="Laguna Hills">Laguna Hills</option>
<option value="Laguna Niguel">Laguna Niguel</option>
<option value="Lake Forest">Lake Forest</option>
 
<option value="Lakewood">Lakewood</option>
<option value="Lennox">Lennox</option>
<option value="Long Beach">Long Beach</option>
<option value="Los Angeles">Los Angeles</option>
<option value="Lynwood">Lynwood</option>
<option value="Manhattan Beach">Manhattan Beach</option>
<option value="Mission Viejo">Mission Viejo</option>
<option value="Modesto">Modesto</option>
<option value="Montrose">Montrose</option>
 
<option value="Napa">Napa</option>
<option value="Newport Beach">Newport Beach</option>
<option value="Northridge">Northridge</option>
<option value="Norwalk">Norwalk</option>
<option value="Oceanside">Oceanside</option>
<option value="Ontario">Ontario</option>
<option value="Orange">Orange</option>
<option value="Pacoima">Pacoima</option>
<option value="Palmdale">Palmdale</option>
 
<option value="Paramount">Paramount</option>
<option value="Pasadena">Pasadena</option>
<option value="Petaluma">Petaluma</option>
<option value="Pomona">Pomona</option>
<option value="Redondo Beach">Redondo Beach</option>
<option value="Rialto">Rialto</option>
<option value="Riverside">Riverside</option>
<option value="Sacramento">Sacramento</option>
<option value="San Bernardino">San Bernardino</option>
 
<option value="San Carlos">San Carlos</option>
<option value="San Diego">San Diego</option>
<option value="San Fernando Valley">San Fernando Valley</option>
<option value="San Francisco">San Francisco</option>
<option value="San Pedro">San Pedro</option>
<option value="San Ramon">San Ramon</option>
<option value="Santa Ana">Santa Ana</option>
<option value="Santa Barbara">Santa Barbara</option>
<option value="Santa Clarita">Santa Clarita</option>
 
<option value="Santa Maria">Santa Maria</option>
<option value="Santa Monica">Santa Monica</option>
<option value="Seal Beach">Seal Beach</option>
<option value="Signal Hill">Signal Hill</option>
<option value="Somewhere">Somewhere</option>
<option value="South Gate">South Gate</option>
<option value="Stanton">Stanton</option>
<option value="Studio City">Studio City</option>
<option value="Sun Valley">Sun Valley</option>
 
<option value="Sunland">Sunland</option>
<option value="Temecula">Temecula</option>
<option value="Thousand Oaks">Thousand Oaks</option>
<option value="Torrance">Torrance</option>
<option value="Tustin">Tustin</option>
<option value="Union City">Union City</option>
<option value="Valencia">Valencia</option>
<option value="Van Nuys">Van Nuys</option>
<option value="Ventura">Ventura</option>
 
<option value="Vista">Vista</option>
<option value="W. Covina">W. Covina</option>
<option value="West Hollywood">West Hollywood</option>
<option value="Westminster">Westminster</option>
<option value="Whittier">Whittier</option>
<option value="Woodland Hills">Woodland Hills</option>
<option value="Yorba Linda">Yorba Linda</option></select>
        
        
 
            <input type="submit" value="Go">
Posted 5-Feb-12 23:57pm
Edited 6-Feb-12 0:12am
v5
Comments
SAKryukov at 6-Feb-12 5:00am
   
Such a useless code dump! The problem is pretty simple, but you did not ask any question. What's the problem? Just do it.
--SA
Waseem Fastian at 6-Feb-12 5:11am
   
@SAKryukov, I want to remove this html using any regex.My html file has lot of tags in it.This html that I have posted is the part of the html file.I do not need this html information.So I want to remove it from html file.I need a regex for this.Thanks
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

If you want to remove the whole select element with it's content, then you can do it with regex only if some constraints are met: no nested element of the same name (select in this case).

Try this:

string pattern = @"<select\b[\s\S]*?</select>";

Cheers

Andi
  Permalink  
Comments
1castle1 at 22-Oct-12 5:23am
   
Thanks.... string pattern = @"<select\b[\s\S]*?</select>"; worked for me :) now just need to get the value and the text out the middle
Andreas Gieriet at 24-Oct-12 18:27pm
   
Hello 1castle1,
Out of the middle of what?
Example: input = ..., expected output = ...?
Cheers
Andi
1castle1 at 20-Nov-12 9:33am
   
I was trying to make a HTML scraper out of Regex to build some tables with test data in.. but then I found a project called html agility and used that instead
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

I have done it myself, this is the regex that I have used

@"<select(\s+[^>]*)?>(.|\n)*?< /select(\s+[^>]*)?>"

Thanks to my mighty Allah.Also thanks to who give me feedback.
  Permalink  
v2
Comments
Prerak Patel at 6-Feb-12 11:51am
   
and it doesn't remove the option tags as it was asked for in question. :doh:
Waseem Fastian at 6-Feb-12 12:23pm
   
this regex will remove the select tag in the html and option tag is inside the select tag.So option tag will be automatically deleted.
Andreas Gieriet at 6-Feb-12 13:57pm
   
This regex will not remove the whole select element with it's content. Did something went lost while pasting it into the solution?

Cheers

Andi
Waseem Fastian at 7-Feb-12 0:25am
   
This regex has removed all the select content.I have used this in my project and it did it. Cheers
Andreas Gieriet at 7-Feb-12 15:03pm
   
Yeah. Now it's better. Before the whole second select part went lost up to the ].
Now, I agree. This would work fine, though, it's a bit an overkill. You use (...) which wil store the matched string. Use (?:...) instead. And if you want to make sure that a word is not part of a larger word, you may use the word-boundary anchor \b. Combining all that results in my solution #3.

But as you said: your's work as well.

Cheers

Andi

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 Sergey Alexandrovich Kryukov 210
1 F-ES Sitecore 195
2 OriginalGriff 130
3 Frankie-C 120
4 DamithSL 95
0 Sergey Alexandrovich Kryukov 7,890
1 OriginalGriff 7,366
2 Sascha Lefèvre 3,064
3 Maciej Los 2,491
4 Richard Deeming 2,335


Advertise | Privacy | Mobile
Web01 | 2.8.150520.1 | Last Updated 7 Feb 2012
Copyright © CodeProject, 1999-2015
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100