Click here to Skip to main content
13,007,704 members (86,323 online)
Click here to Skip to main content
Add your own
alternative version


25 bookmarked
Posted 2 Jun 2005

HTML Tag Extractor

, 2 Jun 2005
Rate this:
Please Sign up or sign in to vote.
This article provides a solution to prevent HTML or JavaScript injections into your fields.

Sample Image


Before I start the code explanations, I want to ask a question. What will you do if someone entered HTML tags or JavaScript into a textbox you have in a web form?

OK, I wrote this article and attached the code I use to validate or, to be more accurate, extract the tags entered in my textboxes. Although ASP.NET 1.1 contains in itself a detector that will detect tags entered in input fields, it would be better for you to extract these tags yourself if you don't need them.

Injections of unwanted tags or scripts may make your results or your output data unpredictable. For example, if you have a textbox that will save a username in a database and the user entered is <b>HisName</b>, and another page displays all the users in a table, then the username with the <b></b> tags will be shown in bold.

For example:

User name

The code attached contains two parts, one for ASP.NET and the other for VB.NET. I'll explain the class which is the same for both.

Using the code

The class Extractor contains a public function Extract that returns a string type, and two private functions FoundOpener, CalculateLength.

Extract function will search though the entered text and will search for any "<" character. If found, call the FoundOpener function which takes two parameters, the text that is under validation and the position of "<" respectively.

FoundOpener will search for the character ">" which is the closer for the tag and will return its position. If not found that means this tag is not closed, then the position will be the length of the text entered and all of the text after the opening will be removed.

After the position of the closer character is determined, another function which is called CalculateLength will be executed to calculate the length of the text between the <>. For example, the length of <center> is 8. This function takes the start and end positions as parameters. Start is the position of "<" and end is the position of ">". The length is calculated by subtracting the start from the end.

Extract function:

Remove is a built-in function for use in string variables to remove pieces of characters:

Public Function Extract(ByVal srctext As String, _
                ByVal sender As frmTagExtractor) As String
 Dim TotalChars As Long
 Dim Counter As Long
 Dim CloserPosition As Long
 Dim length As Long
 Dim Extracts As String
 Dim srcLength As Long = Len(srctext) - 1

 Do While Counter <= srcLength
    If srctext.Chars(Counter) = "<" Then
        CloserPosition = FoundOpener(srctext, Counter)
        length = CalculateLength(Counter, CloserPosition)
        srctext = srctext.Remove(Counter, length)

        srcLength = Len(srctext) - 1
        Counter -= 1
    End If
    Counter += 1

 Return srctext
End Function

FoundOpener function:

InStr built-in function in VB.NET will search something in a string:

Public Class Extractor
  Private Function FoundOpener(ByVal text As String, _
                   ByVal Position As Long) As Long
    Dim CloserPosition As Long
    CloserPosition = InStr(Position + 1, text, ">", CompareMethod.Binary)
    If CloserPosition = 0 Then
      CloserPosition = Len(text)
    End If
  Return CloserPosition
 End Function

CalculateLength function:

Private Function CalculateLength(ByVal start As Long, _
                 ByVal final As Long) As Long
  Return Math.Abs(final - start)
End Function


Please tell me if you have any suggestions concerning this technique or if you have another way to handle such a case.

Best regards.


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

Web Developer
Syrian Arab Republic Syrian Arab Republic
A Pharmacist Smile | :)

You may also be interested in...

Comments and Discussions

QuestionUsing regex, will it remove attributes too Pin
Member 342150827-May-10 0:36
memberMember 342150827-May-10 0:36 
Generalthis is good Pin
edwardwu18-Jan-06 4:15
memberedwardwu18-Jan-06 4:15 
GeneralRe: this is good Pin
smiling4ever18-Jan-06 4:18
membersmiling4ever18-Jan-06 4:18 
GeneralI think Regex is a better choice! Pin
dathq3-Jun-05 16:54
memberdathq3-Jun-05 16:54 
GeneralRe: I think Regex is a better choice! Pin
Thomas Lykke Petersen5-Jun-05 21:54
memberThomas Lykke Petersen5-Jun-05 21:54 
GeneralRe: I think Regex is a better choice! Pin
Member 265064127-Jun-11 20:56
memberMember 265064127-Jun-11 20:56 
GeneralNew Title Pin
eggie53-Jun-05 4:10
membereggie53-Jun-05 4:10 
QuestionWhy not use Regular Expresion? Pin
JJF0072-Jun-05 23:47
memberJJF0072-Jun-05 23:47 
AnswerRe: Why not use Regular Expresion? Pin
smiling4ever3-Jun-05 10:18
membersmiling4ever3-Jun-05 10:18 
GeneralRe: Why not use Regular Expresion? Pin
JJF0074-Jun-05 5:01
memberJJF0074-Jun-05 5:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.170628.1 | Last Updated 3 Jun 2005
Article Copyright 2005 by smiling4ever
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid