Click here to Skip to main content
Licence 
First Posted 12 Apr 2006
Views 33,086
Bookmarked 26 times

HTML Parser

By | 2 Aug 2006 | Article
C# DLL for use it in .Net Applications, you can convert it easy to any code

Introduction

This is HTML parser for getting Titles, Texts and Links from the page, it is a DLL code using C# but you can transform it in easy way to any programming language when you know, how to get the HTML code from the page

Basic Idea

The idea behind this code is,you parse through the HTML code character by character then if you get the title tag represent the text after it to the title string, if you go to body tag then accept all text which not language script or CSS, and the same for the links

Brief Code Description

i make lookup table for some special characters like when you read in the HTML code the characters &lt; this represent the < character

		public string GetTitle(string Source)
		{
		        int len=Source.Length;
			string title="      ";  
			char c;
			for(int i=0;i<len;i++)
			{
				c=Convert.ToChar(Source.Substring(i,1));
				title=title.Remove(0,1);
				title+=c;
				if(title.ToLower()=="<title")
				{
					while(c!='>')
					{
						i++;
						c=Convert.ToChar(Source.Substring(i,1));
					}
					title="";
					i++;
					c=Convert.ToChar(Source.Substring(i,1));
					while(c!='<')
					{
						title+=c;
						i++;
						c=Convert.ToChar(Source.Substring(i,1));
					}
					break;
				}
			}
			return title.Trim();
		}

The other codes for getting text and links in the file attached

Usage

in using this code you add the library to your project then call the instance of this class like Parser.Parse inst=new Parser.Parser() and use the inst for getting the functions inst.GetTitle(page)to represent the title

inst.GetText(page)to represent the text

inst.MakeLinks(page)to represent the Links

then after you make link you will get it in pLabel and pLink which represent the Link and the label you which appear it in the page

Resources

C# DLL in .Net 2005

Contact me

if there is a problem please contact me at ahmed_a_e2006@yahoo.com

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Ahmed Ali El-Sayed

Web Developer

Egypt Egypt

Member

Birth Date: 1/1/1985
ISFP Company(Integrated Solutions For Ports)
Computer Science and Automatic Control Department
Faculty of Engineering - Alexandria University
Graduation Year: 2006
Phone: (+2) 0183262859

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralMy vote of 4 Pinmemberfatma_mansour1922:11 15 Oct '11  
GeneralI need an English Parser (ASP.net) Pinmembersoha_mssh_10120:59 10 Aug '08  
GeneralRe: I need an English Parser (ASP.net) PinmemberAhmed Ali El-Sayed9:39 14 Aug '08  
QuestionI need example Pinmembermatracasoft18:53 2 Jul '07  
AnswerRe: I need example PinmemberAhmed Ali El-Sayed6:52 3 Jul '07  
GeneralRe: I need example Pinmembermatracasoft11:18 5 Jul '07  
Generaluse of html in C# PinmemberZafar I Khan22:47 16 May '07  
GeneralRe: use of html in C# PinmemberAhmed Ali El-Sayed9:32 17 May '07  
GeneralAlso see... PinmemberRavi Bhavnani2:22 13 Apr '06  
GeneralRegEx Pinmemberchadp14:10 12 Apr '06  
GeneralGood RegEx Tutorial PinmemberThe_Mega_ZZTer16:11 12 Apr '06  
GeneralRe: RegEx PinmemberAhmed Ali Elsayed1:13 13 Apr '06  
JokeRe: RegEx PinmemberAbdallah M. Abdelsalam2:31 13 Apr '06  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web02 | 2.5.120517.1 | Last Updated 3 Aug 2006
Article Copyright 2006 by Ahmed Ali El-Sayed
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid