65.9K
CodeProject is changing. Read more.
Home

HTML Parser

starIcon
emptyStarIcon
starIcon
emptyStarIconemptyStarIconemptyStarIcon

1.42/5 (12 votes)

Apr 12, 2006

CPOL

1 min read

viewsIcon

57339

downloadIcon

844

C# DLL for use it in .Net Applications, you can convert it easy to any code

Introduction

This is HTML parser for getting Titles, Texts and Links from the page, it is a dll file using C# but you can transform it in an easy way to any programming language when you know, how to get the HTML code from the page

Basic Idea

The idea behind this code is,you parse through the HTML code character by character then if you get the title tag represent the text after it to the title string, if you go to body tag then accept all text which not language script or CSS, and the same for the links

Brief Code Description

i make lookup table for some special characters like when you read in the HTML code the characters &lt; this represent the < character

		public string GetTitle(string Source)
		{
		        int len=Source.Length;
			string title="      ";  
			char c;
			for(int i=0;i<len;i++)
			{
				c=Convert.ToChar(Source.Substring(i,1));
				title=title.Remove(0,1);
				title+=c;
				if(title.ToLower()=="<title")
				{
					while(c!='>')
					{
						i++;
						c=Convert.ToChar(Source.Substring(i,1));
					}
					title="";
					i++;
					c=Convert.ToChar(Source.Substring(i,1));
					while(c!='<')
					{
						title+=c;
						i++;
						c=Convert.ToChar(Source.Substring(i,1));
					}
					break;
				}
			}
			return title.Trim();
		}

The other codes for getting text and links in the file attached

Usage

in using this code you add the library to your project then call the instance of this class like Parser.Parse inst=new Parser.Parser() and use the inst for getting the functions inst.GetTitle(page)to represent the title

inst.GetText(page)to represent the text

inst.MakeLinks(page)to represent the Links

then after you make link you will get it in pLabel and pLink which represent the Link and the label you which appear it in the page

Resources

C# DLL in .Net 2005

Contact me

 

if there is a problem please contact me at ahmed_a_e2006@yahoo.com