Click here to Skip to main content
15,885,985 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am Scraping HTML DOM elements using HtmlAgilityPack in ASP.NET. currently my code is loading all the href links which means that sublinks of sublinks also . But I need only the depending URL of my domain URL. I don't know how to write code for it. Can any one help me to do this?
Here is my code:
public void GetURL(string strGetURL)
{
var getHtmlSource = new HtmlWeb();
var document = new HtmlDocument();
try
{
document = getHtmlSource.Load(strGetURL);
var aTags = document.DocumentNode.SelectNodes("//a");
if (aTags != null)
{
outputurl.Text = string.Empty;
int _count = 0;
foreach (var aTag in aTags)
{
string strURLTmp;
strURLTmp = aTag.Attributes["href"].Value;
if (_count != 0)
{
if (!CheckDuplicate(strURLTmp))
{
lstResults.Add(strURLTmp);
outputurl.Text += strURLTmp + "\n";
counter++;
GetURL(strURLTmp);
}
}
_count++;
}
}
}
Posted
Comments
ZurdoDev 23-Sep-14 9:17am    
I don't understand.
Member 11091243 24-Sep-14 2:08am    
The issue was I'm trying to crawl the particular page depending URL alone.My code loading all the URL's.Eg: If I give http://www.asp.com,It should load the depending URL's of asp.com .If I have facebook link in that it should load faceboo.com alone but it is loading all the links inside the facebook also.

But now no problem.I resolved the Issue.

Thanks for your interest.
ZurdoDev 24-Sep-14 7:16am    
Glad to hear you worked it out. Please post something as a solution so this no longer shows unanswered.
Member 11091243 26-Sep-14 2:41am    
By using absoluteuri we can resolve this Issue.
Eg:
public static string GetAbsoluteURL(string strRelativeURL, string strbaseURL)
{
return new Uri(new Uri(strbaseURL), strRelativeURL).AbsoluteUri;

}

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900