I am Scraping HTML DOM elements using HtmlAgilityPack in ASP.NET

Question

0.00/5 (No votes)

See more:

I am Scraping HTML DOM elements using HtmlAgilityPack in ASP.NET. currently my code is loading all the href links which means that sublinks of sublinks also . But I need only the depending URL of my domain URL. I don't know how to write code for it. Can any one help me to do this?
Here is my code:
public void GetURL(string strGetURL)
{
var getHtmlSource = new HtmlWeb();
var document = new HtmlDocument();
try
{
document = getHtmlSource.Load(strGetURL);
var aTags = document.DocumentNode.SelectNodes("//a");
if (aTags != null)
{
outputurl.Text = string.Empty;
int _count = 0;
foreach (var aTag in aTags)
{
string strURLTmp;
strURLTmp = aTag.Attributes["href"].Value;
if (_count != 0)
{
if (!CheckDuplicate(strURLTmp))
{
lstResults.Add(strURLTmp);
outputurl.Text += strURLTmp + "\n";
counter++;
GetURL(strURLTmp);
}
}
_count++;
}
}
}

Posted 17-Sep-14 21:48pm

Member 11091243

Add a Solution

Comments

ZurdoDev 23-Sep-14 9:17am

I don't understand.

Member 11091243 24-Sep-14 2:08am

The issue was I'm trying to crawl the particular page depending URL alone.My code loading all the URL's.Eg: If I give http://www.asp.com,It should load the depending URL's of asp.com .If I have facebook link in that it should load faceboo.com alone but it is loading all the links inside the facebook also.

But now no problem.I resolved the Issue.

Thanks for your interest.

ZurdoDev 24-Sep-14 7:16am

Glad to hear you worked it out. Please post something as a solution so this no longer shows unanswered.

Member 11091243 26-Sep-14 2:41am

By using absoluteuri we can resolve this Issue.
Eg:
public static string GetAbsoluteURL(string strRelativeURL, string strbaseURL)
{
return new Uri(new Uri(strbaseURL), strRelativeURL).AbsoluteUri;

}

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)