You search for the string
href="
This will not work for Google.
You try to find links like this:
<a href="http://www.gmail.com">Gmail</a>
A link like that, you'll find.
But Google has also a few links like this:
<a href=http://www.gmail.com>Gmail
Search for the string
href=
and change your
crawlURL
method into this:
public void crawlURL(string URL, string depth)
{
if (!checkPageHasBeenCrawled(URL))
{
PageContent = getURLContent(URL);
MatchCollection matches = Regex.Matches(PageContent, "href=", RegexOptions.IgnoreCase);
int count = matches.Count;
}
}
[EDIT]
But why do you search an attribute?
If you search for the
a
-tag, then you'll find the links also.
So, change the method into this:
public void crawlURL(string URL, string depth)
{
if (!checkPageHasBeenCrawled(URL))
{
PageContent = getURLContent(URL);
MatchCollection matches = Regex.Matches(PageContent, "<a", RegexOptions.IgnoreCase);
int count = matches.Count;
}
}
Hope this helps.