Click here to Skip to main content
15,896,606 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
How do I get a list of all webpages on a website in vb.net? I have tried lots of things - they return one error. I discarded all of them which I now regret.

Please help.

P.S. I do not want to make an xml sitemap.
Posted
Comments
ZurdoDev 19-Mar-15 8:04am    
What for?

You'd have to write a program like the spider search bots that search engines use. Not easy to do.
iProgramIt 19-Mar-15 8:07am    
I know. I just want to do it for a school project.
ZurdoDev 19-Mar-15 8:11am    
That's quite ambitious. You have to first connect to the website using no page, for example: codeproject.com and then parse all the html you get back. You have to find every link in that html and determine if it is a link within that site and if so you then have to write code to go to that url and do the same thing recursively until you've found every link on the site.

That still doesn't guarantee you found every page, there is no way of doing that.
iProgramIt 19-Mar-15 8:19am    
Hmm, yes. That would also grab most of my computer's memory. Could I do a sitemap and remove all of the xml aspects?
ZurdoDev 19-Mar-15 8:23am    
A sitemap is for when you have the website, when you already know all of the pages.

A spider app is when you are trying to get a list of all pages from a site you don't have the code for.

I'm still not sure what you are wanting.

1 solution

It depends on the site...
There is a sitemap protocol[^], that is supported by the site it enables you to 'crawl' the site and get information of all the pages...
In any case writing such a crawler is not that simple, so you better search for some ready-made solution (there are free and open source solutions too, so do not worry about paying for :-))
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900