It depends on the site...
There is a
sitemap protocol[
^], that is supported by the site it enables you to 'crawl' the site and get information of all the pages...
In any case writing such a crawler is not that simple, so you better search for some ready-made solution (there are free and open source solutions too, so do not worry about paying for :-))