Click here to Skip to main content
14,935,710 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
how to retrieve a complete web site using java in that i can use the web site in offline mode but without using any tool. Only using java program.
Posted

1 solution

You need to develop the techniques of Web scraping: http://en.wikipedia.org/wiki/Web_scraping[^].

You can use the class HttpURLConnection:
http://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html[^].

There is a Google option, the class com.google.appengine.api.urlfetch.HTTPRequest:
https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/urlfetch/HTTPRequest[^].

After you get the content, you will most likely need to collect some or all links on a page and use them to do further scraping. You would need to utilize some appropriate HTML parser. You can find one by yourself.

Just a couple of options you can consider:
http://htmlcleaner.sourceforge.net/[^],
http://htmlparser.sourceforge.net/[^].

—SA
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900