Click here to Skip to main content
15,892,059 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
how to retrieve a complete web site using java in that i can use the web site in offline mode but without using any tool. Only using java program.
Posted

1 solution

You need to develop the techniques of Web scraping: http://en.wikipedia.org/wiki/Web_scraping[^].

You can use the class HttpURLConnection:
http://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html[^].

There is a Google option, the class com.google.appengine.api.urlfetch.HTTPRequest:
https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/urlfetch/HTTPRequest[^].

After you get the content, you will most likely need to collect some or all links on a page and use them to do further scraping. You would need to utilize some appropriate HTML parser. You can find one by yourself.

Just a couple of options you can consider:
http://htmlcleaner.sourceforge.net/[^],
http://htmlparser.sourceforge.net/[^].

—SA
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900