I'm looking at doing some VERY simple webpage parsing. My plan is to turn either the source or the output of the webpage into a text file, and then parse that. However, I do not know how to either get a pages source or output into a text file.
Any hints or better (as in simplier) methods would be well appreciated.
If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.
i'm not sure to understand...
htm, html, dhtml files and so on are pure text file !
for example, you can do this simple following thing : save this page (this one or another is you prefer) as an html file...
then, browse you hard disk toward the recently saved file. right click on the file and open it with Notepad... what do you see ? binary ? no of course.
you can submit your parser an htm file directly.
If you really need to have a txt file, you can simply change the extension (*.htm -> *.txt) or add the txt extension to the file name (*.htm -> *.htm.txt). whatever you want...
1) Open web page in Internet Explorer
2) From the menu select File/Save As...
3) In the "Save Web Page" dialog set "Save as type" to "TextFile (*.txt)
4) Give the file a name and a location and click the "Save" button
"No matter where you go, there your are." - Buckaroo Banzai