Click here to Skip to main content
15,867,756 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have download html source via webclient.downloadstring(url). and I saved then in .txt file. Now I used to analysis those codes on line at a time using loops. But this time the html are bit messy(Irregular newlines, spaces). I mean when I view the source in chrome of that site, chrome formats it for me, and it's easy. But in the .txt file it's not formatted, so I'm having hard time to analyse it.

Like first I read the line then split them, then look for things. But now I can't track things as I can't guess what is in the line, as the lines are irregular.

Any ideas?
Thanks.



I'm going to re-question.


The thing is I'm willing to extract information like image links, image category, subcategory from a wallpaper website. I need to look for links within specific tags with specific classes in the html code. I've using string match algorithm. But is there a way to crawl from tag to tag, child tags, parent tags? Like using DOM in javascript?
Posted
Updated 30-Nov-12 2:21am
v2
Comments
ZurdoDev 30-Nov-12 8:14am    
What specifically are you looking for? There may be a better way to do it.
thursunamy 30-Nov-12 8:24am    
Hi,

Look at HTMLAgilityPack.

Regards
Ravi Bhavnani 30-Nov-12 13:09pm    
HTMLAgilityPack is very cool. Also see my StringParser article - I wrote this class to do exactly what you want to do and have used in with much success in a couple of widley used products. http://www.codeproject.com/Articles/12708/StringParser

/ravi
Ravi Bhavnani 30-Nov-12 13:10pm    
You might also find this article helpful:

http://www.codeproject.com/Articles/12709/WebResourceProvider-goes-NET

/ravi
Sergey Alexandrovich Kryukov 30-Nov-12 15:18pm    
What is "source in chrome"? :-) Do you have a problem analyzing HTML, or only text.
--SA

Here is where the trouble lies: you formulated the problem with the difficulty of analyzing of text files as a general problem, without concerns of any detail of the file contents. But this general problem cannot have general solution, simply because the notion of "text file" is not anything certain. They can be, well… anything. After all, HTML and XML files are text files, too, but you seemingly don't have problems with them.

—SA
 
Share this answer
 
better you read the page using ajax with jquery and then track what you want, i never tried but hope it will work faster and better.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900