Click here to Skip to main content
13,003,589 members (59,851 online)
Rate this:
Please Sign up or sign in to vote.
See more:
I have download html source via webclient.downloadstring(url). and I saved then in .txt file. Now I used to analysis those codes on line at a time using loops. But this time the html are bit messy(Irregular newlines, spaces). I mean when I view the source in chrome of that site, chrome formats it for me, and it's easy. But in the .txt file it's not formatted, so I'm having hard time to analyse it.

Like first I read the line then split them, then look for things. But now I can't track things as I can't guess what is in the line, as the lines are irregular.

Any ideas?

I'm going to re-question.

The thing is I'm willing to extract information like image links, image category, subcategory from a wallpaper website. I need to look for links within specific tags with specific classes in the html code. I've using string match algorithm. But is there a way to crawl from tag to tag, child tags, parent tags? Like using DOM in javascript?
Posted 30-Nov-12 1:45am
Updated 30-Nov-12 2:21am
ryanb31 30-Nov-12 8:14am
What specifically are you looking for? There may be a better way to do it.
thursunamy 30-Nov-12 8:24am

Look at HTMLAgilityPack.

Ravi Bhavnani 30-Nov-12 13:09pm
HTMLAgilityPack is very cool. Also see my StringParser article - I wrote this class to do exactly what you want to do and have used in with much success in a couple of widley used products.

Ravi Bhavnani 30-Nov-12 13:10pm
You might also find this article helpful:

Sergey Alexandrovich Kryukov 30-Nov-12 15:18pm
What is "source in chrome"? :-) Do you have a problem analyzing HTML, or only text.
Rate this: bad
Please Sign up or sign in to vote.

Solution 1

Here is where the trouble lies: you formulated the problem with the difficulty of analyzing of text files as a general problem, without concerns of any detail of the file contents. But this general problem cannot have general solution, simply because the notion of "text file" is not anything certain. They can be, well… anything. After all, HTML and XML files are text files, too, but you seemingly don't have problems with them.

Rate this: bad
Please Sign up or sign in to vote.

Solution 2

better you read the page using ajax with jquery and then track what you want, i never tried but hope it will work faster and better.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy | Mobile
Web02 | 2.8.170626.1 | Last Updated 30 Nov 2012
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100