Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# HTML
I have download html source via webclient.downloadstring(url). and I saved then in .txt file. Now I used to analysis those codes on line at a time using loops. But this time the html are bit messy(Irregular newlines, spaces). I mean when I view the source in chrome of that site, chrome formats it for me, and it's easy. But in the .txt file it's not formatted, so I'm having hard time to analyse it.
 
Like first I read the line then split them, then look for things. But now I can't track things as I can't guess what is in the line, as the lines are irregular.
 
Any ideas?
Thanks.
 

 
I'm going to re-question.
 

The thing is I'm willing to extract information like image links, image category, subcategory from a wallpaper website. I need to look for links within specific tags with specific classes in the html code. I've using string match algorithm. But is there a way to crawl from tag to tag, child tags, parent tags? Like using DOM in javascript?
Posted 30-Nov-12 1:45am
Edited 30-Nov-12 2:21am
v2
Comments
ryanb31 at 30-Nov-12 8:14am
   
What specifically are you looking for? There may be a better way to do it.
thursunamy at 30-Nov-12 8:24am
   
Hi,
 
Look at HTMLAgilityPack.
 
Regards
Ravi Bhavnani at 30-Nov-12 13:09pm
   
HTMLAgilityPack is very cool. Also see my StringParser article - I wrote this class to do exactly what you want to do and have used in with much success in a couple of widley used products. http://www.codeproject.com/Articles/12708/StringParser
 
/ravi
Ravi Bhavnani at 30-Nov-12 13:10pm
   
You might also find this article helpful:
 
http://www.codeproject.com/Articles/12709/WebResourceProvider-goes-NET
 
/ravi
Sergey Alexandrovich Kryukov at 30-Nov-12 15:18pm
   
What is "source in chrome"? :-) Do you have a problem analyzing HTML, or only text.
--SA
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Here is where the trouble lies: you formulated the problem with the difficulty of analyzing of text files as a general problem, without concerns of any detail of the file contents. But this general problem cannot have general solution, simply because the notion of "text file" is not anything certain. They can be, well… anything. After all, HTML and XML files are text files, too, but you seemingly don't have problems with them.
 
—SA
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

better you read the page using ajax with jquery and then track what you want, i never tried but hope it will work faster and better.
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 Sergey Alexandrovich Kryukov 498
1 OriginalGriff 439
2 ChintanShukla 305
3 Richard Deeming 250
4 RyanDev 210
0 Sergey Alexandrovich Kryukov 8,901
1 OriginalGriff 7,571
2 CPallini 2,603
3 Richard MacCutchan 2,095
4 Abhinav S 1,893


Advertise | Privacy | Mobile
Web02 | 2.8.140827.1 | Last Updated 30 Nov 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100