Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# HTML LINQ
Hi people,
 

I'm making a provider of news and I need to get a HTML code from the website, save it and find text by a LINQ expression.
I hope some of you can help me with this hard task.
 
i'm using this code to find the webpage source:
 
public static String code(string Url)
    {
        
            HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(Url);
            myRequest.Method = "GET";
            WebResponse myResponse = myRequest.GetResponse();
            StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
            string result = sr.ReadToEnd();
            sr.Close();
            myResponse.Close();
            
            return result;
     }
 
 

now I want to find text in a div of the webpage source.
but i don't know how to do it
Posted 1-Jun-11 0:49am
Jpmcm13379
Edited 1-Jun-11 4:41am
v2
Comments
digital man at 1-Jun-11 6:04am
   
What have you already tried? What research have you carried out on your own? No one will provide a complete solution: we will answer specific questions about a specific problem having seen that you have already tried, hard, to find the answer for yourself.
ali yeganeh at 19-Jan-13 8:51am
   
streamreader.readtoend is so slow
a browser load whole website with images and .. is faster
ali yeganeh at 19-Jan-13 13:17pm
   
i said this command take 10 seconds
string result = sr.ReadToEnd();
i must read a 3 pages and this take almost 30 seconds
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

You getting HTML code from a website. You can use code like this.
 
string urlAddress = "http://google.com";
 
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
  Stream receiveStream = response.GetResponseStream();
  StreamReader readStream = null;
  if (response.CharacterSet == null)
    readStream = new StreamReader(receiveStream);
  else
    readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
  string data = readStream.ReadToEnd();
  response.Close();
  readStream.Close();
}
 
This will give you the returned HTML code from the website. But find text via LINQ is not that easy.
Perhaps it is better to use regular expression but that does not play well with HTML code.
Event better is to get your news by a RSS feed[^].
  Permalink  
Comments
SAKryukov at 1-Jun-11 15:22pm
   
Good code (however, OP is almost there) and advice, my 5.
 
Please see what I added to complement your solution.
--SA
the_h at 19-Aug-13 14:36pm
   
Thanks this is very helpful for me :)
Kim Togo at 2-Jun-11 9:30am
   
Thanks SA
Kim Togo at 20-Aug-13 2:17am
   
Glad you can use it :-)
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 4

In addition to what Kim suggested, I would advice some further steps.
 
If you use RSS feed, chances are, this well-formed XML. Parse XML in one of the following ways and locate the elements you need:
 
  1. Use System.Xml.XmlDocument class. It implements DOM interface; this way is the easiest and good enough if the size if the document is not too big.
    See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx[^].
  2. Use the class System.Xml.XmlTextReader; this is the fastest way of reading, especially is you need to skip some data.
    See http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx[^].
  3. Use the class System.Xml.Linq.XDocument; this is the most adequate way similar to that of XmlDocument, supporting LINQ to XML Programming.
    See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx[^], http://msdn.microsoft.com/en-us/library/bb387063.aspx[^].
 
If by some reason you need to find something in HTML which is not formatted as well-formed XML (which would be a pity), try to use some HTML parser.
 
For example, check Majestic-12 Open Source HTML parser: http://www.majestic12.co.uk/projects/html_parser.php[^].
 
—SA
  Permalink  
Comments
Kim Togo at 2-Jun-11 9:31am
   
Good advice SA. My 5.
SAKryukov at 2-Jun-11 13:12pm
   
Thank you, Kim.
--SA
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 8

Hi all friends
Final project for my university and I need to make a search engine.
I do not know what the title page of the website into a single variable as a string Please help me so I guess it is important
  Permalink  
Comments
CHill60 at 16-Mar-14 14:48pm
   
If you have a question of your own then please post it. Relatively few people will see your post as you have posted it as a solution to an already answered (and old) question
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 9

See my WebResourceProvider[^] framework that was designed to do the exact task you're doing.
 
/ravi
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 7

If by some reason you need to find something in HTML which is not formatted as well-formed XML (which would be a pity), try to use some HTML parser.

For example, check Majestic-12 Open Source HTML here
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 6

string htmlContent = new System.Net.WebClient().DownloadString(theUrl);
 
or you could use
 
System.Net.WebClient wc = new System.Net.WebClient();
            wc.DownloadStringCompleted += (s, e) =>
            {
                string htmlContent = e.Result;
            };
            wc.DownloadStringAsync(theUrl);
 

http://g4ac.co.za/WebClient[^]
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 Maciej Los 495
1 Sergey Alexandrovich Kryukov 479
2 OriginalGriff 360
3 Abhinav S 250
4 CHill60 240
0 Sergey Alexandrovich Kryukov 10,102
1 OriginalGriff 9,495
2 Peter Leow 5,241
3 Kornfeld Eliyahu Peter 3,373
4 Maciej Los 3,076


Advertise | Privacy | Mobile
Web03 | 2.8.150327.1 | Last Updated 14 Mar 2014
Copyright © CodeProject, 1999-2015
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100