Click here to Skip to main content
15,891,136 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I want to fetch all the content from website(any website). I have the following code but it is not 100% correct.

C#
using (var client = new System.Net.WebClient())
    {
        var filename = System.IO.Path.GetTempFileName();
        client.DownloadFile("http://www.cnn.com", filename);
        var doc = new HAP.HtmlDocument();
        doc.OptionDefaultStreamEncoding = Encoding.UTF8;
        doc.Load(filename);

        var root = doc.DocumentNode;
        var a_nodes = root.Descendants("a").ToList();


        foreach (var a_node in a_nodes)
        {
            Console.WriteLine();
            Console.WriteLine(a_node.InnerText.Trim());
        }
    }

    Console.ReadKey();


Now I have used just
HTML
<a>
tag here but I am not sure how to get data from other tags all together.

Any suggestons!!
Posted

1 solution

See SiteMapper Tool[^]. See especially the tread TraverseWebSite_BW_DoWork in SiteMapper.cs.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900