Click here to Skip to main content
15,886,795 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I want to get all the content of an webpage. My target is to generate a report using those content.

Suppose I have a table in webpage. Now I want to get all the html & css content then I want to put it in excel. I have already made it through webbrowser and I am using C#(.NET) when the data of table are contant.But the problem is that webbrowser doesn't support all the css and jquery function and my the data of table is not constant.

Is there any other machanism ? In which way I can get all the content of an webpage or I can save all content and then I want to get all that content. Then I want create an excel and I want to put all those data with its css which I was getting from webpage.

This is my sample code. But please give me some other way .

C#
string input1 = "http://localhost:62343/login.html";
 webBrowser1.Navigate(input1);



Sample Table:
C#
 <div style="removed: absolute; removed 0px;" class="tab-1 st_view st_view_first st_view_active">
                   <div style="removed: relative; height:400px;" class="tabcontent"> 
 <table report_type="Horizontal Table" auto-aggregation="off" id="HorTable0" sort="asc" relativeid="HorTable1" top="50" bottom="" left="50" right="">

<thead>
<tr>
<td>Node Name</td><td>Hit</td><td>Duration</td>
</tr>
</thead>

<tbody>
<tr><td att="@VAL([NodeLogQuery1].[NodeName])" data_type="undefined"></td><td att="@VAL([NodeLogQuery1].[Hit])" data_type="undefined"></td><td att="@VAL([NodeLogQuery1].[Duration])" data_type="undefined"></td></tr></tbody></table><table report_type="Horizontal Table" auto-aggregation="off" id="HorTable1" sort="asc" relativeid="HorTable0" top="" bottom="10" left="" right="50"><thead><tr><td>Node Name</td><td>Hit</td><td>Duration</td></tr></thead><tbody><tr><td att="@VAL([NodeLogQuery1].[NodeName])" data_type="undefined"></td><td att="@VAL([NodeLogQuery1].[Hit])" data_type="undefined"></td><td att="@VAL([NodeLogQuery1].[Duration])" data_type="undefined"></td></tr></tbody>

</table></div>

                </div>
Posted
Updated 24-Nov-14 18:05pm
v2

You can, indeed, as Afzaal shows you here, use WebClient to get the "raw text" of the currently rendered page in the Browser.

However that "raw text" will probably incorporate CSS by links to external files, and you will have to parse the "raw text" to discover those files, extract the file-paths, and then use the WebClient to get their contents if the contents are not compressed. I say probably because it's rare these days to see in-line style definitions in a base-page.

Even if you saved (assuming your browser supports it) a web-page as an MHTML html archive, you'd still have to find a way to get linked-to files for CSS and whatever other file-types you are interested in.

Fortunately for you, there's a 2005 CodeProject article [^] that will grab CSS as well as HTML and wrap it for you in an MHTML archive, also solving a certain security problem you may have. It does not use WebClient to get linked-to files, so it can handle encrypted linked-to files.

It has been a few years since I tried using the code in this artilce, and browsers certainly change, so, as with any CP article, I suggest you read the user comments for any recent success/failure/problem reports about using the code, the test it to see if it meets your current needs.
 
Share this answer
 
You would require a WebClient and not a WebBrowser to download the HTML and all other contents from the documents. WebBrowser would just view the web pages in your application not let you use their data as a string (or other data type). WebClient can be used to download the resources. In this case the string would contain HTML (XML) markup, that you can manipulate using any XmlReader, have a look at the .NET's built-in one[^].

For example this code here,

C#
// required
using System.Net;

// create an instance
WebClient webClient = new WebClient();

// call the HTML page you want to download, and get it as a string
string htmlCode = webClient.DownloadString("{web page (or resource) you want to download}");


You should then remove all the resources used, webClient.Dispose();. The MSDN documentation[^] says that this method downloads the requests resource as a string.
 
Share this answer
 
v3
Comments
sachi Dash 24-Nov-14 8:05am    
Is it works for external css ? I am using a lot of external css and a lot of javascript file. How can I access all of them ? I am also using high chart. What will be the result for high chart ? How can I read css?
Afzaal Ahmad Zeeshan 24-Nov-14 8:06am    
Just pass the location of that resource file and it will downlod it, any kind of resource can be downloaded as a string, so CSS or JavaScript files can be downloaded, time would take for each file to be downloaded, no matter how many. :)
sachi Dash 25-Nov-14 0:12am    
I have updated my question. Please see the following table. This is a dynamic table. When I browse this page then It will take data from database then it will show in webpage. This webpage included a lot of javascript file. I need to run the webpage then I can only get the data from webpage. But WebClient only return all the resources of this page. Like html, css, and as well as included file name. But I need to return data for every column of table. Do you understand now?


<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>


<title>BAT</title>


<link href="../Reports/NodeLogjamilah116201411584280NodeLog/NodeLogjamilah116201411584280NodeLog.css" rel="stylesheet" type="text/css">
<script type="text/javascript" src="../Reports/NodeLogjamilah116201411584280NodeLog/NodeLogjamilah116201411584280NodeLog.js"></script>
<script type="text/javascript" src="../../js/jquery-1.8.0.min.js"></script>
<script type="text/javascript" src="../../js/AllPageLoad.js"></script>


<script type="text/javascript">

jQuery(document).ready(function ($) {

AllPageLoad();

});
</script>


</head>

<body>
<div divtype="Tabs">

<div id="slidetabs" class="clean_rounded-horizontal">

<div class="st_tabs" divtype="Tab_Name">
<div style="overflow: hidden;" class="st_tabs_wrap">
<ul class="tab-links st_tabs_ul">
<li class="st_li_first st_li_active">Report1</li>


<li class="">Report2</li><li class="st_li_last">Report3</li></ul>
</div>
</div>



<div class="st_views" id="DynamicGrid" divtype="Tab_View">

<div style="position: absolute; left: 0px;" class="tab-1 st_view st_view_first st_view_active">
<div style="position: relative; height:400px;" class="tabcontent">

<table report_type="Horizontal Table" auto-aggregation="off" id="HorTable0" sort="asc" relativeid="HorTable1" top="50" bottom="" left="50" right=""><thead><tr><td>Node Name</td><td>Hit</td><td>Duration</td></tr></thead><tbody><tr><td att="@VAL([NodeLogQuery1].[NodeName])" data_type="undefined"></td><td att="@VAL([NodeLogQuery1].[Hit])" data_type="undefined"></td><td att="@VAL([NodeLogQuery1].[Duration])" data_type="undefined"></td></tr></tbody></table><table report_type="Horizontal Table" auto-aggregation="off" id="HorTable1" sort="asc" relativeid="HorTable0" top="" bottom="10" left="" right="50"><thead><tr><td>Node Name</td><td>Hit</td><td>Duration</td></tr></thead><tbody><tr><td att="@VAL([NodeLogQuery1].[NodeName])" data_type="undefined"></td><td att="@VAL([NodeLogQuery1].[Hit])" data_type="undefined"></td><td att="@VAL([NodeLogQuery1].[Duration])" data_type="undefined"></td></tr></tbody></table></div>

</div>


<div style="position: absolute; left: 2560px;" class="tab-2 st_view">
<div style="position: relative; height:400px;" class="tabcontent"><table report_type="Vertical Table" auto-aggregation="off" id="VerTable0" sort="asc" relativeid="tab-2 st_view st

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900