Click here to Skip to main content
15,886,518 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
XML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Untitled Document</title>
<style type="text/css">
<!--
.style3 {font-family: "Trebuchet MS"; font-size: 30px; font-weight: bold; color: #333333; }
body {
    background-image: url();
}
.style18 {font-family: "Trebuchet MS"; font-size: 30px; font-weight: bold; color: #F9F9F8; }
.style8 {color: #FFFFFF}
-->
</style>
</head>

<body>
<table width="99%"  cellspacing="0" cellpadding="0">
  <tr>
    <td height="20" valign="top" bgcolor="#719315">&nbsp;</td>
  </tr>
  <tr>
    <td height="151" valign="top" bgcolor="#719315"><table width="90%" align="center" cellpadding="0"  cellspacing="0">
        <tr>
          <td height="69" bgcolor="#587410"><table width="97%"  border="0" align="right" cellpadding="0" cellspacing="0">
              <tr>
                <td valign="top"><div align="left"><span class="style18">A building project</span></div></td>
              </tr>
          </table></td>
        </tr>
        <tr>
          <td height="5" bgcolor="#FFFFFF"></td>
        </tr>
        <tr>
          <td bgcolor="#F5EDE3"><table width="100%" align="center" cellpadding="0"  cellspacing="0" style="border-collapse:collapse">
              <tr>
                <td bgcolor="#B9D276"><table width="96%"  border="0" align="center" cellpadding="0" cellspacing="0">
                    <tr>
                      <td height="29"><span class="style8"></span></td>
                    </tr>
                    <tr>
                      <td height="25"><table width="98%"  border="0" align="center" cellpadding="0" cellspacing="0">
                        <tr>
                          <td colspan="3"><p class="Style2" style="line-height: 131%; margin-right: 74.95pt"> <span style="line-height: 131%; font-family: Book Antiqua; letter-spacing: -.2pt; font-weight: 700"> <font color="#800000"> <br>
                              </font> <font color="#000000"> &nbsp;</font><font color="#800000">&nbsp;&nbsp;&nbsp; </font> </span><b><font color="#800000"> <span style="font-family: Book Antiqua; letter-spacing: -.2pt"> <br>
                              </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.2pt"> Few people are committed to a </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.1pt"> building </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.2pt"> project. <br>
        They are </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.1pt"> discussing </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.2pt"> about ways </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.1pt"> and </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.2pt"> means to fund it.</span></font></b></p>
                              <p class="Style2" style="line-height: 131%; margin-right: 74.95pt"> <font color="#000000"> <span style="font-family: Book Antiqua; letter-spacing: .3pt"> Member </span> <span style="font-family: Book Antiqua; letter-spacing: 1.05pt"> 1:</span><span style="font-family: Book Antiqua; letter-spacing: .3pt">&nbsp;&nbsp;&nbsp;&nbsp;Well, girls there aren't many ways open to us to raise money</span><span style="line-height: 150%; font-family: Book Antiqua; letter-spacing: .1nbsp;&nbsp; Wonderful, if all of us work together, surely we will complete this project soon. Yea.</font></span></span></p>
                              <p></td>
                        </tr>
                      </table></td>
                    </tr>
                    <tr>
                      <td height="10">&nbsp;</td>
                    </tr>
                </table></td>
              </tr>
          </table></td>
        </tr>
    </table></td>
  </tr>
  <tr>
    <td valign="top" bgcolor="#719315">&nbsp;</td>
  </tr>
</table>
</body>
</html>





am having the document in HTML format, but i need to read only the text not the tags which are used in the document.

is it possible..........?????????????????
Posted
Comments
Lakamraju Raghuram 10-Apr-12 2:48am    
Do not dump.
what text you want to read exactly?
Santhosh Subramanian 10-Apr-12 2:58am    
These are all the things within that html documents....
i need to extract the tags and get only the text to read from that....


A building project





Few people are committed to a building project.
They are discussing about ways and means to fund it.

Member 1: Well, girls there aren't many ways open to us to raise money at
the moment except, of course, asking for contributions.

Teacher: Teacher's are willing to donate Rs.5,000.00 from their side.

Member 2: We will collect some money from the people.

Member 3: Why not organise a magic show to raise money?

Member 1: Any more suggestions...

Member 4: How about distributing surprise prize coupons?

Member 1: Wonderful, if all of us work together, surely we will complete this project soon. Yea.




Santhosh Subramanian 10-Apr-12 2:52am    
A building project





Few people are committed to a building project.
They are discussing about ways and means to fund it.

Member 1: Well, girls there aren't many ways open to us to raise money at
the moment except, of course, asking for contributions.

Teacher: Teacher's are willing to donate Rs.5,000.00 from their side.

Member 2: We will collect some money from the people.

Member 3: Why not organise a magic show to raise money?

Member 1: Any more suggestions...

Member 4: How about distributing surprise prize coupons?

Member 1: Wonderful, if all of us work together, surely we will complete this project soon. Yea.
Lakamraju Raghuram 10-Apr-12 2:55am    
****

Using C# Code
C#
string htmlContent = System.IO.File.ReadAllText("Url of your html file");
lblOnlyText.Text = System.Text.RegularExpressions.Regex.Replace(htmlContent, "<[^>]*>", "");



Using VB.Net Code
VB
Dim htmlContent As String = System.IO.File.ReadAllText("Url of your html file")
lblOnlyText.Text = System.Text.RegularExpressions.Regex.Replace(htmlContent, "<[^>]*>", "")
 
Share this answer
 
Use Jquery for getting your text content. All your needed content is in tag right.


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
<script type="text/javascript" src="http://ajax.microsoft.com/ajax/jquery/jquery-1.4.2.min.js"></script>
<script type="text/javascript">
$(document).ready(function() {
	var txt_con =$('span');
	var cont;

	for(var i=0;i<txt_con.length;i++)
	{
	cont=$("span:nth("+ i +")").html();
	alert(cont);
	}

});
</script>
</head>
<body>
<div><span>Text1</span></div>
<table><tr><td><span>Text2</span></td></tr><tr><td><span>Text3</span></td></tr></table>
<div><span>Text4</span></div>
<div><span>Text5</span></div>
</body>
</html>
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900