|
|
Hello Lonely.
Great Solution... But i see that the ASP.NET not yet attend 100% the WebStandards.
Tks a lot.
Rodrigo Kono
MCP - MCTS - MSP
DevGoiás.NET - www.devgoias.net
Brazil
|
|
|
|
|
VS2003 destroys
<ul>
<li>item1</li>
<li>item2</li>
</ul>
into (horrible):
<ul>
<li>item1
<li>item2 </LI></UL>
XHTMLPage just can handle it (by adding </li> before the second <li>). I don't expect it to do that with such a few lines of code. I mean I can't stop VS2003 from destroying my valid XHTML. I've tried turning off automatic formating but it didn't help.
MaxGyver
|
|
|
|
|
Well, good code. no doubt.
I convertd the code to VB using the free tool C# to VB.NET converter[^] and implemented my ASP .NET project (written entirely in VB.NET 2005). Executed and it worked, but errors of javascript.
My solution is using Microsoft AJAX and third party controls (www.telerik.com). I think one of them (strongly agree, AJAX) causing problem. I received the following error message:
Assertion Failed: Unrecognized tag script:pagerequestmanager
Break into debugger?
any idea?
thanks,
Sameers
|
|
|
|
|
the __viewstate field is not handled, so this class does not do what it should, it does not generate valid xhtml by w3c
|
|
|
|
|
What about the __viewstate field is not compliant?
|
|
|
|
|
ID attribute values aren't allowed to begin with underscores. __viewstate (among others) is invalid.
|
|
|
|
|
Had started using this class and thought it was a fantastic idea and well implemented, but have had to stop using it because it sometimes changes the inner text of HTML tags to lowercase e.g. My Text becomes my textI haven't been able to find a clear pattern to this behaviour, but does anyone know of a fix? 
|
|
|
|
|
good moning all,
i have an application developped with vb.net and i have problem when i browse my application with mozilla, i know that the html generated by vb.net is not valid, in the first time i used the classe aspnet2xhtml to resolved this problem but it's not work, i want to know if this classe can convert to valid html4.0.
thank you.
foued69
|
|
|
|
|
Found this article during my search for a way to make ASP.NET xhtml compliant pages for my current project. After looking through the code, I have a few suggestions (without function calling for brevity):
?> tags
sXhtml = Regex.Replace(sXhtml,"<[^!?].*?>",new MatchEvaluator(Lowerer),RegexOptions.IgnoreCase);
// fixing empty tags
// processing all empty tags, in case somthing like <br/> is used
// instead of <br />.
string[] sa = new string[]{"br", "meta", "link", "hr", "img"};
foreach (string s in sa)
{
sXhtml = Regex.Replace(sXhtml,"<"+s+"(?<attr>\\s*.*?)/?>", "<"+s+"${attr} />",RegexOptions.IgnoreCase);
}
// fixing <script> tag
sXhtml = Regex.Replace(sXhtml, "<script(?<attr1>.*?)(type\\s*=\\s*\"text/javascript\")(?<attr2>.*?)>", "<script${attr1} ${attr2}>", RegexOptions.IgnoreCase);
sXhtml = Regex.Replace(sXhtml, "<script(?<attr1>.*?)(type\\s*=\\s*'text/javascript')(?<attr2>.*?)>", "<script${attr1} ${attr2}>", RegexOptions.IgnoreCase);
sXhtml = Regex.Replace(sXhtml, "<script(?<attr>.*?)>", "<script type=\"text/javascript\" ${attr}>", RegexOptions.IgnoreCase);
sXhtml = Regex.Replace(sXhtml,"style\\s*=\\s*\".+?\"",new MatchEvaluator(StyleLowerer),RegexOptions.IgnoreCase);
sa = new string[]{"a", "applet", "form", "frame", "iframe", "img", "map"};
foreach (string s in sa)
{
sXhtml = Regex.Replace(sXhtml,"<"+s+"(?<attr1>.*?)(name\\s*=\\s*\".*?\")(?<attr2>.*?)>", "<"+s+"${attr1} ${attr2}>",RegexOptions.IgnoreCase);
}
And the functions that go with the above:
public static string Lowerer(Match m)
{
string s = m.ToString();
if (Regex.Matches(s,"[^\\s]+\\s*=\\s*\".*?\"").Count>0)
{
s = Regex.Replace(s,"<.*?=",new MatchEvaluator(ConvertToLower));
s = Regex.Replace(s,"\"[^\"]*?=",new MatchEvaluator(ConvertToLower));
}
else
{
s = s.ToLower();
}
return s;
}
public static string StyleLowerer(Match m)
{
string s = m.ToString();
s = Regex.Replace(s,"\".*?:",new MatchEvaluator(ConvertToLower));
s = Regex.Replace(s,";.*?:",new MatchEvaluator(ConvertToLower));
return s;
}
public static string ConvertToLower(Match m)
{
return m.ToString().ToLower();
}
One known problem is the extra spaces created, such as <br /> becoming <br />.
Or in the script tag processing. Thanks for writing this article!
Vincent Tan
|
|
|
|
|
Has anyone run any performance tests on this ?
It sounds great by the way, will try it out soon 
|
|
|
|
|
|
Because I liked so much this class, i added some improvements to make it work with most pages in validator.w3.org. And I thought I should post it back to Codeproject.
Here is the code with some improvements..
1. RemoveAttribute, now supports <form bla="test" name="form1" > (with Regex)
2. Works with stringbuilder instead of string.(not tested for improved performance)
3. Added remove of not supported attribute language with select and input.
4. Added ParsePostBack, this function replaces _dopostback javascript (function found on the internet, don't know where.(google)
5. Made sure CDATA works in Explorer en Mozilla for javascript and Stylesheets.
/*
* Auteur : big71 (http://www.codeproject.com/aspnet/ASPNET2XHTML.asp)
* Mise à jour et adaptation : Sébastien FERRAND (http://www.vbmaf.net)
* Description : Transforme le code HTML généré par asp.net pour le rendre
* compatible avec XHTML.
* Version : 1.0
* Date : 05 octobre 2004
* ------------------------------------------------------------------------------
* Révision :
* Version : 1.1
* Date : 03 novembre 2004
* Mise à jour des méthodes ConvertToLowerCase(), SingleTagToLowerCase() et
* PropertiesToLowerCase() pour la prise en charge des tags avec ou sans propriété
* et les propriétés multiples pour un tag
* --
* Add support for tags with/without properties and multiproperties tags.
* ------------------------------------------------------------------------------
* Licence : Cette classe est livrée telle quelle. Le ou les auteurs ne sont
* en rien responsable de l'utilisation qui en sera fait.
* Vous êtes libre de distribuer ou d'utiliser cette classe dans
* vos projets à condition que cet entête reste présente.
*/
using System;
using System.IO;
using System.Text.RegularExpressions;
using System.Text;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.HtmlControls;
using System.Diagnostics;
namespace XHTMLPage {
/// <summary>
/// Summary description for XHTMLPage.
/// Please use XHTML10_Transitional, the others need some work.
/// </summary>
public class XHTMLPage : System.Web.UI.Page {
/// <summary>
/// Please use XHTML10_Transitional, the others need some work.
/// </summary>
public enum _XHTMLFormat {
XHTML10_Strict,
XHTML10_Transitional,
XHTML10_Frameset
}
private StringBuilder m_sXHTML;
private _XHTMLFormat m_XHTMLFormat;
private Encoding m_Encoding;
private string m_sLanguage;
private bool m_bXmlCDATA;
/// <summary>
/// Please use XHTML10_Transitional, the others need some work.
/// </summary>
public _XHTMLFormat XHTMLFormat {
get {return m_XHTMLFormat;}
set {m_XHTMLFormat = value;}
}
public Encoding Encoding {
get {return m_Encoding;}
set {m_Encoding = value;}
}
public string Language {
get {return m_sLanguage;}
set {m_sLanguage = value;}
}
/// <summary>
/// This needs some work, please leave it at false.
/// </summary>
public bool XmlCDATA {
get {return m_bXmlCDATA;}
set {m_bXmlCDATA = value;}
}
public XHTMLPage() {
//
// TODO: Add constructor logic here
//
m_sXHTML = new StringBuilder("");
m_XHTMLFormat = _XHTMLFormat.XHTML10_Transitional;
m_Encoding = Encoding.UTF8;
m_sLanguage = "en";
m_bXmlCDATA = false;
}
protected override void Render(HtmlTextWriter output) {
StringWriter w;
w = new StringWriter();
HtmlTextWriter myoutput = new HtmlTextWriter(w);
//Get the html:
base.Render(myoutput);
myoutput.Close();
m_sXHTML = w.GetStringBuilder();
//Filter the content
ReplaceDocType();
switch (m_XHTMLFormat) {
case _XHTMLFormat.XHTML10_Strict:
ConvertToXHTMLStrict();
break;
case _XHTMLFormat.XHTML10_Transitional:
ConvertToXHTMLTransitional();
break;
case _XHTMLFormat.XHTML10_Frameset:
ConvertToXHTMLFrameset();
break;
}
output.Write(m_sXHTML);
}
private void ConvertToXHTMLFrameset() {
ConvertToLowerCase();
AddSelfClose("meta");
FixHtml();
}
private void ConvertToXHTMLTransitional() {
ConvertToLowerCase();
AddSelfClose("meta");
AddSelfClose("link");
AddSelfClose("img");
AddSelfClose("hr");
AddSelfClose("input");
RemoveAttribute("form","language");
RemoveAttribute("select","language");
RemoveAttribute("input","language");
FixScript();
FixBr();
FixStyle();
FixHtml();
}
private void ConvertToXHTMLStrict() {
ConvertToLowerCase();
AddSelfClose("meta");
AddSelfClose("link");
AddSelfClose("img");
AddSelfClose("hr");
AddSelfClose("input");
FixScript();
RemoveAttribute("form", "name");
RemoveAttribute("form","language");
RemoveAttribute("select","language");
RemoveAttribute("input","language");
FixInput();
FixBr();
FixStyle();
FixHtml();
maskScript();
}
private void ReplaceDocType() {
// delete the current DOCTYPE
int nStart = m_sXHTML.ToString().IndexOf("<!DOCTYPE", 0);
if ( nStart > 0 ) {
int nEnd = m_sXHTML.ToString().IndexOf(">", nStart + 1);
if ( nEnd > 0 ) {
m_sXHTML = m_sXHTML.Remove(nStart, nEnd-nStart+1);
switch (m_XHTMLFormat) {
case _XHTMLFormat.XHTML10_Strict:
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">");
break;
case _XHTMLFormat.XHTML10_Transitional:
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">");
break;
case _XHTMLFormat.XHTML10_Frameset:
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Frameset//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd\">");
break;
}
//m_sXHTML = m_sXHTML.Insert(0, "<?xml version=\"1.0\" encoding=\""+ m_Encoding.HeaderName +"\"?>\r\n");
}
}
}
private void ConvertToLowerCase() {
// Make all tag to lower case
// m_sXHTML = Regex.Replace(m_sXHTML, "<(/?)([a-zA-Z]+)(\\s*)>", new MatchEvaluator(SingleTagToLowerCase), RegexOptions.IgnoreCase);
/// Update 03/11/2004 : Add support for Tags with properties
/// Author : Sébastien FERRAND (mailto:sebastien.ferrand@vbmaf.net)
m_sXHTML = new StringBuilder(Regex.Replace(m_sXHTML.ToString(), "<(/?)([a-zA-Z0-9]+)[ ]*(.*?)>",
new MatchEvaluator(SingleTagToLowerCase), RegexOptions.IgnoreCase));
/// Update 03/11/2004 : Update to match correctly tag with more one propertie
/// Author : Sébastien FERRAND (mailto:sebastien.ferrand@vbmaf.net)
// Make all properties to lower case
m_sXHTML = new StringBuilder(Regex.Replace(m_sXHTML.ToString(), "<([a-zA-Z0-9]+)(\\s+[a-zA-Z]+)(=\".+?>)",
new MatchEvaluator(PropertiesToLowerCase), RegexOptions.IgnoreCase));
}
private string SingleTagToLowerCase(Match m) {
/// Update 03/11/2004 : Add support for Tags with multi-properties
/// Author : Sébastien FERRAND (mailto:sebastien.ferrand@vbmaf.net)
if (m.Groups[3].ToString().Trim() == String.Empty )
return "<" + m.Groups[1].ToString().ToLower() + m.Groups[2].ToString().ToLower() + ">";
else
return "<" + m.Groups[1].ToString().ToLower() + m.Groups[2].ToString().ToLower() + " " + m.Groups[3].ToString() + ">";
}
private string PropertiesToLowerCase(Match m) {
string szReplace = "";
szReplace = "<" + m.Groups[1].ToString() + m.Groups[2].ToString().ToLower();
// Search another property in tag
if (Regex.Match(m.Groups[3].ToString(), "(.*?\")(\\s+\\w+)(=\".+>)",
RegexOptions.IgnoreCase).Success) {
szReplace += Regex.Replace(m.Groups[3].ToString(),
"(.*?\")(\\s+\\w+)(=\".+>)", new MatchEvaluator(nextProperty),
RegexOptions.IgnoreCase);
} else {
szReplace += m.Groups[3].ToString();
}
return szReplace ;
}
/// <summary>
/// Recursively search for property in tag
/// </summary>
/// <param name="m">Match of the regular expression</param>
/// <returns>tag with lower case properties</returns>
private string nextProperty(Match m) {
string szReplace = "";
szReplace = m.Groups[1].ToString() + m.Groups[2].ToString().ToLower();
// Search another property in tag
// Ignore if tag contains __VIEWSTATE... prevent long time calculation.
if (Regex.Match(m.Groups[3].ToString(), "(.*?\")(\\s+\\w+)(=\".+>)",
RegexOptions.IgnoreCase).Success && m.Groups[3].ToString().IndexOf("__VIEWSTATE")==-1) {
System.Diagnostics.Debug.WriteLine("Match OK","nextProperty");
szReplace += Regex.Replace(m.Groups[3].ToString(),
"(.*?\")(\\s+\\w+)(=\".+>)", new MatchEvaluator(nextProperty),
RegexOptions.IgnoreCase);
} else {
System.Diagnostics.Debug.WriteLine("Match NOK","nextProperty");
szReplace += m.Groups[3].ToString();
}
return szReplace;
}
private string HTMLTag(Match m) {
return "<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\""+ m_sLanguage +"\">";
}
private void FixHtml() {
m_sXHTML = new StringBuilder(Regex.Replace(m_sXHTML.ToString(), "<html>", new MatchEvaluator(HTMLTag), RegexOptions.IgnoreCase));
}
private void FixBr() {
m_sXHTML = m_sXHTML.Replace("<br>", "<br />");
}
private void FixScript() {
m_sXHTML = m_sXHTML.Replace("<script language=\"javascript\">", "<script type=\"text/javascript\">");
}
private void FixStyle() {
m_sXHTML = new StringBuilder(Regex.Replace(m_sXHTML.ToString(), "style=\"[^\"]+\"", new MatchEvaluator(ToLowerCase), RegexOptions.IgnoreCase));
//m_sXHTML = new StringBuilder(Regex.Replace(m_sXHTML.ToString(), "style=\".+\"", new MatchEvaluator(ToLowerCase), RegexOptions.IgnoreCase));
// // Add <![CDATA[ ... ]]> to mask style
m_sXHTML = new StringBuilder(Regex.Replace(m_sXHTML.ToString(),
@"(<style[^<]*>){1}(?(?=\s*<!--)(\s*<!--)(\s*.*?)(//\s*-->)|(\s*.*?))\s*(</style>){1}",
new MatchEvaluator(FixStyleElv),
RegexOptions.IgnoreCase | RegexOptions.Singleline));
}
private void FixInput() {
int nStart = 0;
int nPos = 0;
while ( nPos >= 0 ) {
string sSearch = "<input type=\"hidden\"";
nPos = m_sXHTML.ToString().IndexOf(sSearch, nStart);
if ( nPos > 0 ) {
nStart = nPos + sSearch.Length;
m_sXHTML = m_sXHTML.Insert(nPos, "<pre>");
int nEnd = m_sXHTML.ToString().IndexOf(">", nStart);
if ( nEnd > 0 ) {
m_sXHTML = m_sXHTML.Insert(nEnd+1, "</pre>");
}
}
}
}
private void AddSelfClose(string sTagName) {
int nStart = 0;
int nPos = 0;
while ( nPos >= 0 ) {
string sSearch = "<" + sTagName;
nPos = m_sXHTML.ToString().IndexOf(sSearch, nStart);
if ( nPos > 0 ) {
nStart = nPos + 1;
int nEnd = m_sXHTML.ToString().IndexOf(">", nStart);
if ( nEnd > 0 ) {
if ( m_sXHTML[nEnd-1] != '/' ) {
m_sXHTML = m_sXHTML.Insert(nEnd, " /");
}
}
}
}
}
private void RemoveAttribute(string sTagName, string sAttrName)
{
int nStart = 0;
int nLen = 0;
int nErased=0;
foreach(Match m in Regex.Matches(m_sXHTML.ToString(),"<" +sTagName+ @"((?'attrib'(\s*(\w*)\s*=\s*(?'quo'(""|'))(.*?)\k'quo'))*?)(\s/){0,1}>"))
{
for(int i=0;i<m.Groups["attrib"].Captures.Count;i++)
{
Capture g =m.Groups["attrib"].Captures[i];
if(g.Value.Trim().IndexOf(sAttrName)>=0)
{
Debug.WriteLine("remove:" +sAttrName+ " from " + m.Value);
nStart = g.Index -nErased;
nLen = g.Length;
m_sXHTML = m_sXHTML.Remove(nStart, nLen);
nErased = nErased + nLen;
}
}
}
}
private void maskScript()
{
// Add <![CDATA[ ... ]]> to mask script
m_sXHTML = new StringBuilder(Regex.Replace(m_sXHTML.ToString(),
@"(<script[^<]*>){1}(?(?=\s*<!--)(\s*<!--)(\s*.*?)(//\s*-->)|(\s*.*?))\s*(</script>){1}",
new MatchEvaluator(FixScriptElv),
RegexOptions.IgnoreCase | RegexOptions.Singleline));
}
private string FixStyleElv(Match m)
{
string ret="";
string st, ed;
if (m_bXmlCDATA)
{
st = "\r\n/*<![CDATA[ */\r\n";
ed = "\r\n/*]]>*/\r\n";
}
else
{
st = "\r\n<!--\r\n";
ed = "\r\n//-->\r\n";
}
if (m.Groups[2].ToString().Trim()=="" && m.Groups[4].ToString().Trim()=="")
st = ed = "";
ret = m.Groups[1].ToString() + st;
ret += m.Groups[2].ToString() + m.Groups[4].ToString() + ed + m.Groups[5].ToString();
return ret;
}
private string FixScriptElv(Match m)
{
string ret="";
string st, ed;
if (m_bXmlCDATA)
{
st = "\r\n// <![CDATA[\r\n";
ed = "\r\n// ]]>\r\n";
}
else
{
st = "\r\n<!--\r\n";
ed = "\r\n//-->\r\n";
}
if (m.Groups[2].ToString().Trim()=="" && m.Groups[4].ToString().Trim()=="")
st = ed = "";
ret = m.Groups[1].ToString() + st;
ret += ParsePostBack(m.Groups[2].ToString()) + m.Groups[4].ToString() + ed + m.Groups[5].ToString();
return ret;
}
private const string XHmlPostBack = "function __doPostBack(eventTarget, eventArgument){{var theform = document.getElementById (\"{0}\");theform.__EVENTTARGET.value = eventTarget.split(\"$\").join(\":\");theform.__EVENTARGUMENT.value = eventArgument;theform.submit();}}";
private string ParsePostBack(string sRoutine)
{
if (sRoutine.IndexOf("__doPostBack(eventTarget, eventArgument)") == -1) return sRoutine;
//Replacae PostBackRoutine
// I think we can replace it this way
///function __doPostBack(eventTarget, eventArgument){
///var theform = document.getElementById ('_ctl0');
///theform.__EVENTTARGET.value = eventTarget.split("$").join(":");
///theform.__EVENTARGUMENT.value = eventArgument;theform.submit();
///}
Match FormNameMatch = Regex.Match (sRoutine,"document.forms\\[\"([^\"]*)\"\\]",RegexOptions.IgnoreCase | RegexOptions.Singleline);
string FormName;
if (FormNameMatch.Success)
FormName = FormNameMatch.Groups[1].ToString();
else
FormName = "Form1";
return String.Format(XHmlPostBack,FormName);
}
private string ToLowerCase(Match m)
{
return m.ToString().ToLower();
}
}
}
|
|
|
|
|
Excellent article! It has really helped me, although I needed to tweak the code a little.
I was getting a invalid page message from the W3C validator because .net was sticking an unwanted attribute in my submit button (language=“javascript”). So, I thought I would add the line of code below:
RemoveAttribute("input", "language");
to the ConvertToXHTMLStrict function in XHTMLPage.cs. However, RemoveAttribute only works if the attribute that you want to remove is the first attribute, so I have written a new version that will remove the attribute wherever in resides in the tag.
private void RemoveAttribute(string sTagName, string sAttrName)
{
int nStart = 0;
int nLength = 0;
Regex rTagWithAttr = new Regex("<"+ sTagName +"[^>]* "+ sAttrName +"=\"(.*?)\"");
MatchCollection mcTags = rTagWithAttr.Matches(m_sXHTML);
for (int i = mcTags.Count-1; i >= 0; i--)
{
nStart = mcTags[i].Index + mcTags[i].Value.IndexOf(sAttrName) - 1;
nLength = mcTags[i].Length - mcTags[i].Value.IndexOf(sAttrName) + 1;
m_sXHTML = m_sXHTML.Remove(nStart, nLength);
}
}
Mind you, this could all be superfluous because dot.net’s clientside validation only work with IE anyway.
|
|
|
|
|
Great class, it's just what I've been looking for. I've noticed (and corrected) a slight bug in the FixStyle Method. The regular expression for lowercasing style definitions is a little over zealous and will lowercase all quoted strings in tags after the first style attribute in the file. This was causing some of my alt,id and class attributes to be lowercased too.
To fix, replace the first regexp in FixStyle with:
m_sXHTML = Regex.Replace(m_sXHTML, "style=\"[^\"]+\"", new MatchEvaluator(ToLowerCase), RegexOptions.IgnoreCase);
|
|
|
|
|
Thanks I will integrate it in the next update.
bye,
big
|
|
|
|
|
Hi
This is what I have been looking for. Is this code available in VB?
Good job,
Karl
|
|
|
|
|
I am doing a class library (C#) that shall include the ASPNET2XHTML class and some other utilities for working with XHTML. Once you have a class library you can use it inside your VB project.
As soon as I have done I will post an update here.
Bye,
big
|
|
|
|
|
|
Here is a vb.net port.
Imports System
Imports System.IO
Imports System.Text.RegularExpressions
Imports System.Text
Imports System.Web.UI
Imports System.Web.UI.WebControls
Imports System.Web.UI.HtmlControls
Imports Microsoft.VisualBasic.ControlChars
Namespace ASPNET2XHTML
Public Class XHTMLPage
Inherits System.Web.UI.Page
'XHTMLPage constructor
Public Sub New()
m_sXHTML = ""
m_XHTMLFormat = _XHTMLFormat.HTML401_Loose
m_Encoding = Encoding.UTF8
m_sLanguage = "en"
m_bXmlCDATA = False
End Sub
Public Enum _XHTMLFormat
XHTML10_Strict
XHTML10_Transitional
XHTML10_Frameset
HTML401_Loose
HTML4_Transitional
End Enum
Private m_sXHTML As String
Private m_XHTMLFormat As _XHTMLFormat
Private m_Encoding As Encoding
Private m_sLanguage As String
Private m_bXmlCDATA As Boolean
Public Property XHTMLFormat() As _XHTMLFormat
Get
Return m_XHTMLFormat
End Get
Set(ByVal Value As _XHTMLFormat)
m_XHTMLFormat = Value
End Set
End Property
Public Property Encoding() As Encoding
Get
Return m_Encoding
End Get
Set(ByVal Value As Encoding)
m_Encoding = Value
End Set
End Property
Public Property Language() As String
Get
Return m_sLanguage
End Get
Set(ByVal Value As String)
m_sLanguage = Value
End Set
End Property
Public Property XmlCDATA() As Boolean
Get
Return m_bXmlCDATA
End Get
Set(ByVal Value As Boolean)
m_bXmlCDATA = Value
End Set
End Property
Protected Overrides Sub Render(ByVal output As HtmlTextWriter)
Dim w As StringWriter
w = New StringWriter
Dim myoutput As HtmlTextWriter
myoutput = New HtmlTextWriter(w)
MyBase.Render(myoutput)
myoutput.Close()
m_sXHTML = w.GetStringBuilder().ToString()
ReplaceDocType()
Select Case (m_XHTMLFormat)
Case _XHTMLFormat.XHTML10_Strict
ConvertToXHTMLStrict()
Case _XHTMLFormat.XHTML10_Transitional
ConvertToXHTMLTransitional()
Case _XHTMLFormat.XHTML10_Frameset
ConvertToXHTMLFrameset()
Case Else
ConvertToHTML4()
End Select
output.Write(m_sXHTML)
End Sub
Private Sub ConvertToXHTMLFrameset()
ConvertToLowerCase()
AddSelfClose("meta")
FixHtml()
End Sub
Private Sub ConvertToXHTMLTransitional()
ConvertToLowerCase()
AddSelfClose("meta")
AddSelfClose("link")
AddSelfClose("img")
AddSelfClose("hr")
FixScript()
FixBr()
FixStyle()
FixHtml()
End Sub
Private Sub ConvertToHTML4()
'ConvertToLowerCase()
'AddSelfClose("meta")
'AddSelfClose("link")
'AddSelfClose("img")
'AddSelfClose("hr")
'FixScript()
'FixBr()
'FixStyle()
End Sub
Private Sub ConvertToXHTMLStrict()
ConvertToLowerCase()
AddSelfClose("meta")
AddSelfClose("link")
AddSelfClose("img")
AddSelfClose("hr")
FixScript()
RemoveAttribute("form", "name")
FixInput()
FixBr()
FixStyle()
FixHtml()
maskScript()
End Sub
Private Sub ReplaceDocType()
' delete the current DOCTYPE
Dim nStart, nEnd As Integer
nStart = m_sXHTML.IndexOf("<!DOCTYPE", 0)
If (nStart > 0) Then
nEnd = m_sXHTML.IndexOf(">", nStart + 1)
If (nEnd > 0) Then
m_sXHTML = m_sXHTML.Remove(nStart, nEnd - nStart + 1)
Select Case m_XHTMLFormat
Case _XHTMLFormat.XHTML10_Strict
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">")
Case _XHTMLFormat.XHTML10_Transitional
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"">")
Case _XHTMLFormat.XHTML10_Frameset
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Frameset//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"">")
Case _XHTMLFormat.HTML401_Loose
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC ""-//W3C//DTD HTML 4.01 Transitional//EN"" ""http://www.w3.org/TR/html4/loose.dtd"" >")
Case _XHTMLFormat.HTML4_Transitional
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC ""-//W3C//DTD HTML 4.0 Transitional//EN"" >")
End Select
If m_XHTMLFormat <> _XHTMLFormat.HTML4_Transitional AndAlso m_XHTMLFormat <> _XHTMLFormat.HTML401_Loose Then
Dim s As String = "?>" & Cr & Lf
Dim h As String = m_Encoding.HeaderName
m_sXHTML = m_sXHTML.Insert(0, "<?xml version=""1.0"" encoding=""" & h & Chr(34) & s)
End If
End If
End If
End Sub
Private Sub ConvertToLowerCase()
' Update 03/11/2004 : Add support for Tags with properties
' Author : Sébastien FERRAND (mailto:sebastien.ferrand@vbmaf.net)
m_sXHTML = Regex.Replace(m_sXHTML, "<(/?)([a-zA-Z0-9]+)[ ]*(.*?)>", _
New MatchEvaluator(AddressOf SingleTagToLowerCase), RegexOptions.IgnoreCase)
' Update 03/11/2004 : Update to match correctly tag with more one propertie
' Author : Sébastien FERRAND (mailto:sebastien.ferrand@vbmaf.net)
' Make all properties to lower case
m_sXHTML = Regex.Replace(m_sXHTML, "<([a-zA-Z0-9]+)(\\s+[a-zA-Z]+)(=" & Chr(34) & ".+?>)", _
New MatchEvaluator(AddressOf PropertiesToLowerCase), RegexOptions.IgnoreCase)
End Sub
Private Function SingleTagToLowerCase(ByVal m As Match) As String
' Update 03/11/2004 : Add support for Tags with multi-properties
' Author : Sébastien FERRAND (mailto:sebastien.ferrand@vbmaf.net)
If (m.Groups(3).ToString().Trim() = String.Empty) Then
Return "<" & m.Groups(1).ToString().ToLower() & m.Groups(2).ToString().ToLower() & ">"
Else
Return "<" & m.Groups(1).ToString().ToLower() & m.Groups(2).ToString().ToLower() & " " & m.Groups(3).ToString() & ">"
End If
End Function
Private Function PropertiesToLowerCase(ByVal m As Match) As String
Dim szReplace As String
szReplace = "<" & m.Groups(1).ToString() & m.Groups(2).ToString().ToLower()
' Search another property in tag
If (Regex.Match(m.Groups(3).ToString(), "(.*?"")(\\s+\\w+)(="".+>)", RegexOptions.IgnoreCase).Success) Then
szReplace &= Regex.Replace(m.Groups(3).ToString(), "(.*?"")(\\s+\\w+)(="".+>)", New MatchEvaluator(AddressOf nextProperty), RegexOptions.IgnoreCase)
Else
szReplace &= m.Groups(3).ToString()
End If
Return szReplace
End Function
' Recursively search for property in tag
' <param name="m">Match of the regular expression</param>
' <returns>tag with lower case properties</returns>
Private Function nextProperty(ByVal m As Match) As String
Dim szReplace As String = ""
szReplace = m.Groups(1).ToString() & m.Groups(2).ToString().ToLower()
' Search another property in tag
' Ignore if tag contains __VIEWSTATE... prevent long time calculation.
If (Regex.Match(m.Groups(3).ToString(), "(.*?"")(\\s+\\w+)(="".+>)", RegexOptions.IgnoreCase).Success AndAlso m.Groups(3).ToString().IndexOf("__VIEWSTATE") = -1) Then
System.Diagnostics.Debug.WriteLine("Match OK", "nextProperty")
szReplace &= Regex.Replace(m.Groups(3).ToString(), "(.*?"")(\\s+\\w+)(="".+>)", New MatchEvaluator(AddressOf nextProperty), RegexOptions.IgnoreCase)
Else
System.Diagnostics.Debug.WriteLine("Match NOK", "nextProperty")
szReplace &= m.Groups(3).ToString()
End If
Return szReplace
End Function
Private Function HTMLTag(ByVal m As Match) As String
Return "<html xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""" & m_sLanguage & """>"
End Function
Private Sub FixHtml()
m_sXHTML = Regex.Replace(m_sXHTML, "<html>", New MatchEvaluator(AddressOf HTMLTag), RegexOptions.IgnoreCase)
End Sub
Private Sub FixBr()
m_sXHTML = m_sXHTML.Replace("<br>", "<br />")
End Sub
Private Sub FixScript()
m_sXHTML = m_sXHTML.Replace("<script language=""javascript"">", "<script type=""text/javascript"">")
End Sub
Private Sub FixStyle()
m_sXHTML = Regex.Replace(m_sXHTML, "style=""[^""]+""", New MatchEvaluator(AddressOf ToLowerCase), RegexOptions.IgnoreCase)
' // Add <![CDATA[ ... ]]> to mask style
Dim m As New MatchEvaluator(AddressOf FixStyleAndScript)
m_sXHTML = Regex.Replace(m_sXHTML, _
"(<style[^<]*>){1}(?(?=\s*<!--)(\s*<!--)(\s*.*?)(//\s*-->)|(\s*.*?))\s*(</style>){1}", _
m, _
RegexOptions.IgnoreCase Or RegexOptions.Singleline)
End Sub
Private Sub FixInput()
Dim nStart, nPos, nEnd As Integer
nStart = 0
nPos = 0
While (nPos >= 0)
Dim sSearch As String = "<input type=""hidden"""
nPos = m_sXHTML.IndexOf(sSearch, nStart)
If (nPos > 0) Then
nStart = nPos + sSearch.Length
m_sXHTML = m_sXHTML.Insert(nPos, "<pre>")
nEnd = m_sXHTML.IndexOf(">", nStart)
If (nEnd > 0) Then
m_sXHTML = m_sXHTML.Insert(nEnd + 1, "</pre>")
End If
End If
End While
End Sub
Private Sub AddSelfClose(ByVal sTagName As String)
Dim nStart, nPos, nEnd As Integer
nStart = 0
nPos = 0
While (nPos >= 0)
Dim sSearch As String = "<" & sTagName
nPos = m_sXHTML.IndexOf(sSearch, nStart)
If (nPos > 0) Then
nStart = nPos + 1
nEnd = m_sXHTML.IndexOf(">", nStart)
If (nEnd > 0) Then
Dim c As Char = m_sXHTML.Chars(nEnd - 1)
If (c <> "/") Then
m_sXHTML = m_sXHTML.Insert(nEnd, " /")
End If
End If
End If
End While
End Sub
Private Sub RemoveAttribute(ByVal sTagName As String, ByVal sAttrName As String)
Dim nStart, nLength, nEnd As Integer
nStart = 0
nLength = 0
' Matches the tag containing the attribute
Dim rTagWithAttr As New Regex("<" & sTagName & "[^>]* " & sAttrName & "=""(.*?)""")
' Collection contains all occurances of the tag with the attribute
Dim mcTags As MatchCollection
mcTags = rTagWithAttr.Matches(m_sXHTML)
' Count BACKWARDS through the collection because the m_sXHTML length is affected each time
' an attribute is removed
Dim i As Integer = mcTags.Count - 1
While i >= 0
nStart = mcTags(i).Index + mcTags(i).Value.IndexOf(sAttrName) - 1
nLength = mcTags(i).Length - mcTags(i).Value.IndexOf(sAttrName) + 1
m_sXHTML = m_sXHTML.Remove(nStart, nLength)
i -= 1
End While
End Sub
Private Sub maskScript()
' Add <![CDATA[ ... ]]> to mask script
m_sXHTML = Regex.Replace(m_sXHTML, _
"(<script[^<]*>){1}(?(?=\s*<!--)(\s*<!--)(\s*.*?)(//\s*-->)|(\s*.*?))\s*(</script>){1}", _
New MatchEvaluator(AddressOf FixStyleAndScript), _
RegexOptions.IgnoreCase Or RegexOptions.Singleline)
End Sub
Private Function FixStyleAndScript(ByVal m As Match) As String
Dim ret, st, ed As String
ret = ""
If (m_bXmlCDATA) Then
st = Cr & Lf & "<![CDATA[" & Cr & Lf
ed = Cr & Lf & "]]>" & Cr & Lf
Else
st = Cr & Lf & "<!--" & Cr & Lf
ed = Cr & Lf & "//-->" & Cr & Lf
End If
If (m.Groups(2).ToString().Trim() = "" AndAlso m.Groups(4).ToString().Trim() = "") Then
st = ""
ed = ""
End If
ret = m.Groups(1).ToString() & st
ret &= m.Groups(2).ToString() & m.Groups(4).ToString() & ed & m.Groups(5).ToString()
Return ret
End Function
Private Function ToLowerCase(ByVal m As Match) As String
Return m.ToString().ToLower()
End Function
End Class
End Namespace
|
|
|
|
|
|
|
Good article & good solution. Thanks.
But I have some problems, I can't solve this:
The replace DOC TYPE procedure don't runs and DOC TYPE is not replaced in output XHTML.
I try with the C# original code and runs properly but in VB code versions, both Lou Vanek code and Robert Sindall, in both the replace DOC TYPE don't runs ...
Please can Help?
|
|
|
|
|
GGGGGGrrrrrrrrrrrhhh!!!!
GGgrrrrrrrrhhhh!!!!
Fed up of locking for this stupid problem ... Grrrrrr Grrrgg
I don't locate it because all code is OK !!!
Finally I see it, The problem is that I have activate VBStudio lowercase properties & tags and it puts
|
|
|
|
|
good morning, i have the same problem but i can't find "VBStudio lowercase properties & tags" i use Vstudio in french and it's not the same, please help.
|
|
|
|
|