jQuery
is much powerful to extract the content of
HTML
document.
However, if you can't use
jQuery
then the
Regex
class can be used to extract the content between
<title> and </title>
, which is required as mentioned in the question, as shown below:
string htmlText = @"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.1//EN"" ""http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"">
<html>
<head>
<title>Paula - Microsoft Word - Comparison of the different image compression algorithms.doc</title>
<title></title><link href=""/DigitalLibrary/extData.aspx?filePath=stylesheet.css&epub=b3aab940-fb48-4f6c-ae63-d599f4893795_aguilera_rpt.epub"" type=""text/css"" rel=""stylesheet""/>
</head>
<body>
<div class=""body"">
<div id=""frontmatter"">
<div id=""titlepage"">
</div>
</div>
</div>
<a id=""1"">";
Match match = Regex.Match(htmlText,@"<title>([^<>]*)</title>",
RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
if (match.Success && match.Groups.Count > 1)
Console.WriteLine(match.Groups[1].Value);