If you have a
HtmlDocument
(using HtmlAgilityPack), you can use
.DocumentNode.Descendants()
to get all descendants, and using the LINQ extension methods, you can search for the element containing 'Sample Text' and get its class:
string html = @"<!DOCTYPE html>
<html>
<head><title>Sample document</title></head>
<body>
<div class=""sampleclass"">
Sample Text
</div>
</body>
</html>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
HtmlNode foundNode = doc.DocumentNode.Descendants().Where(x => x.InnerHtml.Trim() == "Sample Text").FirstOrDefault();
string classAttribute = foundNode?.Attributes["class"]?.Value;
.Where[
^] filters the descendants using the predicate
x => x.InnerHtml.Trim() == "Sample Text"
, which means that for an element 'x' in the list of descendants, the trimmed InnerHTML of 'x' must be "Sample Text".
.FirstOrDefault[
^] returns the first found element, or
null
if no element is found.
When the node is found, the attribute is fetched from the node. Note that I used
?.
instead of just
.
because
?.
is a
null-conditional operator[
^].
foundNode?.Attributes["class"]
means "
if foundNode is null, then this expression evaluates to null; if foundNode is not null, then this expression executes .Attributes["class"]".
?.Value
works in the same way. Using this operator avoids a few
null
checks. If
foundNode
is null or if it doesn't have a class attribute, then
classAttribute
is null too.