Click here to Skip to main content
11,484,402 members (65,911 online)
Click here to Skip to main content

StringParser

, 15 Jan 2006 CPOL 98.6K 3.3K 99
Rate this:
Please Sign up or sign in to vote.
An object that makes it easy to extract information from strings, especially HTML content.

Introduction

StringParser is an object that helps you extract information from a string.  The class is perhaps best suited to parse HTML pages downloaded from the web (see my WebResourceProvider class that helps you do this).  You use StringParser by constructing it with some content (i.e. a string) and using its navigational and extraction methods to extract substrings from the content.  StringParser also provides some static methods designed specifically for parsing HTML.

API

Here are some of the methods provided by StringParser.  Please see the accompanying documentation for an exhaustive list.

Navigational API
resetPosition()
skipToEndOf()
skipToEndOfNoCase()
skipToStartOf()
skipToStartOfNoCase()
  Extraction API
extractTo()
extractToNoCase()
extractUntil()
extractUntilNoCase()
extractToEnd()
  Position query API
at()
atNoCase()
  HTML parsing API
getLinks()
removeComments()
removeEnclosingAnchorTag()
removeEnclosingQuotes()
removeHtml()
removeScripts()

Example 1 - Extracting delimited text

This example shows how to extract text contained between two delimiters. 
  // Extract text between the comma and question mark
  string strExtract = "";
  string str = "Hello Sally, how are you?";
  StringParser p = new StringParser (str);
  if (p.skipToStartOf (",") && p.extractTo ("?", ref strExtract))
     Console.Writeln ("Extracted text = {0}", strExtract);
  else
     Console.Writeln ("No text extracted.");

Example 2 - Extracting the nth occurence of a delimited string

This example shows how to obtain the href attribute of the third anchor tag (<a>) in an HTML string.  The example assumes the string contains valid HTML.
  // Get href attribute of 3rd <a> tag
  string strExtract = "";
  string str = "..."; // HTML
  StringParser p = new StringParser (str);
  if (p.skipToStartOfNoCase ("<a") &&
      p.skipToStartOfNoCase ("<a") &&
      p.skipToStartOfNoCase ("<a") &&
      p.skipToStartOfNoCase ("href=\"") &&
      p.extractTo ("\"", ref strExtract))
     Console.Writeln ("Extracted text = {0}", strExtract);
  else
     Console.Writeln ("No text extracted.");

Example 3 - Global case-insensitive replacement

This example shows how to case-insensitively replace a string in the parser's content..
  // Replace every occurence of <td> with <td class="foo">
  string str = "..."; // HTML
  StringParser p = new StringParser (str);
  p.replaceEvery ("<td>", "<td class=\"foo\">");

Example 4 - Poor man's web scraping

This example shows how to obtain a stock's quote from the content downloaded from Yahoo Finance (MSFT).  The example makes assumptions about the format of the web page.
  // Scrape http://finance.yahoo.com/q?s=msft
  string strQuote = "";
  string str = "..."; // HTML downloaded from http://finance.yahoo.com/q?s=msft
  StringParser p = new StringParser (str);
  if (p.skipToEndOfNoCase ("Last Trade:</td><td class="yfnc_tabledata1"><big><b>") &&
      p.extractTo ("</b>", ref strQuote))
     Console.Writeln ("MSFT (delayed) = {0}", strQuote);

Example 5 - Get list of hyperlinked phrases

This example shows how to obtain the list of hyperlinked phrases in HTML content.
  ArrayList phrases = new ArrayList();
  string str = "..."; // HTML content
  StringParser p = new StringParser (str);
  while (p.skipToStartOfNoCase ("<a")) {
    string strPhrase = "";
    if (p.skipToEndOf (">") && p.extractTo ("<a>", ref strPhrase))
       phrases.Add (strPhrase);
  }

Demo applications

C# applications (with full source code) that use StringParser can be found here:

Revision History

  • 15 Jan 2006
    Initial version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Ravi Bhavnani
Technical Lead
Canada Canada
Ravi Bhavnani is an ardent fan of Microsoft technologies who loves building Windows apps, especially PIMs, system utilities, and things that go bump on the Internet. During his career, Ravi has developed expert systems, desktop imaging apps, marketing automation software, EDA tools, a platform to help people find, analyze and understand information, trading software for institutional investors and advanced data visualization solutions. He currently works for a company that provides enterprise workforce management solutions to large clients.

His interests include the .NET framework, reasoning systems, financial analysis and algorithmic trading, NLP, HCI and UI design. Ravi holds a BS in Physics and Math and an MS in Computer Science and was a Microsoft MVP (C++ and C# in 2006 and 2007). He is also the co-inventor of 2 patents on software security and generating data visualization dashboards. His claim to fame is that he crafted CodeProject's "joke" forum post icon.

Ravi's biggest fear is that one day he might actually get a life, although the chances of that happening seem extremely remote.
Follow on   Google+   LinkedIn

Comments and Discussions

 
GeneralMy vote of 5 Pin
.NET DJ10-Feb-15 7:21
professional.NET DJ10-Feb-15 7:21 
GeneralRe: My vote of 5 Pin
Ravi Bhavnani10-Feb-15 7:29
professionalRavi Bhavnani10-Feb-15 7:29 
GeneralJust what I needed! Pin
alsamflux24-Apr-14 5:19
memberalsamflux24-Apr-14 5:19 
GeneralRe: Just what I needed! Pin
Ravi Bhavnani24-Apr-14 5:24
professionalRavi Bhavnani24-Apr-14 5:24 
GeneralMy vote of 5 Pin
Maciej Los31-Mar-14 7:57
mvpMaciej Los31-Mar-14 7:57 
GeneralRe: My vote of 5 Pin
Ravi Bhavnani31-Mar-14 8:57
professionalRavi Bhavnani31-Mar-14 8:57 
QuestionScraping Of Data from Paginated Grid View Pin
Member 1050050624-Jan-14 19:14
professionalMember 1050050624-Jan-14 19:14 
AnswerRe: Scraping Of Data from Paginated Grid View Pin
Ravi Bhavnani24-Jan-14 20:44
professionalRavi Bhavnani24-Jan-14 20:44 
GeneralRe: Scraping Of Data from Paginated Grid View Pin
Member 1050050624-Jan-14 21:03
professionalMember 1050050624-Jan-14 21:03 
GeneralRe: Scraping Of Data from Paginated Grid View Pin
Ravi Bhavnani24-Jan-14 21:36
professionalRavi Bhavnani24-Jan-14 21:36 
GeneralRe: Scraping Of Data from Paginated Grid View Pin
Member 1050050624-Jan-14 22:58
professionalMember 1050050624-Jan-14 22:58 
QuestionCan your script be extended to a tool that can download real time quotes from google finance and store it in Amibroker Pin
Raj23230-Nov-13 22:13
memberRaj23230-Nov-13 22:13 
AnswerRe: Can your script be extended to a tool that can download real time quotes from google finance and store it in Amibroker Pin
Ravi Bhavnani1-Dec-13 13:16
professionalRavi Bhavnani1-Dec-13 13:16 
QuestionI am newby and need your help. Pin
Member 841644127-May-12 22:41
memberMember 841644127-May-12 22:41 
AnswerRe: I am newby and need your help. Pin
Ravi Bhavnani28-May-12 2:54
memberRavi Bhavnani28-May-12 2:54 
QuestionRe: I am newby and need your help. [modified] Pin
lance.spurgeon28-May-12 9:59
memberlance.spurgeon28-May-12 9:59 
AnswerRe: I am newby and need your help. Pin
lance.spurgeon29-May-12 11:41
memberlance.spurgeon29-May-12 11:41 
GeneralMy vote of 5 Pin
samiDiab29-Feb-12 4:33
membersamiDiab29-Feb-12 4:33 
GeneralRe: My vote of 5 Pin
Ravi Bhavnani28-May-12 2:53
memberRavi Bhavnani28-May-12 2:53 
GeneralExtracting Meta Keywords and Descriptions Pin
keith_fra26-Jul-07 10:48
memberkeith_fra26-Jul-07 10:48 
QuestionRewindTo? Pin
krn_2k19-Jun-07 9:23
memberkrn_2k19-Jun-07 9:23 
AnswerRe: RewindTo? Pin
Ravi Bhavnani19-Jun-07 9:37
memberRavi Bhavnani19-Jun-07 9:37 
GeneralRe: RewindTo? Pin
krn_2k19-Jun-07 9:44
memberkrn_2k19-Jun-07 9:44 
Generalextract tags Pin
rama jayapal29-Mar-07 22:50
memberrama jayapal29-Mar-07 22:50 
GeneralRe: extract tags Pin
Ravi Bhavnani30-Mar-07 3:58
memberRavi Bhavnani30-Mar-07 3:58 
Generalgood stuff Pin
tonyc2a25-Feb-07 7:34
membertonyc2a25-Feb-07 7:34 
GeneralRe: good stuff Pin
Ravi Bhavnani25-Feb-07 7:42
memberRavi Bhavnani25-Feb-07 7:42 
QuestionString parser for Client server Pin
venkiiz23-Jan-07 4:23
membervenkiiz23-Jan-07 4:23 
AnswerRe: String parser for Client server Pin
Ravi Bhavnani23-Jan-07 5:40
memberRavi Bhavnani23-Jan-07 5:40 
GeneralC++ Version Pin
Imtiaz Murtaza23-Nov-06 21:18
memberImtiaz Murtaza23-Nov-06 21:18 
AnswerRe: C++ Version Pin
Ravi Bhavnani24-Nov-06 2:57
memberRavi Bhavnani24-Nov-06 2:57 
Questionthe best articale!!! Pin
ronicohen17-Nov-06 8:51
memberronicohen17-Nov-06 8:51 
AnswerRe: the best articale!!! Pin
Ravi Bhavnani17-Nov-06 8:59
memberRavi Bhavnani17-Nov-06 8:59 
GeneralThis is awesome! Pin
zythra15-Apr-06 18:49
memberzythra15-Apr-06 18:49 
GeneralRe: This is awesome! Pin
Ravi Bhavnani16-Apr-06 5:56
memberRavi Bhavnani16-Apr-06 5:56 
QuestionHow to keep session on Pin
sumoncsekugmail19-Feb-06 1:57
membersumoncsekugmail19-Feb-06 1:57 
AnswerRe: How to keep session on Pin
Ravi Bhavnani19-Feb-06 3:28
memberRavi Bhavnani19-Feb-06 3:28 
GeneralReading Meta tags Pin
rizwan_rashid14-Feb-06 19:26
memberrizwan_rashid14-Feb-06 19:26 
GeneralRe: Reading Meta tags Pin
Ravi Bhavnani15-Feb-06 3:44
memberRavi Bhavnani15-Feb-06 3:44 
GeneralHELP!! Pin
rizwan_rashid14-Feb-06 7:15
memberrizwan_rashid14-Feb-06 7:15 
GeneralHELP!!!! Pin
rizwan_rashid12-Feb-06 1:44
memberrizwan_rashid12-Feb-06 1:44 
GeneralRe: HELP!!!! Pin
Ravi Bhavnani12-Feb-06 2:14
memberRavi Bhavnani12-Feb-06 2:14 
GeneralRe: HELP!!!! Pin
rizwan_rashid12-Feb-06 2:38
memberrizwan_rashid12-Feb-06 2:38 
GeneralRe: HELP!!!! Pin
Ravi Bhavnani12-Feb-06 2:45
memberRavi Bhavnani12-Feb-06 2:45 
GeneralCongratulations from France Pin
cadlink17-Jan-06 14:14
membercadlink17-Jan-06 14:14 
GeneralRe: Congratulations from France Pin
Ravi Bhavnani18-Jan-06 2:58
memberRavi Bhavnani18-Jan-06 2:58 
GeneralThe file cant download Pin
digitalpump15-Jan-06 15:55
memberdigitalpump15-Jan-06 15:55 
GeneralRe: The file cant download Pin
Ravi Bhavnani15-Jan-06 16:00
memberRavi Bhavnani15-Jan-06 16:00 
GeneralFixed! Pin
Ravi Bhavnani15-Jan-06 16:17
memberRavi Bhavnani15-Jan-06 16:17 
GeneralRe: Fixed! Pin
digitalpump15-Jan-06 23:52
memberdigitalpump15-Jan-06 23:52 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.150520.1 | Last Updated 15 Jan 2006
Article Copyright 2006 by Ravi Bhavnani
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid