|
I have a method that should extract text between "paragraph" tags. But I am getting css text and javascript code also.
Here is my code(be kind I am self taught).
private static string GetParagraphs(string webPage)
{
string subWebPage = webPage;
int subWebPageStartIndex = 0;
string paragraph = "";
string paragraphs = "";
int startIndex = 0;
int endIndex = 0;
while (subWebPageStartIndex < webPage.LastIndexOf("</p>"))
{
subWebPage = webPage.Substring(subWebPageStartIndex);
startIndex = subWebPage.IndexOf("<p>") + 3 + subWebPageStartIndex;
endIndex = subWebPage.IndexOf("</p>") + subWebPageStartIndex;
paragraph = webPage.Substring(startIndex, endIndex);
paragraphs = paragraphs + " " + paragraph;
subWebPageStartIndex = endIndex + 4;
Debug.WriteLine(paragraph);
}
return paragraphs;
}
Maybe You can see where I have messed up.
Thank You for taking the time to read this.
Frazzle the name say's it all
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
John F. Woods
|
|
|
|
|
What you have done is BAD. Ideal way to handle this is to use a HTML parser and traverse the DOM to get the text that you need. Look at HTML Agility[^] project.
If you are sure that you will always have a wellformed input, you could easily do this with regular expressions. Here is a working example.
public static List<string> GetAllParagraphValues(string input)
{
List<string> values = new List<string>();
Regex r = new Regex("<p[^>]*>(?<value>.*?)</p>", RegexOptions.IgnoreCase);
foreach (Match match in r.Matches(input))
{
values.Add(match.Groups["value"].Value);
}
return values;
}
Best wishes,
Navaneeth
|
|
|
|
|
N a v a n e e t h wrote: What you have done is BAD.
I knew this, it looks bad and did not work.
Slowly I am learning now as for style that will come in time.
N a v a n e e t h wrote: If you are sure that you will always have a wellformed input, you could easily do this with regular expressions
Where can I learn about regular expressions?
Thank You
Frazzle the name say's it all
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
John F. Woods
|
|
|
|
|
frazzle-me wrote: Where can I learn about regular expressions? Lots of places, Google would be a good start.
|
|
|
|