Click here to Skip to main content
15,916,463 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am having a problem with scraping this:
HTML
<td class="main txt"><a href="http://bors-nliv.svd.se/index.php/detail/index/4600">Afarak Group</a></td>


I would like to scrap the name of the stock, in this example its: Afarak Group but couldn't figure out how after all my attempts and searching. But I've managed to scrap of the stock prices with this code:

C#
 private void button3_Click(object sender, EventArgs e)
    {
        List<string> aktier = new List<string>();
        WebClient web = new WebClient();
        String html = web.DownloadString("http://bors-nliv.svd.se/index.php/aktier/index/35244");
        MatchCollection m1 = Regex.Matches(html, @"<td>\s*(.+?)s*</td>", RegexOptions.Singleline);

        foreach (Match m in m1)
        {
            if (m.Groups[1].Value != "3")

            if (m.Groups[1].Value != "Aktier")
            {


                string aktie = m.Groups[1].Value;
                aktier.Add(aktie);
            }
        }
        listBox2.DataSource = aktier;
    }
}



Here the stock price that only has this two htmltags
HTML
<td>0,41</td>
But how do I scrap the stocks name of the page when it looks like this?

<pre lang="HTML">
HTML
<td class="main txt"><td class="main txt"><a href="http://bors-nliv.svd.se/index.php/detail/index/4600">Afarak Group</a></td>
it's a couple more html tags.

I've tried to set the matches to this

C#
MatchCollection m1 = Regex.Matches(html, @"<a href"">\s*(.+?)s*</td>", RegexOptions.Singleline);


But it still doesn't work. What am I missing?

What I have tried:

C#
MatchCollection m1 = Regex.Matches(html, @"<a href"">\s*(.+?)s*</td>", RegexOptions.Singleline);
Posted
Updated 5-Jul-16 20:25pm

1 solution

Try this using Html Agility Pack[^]
Refer this dll to your project (pick the right framework)

C#
List<string> aktier = new List<string>();
           WebClient web = new WebClient();
           String html = web.DownloadString("http://bors-nliv.svd.se/index.php/detail/index/4600");
           HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
           doc.LoadHtml(html);
           var div = doc.DocumentNode.Descendants("div").Where(d =>d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("secondary-nr")).First();
           for (int i = 0; i < div.ChildNodes.Count; i++)
           {
               var node = div.ChildNodes[i];
               string temp = node.InnerText.Trim();
               if (temp.Length > 0)
                   aktier.Add(temp);
           }
           listBox2.DataSource = aktier;
 
Share this answer
 
Comments
Member 12620371 6-Jul-16 8:49am    
How do I do it without using html agility pack? Whats the regex? Cant figure it out
Karthik_Mahalingam 6-Jul-16 9:50am    
Regex is Regular Expression, which is part of core library used to search strings in a certain pattern.
but HTML Agility pack is a third party library used to parse the HTML/DOM
Member 12620371 6-Jul-16 8:54am    
I tried your code and it does not work. Getting the wrong strings.
Karthik_Mahalingam 6-Jul-16 9:48am    
which string data you need exactly?
provide more information .
Member 12620371 6-Jul-16 11:55am    
<td class="main txt"><td class="main txt">Afarak Group</td>

I need the string "Afarak Group". Between the link( and the

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900