15,904,416 members
Sign in
Sign in
Email
Password
Forgot your password?
Sign in with
home
articles
Browse Topics
>
Latest Articles
Top Articles
Posting/Update Guidelines
Article Help Forum
Submit an article or tip
Import GitHub Project
Import your Blog
quick answers
Q&A
Ask a Question
View Unanswered Questions
View All Questions
View C# questions
View C++ questions
View Javascript questions
View Visual Basic questions
View Python questions
discussions
forums
CodeProject.AI Server
All Message Boards...
Application Lifecycle
>
Running a Business
Sales / Marketing
Collaboration / Beta Testing
Work Issues
Design and Architecture
Artificial Intelligence
ASP.NET
JavaScript
Internet of Things
C / C++ / MFC
>
ATL / WTL / STL
Managed C++/CLI
C#
Free Tools
Objective-C and Swift
Database
Hardware & Devices
>
System Admin
Hosting and Servers
Java
Linux Programming
Python
.NET (Core and Framework)
Android
iOS
Mobile
WPF
Visual Basic
Web Development
Site Bugs / Suggestions
Spam and Abuse Watch
features
features
Competitions
News
The Insider Newsletter
The Daily Build Newsletter
Newsletter archive
Surveys
CodeProject Stuff
community
lounge
Who's Who
Most Valuable Professionals
The Lounge
The CodeProject Blog
Where I Am: Member Photos
The Insider News
The Weird & The Wonderful
help
?
What is 'CodeProject'?
General FAQ
Ask a Question
Bugs and Suggestions
Article Help Forum
About Us
Search within:
Articles
Quick Answers
Messages
Comments by theadmin (Top 21 by date)
theadmin
15-Jul-15 9:06am
View
Is this the only way it can be done? Is there not an easier way than custom drawing?
theadmin
15-Jun-15 8:41am
View
Let me try my question in a different way. I am connecting to about 50-60 webpages, using the above code I am calling every webpage like
ProcessPage1()
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.UserAgent = @"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36";
request.Accept = @"text/html";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream dataStream = response.GetResponseStream();
using (StreamReader streamread = new StreamReader(dataStream))
doc.Load(streamread);
if (doc != null)
{
}
.............
}
ProcessPage2()
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.UserAgent = @"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36";
request.Accept = @"text/html";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream dataStream = response.GetResponseStream();
using (StreamReader streamread = new StreamReader(dataStream))
doc.Load(streamread);
if (doc != null)
{
}
.............
}
ProcessPage3()
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.UserAgent = @"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36";
request.Accept = @"text/html";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream dataStream = response.GetResponseStream();
using (StreamReader streamread = new StreamReader(dataStream))
doc.Load(streamread);
if (doc != null)
{
}
.............
}
Hopefully what I am asking makes sense now.
theadmin
12-Jun-15 8:24am
View
Thank you sir.
theadmin
11-Jun-15 22:18pm
View
Is there any way to make this case insensitive? I noticed that if the string is in lowercase my returned value is null.
theadmin
7-Jun-15 14:51pm
View
Thanks...
theadmin
7-Jun-15 11:00am
View
I ended up doing the following and it seemed to work.
string instock = string.Empty;
foreach (HtmlNode node in row.SelectNodes("./td[3]/text()"))
{
instock = node.InnerText;
if (instock.IndexOf("nbsp;") > -1)
{
instock = instock.Replace("nbsp;", "");
break;
}
}
I don't know if this is the best way but its working.
theadmin
3-Jun-15 13:53pm
View
Mario,
Thanks again for helping me out with the other stuff. I am really chugging along after you gave me all those examples. This question pertains to the same code and I am trying to find the manufacturers of the powders. I am trying to figure out the best way to actually find the info since there will be a line with the manufacturer info contained within the string once its pulled from the html.
I used the states as an example as its pretty much close to exactly what I will be doing when I get the string. I turn to you folks as I am no coder (only in my spare time) and I am looking for guidance on how to actually perform the task at hand. It doesn't have to be a switch, I just thought of a switch since it seemed so easy. Just looking for some guidance on coding better from the pros.
Thanks guys...
theadmin
26-May-15 10:27am
View
Thanks for pointing me to that tool Mario, I have been looking for something to simplify finding the correct path.
The first solution definitely would have been impossible since its not the standard way of getting the results. Hopefully I can get a bit further with the tool and not have to bother you too much.
Thanks again for the help.
theadmin
25-May-15 13:31pm
View
Thanks Mario for working with me on this. I am going to continue the conversation here since others will benefit from it also.
When you posted the code I thought I had it all figured out, little did I realize that it was going to be a LOT more difficult than that.
On this page: http://store.thirdgenerationshootingsupply.com/browse.cfm/2,3612.html I use Chrome to get the xpath in order to make things a lot easier for me when dealing with this xpath issue. Once I had figured out how you got the code working on the first link I posted I pretty much did the same thing, I got the path for the entire grid with products then I went back one position in order to get the other values as I go down the tree.
// path to the entire table
/html/body/table/tbody/tr[4]/td/table/tbody/tr/td[2]/table
// this is what I used
/html/body/table/tr[4]/td/table/tr/td[2]/table/tr
// then to get the values under my first selection while traversing the tree I would use.
string product = listItem.SelectSingleNode("./td[1]/a").InnerText; (debugger stops the program)
I have no idea how to get the values that I need under these nodes since everything I am doing is crashing. Basically in all of this I am just trying to get the following info on every page I am scraping.
// productlink (./td[1]/a and //td[2]/a and ./td[2]/b crashes the debugger)
RED DOT 1 LB
<br>
// item # (difficult to get because of location)
Item #:
ALLREDDOT1LB
<br>
// qty in stock
Qty In Stock:
0
<br>
// price
<table class="itemPriceTable" cellpadding=0 cellspacing=0 border=0><tr class="itemSellPriceRow"><td class="bodyTextSmallBold"><span class="itemSellPriceLabel">Your Price: </span></td><td class="bodyTextSmall"><span class="itemSellPrice">$18.59</span></td></tr></table>
Hopefully the picture can explain a lot better than what I am trying to explain.
-----------------------------------------------------------------------------------------------------------------------------------
Here is another scenario, this other page used the same code that you posted the first time.
https://www.americanreloading.com/en/31-gunpowder
//[@id="product_list"]/li
When I tried to duplicate the code nothing worked. There are some values that are more descriptive than others and its better to get those values instead, unfortunately I couldn't get past go.
when you go back up the tree like this
./a (how do you get the href vaule or the title value??)
Google searched for many hours, read many tutorials on xpath and still not getting anywhere efficiently.
I was going to post two pics with the post but it looks like its not possible.
Thanks again for the help.
theadmin
24-May-15 13:02pm
View
Mario, how do I get in touch with you? I looked at how you came to the answer but its so different when looking at other pages. This stuff is so confusing especially when you are not a software developer.
theadmin
24-May-15 0:12am
View
Thanks Mario, I figured I would post the question in its own thread instead of combining it and it was not related.
theadmin
17-May-15 14:03pm
View
Why do I have such a search string? I don't make up web pages and tell others what to put on their page, I am trying to grab information off the page. When I add the xpath in directly all I get is the text Item #:.
I am doing some IndexOf() in order to get the info I need, it seems you have to be very creative when dealing with these parsers, they do not do everything that you think they do.
theadmin
17-May-15 14:02pm
View
Deleted
Why do I have such a search string? I don't make up web pages and tell others what to put on their page, I am trying to grab information off the page. When I add the xpath in directly all I get is the text Item #:.
I am doing some IndexOf() in order to get the info I need, it seems you have to be very creative when dealing with these parsers, they do not do everything that you think they do.
theadmin
15-May-15 7:59am
View
I understand what you are saying, I thought once I understood how it works in a much simpler file I would know what to do in a more complex file. That really backfired and I am still lost. Can you show me how to get the values from this site if you don't mind. http://www.butchsreloading.com/shop/35-powder?id_category=35&n=50 start with this product (Hodgdon Powder, H414, 1lbs) and get the product, price, availability. Using chrome I get the following. I will be getting all the products available on the page.
//*[@id="product_list"]/li[1]/div[2]/h3/a
//*[@id="product_list"]/li[1]/div[3]/div/span[1]
//*[@id="product_list"]/li[1]/div[3]/div/span[2]
theadmin
14-May-15 22:29pm
View
Thanks, I just got it working. As I moved to my more complex html file none of this worked for me anymore since the file was so different. I appreciate all your help.
theadmin
14-May-15 18:14pm
View
Is there another namespace that I need to add besides using System.Windows.Forms;? Somehow I cant even add this first line without the compiler warning me about not knowing what it is.
HtmlDocument html = new HtmlDocument();
theadmin
14-May-15 18:11pm
View
Thank you very much for that, I really need to brush up on this XPath stuff.
theadmin
14-May-15 18:10pm
View
I just found out that this isn't going to work with the html files. I was just using that file as a sample to test.
theadmin
14-May-15 18:10pm
View
Well here is the story. I was manually parsing webpages for my values, someone asked my why I am parsing just use HtmlAgility. I decided to give it a shot and I am trying to learn how to use it, now you present an even easier way. I am like 100% more confused than I was before but I do like your solution, I am going to definitely use this.
theadmin
14-May-15 16:27pm
View
Hi Mathi,
I am just using that file as an example since its simple, once I grab a webpage its going to be 1000 times more complex. I chose to use this bit of code as a test in order for me to really understand how it's done.
The code you pasted will return all values under each child, I am trying to figure out how to retrieve each value individually in order to add to a listview.
Listview.Text = Title;
Listview.Subitems.Add(Price);
Listview.Subitems.Add(Availability);
It seems like Agility is my best bet when it comes to parsing html code, I am just trying to learn the program in order to understand what I am doing. I don't know if all of this can be achieved by using what you posted, but I really want to use the agility pack and get a good grip on it..
theadmin
14-May-15 16:26pm
View
Are you serious? That's it???? Let me give it a shot, if that works then I am set.
Show More