Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C#
i want to do screen scrapping. currently i am using web client in c# to get the page source of the web page. but the problem is that i need to press some buttons in order to get proper
data. i cant use the selenium cause selenium use web browser like firefox and having visible interface to the end user.
the major problem is that i want to hide activity performed by selenium or any third party component during screen scrapping from the end user.
can suggest me accordingly ?
Posted 17-Feb-13 23:38pm
Comments
Marco Bertschi at 18-Feb-13 8:54am
   
Why do you want to do this? Why should the user not know that his screen is being scrapped?
Sandeep Mewara at 18-Feb-13 9:26am
   
Why should the user not know that his screen is being scrapped? - Exactly. Hiding things from user puts application in suspect category.
Manfred R. Bihy at 18-Feb-13 10:22am
   
I don't think there is anything suspicious going on. I think OP wants Web Page Scraping, which OP tried via Selenium, but with the drawback of having to deal with its user interface. :)
Sandeep Mewara at 18-Feb-13 10:26am
   
Yep, possible. Thus, it was just the comment and nothing else. :) A second opinion at times helps. Thanks.
Sergey Alexandrovich Kryukov at 18-Feb-13 11:29am
   
Of course, there is nothing suspect. This would be just a crime. —SA
Marco Bertschi at 18-Feb-13 11:58am
   
I was not sure about the suspicious touch of the question. That is why I asked :D. As Sandeep said - a second opinion is always helpful.
Manfred R. Bihy at 18-Feb-13 10:12am
   
I believe that screen scrapping will always be detected by the user. At the latest when the user looks and up tries to detect where the fook his monitor was and finds that it has just been scrapped: throw somebody/something on the scrap heap What OP seems to be talking about is more like screen scraping and then not even quite that, but rather page or Web scraping. Cheers!
Sm.Abdullah at 19-Feb-13 13:14pm
   
@Marco Bertschi ! @Sandeep Mewara First of all let me clear i am talking about web page scraping. and intentions from hiding it from the end user is that user cannot interfere or interact with my scraping technique. I want to fetch data from web page and want to display by my own way to end user. there is also an other problem with selenium is that it needs to set up server or java machine, firefox just want to avoid it.of course i am working on desktop application!
Sergey Alexandrovich Kryukov at 18-Feb-13 10:18am
   
Screen scraping or Web scraping? What's the problem of hiding? From what? —SA
Manfred R. Bihy at 18-Feb-13 10:33am
   
I believe OP tried to do the Web/Page scraping with Selenium, but OP doesn't to expose the user to Selenium's user interface. I think he's after something like HTML Agility Pack and the likes. Only problem there might be is that scraping pages that are only loaded after some JavaScript executes to retrieve and display the content is a bit more cumbersome.
Sergey Alexandrovich Kryukov at 18-Feb-13 11:30am
   
I would always prefer to hear what OP says... bad question, anyway, not sure it should be answered... —SA

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Since you mentioned Selenium and web client, I'll just go on and assume you were not talking about screen scraping (note that there is only one p in that word). Selenium is a tool that will do that, but obviously one with a user interface. Since you have not stated your ultimate goal, I can't really tell if you really need Selenium. Web/Page scraping can be done quite easily per code with the Html Agility Pack[^]. This is a free and great implementation which I have used myself before and there are also quite a few of our members who are using it.
 
If the pages you are using rely on JavaScript in order to have any data to be scraped, you'll probably need to use a hidden webbrowser control to fully load the page in the background and then operate on the content once it has been properly loaded.
 
Regards,
 
— Manfred
  Permalink  
Comments
Sandeep Mewara at 18-Feb-13 11:15am
   
My 5 for the answer and probably understanding the question correctly. :)
Manfred R. Bihy at 18-Feb-13 11:18am
   
Thank you Sandeep!
Marco Bertschi at 18-Feb-13 11:59am
   
My 5 for providing a useful answer and showing a workaround on the "Hide me from user"-thing.
Sm.Abdullah at 19-Feb-13 13:34pm
   
Manfred R. Bihy ! thnx manfred for your reply.. i used a hidden browser control too. but there is also a strange behavior or problem i found. plz take a look on this rough piece of code. //it will work fine form me htmlElementCollection collection = browser.getElementbyTagName("input"); collection[0].invokeMember("click"); //supposed the desired input field. // failed same piece of code against Div. htmlElementCollection collection = browser.getElementbyTagName("DIV"); collection[0].invokeMember("click"); //supposed the desired div element. it will show me nothing. if i click on identified div by mouse then it will show a popup light box. it is assured that i am calling invoke member on right div. can you suggest me if something going wrong. ?
Manfred R. Bihy at 19-Feb-13 13:59pm
   
Are you sure your div is the first one in the collection returned by getElementsByTagName. How would you know? Just being curious. ;)
Sm.Abdullah at 19-Feb-13 14:12pm
   
no it is not the first one in actual. i scan all the div and match the inner text. let say a div having inner text download. then i scan all the div and pick the div having inner text download i also check the parent and sibling of the selected htmlElement (which is the div having download inner text in this case ) in debug mode just to insure that i pick the right element or not and found the htmlElemet was the right div against i call invoke member. (in page source i found nothing like onclick attribute to execute javascript code i think there is jquery ).

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Your Filters
Interested
Ignored
     
0 Sergey Alexandrovich Kryukov 895
1 OriginalGriff 523
2 thatraja 245
3 Abhinav S 243
4 Emre Ataseven 200
0 Sergey Alexandrovich Kryukov 8,142
1 OriginalGriff 4,665
2 Peter Leow 3,774
3 Maciej Los 3,515
4 Er. Puneet Goel 3,107


Advertise | Privacy | Mobile
Web01 | 2.8.140415.2 | Last Updated 18 Feb 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Use
Layout: fixed | fluid