Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# .NET IP
Hi all,
 
I developed a web scraper (using C#) that should be able to make thousands of requests each time.
 
The problem is that the website's server will block my IP after a number of requests.
 
Questions:
 
1- How to prevent being blocked?
2- How to know when will the website's server will block my IP? I mean how to know my limit whether being certain amount of traffic or certain number of requests.
 
Thanks.
Posted 10-May-11 2:31am
Yas_EG171
Edited 10-May-11 2:38am
v2
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

There's no way to do this. The one way would to be to limit the scraping to a very slow rate which kinda nullifies the very purpose of scraping.
 
Alternatively, spread the scraping out to multiple domains. For example pick a 100 domains, get 1 page from domain-1, then the next from domain-2, and so on till domain-100, then get the 2nd page from domain-1, then from domain-2, and so on. The trick here is that this artificially slows down your scraping to 1/100 of its former speed (from the server's perspective), but you don't actually lose out on your scraping speeds because you are scraping from multiple sites. Makes sense?
  Permalink  
Comments
BobJanova at 10-May-11 9:58am
   
This is a good idea. The OP needs to realise that his scraper is essentially a low level DoS tool and modify it accordingly. Spreading his efforts over multiple servers would do that quite effectively.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

1) That's very easy. Don't do anything that might lead to being considered a threat. Bombarding a server with 'thousands of requests each time' could be considered to be hostile.
 
2) Now that's a good idea. Each server should post information on how much abuse its owner will tolerate. Seriously, it's very much like when you misbehave at somebody's house. When the owner grabs you by the collar and shows you the door, then you know how far you could go.
  Permalink  
v2
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

It's not possible to get block if you are able to use my way.
 
Use an ADSL modem like Airties, let your server use that internet connection and send reset command in a schedule.
 
That works like a charm. Smile | :)
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 325
1 Sergey Alexandrovich Kryukov 289
2 CPallini 275
3 DamithSL 260
4 Maciej Los 215
0 OriginalGriff 5,455
1 DamithSL 4,422
2 Maciej Los 3,860
3 Kornfeld Eliyahu Peter 3,480
4 Sergey Alexandrovich Kryukov 3,010


Advertise | Privacy | Mobile
Web03 | 2.8.141216.1 | Last Updated 9 Feb 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100