Click here to Skip to main content
13,092,862 members (66,345 online)
Rate this:
Please Sign up or sign in to vote.
See more:
Hi all,

I developed a web scraper (using C#) that should be able to make thousands of requests each time.

The problem is that the website's server will block my IP after a number of requests.


1- How to prevent being blocked?
2- How to know when will the website's server will block my IP? I mean how to know my limit whether being certain amount of traffic or certain number of requests.

Posted 10-May-11 1:31am
Updated 10-May-11 1:38am
Rate this: bad
Please Sign up or sign in to vote.

Solution 2

There's no way to do this. The one way would to be to limit the scraping to a very slow rate which kinda nullifies the very purpose of scraping.

Alternatively, spread the scraping out to multiple domains. For example pick a 100 domains, get 1 page from domain-1, then the next from domain-2, and so on till domain-100, then get the 2nd page from domain-1, then from domain-2, and so on. The trick here is that this artificially slows down your scraping to 1/100 of its former speed (from the server's perspective), but you don't actually lose out on your scraping speeds because you are scraping from multiple sites. Makes sense?
BobJanova 10-May-11 9:58am
This is a good idea. The OP needs to realise that his scraper is essentially a low level DoS tool and modify it accordingly. Spreading his efforts over multiple servers would do that quite effectively.
Rate this: bad
Please Sign up or sign in to vote.

Solution 1

1) That's very easy. Don't do anything that might lead to being considered a threat. Bombarding a server with 'thousands of requests each time' could be considered to be hostile.

2) Now that's a good idea. Each server should post information on how much abuse its owner will tolerate. Seriously, it's very much like when you misbehave at somebody's house. When the owner grabs you by the collar and shows you the door, then you know how far you could go.
Rate this: bad
Please Sign up or sign in to vote.

Solution 3

It's not possible to get block if you are able to use my way.

Use an ADSL modem like Airties, let your server use that internet connection and send reset command in a schedule.

That works like a charm. :)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy |
Web01 | 2.8.170813.1 | Last Updated 30 Dec 2015
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100