Click here to Skip to main content
13,086,120 members (84,106 online)
Rate this:
Please Sign up or sign in to vote.
Hi I am coding a web-crawler which will crawl the websites and selectively parse different sections of a web site.

I am a .Net developer so the choice was obvious that I did it in .Net but the speed was very slow which included downloading and parsing of HTMLPages

Then I tried to just download the contents first using .Net and then same domains using python but the python was very impressive in downloading data. I have achieved downloading using python but the later part is not that easy to code in python, which obviously i don't want to do.

The same batch of domain which took 100 seconds in Python
was taking 20 minutes in .Net based crawler

I tried to download and in took 10 seconds in Python and same was taking 2 minutes in .Net crawler

Does anyone anyone have any idea why this is slow in .Net but fast in python?
Posted 11-Feb-11 19:14pm
Updated 11-Feb-11 19:41pm
SAKryukov 12-Feb-11 1:18am
It needs your codes to see. First of all, did you run the crawler and python on the same system? I mean, you can use python on server side (module WSGI, highly recommend), but you did not tag ASP.NET.
eqlit 12-Feb-11 1:25am
Sorry I can not publish code some company policy u know :)
Yes I tried the downloading on same system, and the code did not include anything except HTTP Download and queue management, and I made a console application for the purpose

1 solution

Rate this: bad
Please Sign up or sign in to vote.

Solution 1

Have you tried using IronPython to insert the python code into your .NET application. That should allow it to download the pages with the speed found in Python. In my opinion the speed in python is faster because Python downloads pages in the form of tuples of byte strings whereas .NET may be downloading the HTML code as it is.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy |
Web01 | 2.8.170813.1 | Last Updated 12 Feb 2011
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100