Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
Hi I am coding a web-crawler which will crawl the websites and selectively parse different sections of a web site.
 
I am a .Net developer so the choice was obvious that I did it in .Net but the speed was very slow which included downloading and parsing of HTMLPages
 
Then I tried to just download the contents first using .Net and then same domains using python but the python was very impressive in downloading data. I have achieved downloading using python but the later part is not that easy to code in python, which obviously i don't want to do.
 
The same batch of domain which took 100 seconds in Python
was taking 20 minutes in .Net based crawler
 
I tried http://www.regexhacks.com/ to download and in took 10 seconds in Python and same was taking 2 minutes in .Net crawler
 
Does anyone anyone have any idea why this is slow in .Net but fast in python?
Posted 11-Feb-11 20:14pm
eqlit108
Edited 11-Feb-11 20:41pm
v3
Comments
SAKryukov at 12-Feb-11 1:18am
   
It needs your codes to see. First of all, did you run the crawler and python on the same system? I mean, you can use python on server side (module WSGI, highly recommend), but you did not tag ASP.NET.
--SA
eqlit at 12-Feb-11 1:25am
   
Sorry I can not publish code some company policy u know :)
but
Yes I tried the downloading on same system, and the code did not include anything except HTTP Download and queue management, and I made a console application for the purpose

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Have you tried using IronPython to insert the python code into your .NET application. That should allow it to download the pages with the speed found in Python. In my opinion the speed in python is faster because Python downloads pages in the form of tuples of byte strings whereas .NET may be downloading the HTML code as it is.
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 Sergey Alexandrovich Kryukov 535
1 OriginalGriff 275
2 BillWoodruff 260
3 Shweta N Mishra 244
4 Deepu S Nair 230
0 OriginalGriff 6,168
1 Sergey Alexandrovich Kryukov 5,818
2 DamithSL 4,958
3 Manas Bhardwaj 4,539
4 Maciej Los 3,755


Advertise | Privacy | Mobile
Web02 | 2.8.1411019.1 | Last Updated 12 Feb 2011
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100