I am new to Nutch. I am part of a IR research team & need to create a setup where in I need to crawl Microsoft's Dataset with Nutch. After googling for a while, I didn't get any tutorial or help. Can anyone guide me for the same?
I am using Nutch 1.4 on Ubuntu 11.10 & Eclipse 3.7.
Till now I am able to crawl public network from my Nutch setup integrated with Eclipse...
Is there any tutorial or wiki explaining how I can achieve this - or any other dataset kept on File System? If not, can you help me please....
Thanks in advance.