Click here to Skip to main content
11,647,828 members (67,758 online)
Click here to Skip to main content

Tagged as

How to scrape data from the Web using Node.js

, 6 Jan 2014 CPOL 20.2K 247 19
Rate this:
Please Sign up or sign in to vote.
This article shows you how to fetch data from the Web using the powerful Node.js

Introduction

Using Node.js , you can do what you want like a website for chat , a Social Network Like LinkedIn and Facebook and also you can fetch data from The Web.

Background

In the past, I have wrote this post on the different options you can use to scrape data from the Web using for the HtmlAgilityPack in .Net Development Environment. So you can do the same functionality using the powerful Node.Js .

So Node.js is a platform built on Chrome's JavaScript Runtime for easily building fast, scalable network application. Node.js uses an event-driven, non-blocking Input Output model that make it lightweight and  efficient, perfect for data-intensive real-time applications that run across distributed devices. you can download it here :  http://nodejs.org/ . 

Using the code

You Will need to install Node.js of course and on top it 3 packages:

  1. npm install request: we could work with the URLs in an easy way.
  2. npm install cheerio: Cheerio is a jQuery for the server side.
  3. npm install fs: We use this Package to make files.   

After you finish the installation process, you should start a JavaScript File that contains our Code. So the first thing you must do is call the modules we need in our application: 

var request= require('request'),
	cheerio = require('cheerio'),
	fs = require('fs'), 
	urls= []; // urls is a variable to store the urls of the pictures. 

After that we must add the main call to fetch data from the website, parse it and work on it . In this case, we will fetch the location of the picture founded on www.reddit.com as shown below:

request('http://www.reddit.com/', function(err,resp,body)
{
	if(!err && resp.statusCode == 200)
	{
		var $ = cheerio.load(body);
		$('a.title', '#siteTable').each(function(){
			
			var url = this.attr('href');
			if(url.indexOf('i.imgur.com')!=-1)
			{
			urls.push(url);
			}
		}); 

Now let's do something with the data we have : we will store the picture founded on the Directory called"img" that you must create it inside the directory of your work. 

for (var i=0;i<urls.length;i++)
		{
			request(urls[i]).pipe(fs.createWriteStream('img/' + i +'.jpg'));
		}
	// now we will close the curly braces and end our program	
	}
}); 

Well done. Now, we have to type on The Node.js Command Prompt : Node Scrapping.js.

Now, you have all the pictures of the websites stored on your directory. 

Any comments are welcome! 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Hadrich Mohamed
Student
Tunisia Tunisia
Software Engineering student in the International Institute of technology of Sfax (IIT),Microsoft Certified Professional : Specialist in C#,Microsoft Student Partner , IEEE IIT SB member, IIT Microsoft Club President And an Appli Academy 2013-2014 Certified.
Mail : hadrichmed@gmail.com
Phone : +216 99 106 643

You may also be interested in...

Comments and Discussions

 
QuestionThe script is doing nothing Pin
Amir Khan16-Mar-15 22:06
memberAmir Khan16-Mar-15 22:06 
Questionscrape wole website Pin
Member 1089203825-Aug-14 7:48
memberMember 1089203825-Aug-14 7:48 
Questionweb scrapping Pin
Member 1089203814-Aug-14 22:01
memberMember 1089203814-Aug-14 22:01 
QuestionWhat is the actual use of Node.js Pin
Tridip Bhattacharjee6-Jan-14 18:59
memberTridip Bhattacharjee6-Jan-14 18:59 
AnswerRe: What is the actual use of Node.js Pin
Hadrich Mohamed26-Feb-14 22:55
professionalHadrich Mohamed26-Feb-14 22:55 
GeneralRe: What is the actual use of Node.js Pin
Tridip Bhattacharjee27-Feb-14 20:15
memberTridip Bhattacharjee27-Feb-14 20:15 
AnswerRe: What is the actual use of Node.js Pin
DaveAuld31-May-14 2:27
protectorDaveAuld31-May-14 2:27 
Question"npm install fs" does not work Pin
mhn21728-Dec-13 0:06
membermhn21728-Dec-13 0:06 
AnswerRe: "npm install fs" does not work Pin
Hadrich Mohamed28-Dec-13 3:36
memberHadrich Mohamed28-Dec-13 3:36 
Generalfs works Pin
mhn2172-Jan-14 2:14
membermhn2172-Jan-14 2:14 
Questionno fs package Pin
ej8ej27-Dec-13 12:11
memberej8ej27-Dec-13 12:11 
AnswerRe: no fs package Pin
pipiscrew28-Dec-13 0:23
memberpipiscrew28-Dec-13 0:23 
GeneralRe: no fs package Pin
Hadrich Mohamed28-Dec-13 3:53
memberHadrich Mohamed28-Dec-13 3:53 
GeneralRe: no fs package Pin
ej8ej29-Dec-13 9:24
memberej8ej29-Dec-13 9:24 
Questionuse nodeJSgui Pin
pipiscrew27-Dec-13 0:37
memberpipiscrew27-Dec-13 0:37 
QuestionHow to get only html without images Pin
Sunasara Imdadhusen25-Dec-13 23:33
memberSunasara Imdadhusen25-Dec-13 23:33 
AnswerRe: How to get only html without images Pin
Hadrich Mohamed26-Dec-13 1:30
memberHadrich Mohamed26-Dec-13 1:30 
QuestionNode.js is still confusing to me Pin
Chicken9925-Dec-13 21:56
memberChicken9925-Dec-13 21:56 
GeneralRe: Node.js is still confusing to me Pin
Gebbetje4-Jan-14 4:15
memberGebbetje4-Jan-14 4:15 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.150804.4 | Last Updated 6 Jan 2014
Article Copyright 2013 by Hadrich Mohamed
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid