Click here to Skip to main content
14,580,263 members

Running a .NET Core Web Crawler on a Raspberry Pi

Rate this:
4.85 (11 votes)
Please Sign up or sign in to vote.
4.85 (11 votes)
16 Dec 2017CPOL
With a web crawler that runs on a Raspberry Pi, you can automate a boring daily task, such as price monitoring or market research

Introduction

Recently, I developed an interest in IOT and Raspberry Pi, since I'm .NET developer, so I started to explore .NET Core on Linux stack. The reason was simple - because linux stack is cheap and can run everywhere, I built my website in .NET Core that runs on Ubuntu on Linode for $5/month, next I started exploring Raspberry Pi that runs on Linux distribution flavour Raspbian. My first project is to build a web crawler in C# that runs on Raspberry pi to get the latest shopping deals from popular sites such as Amazon or Bestbuy, then it posts data to WebApi to feed my site http://www.fairnet.com/deal.

Prerequisites

Visual Studio 2017 with the ".NET Core cross-platform development" workload installed. You can download the community edition which is free.

Using the Code

Launch Visual Studio 2017. Select File > New > Project from the menu bar. In the New Project* dialog, select the Visual C# node followed by the .NET Core node. Then select the Console App (.NET Core) project template.

Image 1Image 2

Install HtmlAgilityPack, and Newtonsoft.Json NuGet packages.

Image 3Image 4

HtmlAgilityPack is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT.

Here is the request to the website to get all HTML pages:

HttpClient client = new HttpClient();
using (var response = await client.GetAsync(url))
   {
       using (var content = response.Content)
       {
           var result = await content.ReadAsStringAsync();
           var document = new HtmlDocument();
           document.LoadHtml(result);
           var nodes = document.DocumentNode.SelectNodes("//div[@class='item-inner clearfix']");
           var storeData = new List<store>();
           foreach (var node in nodes)
           {
               Store _store = ParseHtml(node);
               storeData.Add(_store);
           }

           HttpResponseMessage resp = await client.PostAsJsonAsync<list<store>>
                                      (@"/api/stores", storeData);
       }
   }

I post the parsed data to webApi, where it gets saved in MongoDB.

HttpResponseMessage resp = await client.PostAsJsonAsync >(@"/api/stores", storeData);

Here is the ParseHtml method to parse useful data.

private static Store ParseHtml(HtmlNode node)
   {
       var _store = new Store();

       _store.Image = node.Descendants("img").ElementAt(imgIndex).OuterHtml;
       _store.Link = node.Descendants("a").Select
                     (s => s.GetAttributeValue("href", "not found")).FirstOrDefault();
       _store.Title = node.Descendants("a").ElementAt(titIndex).InnerText;
       _store.Price = node.Descendants("span").ElementAt(pricIndex).InnerText;
       _store.RetailPrice = node.Descendants("span").ElementAt(retpricIndex).InnerText;

       return _store;
 }

Next, I need to setup Raspberry Pi so that .NET code can run on it.

Supplies required:

  • Raspberry Pi 3 Model B
  • HDMI cable
  • USB mouse / keyboard
  • SD card
  • 2 Amp USB power supply

Setup Raspberry Pi

  1. The recommended OS is called Raspbian. Download it from https://www.raspberrypi.org/downloads/raspbian/
  2. Install .NET Core 2 onto the Raspberry Pi
  3. Deploy this application to your Pi running Raspbian

Once Raspbian has been installed, configure Raspberry Pi to connect from the development machine.

Enabled SSH from Raspberry Pi Configuration screen.

Image 5Image 6

Next, we need to find the IP address of the Raspberry Pi.

Open a terminal on your Pi and type:

hostname -I

Next, install PUTTY to connect from your development machine.

Image 7Image 8

The default username and password for Raspbian is “pi” and “raspberry“:

Image 9Image 10

Install .NET Core 2 onto the Raspberry Pi.

# Update the Raspbian install
sudo apt-get -y update

# Install the packages necessary for .NET Core
sudo apt-get -y install libunwind8 gettext

# Download the nightly binaries for .NET Core 2
wget https://dotnetcli.blob.core.windows.net/dotnet/Runtime/release/2.0.0/
     dotnet-runtime-latest-linux-arm.tar.gz

# Create a folder to hold the .NET Core 2 installation
sudo mkdir /opt/dotnet

# Unzip the dotnet zip into the dotnet installation folder
sudo tar -xvf dotnet-runtime-latest-linux-arm.tar.gz -C /opt/dotnet

# set up a symbolic link to a directory on the path so we can call dotnet
sudo ln -s /opt/dotnet/dotnet /usr/local/bin

Run dotnet --info command to see the version installed on Raspbian.

Image 11Image 12

Create .NET deployment release build for linux-arm:

dotnet publish -c release -r linux-arm

Now, create a folder for webcrawler, and transfer project files using FTP. then, run dotnet webcrawler.

dotnet webcrawler.dll

Points of Interest

I’ll be blogging more on the future on developing IoT applications to this platform.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Farooq Kaiser
Software Developer (Senior) http://www.Fairnet.com
Canada Canada
No Biography provided

Comments and Discussions

 
QuestionSegmentation fault - Raspberry pi 1 - Model B Pin
dilipprasad8722-Jan-18 1:42
Memberdilipprasad8722-Jan-18 1:42 
GeneralMy vote of 5 Pin
Igor Ladnik6-Jan-18 7:58
mvaIgor Ladnik6-Jan-18 7:58 
Praiseawesome~ Pin
woojja19-Dec-17 10:41
Memberwoojja19-Dec-17 10:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Article
Posted 16 Dec 2017

Tagged as

Stats

13.4K views
12 bookmarked