Click here to Skip to main content

Articles by Richard Penman (Article: 1, Technical Blogs: 24, Tip/Trick: 1)

Article: 1, Technical Blogs: 24, Tip/Trick: 1

RSS Feed

Average article rating: 4.50

Uncategorised Technical Blogs
General
Posted: 26 Jun 2014   Updated: 26 Jun 2014   Views: 3,519   Rating: 4.50/5    Votes: 3   Popularity: 2.39
Licence: The GNU Lesser General Public License (LGPLv3)      Bookmarked: 5   Downloaded: 0
Offline Reverse Geocode

Average blogs rating: 4.95

Applications & Tools
General
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 5,002   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
My solution using Webkit.
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 6,870   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 11   Downloaded: 0
How to learn about web scraping.
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 5,101   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
How to use proxies.
Posted: 13 Jan 2013   Updated: 13 Jan 2013   Views: 4,556   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 1   Downloaded: 0
Crawling with threads.
Posted: 13 Jan 2013   Updated: 13 Jan 2013   Views: 3,915   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
Using Google Translate to crawl a website.
Posted: 15 Jan 2013   Updated: 15 Jan 2013   Views: 4,923   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 1   Downloaded: 0
Why web2py?
Posted: 15 Jan 2013   Updated: 15 Jan 2013   Views: 3,567   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
I have three solutions for periodically scraping a website.
Posted: 16 Jan 2013   Updated: 16 Jan 2013   Views: 4,246   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
Scraping JavaScript based web pages with Chickenfoot.
HTML / CSS
General
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 4,993   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 5   Downloaded: 0
I made an earlier post about using webkit to process the JavaScript in a webpage so you can access the resulting HTML and how to apply it to multiple webpages.
HTML
Posted: 18 Jan 2013   Updated: 18 Jan 2013   Views: 2,078   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 0   Downloaded: 0
How to use XPaths robustly
Posted: 23 Jan 2013   Updated: 23 Jan 2013   Views: 6,305   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 3   Downloaded: 0
HTML is a tree structure: at the root is a tag followed by the and tags and then more tags before the content itself. However when a webpage is downloaded all one gets is a series of characters. Working directly with that text is fine when using regular expressions, but often we want to traverse
Web Security
General
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 6,470   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 5   Downloaded: 0
I have been interested in automatic approaches to web scraping for a few years now.During university I created the SiteScraper library, which used training cases to automatically scrape webpages.This approach was particularly useful for scraping a website periodically because the model could automat
Posted: 16 Jan 2013   Updated: 16 Jan 2013   Views: 2,739   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
Some strategies to protect your data.
Posted: 16 Jan 2013   Updated: 16 Jan 2013   Views: 3,862   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
How to crawl websites without being blocked.
Posted: 18 Jan 2013   Updated: 18 Jan 2013   Views: 3,111   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
Using regular expressions for web scraping is sometimes criticized, but I believe they still have their place, particularly for one-off scrapes. Let's say I want to extract the title of a particular webpage - here is an implementation using BeautifulSoup, lxml, and regular expressions:import reimpor
Posted: 23 Jan 2013   Updated: 23 Jan 2013   Views: 5,077   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 4   Downloaded: 0
In this post I will clarify what I do by walking through a simple web scraping job I worked on.
Database
MySQL
Posted: 7 Jan 2013   Updated: 7 Jan 2013   Views: 7,140   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 3   Downloaded: 0
Sometimes I need to import large spreadsheets into MySQL.The easy way would be to assume all fields are varchar, but then the database would lose features such as ordering by a numeric field.The hard way would be to manually determine the type of each field to define the schema.That doesn't sound mu
Other .NET Languages
General
Posted: 7 Jan 2013   Updated: 7 Jan 2013   Views: 5,318   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 5   Downloaded: 0
Python and other scripting languages are sometimes dismissed because of their inefficiency compared to compiled languages like C. For example here are implementations of the fibonacci sequence in C and Python:int fib(int n){ if (n < 2) return n; else return fib(n - 1) + fib(n - 2);}int m
Cross Platform
Qt
Posted: 15 Jan 2013   Updated: 15 Jan 2013   Views: 3,540   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
Webkit has now been ported to the Qt framework and can be used through its Python bindings.
Libraries
General
Posted: 13 Jan 2013   Updated: 13 Jan 2013   Views: 4,139   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 3   Downloaded: 0
Automatically scraping website data based on example cases.
Posted: 16 Jan 2013   Updated: 16 Jan 2013   Views: 4,544   Rating: 4.67/5    Votes: 2   Popularity: 1.20
Licence: The Code Project Open License (CPOL)      Bookmarked: 1   Downloaded: 0
How to increase your Google App Engine quotas for free.
Threads, Processes & IPC
Threading
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 4,150   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 4   Downloaded: 0
In a previous post I showed how to scrape a list of webpages. Here is an updated example that downloads the content in multiple threads.
Uncategorised Technical Blogs
General
Posted: 7 Jan 2013   Updated: 7 Jan 2013   Views: 7,433   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
There is a nice website screenshots.com that hosts historic screenshots for many websites.
Reviews on Third Party Products and Tools
Community Reviews
Posted: 7 Jan 2013   Updated: 7 Jan 2013   Views: 23,197   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 4   Downloaded: 0
Some websites require passing a CAPTCHA to access their content. As I have written before these can be parsed using the deathbycaptcha API, however for large websites with many CAPTCHA's this becomes prohibitively expensive. For example solving 1 million CAPTCHA's with this API would cost $1390.Fort

Average tips rating: 0.00

Amazon Web Services
General
Posted: 26 Jun 2014   Updated: 26 Jun 2014   Views: 2,968   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 3   Downloaded: 0
A few friends asked me what web services I use to run my business so I am writing this to point people in future.
No reference articles have been posted.

Richard Penman

Australia Australia
No Biography provided


Advertise | Privacy | Mobile
Web04 | 2.8.1411022.1 | Last Updated 23 Nov 2014
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid