Click here to Skip to main content

Articles by Richard Penman (Article: 1, Technical Blogs: 24, Tip/Trick: 1)

Article: 1, Technical Blogs: 24, Tip/Trick: 1

RSS Feed

Average article rating: 4.50

Uncategorised Technical Blogs
General
Posted: 26 Jun 2014   Updated: 26 Jun 2014   Views: 3,240   Rating: 4.50/5    Votes: 3   Popularity: 2.39
Licence: The GNU Lesser General Public License (LGPLv3)      Bookmarked: 5   Downloaded: 0
Offline Reverse Geocode

Average blogs rating: 4.95

Applications & Tools
General
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 4,900   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 1   Downloaded: 0
My solution using Webkit.
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 6,745   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 11   Downloaded: 0
How to learn about web scraping.
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 4,937   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
How to use proxies.
Posted: 13 Jan 2013   Updated: 13 Jan 2013   Views: 4,484   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 1   Downloaded: 0
Crawling with threads.
Posted: 13 Jan 2013   Updated: 13 Jan 2013   Views: 3,816   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
Using Google Translate to crawl a website.
Posted: 15 Jan 2013   Updated: 15 Jan 2013   Views: 4,814   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 1   Downloaded: 0
Why web2py?
Posted: 15 Jan 2013   Updated: 15 Jan 2013   Views: 3,485   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
I have three solutions for periodically scraping a website.
Posted: 16 Jan 2013   Updated: 16 Jan 2013   Views: 4,158   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
Scraping JavaScript based web pages with Chickenfoot.
HTML / CSS
General
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 4,920   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 5   Downloaded: 0
I made an earlier post about using webkit to process the JavaScript in a webpage so you can access the resulting HTML and how to apply it to multiple webpages.
HTML
Posted: 18 Jan 2013   Updated: 18 Jan 2013   Views: 2,041   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 0   Downloaded: 0
How to use XPaths robustly
Posted: 23 Jan 2013   Updated: 23 Jan 2013   Views: 6,176   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 3   Downloaded: 0
HTML is a tree structure: at the root is a tag followed by the and tags and then more tags before the content itself. However when a webpage is downloaded all one gets is a series of characters. Working directly with that text is fine when using regular expressions, but often we want to traverse
Web Security
General
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 6,347   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 5   Downloaded: 0
I have been interested in automatic approaches to web scraping for a few years now.During university I created the SiteScraper library, which used training cases to automatically scrape webpages.This approach was particularly useful for scraping a website periodically because the model could automat
Posted: 16 Jan 2013   Updated: 16 Jan 2013   Views: 2,680   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
Some strategies to protect your data.
Posted: 16 Jan 2013   Updated: 16 Jan 2013   Views: 3,762   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
How to crawl websites without being blocked.
Posted: 18 Jan 2013   Updated: 18 Jan 2013   Views: 3,034   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
Using regular expressions for web scraping is sometimes criticized, but I believe they still have their place, particularly for one-off scrapes. Let's say I want to extract the title of a particular webpage - here is an implementation using BeautifulSoup, lxml, and regular expressions:import reimpor
Posted: 23 Jan 2013   Updated: 23 Jan 2013   Views: 5,023   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 4   Downloaded: 0
In this post I will clarify what I do by walking through a simple web scraping job I worked on.
Database
MySQL
Posted: 7 Jan 2013   Updated: 7 Jan 2013   Views: 6,930   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 3   Downloaded: 0
Sometimes I need to import large spreadsheets into MySQL.The easy way would be to assume all fields are varchar, but then the database would lose features such as ordering by a numeric field.The hard way would be to manually determine the type of each field to define the schema.That doesn't sound mu
Other .NET Languages
General
Posted: 7 Jan 2013   Updated: 7 Jan 2013   Views: 5,180   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 5   Downloaded: 0
Python and other scripting languages are sometimes dismissed because of their inefficiency compared to compiled languages like C. For example here are implementations of the fibonacci sequence in C and Python:int fib(int n){ if (n < 2) return n; else return fib(n - 1) + fib(n - 2);}int m
Cross Platform
Qt
Posted: 15 Jan 2013   Updated: 15 Jan 2013   Views: 3,452   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 2   Downloaded: 0
Webkit has now been ported to the Qt framework and can be used through its Python bindings.
Libraries
General
Posted: 13 Jan 2013   Updated: 13 Jan 2013   Views: 4,065   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 3   Downloaded: 0
Automatically scraping website data based on example cases.
Posted: 16 Jan 2013   Updated: 16 Jan 2013   Views: 4,377   Rating: 4.67/5    Votes: 2   Popularity: 1.20
Licence: The Code Project Open License (CPOL)      Bookmarked: 1   Downloaded: 0
How to increase your Google App Engine quotas for free.
Threads, Processes & IPC
Threading
Posted: 9 Jan 2013   Updated: 9 Jan 2013   Views: 4,057   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 4   Downloaded: 0
In a previous post I showed how to scrape a list of webpages. Here is an updated example that downloads the content in multiple threads.
Uncategorised Technical Blogs
General
Posted: 7 Jan 2013   Updated: 7 Jan 2013   Views: 7,335   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 1   Downloaded: 0
There is a nice website screenshots.com that hosts historic screenshots for many websites.
Reviews on Third Party Products and Tools
Community Reviews
Posted: 7 Jan 2013   Updated: 7 Jan 2013   Views: 22,073   Rating: 5.00/5    Votes: 1   Popularity: 0.00
Licence: The Code Project Open License (CPOL)      Bookmarked: 4   Downloaded: 0
Some websites require passing a CAPTCHA to access their content. As I have written before these can be parsed using the deathbycaptcha API, however for large websites with many CAPTCHA's this becomes prohibitively expensive. For example solving 1 million CAPTCHA's with this API would cost $1390.Fort

Average tips rating: 0.00

Amazon Web Services
General
Posted: 26 Jun 2014   Updated: 26 Jun 2014   Views: 2,861   Rating: 0.0 / 5    Votes: 0   Popularity: 0.0
Licence: The Code Project Open License (CPOL)      Bookmarked: 3   Downloaded: 0
A few friends asked me what web services I use to run my business so I am writing this to point people in future.
No reference articles have been posted.

Richard Penman

Australia Australia
No Biography provided


Advertise | Privacy | Mobile
Web04 | 2.8.141022.1 | Last Updated 22 Oct 2014
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid