Click here to Skip to main content
13,001,159 members (59,225 online)
Click here to Skip to main content
Add your own
alternative version


17 bookmarked
Posted 19 Aug 2009

PostgreSQL Synchronization Tool

, 19 Aug 2009
Rate this:
Please Sign up or sign in to vote.
Idea and implementation of a simple and easy database synchronization tool.


This article introduces a method of synchronizing two PostgreSQL databases. Although, this seems to be an easy task, no product (slony, londiste, ...) really satisfied the needs within the project. Either they have special prerequisites that didn't apply for our problem, or they didn't support synchronizing of large objects.

Large objects are used to store tiles of a street/aerial map within PostgreSQL. My GIS-server queries the database and gets the tiles out. By using this construction, we are getting a flexible infrastructure for updating and maintaining different versions of the maps.

Everything was working fine until the service needs to be spread over three servers. How can we easily synchronize the databases? I really found no working solution that was clean and easy to use.


As I'm using rsync on a regular basis, I thought why isn't there a tool for databases that simply does nothing more than synchronizing two databases on demand - no sophisticated triggers or additional changes of the database. psync is a simple application that synchronizes two PostgreSQL databases. The concept can be adapted or even encapsulated so that other databases can also be used.

The idea of psync is to use two database connections and update the tables and their content like this:

  1. fetch all tables of the source and destination databases
  2. for each table in the source, do:
    1. create table if not exist on destination
    2. sync contents of table
  3. for each table in destination, do:
    1. delete table if not exists on the source

Although the algorithm looks simple, there are many problems to overcome.

  1. Duplicating tables
  2. Handling of special columns, e.g., oid for large objects
  3. Constraints on tables
  4. Special structures of extensions

Duplicating Tables

In order to duplicate a table, we need to find out which columns exist and which primary keys we have and so on. With this information, we can assemble a CREATE TABLE command that creates the table. In PostgreSQL, the information is stored in a table named information_schema.columns: column_name, is_nullable, and data_type. Primary keys are a little bit more complicated as we need two tables information_schema.table_constraints and information_schema.key_column_usage. See PSync.PrimaryKeys for the whole SQL statement.

Large Object Identifiers

The current implementation of psync focuses on the oid data type for handling large objects. Large objects store binary data and are identified using an integer. Tables with an oid column store only references to a large object. Most replication solutions can't handle large objects.

Certainly, we can't just copy the identifiers from the source to the destination, we need to compare the data itself. Always copying the whole object is not a solution as they may be big and this will waste network bandwidth and time when they are equal. The solution used by psync is hash values (currently the MD5 algorithm is used). In order to keep the traffic low, these hashes are calculated by the database using a custom function without transferring any data. psync compares these values and copies the object on demand.

The PostgreSQL function for calculating an MD5 hash value is (original source):

as $$
 fd        integer;
 size      integer;
 hashval   text;
 INV_READ  constant integer := 262144; -- 0x40000 from libpq-fs.h
 SEEK_SET  constant integer := 0;
 SEEK_END  constant integer := 2;
 IF id is null THEN
 fd   := lo_open(id, INV_READ);
 size := lo_lseek(fd, 0, 2);
 PERFORM lo_lseek(fd, 0, 0);
 hashval := md5(loread(fd, size));
 PERFORM lo_close(fd);
 RETURN hashval;
language plpgsql stable strict;
comment on FUNCTION md5(id oid) is 'Calculates the md5 sum of a large object.';

The process gets even more complex as there maybe more than one oid column in a table. The following code segment shows the process of constructing the SQL query for retrieving the hash values:

bool first;
string _select = "select";
string _from = " from";
string _where = " where";

// Compose the select block
first = true;
foreach(int oid in oids) {
  if (!first)
    _select += ",";
  // Get the md5 hash value for the oid-column
  _select += string.Format(" md5({0})", result.ColumnName(oid));
  first = false;

// Data from the table we are currently working on
_from += table;

// The row identified by the primary keys of the table
first = true;
foreach(string key in primaryKeys) {
  if (!first)
    _where += " and";
  _where += string.Format(" {0}='{1}'", key, result[row, result.ColumnIndex(key)]);
  first = false;
command = _select + _from + _where;

After building the SQL command, the results are compared:

Result r1, r2;
try {
  r1 = dst.Exec(command);
  if (r1.Rows > 0) {
    r2 = src.Exec(command);

    // Compare large objects
    for(int c=0; c<r1.Columns; c++) {
      if (r1[0, c] != r2[0, c]) {
        // md5 hashes are different
        int id = CopyOid(result.GetInt(row, oids[c]));
        command = string.Format("update {0} set {1}='{2}'{3}", 
                                table, result.ColumnName(oids[c]), id, _where);

  else {
    // Row missing on destination
    command = string.Format("insert into {0} values(", table);
    first = true;
    for(int col=0; col<result.Columns; col++) {
      if (!first)
        command += ",";
      if (oids.Contains(col)) {
        int id = CopyOid(result.GetInt(row, oid));
        command += string.Format(" '{0}'", id);
        command += string.Format(" '{0}'", result[row, col]);
      first = false;
    command += ")";
} catch(PostgreSQLException e) {
  // Large object id exist on destination but does not reference an object
  foreach(int oid in oids) {
    int id = CopyOid(result.GetInt(row, oid));
    command = string.Format("update {0} set {1}='{2}'{3}", 
                            table, result.ColumnName(oid), id, _where);

Using the Code

The code is implemented in two parts. The npq namespace implements a low-level wrapper around libpq (class PG) together with a high-level interface for easier use.

Example use of the library:

PostgreSQL db = new PostgreSQL("hostaddr='' port='5432' requiressl='1' " + 
                               "user='XXX' password='XXX' dbname='XXX'");
Console.WriteLine("Server {0} Protocol {1}", db.Version, db.Protocol);

Result result = db.Exec("select * from some_table");
Console.WriteLine("Result {0}", result.Valid);
Console.Write("{0,5} ", "Nr");
for(int f=0; f<result.Columns; f++)
  Console.Write("| {0,15}:{1,8}", result.ColumnName(f), result.ColumnType(f));
for(int n=0; n<result.Rows; n++) {
  Console.Write("{0,5} ", n);
  for(int f=0; f<result.Columns; f++)
    Console.Write("| {0,24}", result[n, f]);
// You'll need to explicitly call Dispose to free associated ressources

Handling of large objects is done like this:

LargeObject lo = new LargeObject(db);
byte[] tmp = new byte[1024];
int s = 0;
FileStream fs = new FileStream(result[n, 0]+".jpg", FileMode.Create);
while ((s = lo.Read(tmp, 1024)) > 0)
  fs.Write(tmp, 0, s);

The code was developed under Linux using Mono and MonoDevelop. It should aslo work under Windows but was not tested. Linux users must add the following configuration (npq.dll.config) to correctly map the library.

  <dllmap dll="libpq.dll" target="" os="!windows" />


  • 19.08.2009: Initial version.


This article, along with any associated source code and files, is licensed under The Common Development and Distribution License (CDDL)


About the Author

Germany Germany
Utilizing my experience of industry-scale real-time graphics programming, design, development of software, as well as European research projects I want to bring in new ideas for creative projects.
- 2009-now BTC AG: software for the renewable energy sector.
- 2007-2009 Digital Media: Audio, Graphics and GIS Web Services(
- 2001-2007 artecLab://art/work/technology: Mixed Reality, Computer Games and eLearning
- 1998-2001 STN ATLAS Elektronik: real-time graphics for ground warfare simulation

For a complete resume see

You may also be interested in...

Comments and Discussions

GeneralMy vote of 1 Pin
Yuriy Yurchenko20-Aug-09 4:13
memberYuriy Yurchenko20-Aug-09 4:13 
See the Slony I and Slony 2.
Article is a new bicycle
GeneralRe: My vote of 1 Pin
mf04020-Aug-09 5:27
membermf04020-Aug-09 5:27 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170624.1 | Last Updated 19 Aug 2009
Article Copyright 2009 by mf040
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid