lunedì 5 luglio 2010

PostgreSQL 9.0: Physical Hot Standby & streaming replication

PostgreSQL 9.0 will be released probably at middle of august-September and it will have a lot of new interesting features, and in this article we will describe the most interesting one: hot standby & streaming replication. All tested are done on PostgreSQL 9 beta 2

How it works

First we need to clear the concept of physical hot standby. Normally we are using physical warm standby, i.e. a replicated system that start from a backup and is continuing recovering with archive, but it's not accessible for read only transactions. The other possibility is the logical standby that permits query, but it's normally a replicated query system with problems on random functions or DDL .

Hot standby is the union of features of the two: solidity of recover plus the read only access to the databases. This is possible because it's a warm standby that is continuously applying the wal segments ( sent with streaming replication), and moreover encapsulates the transactions in a read-only stack.

The process of generation of an hot standby is so:

  1. hot backup from master

  2. restore of backup on a slave

  3. startup of a slave and begin the catchup phase with receiver process that asks the wal segments to the sender process for writing wal segments and so permits to the startup process to recover until last wal segment

  4. when it have recovered all wal segments and finally catches up the master, it starts with streaming replication

  5. now the standby is continuously update in async mode.


Graph of streaming :




  1. Now that streaming is function the sender is reading from current wal segment and send the modification to the receiver that write to wal segments, while startup process continue to recover the standby using information of wal segments


For understand better how it's work, and the processes involved in this technology we need to discover first the new parameter involved:

  1. wal_level: the wal level have a new value “hot_standby” that have to be enabled on master node, and increases the logging level

  2. max_wal_senders: the number of connections between master and slave standby, that is needed for streaming replication

  3. wal_keep_segments: the number of WAL segments that is need for the catch up transaction

Besides there are new parameter for recovery.conf:

  1. standby_mode : enable the read only access

  2. trigger_file : file that if created stops the recover and open the standby

  3. restore_command : command string for restore archive wal segments.


Now we will see the processes involved in master:

and the process in standby :

As you could see there is a new process on master ( WAL sender ) and on standby (WAL receiver)

You could see the WAL sender is sending the position 0/14000070 to the WAL receiver , that is opening in writing the wal segments 00000010000000000000014 on standby.

In the while the startup is opening in reading the wal segments and is applying to the standby database.


Limits

Now , what are the limits of the technology? In practice there are 2 limits:

  1. the transaction could not lock in exclusive mode on standby node

  2. the transaction has to abort if the recovery is cleaning the death tuples.

The second point is very interesting. If there is a long query on standby is possible that is reading “old” rows that is begging cleaned up by vacuum in the master node, so when recovery start to clean rows, it not cares about the long query. Why? The answer is simple, the master doesn't know about what transactions are running in the standby.

For resolve this type of issue there are some practical solutions :

  1. use the new parameter “max_standby_delay” that permits to delay the recovery so the query could finish , the value is in seconds. If it is configured to -1 values, it means infinite time.

  2. vacuum_defer_cleanup_age , the number of transactions by which cleanup of death tuples are deferred


Performance monster

Even if there are these “limitations”, this technology shows in a stress test a very impressive fast replications as showed in this graph :

If you see the difference in consistence state between master and slave are minimal. Besides the difference was not correlated with the many smal transactions (a lot of commit) or very long transaction. The test was done on 2 virtual machine on the same storage (a usb external disk with 5400rpm) with centos 5.5 and PostgreSQL 9.0 beta.

In conclusion this feature is perfect for reporting system & disaster recovery architecture.

Mat


Returning from char(10)

I'm returned from my private participation at CHAR(10) , it was an incredible experience to listen all new projects and tecnology on Postgres, besides I met people like Simon Riggs, Tatsuo Ishii , Greg Smith, Ricardo Jimenez-Peris and all others and it was incredible.
I was very impressed by the projects on multi master replications or sync multi-node clusters database developed on PostgreSQL, moreover the new technology developed by Simon (hot standby replication ) is very impressive and very stable.

With time I will write on this blogs on all new technology and tests on all these projects

Stay tuned

Mat