How it works
First we need to clear the concept of physical hot standby. Normally we are using physical warm standby, i.e. a replicated system that start from a backup and is continuing recovering with archive, but it's not accessible for read only transactions. The other possibility is the logical standby that permits query, but it's normally a replicated query system with problems on random functions or DDL .
Hot standby is the union of features of the two: solidity of recover plus the read only access to the databases. This is possible because it's a warm standby that is continuously applying the wal segments ( sent with streaming replication), and moreover encapsulates the transactions in a read-only stack.
The process of generation of an hot standby is so:
hot backup from master
restore of backup on a slave
startup of a slave and begin the catchup phase with receiver process that asks the wal segments to the sender process for writing wal segments and so permits to the startup process to recover until last wal segment
when it have recovered all wal segments and finally catches up the master, it starts with streaming replication
now the standby is continuously update in async mode.
Graph of streaming :
Now that streaming is function the sender is reading from current wal segment and send the modification to the receiver that write to wal segments, while startup process continue to recover the standby using information of wal segments
For understand better how it's work, and the processes involved in this technology we need to discover first the new parameter involved:
wal_level: the wal level have a new value “hot_standby” that have to be enabled on master node, and increases the logging level
max_wal_senders: the number of connections between master and slave standby, that is needed for streaming replication
wal_keep_segments: the number of WAL segments that is need for the catch up transaction
Besides there are new parameter for recovery.conf:
standby_mode : enable the read only access
trigger_file : file that if created stops the recover and open the standby
restore_command : command string for restore archive wal segments.
Now we will see the processes involved in master:
and the process in standby :
As you could see there is a new process on master ( WAL sender ) and on standby (WAL receiver)
You could see the WAL sender is sending the position 0/14000070 to the WAL receiver , that is opening in writing the wal segments 00000010000000000000014 on standby.
In the while the startup is opening in reading the wal segments and is applying to the standby database.
Limits
Now , what are the limits of the technology? In practice there are 2 limits:
the transaction could not lock in exclusive mode on standby node
the transaction has to abort if the recovery is cleaning the death tuples.
The second point is very interesting. If there is a long query on standby is possible that is reading “old” rows that is begging cleaned up by vacuum in the master node, so when recovery start to clean rows, it not cares about the long query. Why? The answer is simple, the master doesn't know about what transactions are running in the standby.
For resolve this type of issue there are some practical solutions :
use the new parameter “max_standby_delay” that permits to delay the recovery so the query could finish , the value is in seconds. If it is configured to -1 values, it means infinite time.
vacuum_defer_cleanup_age , the number of transactions by which cleanup of death tuples are deferred
Performance monster
Even if there are these “limitations”, this technology shows in a stress test a very impressive fast replications as showed in this graph :
If you see the difference in consistence state between master and slave are minimal. Besides the difference was not correlated with the many smal transactions (a lot of commit) or very long transaction. The test was done on 2 virtual machine on the same storage (a usb external disk with 5400rpm) with centos 5.5 and PostgreSQL 9.0 beta.
In conclusion this feature is perfect for reporting system & disaster recovery architecture.
Mat