-= How to Set up Cascaded Replication in londiste =
+= How to Set up Cascaded Replication in Londiste =
== Basic cluster setup ==
----
The ini files for databases are are created similar to the one below, only
-the "db1" part changes
+the "db1" part changes.
----
$ cat conf/londiste_db1.ini
pgq_lazy_fetch = 0
----
-After creating the ini files we are ready to install londiste3 and initialize nodes
+After creating the ini files we are ready to install londiste3 and initialize nodes.
----
$ londiste3 -q conf/londiste_db1.ini create-root node1 dbname=db1
$ londiste3 -q conf/londiste_db5.ini create-branch node5 dbname=db5 --provider=dbname=db3
----
-Now that schemas are installed, we can start the ticker
+Now that schemas are installed, we can start the ticker.
----
$ pgqd -q -d conf/pgqd.ini
----
And you need a londiste worker process on each node to actually carry out
-the actions
+the actions.
----
$ londiste3 -q -d conf/londiste_db1.ini worker
Now let's play with data.
-Create table on root node and fill couple of rows
+Create table on root node and fill couple of rows.
----
$ psql -d db1 -c "create table mytable (id serial primary key, data text)"
$ psql -d db1 -c "insert into mytable (data) values ('row4')"
----
-Creat some load on table
+Create some load on table.
----
$ ./loadgen.py -d conf/gen1.ini
----
-Register table on root node
+Register table on root node.
----
$ londiste3 -q conf/londiste_db1.ini add-table mytable
$ londiste3 -q conf/londiste_db1.ini add-seq mytable_id_seq
----
-Register table on other node with creation
+Register table on other node with creation.
----
$ psql -d db2 -c "create sequence mytable_id_seq"
The main advantage of skytools3 cascaded replication is how easy it is to
change the replication topology.
-
----
$ londiste3 -q conf/londiste_db4.ini change-provider --provider=node2
$ londiste3 -q conf/londiste_db4.ini status
ERR: londiste_db5: duplicate key value violates unique constraint "mytable_pkey"
----
-Now let's move it to node 3
+Now let's move it to node 3:
----
$ londiste3 -q conf/londiste_db4.ini change-provider --provider=node3
----
-
----
$ londiste3 -q conf/londiste_db5.ini change-provider --provider=node2
$ londiste3 -q conf/londiste_db1.ini status
Lag: 3s, Tick: 65
----
-The takeover command is in fact the only way to change the root node
+The takeover command is in fact the only way to change the root node.
----
$ londiste3 -q conf/londiste_db2.ini takeover node1
== Introduction ==
-In this howto we will set up a replication scheme , where data is collected
+In this howto we will set up a replication scheme, where data is collected
from multiple "partition" databases into a single table on "full" database.
This situation is common when using PL/Proxy or other similar partitioning
pidfile = pid/%(job_name)s.pid
----
-These ini files are then used for setting up the nodes and adding tables to root node
+These ini files are then used for setting up the nodes and adding tables to root node.
-Set up the root nodes on part1 and part2
+Set up the root nodes on part1 and part2:
----
$ londiste3 -v conf/l3_part1_q_part1.ini create-root part1_root dbname=part1
set up two londiste nodes, one for each of the partition nodes.
These will act as the receiving nodes to replicate to.
-These look very similar and differ only in queue name
+These look very similar and differ only in queue name:
File `conf/l3_part1_q_full1.ini`:
----
$ londiste3 -v conf/l3_part2_q_full1.ini create-leaf merge_part2_full1 dbname=full1 --provider=dbname=part2
----
-and later also for launching the replication worker daemons
+And later also for launching the replication worker daemons.
-But before launching the workers you need to start the pgqd or the "ticker daemon"
+But before launching the workers you need to start the pgqd or the "ticker daemon":
----
$ pgqd -v -d conf/pgqd.ini
----
-the `conf/pgqd.ini` file for the command above looks like this:
+The `conf/pgqd.ini` file for the command above looks like this:
----
[pgqd]
----
Now that the ticker is running, it's time to launch londiste3 workers which will
-do tha actual replication:
+do the actual replication:
----
$ londiste3 -v -d conf/l3_part1_q_full1.ini worker
=== Setting up the tables ===
-in order to have something to replicate, we need some tables, so let's create
+In order to have something to replicate, we need some tables, so let's create
them on partition nodes:
----
$ psql -d "part2" -c "create table mydata (id int4 primary key, data text)"
----
-and then add them to set of replicated tables on the root node
+And then add them to set of replicated tables on the root node:
----
$ londiste3 -v conf/l3_part1_q_part1.ini add-table mydata
$ londiste3 -v conf/l3_part2_q_part2.ini add-table mydata
----
-Now we need some data in these tables, as replicating empty tables is no fun
+Now we need some data in these tables, as replicating empty tables is no fun:
----
$ psql -d "part1" -c "insert into mydata values (1, 'part1')"
(2 rows)
----
-Now lets subscribe them on full database. As the table is not yet created on
+Now let's subscribe them on full database. As the table is not yet created on
full1, we specify `--create` so londiste creates the table on leaf node based on
structure that is on root. The switch `--merge-all` tells londiste to add the table to all
-queues which have it on root side, not just the one from the .ini file
+queues which have it on root side, not just the one from the .ini file.
----
$ londiste3 -v conf/l3_part1_q_full1.ini add-table mydata --create --merge-all
(4 rows)
----
-the rows fith ids 1 and 2 where replicated during initial copy, the ones with
+The rows with ids 1 and 2 where replicated during initial copy, the ones with
5 and 6 were captured by triggers into event log on partition database and then
replicated to full1 using the standard replication process.
=== checking subscription ===
-Just to check if we really did achieve what we wanted, we see which tables
-are present and fully sobscribed ('ok')
+Just to check if we really did achieve what we wanted, we see which tables
+are present and fully subscribed ('ok'):
----
$ psql -d "full1" -c "select * from londiste.table_info order by queue_name"
(2 rows)
----
-ok, here we have the table public.mydata subscribed from 2 queues and it's
-merge_state is 'ok', meaning the initial copy process has been successfull
+Ok, here we have the table public.mydata subscribed from 2 queues and its
+merge_state is 'ok', meaning the initial copy process has been successfull.
-That's it , we have successfully set up replication from two partition
+That's it, we have successfully set up replication from two partition
databases to one single full database.
=== Create databases ===
-Create root database that will contain all data and two shard databases
+Create root database that will contain all data and two shard databases.
Run the following SQL:
----
Deploy hash function everywhere. This is needed because internal hashtext
function was changed between 8.3 and 8.4 versions and may be changed again
-in future withoud consideration for it's users.
+in future without consideration for its users.
---
psql rootdb < /usr/share/postgresql/8.4/contrib/hashlib.sql
=== Set up pgbench schema ===
In this HowTo we are using pgbench for setting up the schema,
-populating it with sampledata and later running SQL loads to be replicated.
+populating it with sample data and later running SQL loads to be replicated.
-This command will create pgbanch tables and fill them with data:
+This command will create pgbench tables and fill them with data:
----
/usr/lib/postgresql/8.4/bin/pgbench -i -s 2 -F 80 rootdb
----
-Write partconf.sql that will be deployed to all db's
+Write partconf.sql that will be deployed to all databases:
----
CREATE SCHEMA partconf;
CREATE TABLE partconf.conf (
LANGUAGE sql;
----
-Populate shard configuration tables. These values are used inside part.py
+Populate shard configuration tables. These values are used inside part.py.
----
psql rootdb < partconf.sql
psql sharddb_1 -c "insert into partconf.conf(part_nr, max_part) values(1,1);"
----
-Next create configuration files file for root node and both partitions
+Next create configuration files for root node and both partitions.
st3partsplit/st3_rootdb.ini
----
londiste3 -d st3partsplit/st3_rootdb.ini worker
----
-And create leaf nodes and start the workers on partitions :
+And create leaf nodes and start the workers on partitions:
----
londiste3 st3partsplit/st3_sharddb_0.ini create-leaf node2_0 dbname=sharddb_0 --provider=dbname=rootdb
londiste3 -d st3partsplit/st3_sharddb_1.ini worker
----
-Create config file st3partsplit/pgqd.ini for `pgqd` ("the ticker")
+Create config file st3partsplit/pgqd.ini for `pgqd` ("the ticker"):
----
[pgqd]
----
-Start the ticker process :
+Start the ticker process:
----
pgqd -d st3partsplit/pgqd.ini
----
creates tables on target nodes automatically.
The `--handler=part` tells londiste to use the `part` handler for replication,
-the `--handler-arg=key=*id` specifyies which key field to partition on.
-
+the `--handler-arg=key=*id` specifies which key field to partition on.
Run command the following commands :
/usr/lib/postgresql/8.4/bin/pgbench -T 10 -c 5 rootdb
----
-After this is done, you can check that the tables on both sides hanve the same data with
+After this is done, you can check that the tables on both sides have the same data:
----
londiste3 st3partsplit/st3_sharddb_0.ini compare
londiste3 st3partsplit/st3_sharddb_0.ini compare
----
-Except of course that they dont - each partition will only have roughly half
+Except of course that they don't - each partition will only have roughly half
the data from the root. But the row counts and checksums of the partitions
should both add up to the numbers on the master.
-
This sample does the following actions:
- * sets up te databases
+ * sets up the databases
- creates a database 'l3simple_db1', which will be master
- populates this with pgbench schema and data
- adds primary and foreign keys to make the db more realistic
- creates a leaf node on 'l3simple_db2'
- starts the ticker daemon
- adds all tables to replication set on both databases
- - waits the replication to complete
+ - waits for the replication to complete
-It also runs pgbench a to test that the replication actually happens and works properly.
+It also runs pgbench to test that the replication actually happens and works properly.
== Set up schema for root database ==
=== Set up pgbench schema ===
In this HowTo we are using pgbench for setting up the schema,
-populating it with sampledata and later running SQL loads to be replicated.
+populating it with sample data and later running SQL loads to be replicated.
Run command :
londiste3 st3simple/st3_l3simple_db2.ini create-leaf node2 dbname=l3simple_db2 --provider=dbname=l3simple_db1
----
-Launch worker daemon for target database.
+Launch worker daemon for target database:
----
londiste3 -d st3simple/st3_l3simple_db2.ini worker
----
-Create config file `st3simple/pgqd.ini`
-for PgQ ticker daemon:
+Create config file `st3simple/pgqd.ini` for PgQ ticker daemon:
----
[pgqd]
/usr/lib/postgresql/9.1/bin/pgbench -T 120 -c 5 l3simple_db1 -f /tmp/throttled.pgbench
----
-the /tmp/throttled.pgbench contains the standard pgbencg workload, except that
-ther are random length waits between commands.
+The /tmp/throttled.pgbench contains the standard pgbench workload, except that
+there are random length waits between commands.
-Now add all the tables to replication, first on root node and then on the leaf
+Now add all the tables to replication, first on root node and then on the leaf:
Run command :
----
To test our newly set up replication
The following command will run pgbench full speed with 5 parallel
-database connections generating database traffic
-for 10 seconds
+database connections generating database traffic for 10 seconds:
----
/usr/lib/postgresql/9.1/bin/pgbench -T 10 -c 5 l3simple_db2
----
-After this is done, you can check that the tables on both sides have the same data with
+After this is done, you can check that the tables on both sides have the same data:
----
londiste3 st3simple/st3_l3simple_db2.ini compare
Compare command will establish the same logical point in time on provider and
subscriber nodes and then count and checksum the rows on both sides.
-The result will look like this
+The result will look like this:
----
2011-12-25 08:24:42,138 29189 INFO Locking public.pgbench_accounts
2011-12-25 08:24:56,740 29189 INFO dstdb: 20 rows, checksum=518235101
----
-the "checksum" is computed by adding up hashtext() sums for all database rows.
+The "checksum" is computed by adding up hashtext() sums for all database rows.
== Done ==
The setup of simple 2 node cluster is done.
-