Prepare PostgreSQL for RDI
Prepare PostgreSQL databases to work with RDI
PostgreSQL supports several
logical decoding plug-ins
to enable CDC. If you don't want to use the native pgoutput
logical replication stream support
then you must install your preferred plug-in into the PostgreSQL server. Once you have done this,
you must enable a replication slot, and configure a user with privileges to perform the replication.
If you are using a service like Heroku Postgres to host
your database then this might restrict the plug-ins you can use. If you can't use your preferred
plug-in then could try the pgoutput
decoder if you are using PostgreSQL 10 or above.
If this doesn't work for you then you won't be able to use RDI with your database.
Amazon RDS for PostgreSQL
Follow the steps below to enable CDC with Amazon RDS for PostgreSQL:
-
Set the instance parameter
rds.logical_replication
to 1. -
Check that the
wal_level
parameter is set tological
by running the querySHOW wal_level
as the database RDS master user. The parameter might not have this value in multi-zone replication setups. You can't change the value manually but it should change automatically when you set therds.logical_replication
parameter to 1. If it doesn't change then you probably just need to restart your database instance. You can restart manually or wait until a restart occurs during your maintenance window. -
Set the Debezium
plugin.name
parameter topgoutput
. -
Initiate logical replication from an AWS account that has the
rds_replication
role. The role grants permissions to manage logical slots and to stream data using logical slots. By default, only the master user account on AWS has therds_replication
role on Amazon RDS, but if you have administrator privileges, you can grant the role to other accounts using a query like the following:GRANT rds_replication TO <my_user>
To enable accounts other than the master account to create an initial snapshot, you must grant
SELECT
permission to the accounts on the tables to be captured. See the documentation about security for PostgreSQL logical replication for more information.
Install the logical decoding output plug-in
As of PostgreSQL 9.4, the only way to read changes to the write-ahead-log is to
install a logical decoding output plug-in.
These plug-ins are written in C using PostgreSQL-specific APIs, as described in the
PostgreSQL documentation.
The PostgreSQL connector uses one of Debezium’s supported logical decoding
plug-ins to receive change events from the database in either the default
pgoutput
format (supplied with PostgreSQL) or the
Protobuf
format.
See the
decoderbufs Protobuf plug-in documentation
for more details about how to compile it and also its requirements and limitations.
For simplicity, Debezium also provides a container image that compiles and installs the plug-ins on top of the upstream PostgreSQL server image. Use this image as an example of the steps involved in the installation.
Plug-in differences
Plug-ins don't all behave in exactly the same way. All of them refresh information about
the database schema when they detect that it has changed, but the pgoutput
plug-in is
more "eager" than some other plug-ins to do this. For example, pgoutput
will refresh
when it detects a change to the default value of a column but other plug-ins won't
notice this until another, more significant change happens (such as adding a new table
column).
The Debezium project maintains a Java class that tracks the known differences between plug-ins.
Configure the PostgreSQL server
If you want to use a logical decoding plug-in other than the default pgoutput
then
you must first configure it in the postgresql.conf
file. Set the shared_preload_libraries
parameter to load your plug-in at startup. For example, to load the decoderbufs
plug-in, you would add the following line:
# MODULES
shared_preload_libraries = 'decoderbufs'
Add the line below to configure the replication slot (for any plug-in). This instructs the server to use logical decoding with the write-ahead log.
# REPLICATION
wal_level = logical
You can also set other PostgreSQL streaming replication parameters if you need them.
For example, you can use max_wal_senders
and max_replication_slots
to increase
the number of connectors that can access the sending server concurrently,
and wal_keep_size
to limit the maximum WAL size that a replication slot retains.
The
configuration parameters
documentation describes all the parameters you can use.
PostgreSQL’s logical decoding uses replication slots. These are guaranteed to retain all the WAL
segments that Debezium needs even when Debezium suffers an outage. You should monitor replication
slots carefully to avoid excessive disk consumption and other conditions such as catalog bloat that can arise
if a replication slot is used infrequently. See the PostgreSQL documentation about
replication slots
for more information.
If you are using a synchronous_commit
setting other than on
, then you should set wal_writer_delay
to a value of about 10 milliseconds to ensure a low latency for change events. If you don't set this then
the default value of about 200 milliseconds will apply.
Set up permissions
The Debezium connector needs a database user that has the REPLICATION and LOGIN roles so that it can perform replications. By default, a superuser has these roles but for security reasons, you should give the minimum necessary permissions to the Debezium user rather than full superuser permissions.
If you have administrator privileges then you can create a role for your Debezium user using a query like the following. Note that these are the minimum permissions the user needs to perform replications, but you might also need to grant other permissions.
CREATE ROLE <name> REPLICATION LOGIN;
Set privileges for Debezium to create PostgreSQL publications with pgoutput
The Debezium user needs specific permissions to work with the pgoutput
plug-in.
The plug-in captures change events from the
publications
that PostgreSQL produces for your chosen source tables. A publication contains change events from
one or more tables that are filtered using criteria from a publication specification.
If you have administrator privileges, you can create the publication specification manually or you can grant the Debezium user the privileges to create the specification automatically. The required privileges are:
- Replication privileges in the database to add the table to a publication.
CREATE
privileges on the database to add publications.SELECT
privileges on the tables to copy the initial table data. Table owners automatically haveSELECT
permission for the table.
To add a table to a publication, the user must be an owner of the table. However, in this case, the source table already exists, so you must use a PostgreSQL replication group to share ownership between the Debezium user and the original owner. Configure the replication group using the following commands:
-
Create the replication group (the name
replication_group
here is just an example):CREATE ROLE replication_group;
-
Add the original owner of the table to the group:
GRANT replication_group TO original_owner;
-
Add the Debezium replication user to the group:
GRANT replication_group TO replication_user;
-
Transfer ownership of the table to
replication_group
:ALTER TABLE table_name OWNER TO replication_group;
You must also set the value of the publication.autocreate.mode
parameter to filtered
to allow Debezium to specify the publication configuration. See the
Debezium documentation for publication.autocreate.mode
to learn more about this setting.
Configure PostgreSQL for replication with the Debezium connector host
You must configure the database to allow replication with the host that runs
the PostgreSQL Debezium connector. To do this, add an entry to the
host-based authentication file, pg_hba.conf
, for each client that needs to
use replication. For example, to enable replication for <youruser>
locally,
on the server machine, you would add a line like the following:
local replication <youruser> trust
To allow <youruser>
on localhost to receive replication changes using IPV4,
add the line:
host replication <youruser> 127.0.0.1/32 trust
To allow <youruser>
on localhost to receive replication changes using IPV6,
add the line:
host replication <youruser> ::1/128 trust
Find out more from the PostgreSQL pages about
pg_hba.conf
and
network address types.
Supported PostgreSQL topologies
You can use the Debezium PostgreSQL connector with a standalone PostgreSQL server or with a cluster of servers. For versions 12 and below, PostgreSQL supports logical replication slots on only primary servers. This means that Debezium can only connect to a primary server for CDC and the connection will stop if this server fails. If the same server is promoted to primary when service resumes then you can simply restart the Debezium connector. However, if a different server is promoted to primary, then you must reconfigure Debezium to use the new server before restarting. Also, make sure the new server has the correct plug-in and configuration for Debezium.