Prepare PostgreSQL for RDI

Prepare PostgreSQL databases to work with RDI

PostgreSQL supports several logical decoding plug-ins to enable CDC. If you don't want to use the native pgoutput logical replication stream support then you must install your preferred plug-in into the PostgreSQL server. Once you have done this, you must enable a replication slot, and configure a user with privileges to perform the replication.

If you are using a service like Heroku Postgres to host your database then this might restrict the plug-ins you can use. If you can't use your preferred plug-in then could try the pgoutput decoder if you are using PostgreSQL 10 or above. If this doesn't work for you then you won't be able to use RDI with your database.

Amazon RDS for PostgreSQL

Follow the steps below to enable CDC with Amazon RDS for PostgreSQL:

  1. Set the instance parameter rds.logical_replication to 1.

  2. Check that the wal_level parameter is set to logical by running the query SHOW wal_level as the database RDS master user. The parameter might not have this value in multi-zone replication setups. You can't change the value manually but it should change automatically when you set the rds.logical_replication parameter to 1. If it doesn't change then you probably just need to restart your database instance. You can restart manually or wait until a restart occurs during your maintenance window.

  3. Set the Debezium plugin.name parameter to pgoutput.

  4. Initiate logical replication from an AWS account that has the rds_replication role. The role grants permissions to manage logical slots and to stream data using logical slots. By default, only the master user account on AWS has the rds_replication role on Amazon RDS, but if you have administrator privileges, you can grant the role to other accounts using a query like the following:

    GRANT rds_replication TO <my_user>
    

    To enable accounts other than the master account to create an initial snapshot, you must grant SELECT permission to the accounts on the tables to be captured. See the documentation about security for PostgreSQL logical replication for more information.

Install the logical decoding output plug-in

As of PostgreSQL 9.4, the only way to read changes to the write-ahead-log is to install a logical decoding output plug-in. These plug-ins are written in C using PostgreSQL-specific APIs, as described in the PostgreSQL documentation. The PostgreSQL connector uses one of Debezium’s supported logical decoding plug-ins to receive change events from the database in either the default pgoutput format (supplied with PostgreSQL) or the Protobuf format. See the decoderbufs Protobuf plug-in documentation for more details about how to compile it and also its requirements and limitations.

For simplicity, Debezium also provides a container image that compiles and installs the plug-ins on top of the upstream PostgreSQL server image. Use this image as an example of the steps involved in the installation.

Note:
The Debezium logical decoding plug-ins have been tested on Linux machines, but if you are using Windows or other operating systems, the installation steps might be different from those listed here.

Plug-in differences

Plug-ins don't all behave in exactly the same way. All of them refresh information about the database schema when they detect that it has changed, but the pgoutput plug-in is more "eager" than some other plug-ins to do this. For example, pgoutput will refresh when it detects a change to the default value of a column but other plug-ins won't notice this until another, more significant change happens (such as adding a new table column).

The Debezium project maintains a Java class that tracks the known differences between plug-ins.

Configure the PostgreSQL server

If you want to use a logical decoding plug-in other than the default pgoutput then you must first configure it in the postgresql.conf file. Set the shared_preload_libraries parameter to load your plug-in at startup. For example, to load the decoderbufs plug-in, you would add the following line:

# MODULES
shared_preload_libraries = 'decoderbufs'

Add the line below to configure the replication slot (for any plug-in). This instructs the server to use logical decoding with the write-ahead log.

# REPLICATION
wal_level = logical            

You can also set other PostgreSQL streaming replication parameters if you need them. For example, you can use max_wal_senders and max_replication_slots to increase the number of connectors that can access the sending server concurrently, and wal_keep_size to limit the maximum WAL size that a replication slot retains. The configuration parameters documentation describes all the parameters you can use.

PostgreSQL’s logical decoding uses replication slots. These are guaranteed to retain all the WAL segments that Debezium needs even when Debezium suffers an outage. You should monitor replication slots carefully to avoid excessive disk consumption and other conditions such as catalog bloat that can arise if a replication slot is used infrequently. See the PostgreSQL documentation about replication slots for more information. If you are using a synchronous_commit setting other than on, then you should set wal_writer_delay to a value of about 10 milliseconds to ensure a low latency for change events. If you don't set this then the default value of about 200 milliseconds will apply.

Note:
This guide summarizes the operation of the PostgreSQL write-ahead log, but we strongly recommend you consult the PostgreSQL write-ahead log documentation to get a better understanding.

Set up permissions

The Debezium connector needs a database user that has the REPLICATION and LOGIN roles so that it can perform replications. By default, a superuser has these roles but for security reasons, you should give the minimum necessary permissions to the Debezium user rather than full superuser permissions.

If you have administrator privileges then you can create a role for your Debezium user using a query like the following. Note that these are the minimum permissions the user needs to perform replications, but you might also need to grant other permissions.

CREATE ROLE <name> REPLICATION LOGIN;

Set privileges for Debezium to create PostgreSQL publications with pgoutput

The Debezium user needs specific permissions to work with the pgoutput plug-in. The plug-in captures change events from the publications that PostgreSQL produces for your chosen source tables. A publication contains change events from one or more tables that are filtered using criteria from a publication specification.

If you have administrator privileges, you can create the publication specification manually or you can grant the Debezium user the privileges to create the specification automatically. The required privileges are:

  • Replication privileges in the database to add the table to a publication.
  • CREATE privileges on the database to add publications.
  • SELECT privileges on the tables to copy the initial table data. Table owners automatically have SELECT permission for the table.

To add a table to a publication, the user must be an owner of the table. However, in this case, the source table already exists, so you must use a PostgreSQL replication group to share ownership between the Debezium user and the original owner. Configure the replication group using the following commands:

  1. Create the replication group (the name replication_group here is just an example):

    CREATE ROLE replication_group;
    
  2. Add the original owner of the table to the group:

    GRANT replication_group TO original_owner;
    
  3. Add the Debezium replication user to the group:

    GRANT replication_group TO replication_user;
    
  4. Transfer ownership of the table to replication_group:

    ALTER TABLE table_name OWNER TO replication_group;
    

You must also set the value of the publication.autocreate.mode parameter to filtered to allow Debezium to specify the publication configuration. See the Debezium documentation for publication.autocreate.mode to learn more about this setting.

Configure PostgreSQL for replication with the Debezium connector host

You must configure the database to allow replication with the host that runs the PostgreSQL Debezium connector. To do this, add an entry to the host-based authentication file, pg_hba.conf, for each client that needs to use replication. For example, to enable replication for <youruser> locally, on the server machine, you would add a line like the following:

local   replication     <youruser>                          trust

To allow <youruser> on localhost to receive replication changes using IPV4, add the line:

host    replication     <youruser>  127.0.0.1/32            trust

To allow <youruser> on localhost to receive replication changes using IPV6, add the line:

host    replication     <youruser>  ::1/128                 trust 

Find out more from the PostgreSQL pages about pg_hba.conf and network address types.

Supported PostgreSQL topologies

You can use the Debezium PostgreSQL connector with a standalone PostgreSQL server or with a cluster of servers. For versions 12 and below, PostgreSQL supports logical replication slots on only primary servers. This means that Debezium can only connect to a primary server for CDC and the connection will stop if this server fails. If the same server is promoted to primary when service resumes then you can simply restart the Debezium connector. However, if a different server is promoted to primary, then you must reconfigure Debezium to use the new server before restarting. Also, make sure the new server has the correct plug-in and configuration for Debezium.

RATE THIS PAGE
Back to top ↑