Skip to content

feat(cluster): expose replicationSlots on the CNPG Cluster#877

Merged
itay-grudev merged 1 commit into
cloudnative-pg:mainfrom
paradedb:feat/replication-slots-failover
Jun 18, 2026
Merged

feat(cluster): expose replicationSlots on the CNPG Cluster#877
itay-grudev merged 1 commit into
cloudnative-pg:mainfrom
paradedb:feat/replication-slots-failover

Conversation

@philippemnoel

@philippemnoel philippemnoel commented May 18, 2026

Copy link
Copy Markdown
Contributor

What

  • Exposes CloudNativePG Cluster.spec.replicationSlots on the cluster chart through cluster.replicationSlots.
  • Documents the logical decoding slot failover settings required for CDC tools (e.g. Debezium).
  • Extends the non-default-configuration chainsaw test to render and assert the new block plus the matching PostgreSQL parameters.

Why

CDC tools (Debezium, etc.) create and consume a logical replication slot on the current primary. Before this change, a CNPG switchover or failover could leave that logical slot behind on the old primary, which then becomes a replica. CDC stops reconnecting cleanly and the abandoned slot can continue retaining WAL.

CloudNativePG can coordinate logical decoding slot synchronization across HA instances via spec.replicationSlots, but the cluster chart did not previously surface that field, so chart users had no way to enable it without bypassing the chart.

Requirements for CDC failover

  • CloudNativePG operator and CRDs 1.27+.
  • PostgreSQL 17+ for native failover slots (PostgreSQL 18 is covered).
  • Set cluster.replicationSlots.highAvailability.synchronizeLogicalDecoding: true.
  • Set cluster.postgresql.parameters.hot_standby_feedback: "on".
  • Set cluster.postgresql.parameters.sync_replication_slots: "on".
  • Ensure the CDC client creates or alters its logical slot with failover = true; CNPG cannot move a normal non-failover logical slot.
  • Before planned failovers, verify the target standby has the logical slot with synced = true, temporary = false, and invalidation_reason IS NULL.

Backwards compatibility

cluster.replicationSlots defaults to {}, and the template uses {{- with ... }} so the block is omitted entirely when unset. Existing deployments render identically.

Tests

  • helm lint charts/cluster
  • helm template test charts/cluster --show-only templates/cluster.yaml (no replicationSlots rendered by default)
  • helm template test charts/cluster --show-only templates/cluster.yaml --values charts/cluster/test/postgresql-cluster-configuration/01-non_default_configuration_cluster.yaml (renders replicationSlots verbatim)
  • The chainsaw postgresql-cluster-configuration test now asserts the rendered replicationSlots block.

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label May 18, 2026
@philippemnoel philippemnoel force-pushed the feat/replication-slots-failover branch from 45bd81e to f5f06dd Compare June 18, 2026 19:45
@philippemnoel philippemnoel requested a review from phisco as a code owner June 18, 2026 19:45
Surfaces CloudNativePG's `spec.replicationSlots` on the cluster chart so
chart users can enable synchronization of user-created logical
replication slots between the primary and standbys. With PostgreSQL 17+
failover slots, this lets CDC consumers (e.g. Debezium) survive a CNPG
failover without losing the slot.

To make logical decoding slots survive failover, users must:
- enable `cluster.replicationSlots.highAvailability.synchronizeLogicalDecoding`
- set `cluster.postgresql.parameters.hot_standby_feedback: "on"`
- set `cluster.postgresql.parameters.sync_replication_slots: "on"`
- create the CDC client's logical slot with `failover = true`

Requires CloudNativePG 1.27+ and PostgreSQL 17+ for native failover
slots. The block is omitted entirely when the value is empty, so
existing deployments are unaffected.

The non-default-configuration chainsaw test is extended with a
`replicationSlots` block and the matching PostgreSQL parameters, and
asserts they appear verbatim on the rendered Cluster CR.

Signed-off-by: Philippe Noël <philippemnoel@gmail.com>
@philippemnoel philippemnoel force-pushed the feat/replication-slots-failover branch from f5f06dd to 85ca950 Compare June 18, 2026 20:12
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 18, 2026
@itay-grudev itay-grudev merged commit 990dc64 into cloudnative-pg:main Jun 18, 2026
19 checks passed
@itay-grudev itay-grudev deleted the feat/replication-slots-failover branch June 18, 2026 20:47
tuunit pushed a commit to tuunit/cloudnative-pg-charts that referenced this pull request Jun 23, 2026
…ve-pg#877)

Surfaces CloudNativePG's `spec.replicationSlots` on the cluster chart so
chart users can enable synchronization of user-created logical
replication slots between the primary and standbys. With PostgreSQL 17+
failover slots, this lets CDC consumers (e.g. Debezium) survive a CNPG
failover without losing the slot.

To make logical decoding slots survive failover, users must:
- enable `cluster.replicationSlots.highAvailability.synchronizeLogicalDecoding`
- set `cluster.postgresql.parameters.hot_standby_feedback: "on"`
- set `cluster.postgresql.parameters.sync_replication_slots: "on"`
- create the CDC client's logical slot with `failover = true`

Requires CloudNativePG 1.27+ and PostgreSQL 17+ for native failover
slots. The block is omitted entirely when the value is empty, so
existing deployments are unaffected.

The non-default-configuration chainsaw test is extended with a
`replicationSlots` block and the matching PostgreSQL parameters, and
asserts they appear verbatim on the rendered Cluster CR.

Signed-off-by: Philippe Noël <philippemnoel@gmail.com>
drew-viles pushed a commit to nscaledev/cnpg-charts that referenced this pull request Jun 24, 2026
…ve-pg#877)

Surfaces CloudNativePG's `spec.replicationSlots` on the cluster chart so
chart users can enable synchronization of user-created logical
replication slots between the primary and standbys. With PostgreSQL 17+
failover slots, this lets CDC consumers (e.g. Debezium) survive a CNPG
failover without losing the slot.

To make logical decoding slots survive failover, users must:
- enable `cluster.replicationSlots.highAvailability.synchronizeLogicalDecoding`
- set `cluster.postgresql.parameters.hot_standby_feedback: "on"`
- set `cluster.postgresql.parameters.sync_replication_slots: "on"`
- create the CDC client's logical slot with `failover = true`

Requires CloudNativePG 1.27+ and PostgreSQL 17+ for native failover
slots. The block is omitted entirely when the value is empty, so
existing deployments are unaffected.

The non-default-configuration chainsaw test is extended with a
`replicationSlots` block and the matching PostgreSQL parameters, and
asserts they appear verbatim on the rendered Cluster CR.

Signed-off-by: Philippe Noël <philippemnoel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants