|
| 1 | +# Fencing |
| 2 | + |
| 3 | +Fencing in Cloud Native PostgreSQL is the ultimate process of protecting the |
| 4 | +data in one, more, or even all instances of a PostgreSQL cluster when they |
| 5 | +appear to be malfunctioning. When an instance is fenced, the PostgreSQL server |
| 6 | +process (`postmaster`) is guaranteed to be shut down, while the pod is kept running. |
| 7 | +This makes sure that, until the fence is lifted, data on the pod is not modified by |
| 8 | +PostgreSQL and that the file system can be investigated for debugging and |
| 9 | +troubleshooting purposes. |
| 10 | + |
| 11 | +## How to fence instances |
| 12 | + |
| 13 | +In Cloud Native PostgreSQL you can fence: |
| 14 | + |
| 15 | +- a specific instance |
| 16 | +- a list of instances |
| 17 | +- an entire Postgres `Cluster` |
| 18 | + |
| 19 | +Fencing is controlled through the content of the `k8s.enterprisedb.io/fencedInstances` |
| 20 | +annotation, which expects a JSON formatted list of instance names. |
| 21 | +If the annotation is set to `'["*"]'`, a singleton list with a wildcard, the |
| 22 | +whole cluster is fenced. |
| 23 | +If the annotation is set to an empty JSON list, the operator behaves as if the |
| 24 | +annotation was not set. |
| 25 | + |
| 26 | +For example: |
| 27 | + |
| 28 | +- `k8s.enterprisedb.io/fencedInstances: '["cluster-example-1"]'` will fence just |
| 29 | + the `cluster-example-1` instance |
| 30 | + |
| 31 | +- `k8s.enterprisedb.io/fencedInstances: '["cluster-example-1","cluster-example-2"]'` |
| 32 | + will fence the `cluster-example-1` and `cluster-example-2` instances |
| 33 | + |
| 34 | +- `k8s.enterprisedb.io/fencedInstances: '["*"]'` will fence every instance in |
| 35 | + the cluster. |
| 36 | + |
| 37 | +The annotation can be manually set on the Kubernetes object, for example via |
| 38 | +the `kubectl annotate` command, or in a transparent way using the |
| 39 | +`kubectl cnp fencing on` subcommand: |
| 40 | + |
| 41 | +```shell |
| 42 | +# to fence only one instance |
| 43 | +kubectl cnp fencing on cluster-example 1 |
| 44 | + |
| 45 | +# to fence all the instances in a Cluster |
| 46 | +kubectl cnp fencing on cluster-example "*" |
| 47 | +``` |
| 48 | + |
| 49 | +Here is an example of a `Cluster` with an instance that was previously fenced: |
| 50 | + |
| 51 | +```yaml |
| 52 | +apiVersion: postgresql.k8s.enterprisedb.io/v1 |
| 53 | +kind: Cluster |
| 54 | +metadata: |
| 55 | + annotations: |
| 56 | + k8s.enterprisedb.io/fencedInstances: '["cluster-example-1"]' |
| 57 | +[...] |
| 58 | +``` |
| 59 | + |
| 60 | +## How to lift fencing |
| 61 | + |
| 62 | +Fencing can be lifted by clearing the annotation, or set it to a different value. |
| 63 | + |
| 64 | +As for fencing, this can be done either manually with `kubectl annotate`, or |
| 65 | +using the `kubectl cnp fencing` subcommand as follows: |
| 66 | + |
| 67 | +```shell |
| 68 | +# to lift the fencing only for one instance |
| 69 | +# N.B.: at the moment this won't work if the whole cluster was fenced previously, |
| 70 | +# in that case you will have to manually set the annotation as explained above |
| 71 | +kubectl cnp fencing off cluster-example 1 |
| 72 | + |
| 73 | +# to lift the fencing for all the instances in a Cluster |
| 74 | +kubectl cnp fencing off cluster-example "*" |
| 75 | +``` |
| 76 | + |
| 77 | +## How fencing works |
| 78 | + |
| 79 | +Once an instance is set for fencing, the procedure to shut down the |
| 80 | +`postmaster` process is initiated. This consists of an initial smart shutdown |
| 81 | +with a timeout set to `.spec.stopDelay`, followed by a fast shutdown if |
| 82 | +required. Then: |
| 83 | + |
| 84 | +- the Pod will be kept alive |
| 85 | + |
| 86 | +- the Pod won't be marked as *Ready* |
| 87 | + |
| 88 | +- all the changes that don't require the Postgres instance to be up will be |
| 89 | + reconciled, including: |
| 90 | + - configuration files |
| 91 | + - certificates and all the cryptographic material |
| 92 | + |
| 93 | +- metrics will not be collected, except `cnp_collector_fencing_on` which will be |
| 94 | + set to 1 |
| 95 | + |
| 96 | +!!! Warning |
| 97 | + When at least one instance in a `Cluster` is fenced, failovers/switchovers for that |
| 98 | + `Cluster` will be blocked until the fence is lifted, as the status of the `Cluster` |
| 99 | + cannot be considered stable. |
| 100 | + |
| 101 | + In particular, if a **primary instance** will be fenced, the postmaster process |
| 102 | + will be shut down but no failover will happen, interrupting the operativity of |
| 103 | + the applications. When the fence will be lifted, the primary instance will be |
| 104 | + started up again without any failover happening. |
| 105 | + |
| 106 | + Given that, we advise the user to fence only replica instances when possible. |
| 107 | + |
| 108 | +!!! Warning |
| 109 | + If the primary is the only fenced instance in a `Cluster` and the pod is deleted, a |
| 110 | + failover will be performed. When the fence on the old primary is lifted, that instance |
| 111 | + is restarted as a standby (follower of the new primary). |
| 112 | + |
| 113 | +If a fenced instance is deleted, the pod will be recreated normally, but the |
| 114 | +postmaster won't be started. This can be extremely helpful when instances |
| 115 | +are `Crashlooping`. |
0 commit comments