Skip to content

Commit b54ad2e

Browse files
committed
Adjust NodeFilesystemSpaceFillingUp thresholds according default kubelet GC behavior
Previously[1] we attempted to do the same, but there was a misunderstanding about the GC behavior and it caused the alert to be fired even before GC comes into play. According to[2][3] kubelet GC kicks in only when `imageGCHighThresholdPercent` is hit which is set to 85% by default. However `NodeFilesystemSpaceFillingUp` is set to fire as soon as 80% usage is hit. This commit changes the `fsSpaceFillingUpWarningThreshold` to 15% so that we give ample time to GC to reclaim unwanted images. This commit also changes `fsSpaceFillingUpCriticalThreshold` to 10% which gives more time to admins to react to warning before sending critical alert. [1] prometheus-operator#1357 [2] https://docs.openshift.com/container-platform/4.10/nodes/nodes/nodes-nodes-garbage-collection.html#nodes-nodes-garbage-collection-images_nodes-nodes-configuring [3] https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/ Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com> (cherry picked from commit 6ff8bfb)
1 parent 125fb56 commit b54ad2e

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

jsonnet/kube-prometheus/components/node-exporter.libsonnet

+5-2
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,12 @@ local defaults = {
3535
// GC values,
3636
// imageGCLowThresholdPercent: 80
3737
// imageGCHighThresholdPercent: 85
38+
// GC kicks in when imageGCHighThresholdPercent is hit and attempts to free upto imageGCLowThresholdPercent.
3839
// See https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/ for more details.
39-
fsSpaceFillingUpWarningThreshold: 20,
40-
fsSpaceFillingUpCriticalThreshold: 15,
40+
// Warn only after imageGCHighThresholdPercent is hit, but filesystem is not freed up for a prolonged duration.
41+
fsSpaceFillingUpWarningThreshold: 15,
42+
// Send critical alert only after (imageGCHighThresholdPercent + 5) is hit, but filesystem is not freed up for a prolonged duration.
43+
fsSpaceFillingUpCriticalThreshold: 10,
4144
diskDeviceSelector: 'device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"',
4245
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/node/%s',
4346
},

0 commit comments

Comments
 (0)