Allow sampling of statements depending on duration
authorTomas Vondra <tomas.vondra@postgresql.org>
Mon, 4 Nov 2019 00:57:45 +0000 (01:57 +0100)
committerTomas Vondra <tomas.vondra@postgresql.org>
Wed, 6 Nov 2019 18:11:07 +0000 (19:11 +0100)
This allows logging a sample of statements, without incurring excessive
log traffic (which may impact performance).  This can be useful when
analyzing workloads with lots of short queries.

The sampling is configured using two new GUC parameters:

 * log_min_duration_sample - minimum required statement duration

 * log_statement_sample_rate - sample rate (0.0 - 1.0)

Only statements with duration exceeding log_min_duration_sample are
considered for sampling. To enable sampling, both those GUCs have to
be set correctly.

The existing log_min_duration_statement GUC has a higher priority, i.e.
statements with duration exceeding log_min_duration_statement will be
always logged, irrespectedly of how the sampling is configured. This
means only configurations

  log_min_duration_sample < log_min_duration_statement

do actually sample the statements, instead of logging everything.

Author: Adrien Nayrat
Reviewed-by: David Rowley, Vik Fearing, Tomas Vondra
Discussion: https://postgr.es/m/bbe0a1a8-a8f7-3be2-155a-888e661cc06c@anayrat.info

doc/src/sgml/config.sgml
src/backend/tcop/postgres.c
src/backend/utils/misc/guc.c
src/backend/utils/misc/postgresql.conf.sample
src/include/utils/guc.h

index 48d7939d2de15aab15687e15a0b2ee1746f26573..46bc31de4c4fa96aa7771b1ca3a596b5a1a49269 100644 (file)
@@ -5950,6 +5950,12 @@ local0.*    /var/log/postgresql
          Only superusers can change this setting.
         </para>
 
+        <para>
+         This overrides <xref linkend="guc-log-min-duration-sample"/>,
+         meaning that queries with duration exceeding this setting are not
+         subject to sampling and are always logged.
+        </para>
+
         <para>
          For clients using extended query protocol, durations of the Parse,
          Bind, and Execute steps are logged independently.
@@ -5972,6 +5978,85 @@ local0.*    /var/log/postgresql
        </listitem>
       </varlistentry>
 
+     <varlistentry id="guc-log-min-duration-sample" xreflabel="log_min_duration_sample">
+      <term><varname>log_min_duration_sample</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>log_min_duration_sample</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+       <listitem>
+        <para>
+         Allows to sample the logging of the duration of each completed
+         statement if the statement ran for at least the specified amount of
+         time. If this value is specified without units, it is taken as milliseconds.
+         Setting this to zero samples all statement durations.
+         Minus-one (the default) disables sampling statement durations.
+         For example, if you set it to <literal>250ms</literal>
+         then all SQL statements that run 250ms or longer will be considered
+         for sampling, with sample rate is controlled by <xref linkend="guc-log-statement-sample-rate"/>. 
+         Enabling this parameter can be helpful when the traffic too high to
+         sample all queries.
+         Only superusers can change this setting.
+        </para>
+
+        <para>
+         This option has lower priority than <xref linkend="guc-log-min-duration-statement"/>,
+         meaning that statements with durations exceeding <xref linkend="guc-log-min-duration-statement"/>
+         are not subject to sampling and are always logged.
+        </para>
+
+        <para>
+         For clients using extended query protocol, durations of the Parse,
+         Bind, and Execute steps are logged independently.
+        </para>
+
+       <note>
+        <para>
+         When using this option together with
+         <xref linkend="guc-log-statement"/>,
+         the text of statements that are logged because of
+         <varname>log_statement</varname> will not be repeated in the
+         duration log message.
+         If you are not using <application>syslog</application>, it is recommended
+         that you log the PID or session ID using
+         <xref linkend="guc-log-line-prefix"/>
+         so that you can link the statement message to the later
+         duration message using the process ID or session ID.
+        </para>
+       </note>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-log-statement-sample-rate" xreflabel="log_statement_sample_rate">
+      <term><varname>log_statement_sample_rate</varname> (<type>real</type>)
+      <indexterm>
+       <primary><varname>log_statement_sample_rate</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+       <listitem>
+        <para>
+         Determines the fraction of statements with duration exceeding
+         <xref linkend="guc-log-min-duration-sample"/> to be logged.
+         This is a statistical parameter, for example <literal>0.5</literal>
+         means there is statistically one in two chances to log the statement.
+         The default is <literal>1.0</literal>, meaning log all such
+         statements.
+         Setting this to zero disables sampling logging, same as setting
+         <varname>log_min_duration_sample</varname> to
+         <literal>-1</literal>.
+         <varname>log_statement_sample_rate</varname> is helpful when the
+         traffic is too high to log all queries.
+         Only superusers can change this setting.
+        </para>
+       <note>
+        <para>
+         Like all statement-logging options, this option can add significant
+         overhead.
+        </para>
+       </note>
+       </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-log-transaction-sample-rate" xreflabel="log_transaction_sample_rate">
       <term><varname>log_transaction_sample_rate</varname> (<type>real</type>)
       <indexterm>
index 4bec40aa2879597059deebaada85394c7a14b63e..82894eadc66ec63f762f92c11f1846826fc77c15 100644 (file)
@@ -2235,12 +2235,15 @@ check_log_statement(List *stmt_list)
 int
 check_log_duration(char *msec_str, bool was_logged)
 {
-   if (log_duration || log_min_duration_statement >= 0 || xact_is_sampled)
+   if (log_duration || log_min_duration_sample >= 0 ||
+       log_min_duration_statement >= 0 || xact_is_sampled)
    {
        long        secs;
        int         usecs;
        int         msecs;
-       bool        exceeded;
+       bool        exceeded_duration;
+       bool        exceeded_sample_duration;
+       bool        in_sample = false;
 
        TimestampDifference(GetCurrentStatementStartTimestamp(),
                            GetCurrentTimestamp(),
@@ -2248,20 +2251,35 @@ check_log_duration(char *msec_str, bool was_logged)
        msecs = usecs / 1000;
 
        /*
-        * This odd-looking test for log_min_duration_statement being exceeded
-        * is designed to avoid integer overflow with very long durations:
-        * don't compute secs * 1000 until we've verified it will fit in int.
+        * This odd-looking test for log_min_duration_* being exceeded is
+        * designed to avoid integer overflow with very long durations: don't
+        * compute secs * 1000 until we've verified it will fit in int.
         */
-       exceeded = (log_min_duration_statement == 0 ||
-                   (log_min_duration_statement > 0 &&
-                    (secs > log_min_duration_statement / 1000 ||
-                     secs * 1000 + msecs >= log_min_duration_statement)));
+       exceeded_duration = (log_min_duration_statement == 0 ||
+                            (log_min_duration_statement > 0 &&
+                             (secs > log_min_duration_statement / 1000 ||
+                              secs * 1000 + msecs >= log_min_duration_statement)));
 
-       if (exceeded || log_duration || xact_is_sampled)
+       exceeded_sample_duration = (log_min_duration_sample == 0 ||
+                                   (log_min_duration_sample > 0 &&
+                                    (secs > log_min_duration_sample / 1000 ||
+                                     secs * 1000 + msecs >= log_min_duration_sample)));
+
+       /*
+        * Do not log if log_statement_sample_rate = 0. Log a sample if
+        * log_statement_sample_rate <= 1 and avoid unecessary random() call
+        * if log_statement_sample_rate = 1.
+        */
+       if (exceeded_sample_duration)
+           in_sample = log_statement_sample_rate != 0 &&
+               (log_statement_sample_rate == 1 ||
+                random() <= log_statement_sample_rate * MAX_RANDOM_VALUE);
+
+       if (exceeded_duration || in_sample || log_duration || xact_is_sampled)
        {
            snprintf(msec_str, 32, "%ld.%03d",
                     secs * 1000 + msecs, usecs % 1000);
-           if ((exceeded || xact_is_sampled) && !was_logged)
+           if ((exceeded_duration || in_sample || xact_is_sampled) && !was_logged)
                return 2;
            else
                return 1;
index 31a5ef0474732590393afac972dc031494334cc3..e84c8cc4cfc9bb964278129add8ae5a7cc10e7fb 100644 (file)
@@ -509,8 +509,10 @@ bool       session_auth_is_superuser;
 int            log_min_error_statement = ERROR;
 int            log_min_messages = WARNING;
 int            client_min_messages = NOTICE;
+int            log_min_duration_sample = -1;
 int            log_min_duration_statement = -1;
 int            log_temp_files = -1;
+double     log_statement_sample_rate = 1.0;
 double     log_xact_sample_rate = 0;
 int            trace_recovery_messages = LOG;
 
@@ -2700,10 +2702,23 @@ static struct config_int ConfigureNamesInt[] =
        NULL, NULL, NULL
    },
 
+   {
+       {"log_min_duration_sample", PGC_SUSET, LOGGING_WHEN,
+           gettext_noop("Sets the minimum execution time above which "
+                        "a sample of statements will be logged."
+                        " Sampling is determined by log_statement_sample_rate."),
+           gettext_noop("Zero log a sample of all queries. -1 turns this feature off."),
+           GUC_UNIT_MS
+       },
+       &log_min_duration_sample,
+       -1, -1, INT_MAX,
+       NULL, NULL, NULL
+   },
+
    {
        {"log_min_duration_statement", PGC_SUSET, LOGGING_WHEN,
            gettext_noop("Sets the minimum execution time above which "
-                        "statements will be logged."),
+                        "all statements will be logged."),
            gettext_noop("Zero prints all queries. -1 turns this feature off."),
            GUC_UNIT_MS
        },
@@ -3430,6 +3445,16 @@ static struct config_real ConfigureNamesReal[] =
        NULL, NULL, NULL
    },
 
+   {
+       {"log_statement_sample_rate", PGC_SUSET, LOGGING_WHEN,
+           gettext_noop("Fraction of statements exceeding log_min_duration_sample to be logged."),
+           gettext_noop("Use a value between 0.0 (never log) and 1.0 (always log).")
+       },
+       &log_statement_sample_rate,
+       1.0, 0.0, 1.0,
+       NULL, NULL, NULL
+   },
+
    {
        {"log_transaction_sample_rate", PGC_SUSET, LOGGING_WHEN,
            gettext_noop("Set the fraction of transactions to log for new transactions."),
index 0fc23e3a6127c70260f7adbcf9d0b09d01855402..be02a76d9d8c26e61302a07af6d845278703a4c6 100644 (file)
                    # statements running at least this number
                    # of milliseconds
 
+#log_min_duration_sample = -1  # -1 is disabled, 0 logs a sample of statements
+                   # and their durations, > 0 logs only a sample of
+                   # statements running at least this number
+                   # of milliseconds
+                   # Sample fraction is determined by log_statement_sample_rate
+
+#log_statement_sample_rate = 1.0   # Fraction of logged statements exceeding
+                   # log_min_duration_sample to be logged.
+                   # 1.0 logs all such statements, 0.0 never logs.
+
+
 #log_transaction_sample_rate = 0.0 # Fraction of transactions whose statements
                    # are logged regardless of their duration. 1.0 logs all
                    # statements from all transactions, 0.0 never logs.
index 6791e0cbc20ac0b6279565637e44ffdafc9b6ca1..9aa3d02596529dd33262ad9601546166f53821ae 100644 (file)
@@ -251,8 +251,10 @@ extern bool session_auth_is_superuser;
 extern int log_min_error_statement;
 extern PGDLLIMPORT int log_min_messages;
 extern PGDLLIMPORT int client_min_messages;
+extern int log_min_duration_sample;
 extern int log_min_duration_statement;
 extern int log_temp_files;
+extern double log_statement_sample_rate;
 extern double log_xact_sample_rate;
 
 extern int temp_file_limit;