Make websearch_to_tsquery() parse text in quotes as a single token
authorAlexander Korotkov <akorotkov@postgresql.org>
Mon, 3 May 2021 00:58:03 +0000 (03:58 +0300)
committerAlexander Korotkov <akorotkov@postgresql.org>
Mon, 3 May 2021 01:18:19 +0000 (04:18 +0300)
commiteb086056fec44516efdd5db71244a079fed65c7f
tree18b086f3c361e471380fd6f66c9bf6d7de81f5ac
parent651d005e76bc0b9542615f609b4d0d946035dc58
Make websearch_to_tsquery() parse text in quotes as a single token

websearch_to_tsquery() splits text in quotes into tokens and connects them with
phrase operator on its own.  However, that leads to surprising results when the
token contains no words.

For instance, websearch_to_tsquery('"aaa: bbb"') is 'aaa <2> bbb', because
it is equivalent of to_tsquery(E'aaa <-> \':\' <-> bbb').  But
websearch_to_tsquery('"aaa: bbb"') has to be 'aaa <-> bbb' in order to match
to_tsvector('aaa: bbb').

Since 0c4f355c6a, we anyway connect lexemes of complex tokens with phrase
operators.  Thus, let's just websearch_to_tsquery() parse text in quotes as
a single token.  Therefore, websearch_to_tsquery() should process the quoted
text in the same way phraseto_tsquery() does.  This solution is what we exactly
need and also simplifies the code.

This commit is an incompatible change, so we don't backpatch it.

Reported-by: Valentin Gatien-Baron
Discussion: https://postgr.es/m/CA%2B0DEqiZs7gdOd4ikmg%3D0UWG%2BSwWOLxPsk_JW-sx9WNOyrb0KQ%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Tom Lane, Zhihong Yu
src/backend/utils/adt/tsquery.c
src/test/regress/expected/tsearch.out
src/test/regress/sql/tsearch.sql