Skip to content

Commit 35fcb1b

Browse files
committed
Allow GiST distance function to return merely a lower-bound.
The distance function can now set *recheck = false, like index quals. The executor will then re-check the ORDER BY expressions, and use a queue to reorder the results on the fly. This makes it possible to do kNN-searches on polygons and circles, which don't store the exact value in the index, but just a bounding box. Alexander Korotkov and me
1 parent ecd222e commit 35fcb1b

File tree

19 files changed

+699
-40
lines changed

19 files changed

+699
-40
lines changed

doc/src/sgml/gist.sgml

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@
105105
<literal>~=</>
106106
</entry>
107107
<entry>
108+
<literal>&lt;-&gt;</>
108109
</entry>
109110
</row>
110111
<row>
@@ -163,6 +164,7 @@
163164
<literal>~=</>
164165
</entry>
165166
<entry>
167+
<literal>&lt;-&gt;</>
166168
</entry>
167169
</row>
168170
<row>
@@ -206,6 +208,12 @@
206208
</tgroup>
207209
</table>
208210

211+
<para>
212+
Currently, ordering by the distance operator <literal>&lt;-&gt;</>
213+
is supported only with <literal>point</> by the operator classes
214+
of the geometric types.
215+
</para>
216+
209217
<para>
210218
For historical reasons, the <literal>inet_ops</> operator class is
211219
not the default class for types <type>inet</> and <type>cidr</>.
@@ -780,6 +788,7 @@ my_distance(PG_FUNCTION_ARGS)
780788
data_type *query = PG_GETARG_DATA_TYPE_P(1);
781789
StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
782790
/* Oid subtype = PG_GETARG_OID(3); */
791+
/* bool *recheck = (bool *) PG_GETARG_POINTER(4); */
783792
data_type *key = DatumGetDataType(entry-&gt;key);
784793
double retval;
785794

@@ -792,14 +801,24 @@ my_distance(PG_FUNCTION_ARGS)
792801
</programlisting>
793802

794803
The arguments to the <function>distance</> function are identical to
795-
the arguments of the <function>consistent</> function, except that no
796-
recheck flag is used. The distance to a leaf index entry must always
797-
be determined exactly, since there is no way to re-order the tuples
798-
once they are returned. Some approximation is allowed when determining
799-
the distance to an internal tree node, so long as the result is never
800-
greater than any child's actual distance. Thus, for example, distance
801-
to a bounding box is usually sufficient in geometric applications. The
802-
result value can be any finite <type>float8</> value. (Infinity and
804+
the arguments of the <function>consistent</> function.
805+
</para>
806+
807+
<para>
808+
Some approximation is allowed when determining the distance, as long as
809+
the result is never greater than the entry's actual distance. Thus, for
810+
example, distance to a bounding box is usually sufficient in geometric
811+
applications. For an internal tree node, the distance returned must not
812+
be greater than the distance to any of the child nodes. If the returned
813+
distance is not accurate, the function must set *recheck to false. (This
814+
is not necessary for internal tree nodes; for them, the calculation is
815+
always assumed to be inaccurate). The executor will calculate the
816+
accurate distance after fetching the tuple from the heap, and reorder
817+
the tuples if necessary.
818+
</para>
819+
820+
<para>
821+
The result value can be any finite <type>float8</> value. (Infinity and
803822
minus infinity are used internally to handle cases such as nulls, so it
804823
is not recommended that <function>distance</> functions return these
805824
values.)

src/backend/access/gist/gistget.c

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,10 @@
3030
* The index tuple might represent either a heap tuple or a lower index page,
3131
* depending on whether the containing page is a leaf page or not.
3232
*
33-
* On success return for a heap tuple, *recheck_p is set to indicate
34-
* whether recheck is needed. We recheck if any of the consistent() functions
35-
* request it. recheck is not interesting when examining a non-leaf entry,
36-
* since we must visit the lower index page if there's any doubt.
33+
* On success return for a heap tuple, *recheck_p is set to indicate whether
34+
* recheck is needed. We recheck if any of the consistent() or distance()
35+
* functions request it. recheck is not interesting when examining a non-leaf
36+
* entry, since we must visit the lower index page if there's any doubt.
3737
*
3838
* If we are doing an ordered scan, so->distances[] is filled with distance
3939
* data from the distance() functions before returning success.
@@ -176,6 +176,7 @@ gistindex_keytest(IndexScanDesc scan,
176176
else
177177
{
178178
Datum dist;
179+
bool recheck;
179180
GISTENTRY de;
180181

181182
gistdentryinit(giststate, key->sk_attno - 1, &de,
@@ -192,16 +193,21 @@ gistindex_keytest(IndexScanDesc scan,
192193
* always be zero, but might as well pass it for possible future
193194
* use.)
194195
*
195-
* Note that Distance functions don't get a recheck argument. We
196-
* can't tolerate lossy distance calculations on leaf tuples;
197-
* there is no opportunity to re-sort the tuples afterwards.
196+
* Distance functions get a recheck argument as well. In this
197+
* case the returned distance is the lower bound of distance
198+
* and needs to be rechecked. We return single recheck flag
199+
* which means that both quals and distances are to be
200+
* rechecked.
198201
*/
199-
dist = FunctionCall4Coll(&key->sk_func,
202+
dist = FunctionCall5Coll(&key->sk_func,
200203
key->sk_collation,
201204
PointerGetDatum(&de),
202205
key->sk_argument,
203206
Int32GetDatum(key->sk_strategy),
204-
ObjectIdGetDatum(key->sk_subtype));
207+
ObjectIdGetDatum(key->sk_subtype),
208+
PointerGetDatum(&recheck));
209+
210+
*recheck_p |= recheck;
205211

206212
*distance_p = DatumGetFloat8(dist);
207213
}
@@ -434,6 +440,7 @@ getNextNearest(IndexScanDesc scan)
434440
{
435441
GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
436442
bool res = false;
443+
int i;
437444

438445
if (scan->xs_itup)
439446
{
@@ -454,6 +461,11 @@ getNextNearest(IndexScanDesc scan)
454461
/* found a heap item at currently minimal distance */
455462
scan->xs_ctup.t_self = item->data.heap.heapPtr;
456463
scan->xs_recheck = item->data.heap.recheck;
464+
for (i = 0; i < scan->numberOfOrderBys; i++)
465+
{
466+
scan->xs_orderbyvals[i] = Float8GetDatum(item->distances[i]);
467+
scan->xs_orderbynulls[i] = false;
468+
}
457469

458470
/* in an index-only scan, also return the reconstructed tuple. */
459471
if (scan->xs_want_itup)

src/backend/access/gist/gistproc.c

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1478,3 +1478,40 @@ gist_point_distance(PG_FUNCTION_ARGS)
14781478

14791479
PG_RETURN_FLOAT8(distance);
14801480
}
1481+
1482+
/*
1483+
* The inexact GiST distance method for geometric types that store bounding
1484+
* boxes.
1485+
*
1486+
* Compute lossy distance from point to index entries. The result is inexact
1487+
* because index entries are bounding boxes, not the exact shapes of the
1488+
* indexed geometric types. We use distance from point to MBR of index entry.
1489+
* This is a lower bound estimate of distance from point to indexed geometric
1490+
* type.
1491+
*/
1492+
Datum
1493+
gist_bbox_distance(PG_FUNCTION_ARGS)
1494+
{
1495+
GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
1496+
StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
1497+
bool *recheck = (bool *) PG_GETARG_POINTER(4);
1498+
double distance;
1499+
StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
1500+
1501+
/* Bounding box distance is always inexact. */
1502+
*recheck = true;
1503+
1504+
switch (strategyGroup)
1505+
{
1506+
case PointStrategyNumberGroup:
1507+
distance = computeDistance(false,
1508+
DatumGetBoxP(entry->key),
1509+
PG_GETARG_POINT_P(1));
1510+
break;
1511+
default:
1512+
elog(ERROR, "unknown strategy number: %d", strategy);
1513+
distance = 0.0; /* keep compiler quiet */
1514+
}
1515+
1516+
PG_RETURN_FLOAT8(distance);
1517+
}

src/backend/access/gist/gistscan.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,11 @@ gistbeginscan(PG_FUNCTION_ARGS)
8585
/* workspaces with size dependent on numberOfOrderBys: */
8686
so->distances = palloc(sizeof(double) * scan->numberOfOrderBys);
8787
so->qual_ok = true; /* in case there are zero keys */
88+
if (scan->numberOfOrderBys > 0)
89+
{
90+
scan->xs_orderbyvals = palloc(sizeof(Datum) * scan->numberOfOrderBys);
91+
scan->xs_orderbynulls = palloc(sizeof(bool) * scan->numberOfOrderBys);
92+
}
8893

8994
scan->opaque = so;
9095

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy