Skip to content

Commit 80101ab

Browse files
author
Zakirov Artur
committed
Added more information in README. Listed all operator classes
1 parent af9b870 commit 80101ab

File tree

2 files changed

+157
-15
lines changed

2 files changed

+157
-15
lines changed

README.md

Lines changed: 157 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,29 @@
44

55
## Introduction
66

7-
The **rum** module provides access method to work with RUM index. It is based
8-
on the GIN access methods code.
7+
The **rum** module provides access method to work with `RUM` index. It is based
8+
on the `GIN` access methods code.
9+
10+
`GIN` index allows to perform fast full text search using `tsvector` and
11+
`tsquery` types. But full text search with GIN index has several problems:
12+
13+
- Slow ranking. It is need position information about lexems to ranking. `GIN`
14+
index doesn't store positions of lexems. So after index scan we need additional
15+
heap scan to retreive lexems positions.
16+
- There isn't phrase search with `GIN` index. This problem relates with previous
17+
problem. It is need position information to perform phrase search.
18+
- Slow ordering by timestamp. `GIN` index can't store some related information
19+
in index with lexemes. So it is necessary to perform additional heap scan.
20+
21+
`RUM` solves this problems by storing additional information in posting tree.
22+
For example, positional information of lexemes or timestamps. You can get an
23+
idea of `RUM` by the following picture:
24+
25+
![How RUM stores additional information](img/gin_rum.png)
26+
27+
Drawback of `RUM` is that it has slower build and insert time than `GIN`.
28+
It is because we need to store additional information besides keys and because
29+
`RUM` uses generic WAL.
930

1031
## License
1132

@@ -16,7 +37,7 @@ This module available under the same license as
1637

1738
Before build and install **rum** you should ensure following:
1839

19-
* PostgreSQL version is 9.6.
40+
* PostgreSQL version is 9.6+.
2041

2142
Typical installation procedure may look like this:
2243

@@ -27,18 +48,27 @@ Typical installation procedure may look like this:
2748
$ make USE_PGXS=1 installcheck
2849
$ psql DB -c "CREATE EXTENSION rum;"
2950

30-
## New access method and operator class
31-
32-
The **rum** module provides the access method **rum** and the operator class
33-
**rum_tsvector_ops**.
51+
## Common operators and functions
3452

35-
The module provides new operators.
53+
**rum** module provides next operators.
3654

3755
| Operator | Returns | Description
3856
| -------------------- | ------- | ----------------------------------------------
3957
| tsvector <=> tsquery | float4 | Returns distance between tsvector and tsquery.
58+
| timestamp <=> timestamp | float8 | Returns distance between two timestamps.
59+
| timestamp <=| timestamp | float8 | Returns distance only for left timestamps.
60+
| timestamp |=> timestamp | float8 | Returns distance only for right timestamps.
61+
62+
## Operator classes
4063

41-
## Examples
64+
**rum** provides next operator classes
65+
66+
### rum_tsvector_ops
67+
68+
For type: `tsvector`
69+
70+
This operator class stores `tsvector` lexemes with positional information. Supports
71+
ordering by `<=>` operator and prefix search. There is the example.
4272

4373
Let us assume we have the table:
4474

@@ -70,9 +100,9 @@ And we can execute the following queries:
70100

71101
```sql
72102
=# SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
73-
FROM test_rum
74-
WHERE a @@ to_tsquery('english', 'beautiful | place')
75-
ORDER BY a <=> to_tsquery('english', 'beautiful | place');
103+
FROM test_rum
104+
WHERE a @@ to_tsquery('english', 'beautiful | place')
105+
ORDER BY a <=> to_tsquery('english', 'beautiful | place');
76106
t | rank
77107
---------------------------------+-----------
78108
The situation is most beautiful | 0.0303964
@@ -81,16 +111,128 @@ And we can execute the following queries:
81111
(3 rows)
82112

83113
=# SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank
84-
FROM test_rum
85-
WHERE a @@ to_tsquery('english', 'place | situation')
86-
ORDER BY a <=> to_tsquery('english', 'place | situation');
114+
FROM test_rum
115+
WHERE a @@ to_tsquery('english', 'place | situation')
116+
ORDER BY a <=> to_tsquery('english', 'place | situation');
87117
t | rank
88118
---------------------------------+-----------
89119
The situation is most beautiful | 0.0303964
90120
It looks like a beautiful place | 0.0303964
91121
(2 rows)
92122
```
93123

124+
### rum_tsvector_hash_ops
125+
126+
For type: `tsvector`
127+
128+
This operator class stores hash of `tsvector` lexemes with positional information.
129+
Supports ordering by `&lt;=&gt;` operator. But **doesn't** support prefix search.
130+
131+
### rum_timestamp_ops
132+
133+
For type: `timestamp`
134+
135+
Operator class provides fast search and ordering by timestamp fields. Supports
136+
ordering by `&lt;=&gt;`, `&lt;=&#124;` and `&#124;=&gt;` operators. Can be used
137+
with `rum_tsvector_timestamp_ops` operator class.
138+
139+
### rum_timestamptz_ops
140+
141+
For type: `timestamptz`
142+
143+
Operator class provides fast search and ordering by timestamptz fields. Supports
144+
ordering by `&lt;=&gt;`, `&lt;=&#124;` and `&#124;=&gt;` operators. Can be used
145+
with `rum_tsvector_timestamptz_ops` operator class.
146+
147+
### rum_tsvector_timestamp_ops
148+
149+
For type: `tsvector`
150+
151+
This operator class stores `tsvector` lexems with timestamp field. There is the example:
152+
153+
Let us assume we have the table:
154+
```sql
155+
CREATE TABLE tsts (id int, t tsvector, d timestamp);
156+
157+
\copy tsts from 'rum/data/tsts.data'
158+
159+
CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_timestamp_ops, d)
160+
WITH (attach = 'd', to = 't');
161+
```
162+
163+
Now we can execute the following queries:
164+
```sql
165+
EXPLAIN (costs off)
166+
SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;
167+
QUERY PLAN
168+
-----------------------------------------------------------------------------------
169+
Limit
170+
-> Index Scan using tsts_idx on tsts
171+
Index Cond: (t @@ '''wr'' & ''qh'''::tsquery)
172+
Order By: (d <=> 'Mon May 16 14:21:25 2016'::timestamp without time zone)
173+
(4 rows)
174+
175+
SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;
176+
id | d | ?column?
177+
-----+---------------------------------+---------------
178+
355 | Mon May 16 14:21:22.326724 2016 | 2.673276
179+
354 | Mon May 16 13:21:22.326724 2016 | 3602.673276
180+
371 | Tue May 17 06:21:22.326724 2016 | 57597.326724
181+
406 | Wed May 18 17:21:22.326724 2016 | 183597.326724
182+
415 | Thu May 19 02:21:22.326724 2016 | 215997.326724
183+
(5 rows)
184+
```
185+
186+
### rum_tsvector_timestamptz_ops
187+
188+
For type: `tsvector`
189+
190+
See comments for `rum_tsvector_timestamp_ops` operator class.
191+
192+
### rum_tsvector_hash_timestamp_ops
193+
194+
For type: `tsvector`
195+
196+
This operator class stores hash of `tsvector` lexems with timestamp field.
197+
**Doesn't** support prefix search.
198+
199+
### rum_tsvector_hash_timestamptz_ops
200+
201+
For type: `tsvector`
202+
203+
This operator class stores hash of `tsvector` lexems with timestamptz field.
204+
**Doesn't** support prefix search.
205+
206+
### rum_tsquery_ops
207+
208+
For type: `tsquery`
209+
210+
Stores branches of query tree in additional information. For example we have the table:
211+
```sql
212+
CREATE TABLE query (q tsquery, tag text);
213+
214+
INSERT INTO query VALUES ('supernova & star', 'sn'),
215+
('black', 'color'),
216+
('big & bang & black & hole', 'bang'),
217+
('spiral & galaxy', 'shape'),
218+
('black & hole', 'color');
219+
220+
CREATE INDEX query_idx ON query USING rum(q);
221+
```
222+
223+
Now we can execute the following fast query:
224+
```sql
225+
SELECT * FROM query
226+
WHERE to_tsvector('black holes never exists before we think about them') @@ q;
227+
q | tag
228+
------------------+-------
229+
'black' | color
230+
'black' & 'hole' | color
231+
(2 rows)
232+
233+
234+
```
235+
94236
## Authors
95237

96238
Alexander Korotkov <a.korotkov@postgrespro.ru> Postgres Professional Ltd., Russia

img/gin_rum.png

18.7 KB
Loading

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy