Skip to content

Commit 4cdf1d3

Browse files
authored
added (#1171)
1 parent 0df409b commit 4cdf1d3

File tree

3 files changed

+98
-24
lines changed

3 files changed

+98
-24
lines changed

docs/en/guides/55-performance/03-fulltext-index.md

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,6 @@ REFRESH INVERTED INDEX customer_feedback_idx ON customer_feedback;
6262

6363
Databend offers a range of full-text search functions empowering you to efficiently search through documents. For more information about their syntax and examples, see [Full-Text Search Functions](/sql/sql-functions/search-functions/).
6464

65-
| Full-Text Search Function | Description |
66-
|------------------------------------|-----------------------------------------------------------------|
67-
| `MATCH('<columns>', '<keywords>')` | Searches for documents containing specified keywords. |
68-
| `QUERY('<query_expr>')` | Searches for documents satisfying a specified query expression. |
69-
| `SCORE()` | Returns the relevance of the query string. |
70-
7165
## Managing Inverted Indexes
7266

7367
Databend provides a variety of commands to manage inverted indexes. For details, see [Inverted Index](/sql/sql-commands/ddl/inverted-index/).
@@ -170,11 +164,9 @@ FROM
170164
WHERE
171165
MATCH(event_message, 'PersistentVolume');
172166

173-
┌──────────────────────────────────────────────────┐
174-
│ event_id │ event_message │
175-
├─────────────────┼────────────────────────────────┤
176-
5 │ PersistentVolume claim created │
177-
└──────────────────────────────────────────────────┘
167+
-[ RECORD 1 ]-----------------------------------
168+
event_id: 5
169+
event_message: PersistentVolume claim created
178170
```
179171

180172
To check if the full-text index will be utilized for the search, use the [EXPLAIN](/sql/sql-commands/explain-cmds/explain) command:
@@ -194,7 +186,6 @@ Filter
194186
├── read size: < 1 KiB
195187
├── partitions total: 5
196188
├── partitions scanned: 1
197-
// highlight-next-line
198189
├── pruning stats: [segments: <range pruning: 5 to 5>, blocks: <range pruning: 5 to 5, inverted pruning: 5 to 1>]
199190
├── push downs: [filters: [k8s_logs._search_matched (#4)], limit: NONE]
200191
└── estimated rows: 5.00
@@ -217,9 +208,26 @@ WHERE
217208
SCORE() > 0.5
218209
AND QUERY('event_message:"PersistentVolume claim created"');
219210

220-
┌─────────────────────────────────────────────────────────────────────────────────────┐
221-
│ event_id │ event_message │ event_timestamp │ score() │
222-
├─────────────────┼────────────────────────────────┼─────────────────────┼────────────┤
223-
5 │ PersistentVolume claim created │ 2024-04-08 12:00:000.86304635
224-
└─────────────────────────────────────────────────────────────────────────────────────┘
225-
```
211+
-[ RECORD 1 ]-----------------------------------
212+
event_id: 5
213+
event_message: PersistentVolume claim created
214+
event_timestamp: 2024-04-08 12:00:00
215+
score(): 0.86304635
216+
```
217+
218+
The following query performs a fuzzy search using the `fuzziness` option:
219+
220+
```sql
221+
-- 'PersistentVolume claim create' is intentionally misspelled
222+
SELECT
223+
event_id, event_message, event_timestamp
224+
FROM
225+
k8s_logs
226+
WHERE
227+
match('event_message', 'PersistentVolume claim create', 'fuzziness=1');
228+
229+
-[ RECORD 1 ]-----------------------------------
230+
event_id: 5
231+
event_message: PersistentVolume claim created
232+
event_timestamp: 2024-04-08 12:00:00
233+
```

docs/en/sql-reference/20-sql-functions/10-search-functions/match.md

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: MATCH
33
---
44
import FunctionDescription from '@site/src/components/FunctionDescription';
55

6-
<FunctionDescription description="Introduced or updated: v1.2.425"/>
6+
<FunctionDescription description="Introduced or updated: v1.2.619"/>
77

88
Searches for documents containing specified keywords. Please note that the MATCH function can only be used in a WHERE clause.
99

@@ -14,13 +14,20 @@ Databend's MATCH function is inspired by Elasticsearch's [MATCH](https://www.ela
1414
## Syntax
1515

1616
```sql
17-
MATCH( '<columns>', '<keywords>' )
17+
MATCH( '<columns>', '<keywords>'[, '<options>'] )
1818
```
1919

2020
| Parameter | Description |
2121
|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
2222
| `<columns>` | A comma-separated list of column names in the table to search for the specified keywords, with optional weighting using the syntax (^), which allows assigning different weights to each column, influencing the importance of each column in the search. |
23-
| `<keywords>` | The keywords to match against the specified columns in the table. |
23+
| `<keywords>` | The keywords to match against the specified columns in the table. This parameter can also be used for suffix matching, where the search term followed by an asterisk (*) can match any number of characters or words. |
24+
| `<options>` | A set of configuration options, separated by semicolons `;`, that customize the search behavior. See the table below for details. |
25+
26+
| Option | Description | Example | Explanation |
27+
|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
28+
| fuzziness | Allows matching terms within a specified Levenshtein distance. `fuzziness` can be set to 1 or 2. | SELECT id, score(), content FROM t WHERE match(content, 'box', 'fuzziness=1'); | When matching the query term "box", `fuzziness=1` allows matching terms like "fox", since "box" and "fox" have a Levenshtein distance of 1. |
29+
| operator | Specifies how multiple query terms are combined. Can be set to OR (default) or AND. OR returns results containing any of the query terms, while AND returns results containing all query terms. | SELECT id, score(), content FROM t WHERE match(content, 'action works', 'fuzziness=1;operator=AND'); | With `operator=AND`, the query requires both "action" and "works" to be present in the results. Due to `fuzziness=1`, it matches terms like "Actions" and "words", so "Actions speak louder than words" is returned. |
30+
| lenient | Controls whether errors are reported when the query text is invalid. Defaults to `false`. If set to `true`, no error is reported, and an empty result set is returned if the query text is invalid. | SELECT id, score(), content FROM t WHERE match(content, '()', 'lenient=true'); | If the query text `()` is invalid, setting `lenient=true` prevents an error from being thrown and returns an empty result set instead. |
2431

2532
## Examples
2633

@@ -46,6 +53,20 @@ SELECT * FROM test WHERE MATCH('title', 'art power');
4653
│ The Art of Communication │ Effective communication is crucial in everyday life. │
4754
└────────────────────────────────────────────────────────────────────────────────────────────────────┘
4855

56+
-- Retrieve documents where the 'title' column contains values that start with 'The' followed by any characters
57+
SELECT * FROM test WHERE MATCH('title', 'The*')
58+
59+
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
60+
│ title │ body │
61+
│ Nullable(String) │ Nullable(String) │
62+
├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤
63+
│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │
64+
│ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │
65+
│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │
66+
│ The Art of Communication │ Effective communication is crucial in everyday life. │
67+
│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │
68+
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
69+
4970
-- Retrieve documents where either the 'title' or 'body' column matches 'knowledge technology'
5071
SELECT *, score() FROM test WHERE MATCH('title, body', 'knowledge technology');
5172

@@ -65,4 +86,11 @@ SELECT *, score() FROM test WHERE MATCH('title^5, body^1.2', 'knowledge technolo
6586
│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708
6687
│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ 7.8053584
6788
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
89+
90+
-- Retrieve documents where the 'body' column contains both "knowledge" and "imagination" (allowing for minor typos).
91+
SELECT * FROM test WHERE MATCH('body', 'knowledg imaginatio', 'fuzziness = 1; operator = AND');
92+
93+
-[ RECORD 1 ]-----------------------------------
94+
title: The Importance of Reading
95+
body: Reading is a crucial skill that opens up a world of knowledge and imagination.
6896
```

0 commit comments

Comments
 (0)