0% found this document useful (0 votes)

4 views

Elasticsearch Python Slides

The document provides an overview of various systems and operations related to indexing and managing data in Elasticsearch, including creating indices, inserting and deleting documents, and utilizing the search API. It explains data types, mapping, and the bulk API for efficient data handling. Additionally, it covers query DSL for constructing complex queries to retrieve specific data from indices.

Uploaded by

taigrus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODP, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Elasticsearch Python Slides

Uploaded by

taigrus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODP, PDF, TXT or read online on Scribd

You are on page 1/ 173

2

Indices

- The quick brown fox jumped over the lazy dog

- 3.14
- 15/09/2024

The quick brown fox

Text embedding
Documents jumped over the [-0.1, 2.5, …, -1.67]
model
lazy dog

3
Search system

Indices

https://www.travelmediagroup.com/the-power-of-facebook-as-a-search-engine-2/

Documents

4
Recommendation system

Indices

https://www.shopagain.com/blog/product-recommendation-engines-what-is-it-how-it-work/

Documents

5
RAG system

Indices

https://www.superannotate.com/blog/rag-explained

Documents

6
RAG system

Indices

https://www.superannotate.com/blog/rag-explained

Documents

7
3 Create an index

8
What is an index ?

●
Name
●
Price
JSON JSON
●
Descriptio
n
●
...
Product 1 Product 2

JSON

Product n

Product Index

9
Shards & replicas

Number of shards = 2

Sharding
Product Index

Product Index

10
Shards & replicas

Number of shards = 2
Number of replicas = 1

Product Index
duplicate

Product Index
11
4 Inserting documents

12
Document
●
“Name”: “value”
JSON
●
“Price”: “value”
●
“Description: “value”
●
...

13
Document
insertion mapping
Field Type
JSON

created_on date
X100 my_index
text text

title text

🛈 This process is called mapping and can be

done automatically or manually. By default,
ElasticSearch does it automatically.

14
5 Field data types

15
Field data types
insertion mapping
Field Type
JSON

created_on date
X100 my_index
text text

title text

16
Field data types

1)Common types

●
Binary
R ead mor e
●
Accepts a binary value as a Base64 encoded string.
●
Is not searchable and is not stored.
●
Use _source (i.e., document) to get the data back.

encoding
IVBORw0KGgoAAAANSUhEUgAAA
OEAAADhCAIAAADiVBORw0KGgo
AAAANSUhEUgAAAOEAAADhCAI
A….

Base64
representation

17
Field data types

1)Common types

●
Binary
R ead mor e
●
Boolean (True / False)
●
Numbers (long, integer, byte, short, etc)
●
Dates
●
Keyword (IDs, email addresses, status codes, zip codes, etc)

18
Field data types

2)Objects types (JSON)

●
Object
Read more

{
JSON "region": "US",
"manager": { indexation {
"age": 30, "region": "US",
"name": { "manager.age": 30,
"first": "John", "manager.name.first": "John",
"last": "Smith" "manager.name.last": "Smith"
} }
}
}

19
Field data types

2)Objects types (JSON)

●
Object
●
Flattened Read more
●
Efficient for deeply nested JSON objects.
●
Hierarchical structure is not preserved.
●
Nested
●
Cases where you have array of objects.
●
Maintains relationship between the object’s fields.

20
Field data types

2)Objects types (JSON)

●
Flattened / Nested object example
Read more
{
JSON "group": "fans",
"user": [
{
"first": "John", indexation {
"last": "Smith" "group" : "fans",
}, "user.first" : [ "alice", "john" ],
{ "user.last" : [ "smith", "white" ]
"first": "Alice", }
"last": "White"
}
]
}

21
Field data types

3)Text search types

●
Text
●
Used for full-text content. Read more
●
Examples: Body of an email or the description of a product.

analyzer
Structured format that’s optimized for search.

Unstructured format
22
Field data types

3)Text search types

●
Text
●
Used for full-text content. Read more
●
Examples: Body of an email or the description of a product.
●
Completion
●
Search as you type
●
Annotated text

23
Field data types

4)Spatial data types

●
Geo point
●
Geo shape Read more
●
Point (Cartesian point)
●
Shape (Cartesian geometry)

24
Field data types

25
6 Delete documents

26
Delete documents

JSON id=1

Read mor e
my_index
JSON id=2

JSON id=3

DELETE

27
Delete documents

JSON id=1

Read mor e
my_index
JSON id=2

JSON id=3

my_index 1 my_index 4

DELETE DELETE

❌ 28
7 Get document

29
Get documents

JSON id=1

Read mor e
my_index
JSON id=2

JSON id=3

GET

30
Get documents

JSON id=1

Read mor e
my_index
JSON id=2

JSON id=3

my_index 1 my_index 4

GET GET

❌ 31
8 Count documents

32
Count documents

JSON

Read mor e
my_index
JSON

JSON

🛈 The query parameter is used to match

<index> <index> <q>
certain criteria

COUNT COUNT

33
9 The exists API

34
The exists API

JSON

Read mor e
my_index
JSON

JSON

<index> <index> <id>

client.indices.exists client.exists

🛈 This checks if an index exists in 🛈 This checks if a document exists 35

ElasticSearch. in an index.
10 Update document

36
Update documents
1) Documents exists in the index

JSON id=1 d mor e

Rea

my_index JSON id=2

🛈 The update operation follows these steps:

<index> <id>
1. Get the document.
2. Update it (e.g. Add a new field, remove a field, or update a field)
3. Re-index the result.
UPDATE

37
Update documents
But how do you update the document?

insertion
●
“book_id”: 1 Read mor e
JSON
●
“book_name”: “A book”

id=1 my_index

my_index get update

script = {
UPDATE Document “book_id” = 2
}
1 1 2

38
Update documents
2) Document doesn’t exist

JSON id=1 d mor e

Rea

my_index JSON id=2

🛈 The update operation can create the document if it doesn’t exist

<index> id=4
1. Add the values you want to insert.
2. Set doc_as_upsert to true.

UPDATE

39
11 Bulk API

40
The bulk API

JSON Index

Read mor e
my_index JSON Index

Update

Delete
●
Each operation (index, update, delete) makes a separate
... API call.
●
The bulk API performs multiple operations in one API call.
This increases indexing speed.

41
The bulk API - Syntax

●
action and metadata\n
●
optional source\n 🛈 The source is required for:
●
action and metadata\n update e
●
●
optional source\n Read mor
●
…
●
index
●
action and metadata\n ●
create
●
optional source\n

🛈 The action can be one of the following:

●
index ●
update
●
create ●
delete

42
The bulk API - Example
response = es.bulk(
operations=[
{
"index": { 🛈 The source is required for:
"_index": "test",

}
"_id": "1" ●
update
},
{ ●
index
},
"field1": "value1"
Read mor e
{
●
create
"delete": {
"_index": "test",
"_id": "2"
}
},
{
"create": {
"_index": "test",
"_id": "3"
}
},
{
"field1": "value3"
},
{
"update": {
"_id": "1",
"_index": "test"
}
},
{
"doc": { Action
"field2": "value2"
} Source
}
], 43
)
12 The search API – Part 1

44
The search API

JSON id=1
Read mor e

my_index JSON id=2

You use the search API to build:

●
Search engines ●
Log data analysis
●
Recommendation systems ●
...
●
Real-time dashboards

45
The search API

<index> ●
my_index
●
index_1,my_index
SEARCH mor e
Read
●
index*
●
_all

46
The search API

<index>

q ●
Use it for simple searches.
SEARCH mor e
Read
●
Uses the Lucene syntax.

47
The search API

<index>

q
SEARCH mor e
Read
query ●
Use it for complex, structured
queries.
●
Uses the Query DSL language.
●
Default value is match all.

48
The search API

<index>

q
SEARCH mor e
Read
query

timeout, size, from

●
Timeout:
●
The maximum time to wait for a search request to complete.
●
Time units (seconds, milliseconds, days, etc) .
●
Size:
●
Defines the number of hits (documents) to return.
●
Default value is 10. Max value is 10000.
49
The search API

<index>

q
SEARCH mor e
Read
query

timeout, size, from

●
from:
●
Starting point from which to return search results (pagination) .
●
Useful if you want to implement skip functionality.

50
13 The search API – Part 2

51
The search API – Query DSL

<index>

q
SEARCH mor e
Read
query

Query DSL is used to create complex, structured queries.

Query DSL consists of two types of clauses:
●
Leaf clauses (match, term, or range)
●
Compound clauses (bool)

🛈 Query DSL means Query (Domain Specific Language)

52
The search API – Query DSL

Leaf clauses
1. match:
●
Is used to perform full-text search.
Read mor e
●
Returns documents that match a provided text, number, date, or bool.
●
The field must be mapped to a text data type.
2. term:
●
Returns documents that contain an exact term in a provided field.
●
The field must be mapped to a keyword data type or a numeric/date
type.
●
Example usage : product ID, book ID, username, etc.
3. range:
●
Returns documents that contain terms within a provided range.

53
The search API – Query DSL

●
Example for match query:
response = es.search(
index="my_index",
Read mor e
body={
"query": {
"match": {
"description": "A description."
}
}
}
)

54
The search API – Query DSL

●
Example for term query:
response = es.search(
index="my_index",
Read mor e
body={
"query": {
"term": {
"product_id": "PRODUCT_12345"
}
}
}
)

55
The search API – Query DSL
●
Example for range query:

response = es.search(
index="my_index",
Read mor e
body={
"query": {
"range": {
"publication_date":
{
"gte": "2023-01-01",
"lte": "2023-12-31"
}
}
}
}
}
)
56
The search API – Query DSL

Compound clauses
●
bool:
●
Combines multiple queries using boolean logic:
Read mor e
●
must, filter, should, must_not.
●
The field must be mapped to a text data type.

57
The search API – Query DSL
●
Example for bool query:

response = es.search(index="my_index", body={

"query": {
"bool": {
"must": [
Read mor e
{
"match": {
"title": "Elasticsearch"
}
}
],
"filter": [
{
"term": {
"status": "published"
}
}
],
"should": [
{
"match": {
"tags": "search"
}
}
],
"must_not": [
{
"term": {
"deleted": True
}
}
]
} 58
}
)
The search API – Query DSL
●
Example for bool query:

response = es.search(index="my_index", body={

"query": {
"bool": {
"must": [
Read mor e
{
"match": { Keeping documents where title is equal to
"title": "Elasticsearch"
} Elasticsearch
}
],
"filter": [
{
"term": {
"status": "published"
}
}
],
"should": [
{
"match": {
"tags": "search"
}
}
],
"must_not": [
{
"term": {
"deleted": True
}
}
]
} 59
}
})
The search API – Query DSL
●
Example for bool query:

response = es.search(index="my_index", body={

"query": {
"bool": {
"must": [
Read mor e
{
"match": {
"title": "Elasticsearch"
}
}
],
"filter": [
{
"term": {
"status": "published"
Filtering documents with a status of
} published
}
],
"should": [
{
"match": {
"tags": "search"
}
}
],
"must_not": [
{
"term": {
"deleted": True
}
}
]
} 60
}
})
The search API – Query DSL
●
Example for bool query:

response = es.search(index="my_index", body={

"query": {
"bool": {
"must": [
Read mor e
{
"match": {
"title": "Elasticsearch"
}
}
],
"filter": [
{
"term": {
"status": "published"
}
}
],
"should": [
{
"match": {
"tags": "search"
This match is optional
}
}
],
"must_not": [
{
"term": {
"deleted": True
}
}
]
} 61
}
})
The search API – Query DSL
●
Example for bool query:

response = es.search(index="my_index", body={

"query": {
"bool": {
"must": [
Read mor e
{
"match": {
"title": "Elasticsearch"
}
}
],
"filter": [
{
"term": {
"status": "published"
}
}
],
"should": [
{
"match": {
"tags": "search"
}
}
],
"must_not": [
{
"term": {
"deleted": True
Excluding any document where the
} deleted field is set to true
}
]
} 62
}
})
14 The search API – Part 3

63
The search API

Timeout
●
It sets the maximum duration for a query to execute.
●
If the search takes longer than the specified time, Elasticsearch will
Read mor e
abort the search.

64
The search API

65
The search API

Timeout
●
Sets the maximum duration for a query to execute.
●
If the search takes longer than the specified time, Elasticsearch will
Read mor e
abort the search.
Size
●
Controls how many search results are returned.
●
Max value is 10000.
From
●
Used for pagination.
●
It tells Elasticsearch how many documents to skip before starting to return
results.

66
The search API

Aggregations
●
Performs calculation on the data
●
Average, max, min, count.
Read mor e

67
15 Dense vectors

68
Dense vector field type

●
Stores dense vectors of numeric values.
●
Use it if you have few or no zero elements.
●
Does not support aggregations or sorting.
Read mor e
●
It is not possible to store multiple values in one dense vector field.
●
Use kNN search to retrieve the nearest vectors.
●
Max size of a dense vector is 4096.

[2 , 0 ,1 ,−5 ,... , 20]

[[...],[...],[...],...]
69
Medium article by Sachinsoni
Dense vector field type

Examples
response = client.index(
index="my-index",
id="1", Read mor e
document={
"my_text": "example text",
"my_vector": [0.5, 10, 6]
}
)

response = client.index(
index="my-index",
id="2",
document={
"my_text": "example text 2",
"my_vector": [[0.5, 10, 6], [1, 0, -2]]
}
)
70
Dense vector field type

Important
●
You have to manually do the mapping.
●
Elasticsearch does not automatically infer the mapping for dense
Read mor e
vectors.
●
It needs to know the exact number of dimensions.

response = es.indices.create(
index="my_index",
mappings={
"properties": {
"sides_length": {
"type": "dense_vector",
"dims": 4
},
"shape": {
"type": "keyword"
}
}
}, 71
)
16 Embedding documents

72
Embedding documents

●
Embedding transforms text into numerical vectors.
●
Deep learning models are used to embed documents.
●
These models preserve the meaning of the text.
Read mor e
●
Use cases:
●
Recommendation systems
●
Retrieval-Augmented Generation (RAG)

[2 , 0 ,1 ,−5 ,... , 20]

[0.54 ,−4.2 , 0 ,−0.6 ,... ,1]
embedding
[3 , 0.33 ,−.98 ,−1 ,... ,−1.1]
73
Embedding documents

Deep learning models

Read mor e

●
Closed models. ●
Open-source models.
●
Paid (Pay what you use). ●
Free to use.
●
No hardware required (Cloud). ●
Hardware is required (preferably GPU).

74
Embedding documents

Embedding size

●
Size of the dense vector. or e
Read m
●
Larger sizes yield better embeddings.
●
Common sizes include 384, 768, 1024.

75
Embedding documents

Input size

●
Size of the input text that the model can process. or e
Read m
●
Text will be truncated if it exceeds the model’s capacity.
●
Common values include 256, 512 tokens.

76
Embedding documents

Text language

●
Some models can translate specific languages. or e
Read m
●
Others support multiple languages and are multilingual.

77
Embedding documents

78
Massive Text Embedding Benchmark (MTEB)
17 KNN search

79
k-nearest neighbor (kNN) search

●
How do we search for embedded documents?
●
We use the kNN search for fields mapped as dense vectors.
●
Important, you can’t use the query parameter in this case.
Read mor e

[2 , 0 ,1 ,−5 ,... , 20]

[0.54 ,−4.2 , 0 ,−0.6 ,... ,1]
embedding
[3 , 0.33 ,−.98 ,−1 ,... ,−1.1]
80
k-nearest neighbor (kNN) search

●
The kNN algorithm is used for classification and regression tasks.
●
It classifies new data points by comparing them with the k nearest
points from the training data.
Read mor e
●
Commonly distances metrics: Euclidean, Manhattan or Minkowski.
●
This algorithm is simple and effective.

Medium article by Sachinsoni Medium article by Dancker

81
k-nearest neighbor (kNN) search

Watch on

82
k-nearest neighbor (kNN) search

Example 🛈 Should be a dense_vector

results = es.search(
knn={
Read mor e
'field': 'embedding',
'query_vector': es.get_embedding(parsed_query),
'num_candidates': 50,
'k': 10,
}
)

83
k-nearest neighbor (kNN) search

Example
results = es.search(
knn={
Read mor e
'field': 'embedding',
'query_vector': es.get_embedding(parsed_query),
'num_candidates': 50,
'k': 10,
}
) 🛈 The query should be a vector.

84
k-nearest neighbor (kNN) search

Example
results = es.search(
knn={
Read mor e
'field': 'embedding',
'query_vector': es.get_embedding(parsed_query),
'num_candidates': 50,
'k': 10,
}
) 🛈 Retrieves 50 potentially relevant documents
before applying distance calculations to select
the k best documents.

85
k-nearest neighbor (kNN) search

Example
results = es.search(
knn={
Read mor e
'field': 'embedding',
'query_vector': es.get_embedding(parsed_query),
'num_candidates': 50,
'k': 10,
}
)

🛈 Returns up to 10 documents that match your query.

86
18 Deep pagination

87
Deep pagination

●
Indexing / fetching all documents at once is inefficient and slow.
●
Pagination:
●
Retrieves data in small chunks from large indexes.
Read mor e
●
Improves performance and efficiency.
●
Fast search experience.
●
Cost effective.

x100K
index /
fetch
Medium article by Dayanand

88
Deep pagination

Pagination methods
●
from/size: commonly used for paginating smaller datasets.
●
search_after: offers more efficient deep pagination for large datasets.
Read mor e

89
Deep pagination

Pagination methods
●
from/size: commonly used for paginating smaller datasets.
●
search_after: offers more efficient deep pagination for large datasets.
Read mor e

🛈 Note:
●
from/size is limited to 10k results.

size = 8
●
Requires a lot of memory for deep
pages.
from = 5 Not suitable for larger indexes.
size = 8

from = 0

index index
90
Deep pagination

Pagination methods
●
from/size: commonly used for paginating smaller datasets.
●
search_after: offers more efficient deep pagination for large datasets.
Read mor e

new sort values

size = 8
Sortable fields
●
timestamp
●
id sort values old sort values
size = 8

index index
91
Deep pagination

Pagination methods
●
from/size: commonly used for paginating smaller datasets.
●
search_after: offers more efficient deep pagination for large datasets.
Read mor e

🛈 Note:
●
The search_after method is not constrained by the 10k result limit.
●
It does not utilize an offset (i.e., the from parameter).
●
Results must be sorted by fields such as ID or timestamp.
●
Uses a pointer derived from the sort values of the last document on the previous page.
●
This approach prevents the skipping of documents.
●
It is particularly beneficial for handling larger indexes.

92
Deep pagination

Comparison
Below are the results obtained after attempting to retrieve 10,000
documents using the following parameters (size = 200, iterations = 50)
Read mor e

93
Deep pagination

Comparison

from/size search_after mor e

Read
Average time (ms) 6.417 3.072

Maximum time (ms) 15.812 4.896

Minimum time (ms) 2.757 1.772

from/size search_after

Performance degradation 2.9x 0.58x

🛈 Performance degradation is calculated like this (last page time / first page time)

94
19 Ingest pipelines

95
Ingest pipelines

●
You can perform transformations on data before indexing.
●
Common transformations: remove fields, lowercase text, remove HTML
tags, and more.
Read mor e

Step 1 Step 2 Step 3

Documents Ingest pipeline Target index

96
Ingest pipelines

●
We use the ingest API to:
●
Create or update a pipeline.

Read mor e
response = client.ingest.put_pipeline(
id="my-pipeline",
description="A description",
processors=[
{
"set": {
"description": "A description",
"field": "field",
"value": “value”
}
},
{
"lowercase": {
"field": "field"
}
}
], 97
)
Ingest pipelines

●
We use the ingest API to:
●
Simulate a pipeline.

Read mor e
response = client.ingest.simulate( OR response = client.ingest.simulate(
id="my-pipeline", pipeline={
docs=[ "processors": [
{ {
"_index": "index", "lowercase": {
"_id": "id", "field": "field"
"_source": { }
"foo": "bar" }
} ]
}, },
{ ...
"_index": "index", )
"_id": "id",
"_source": {
"foo": "rab"
}
} 98
],
)
Ingest pipelines

●
We use the ingest API to:
●
Delete a pipeline.

Read mor e
response = client.ingest.delete_pipeline(
id="my-pipeline",
)

99
Ingest pipelines

●
We use the ingest API to:
●
Get a pipeline.

Read mor e
response = client.ingest.get_pipeline(
id="my-pipeline",
)

100
Ingest pipelines

●
Pipelines can fail. You can either ignore the failure or handle it.
●
If you ignore the failure, the pipeline will skip over the failed steps.

Read mor e
response = client.ingest.put_pipeline(
id="my-pipeline",
processors=[
{
"rename": {
"description": "A description",
"field": "field",
"ignore_failure": True
} Step 1 Step 2 Step 3
}
],
)

Ingest pipeline
101
Ingest pipelines

●
Pipelines can fail. You can either ignore the failure or handle it.
●
Specify custom error-handling steps with on_failure.

Read mor e
response = client.ingest.put_pipeline(
id="my-pipeline",
processors=[ Retry, log error, etc
{
"rename": {
"description": "A description",
"field": "field",
"on_failure": [...]
} Step 1 Step 2 Step 3
}
],
)

Ingest pipeline
102
20 Ingest processors

103
Ingest processors

●
Common transformations: remove fields, lowercase text, remove HTML
tags, and more.

Read mor e

Step 1 Step 2 Step 3

Documents Ingest pipeline Target index

104
Ingest processors

Ingest processors by category

●
Ingest processors are organized into 5 categories. Read mor e

Data enrichment Array/JSON handling Data transformation

●
Append ●
For each ●
Convert
●
Inference ●
JSON ●
Rename
●
Attachment ●
Sort ●
Set
●
... ●
HTML strip
●
Lowercase / Uppercase
●
Trim
●
Split
Data Filtering Pipeline handling ●
...

●
Drop ●
Fail
●
Remove ●
Pipeline
105
21 Filters in depth

106
Filters in depth

●
When searching in Elasticsearch, you can use either query context or
filter context. R ead mor e

Query Query context Score

How well does this document

match this query clause ?

Query Filter context Yes / No

Does this document match this

query clause ? 107
Filters in depth

Why use the filter context ?

●
Binary matching. R ead mor e
●
No score is needed.
●
Filters execute faster than queries (no score is computed).
●
Filters consume less CPU resources.

Query Filter context Yes / No

Does this document match this

query clause ? 108
Filters in depth

Use cases
Filters are effective for querying structured data. R ead mor e

Structured data

Numeric fields Dates Boolean values Keyword fields ...

109
Filters in depth

Example 1

R ead mor e
response = client.search(
index="phones",
query={
"bool": {
"filter": [
{
"term": {
"color": "black" Color
} AND Yes/No
},
{ Brand
"term": {
"brand": "samsung"
}
}
]
}
},
)

110
Filters in depth

Example 2

R ead mor e
response = client.search(
query={
"bool": {
"filter": [
{
"term": {
"status": "published"
} Status
}, AND Yes/No
{
"range": { Publish date
"publish_date": {
"gte": "2015-01-01",
"lte": "2015-02-01"
}
}
}
]
}
},
) 111
Filters in depth

Post filters
●
Applies filters after aggregations are calculated. R ead mor e
●
Does not affect aggregations.
●
Only filters the search results.
●
Let you narrow down what users see without limiting
what they can choose from.

112
Filters in depth
Example
response = client.search( response = client.search( response = client.search(
index="shirts", aggs={ post_filter={ R ead mor e
query={ "colors": { "term": {
"bool": { "terms": { "color": "red"
"filter": { "field": "color" }
"term": { } },
"brand": "gucci" }, )
} "color_red": {
} "filter": {
} "term": {
}, "color": "red"
... }
) },
"aggs": {
"models": {
"terms": {
"field": "model"
}
}
}
}
}, 113
...
)
22 SQL search API

114
SQL Search API

●
We used Query DSL to search for documents.
●
An alternative method for searching documents is the SQL Search API. R ead mor e

Query DSL

Query my_index Results

115
SQL Search API

The SQL search API supports numerous parameters.

R ead mor e

forma cursor
fetch_size
t

delimiter SQL search API filter

page time-out ... request time-out

116
SQL Search API

Example field
R ead mor e
response = client.sql.query(
format="txt",
query="SELECT * FROM library ORDER BY page_count DESC LIMIT 5",
)

size
Index

117
SQL Search API

The available response formats are:

●
CSV R ead mor e
●
JSON
●
TSV
●
TXT response = client.sql.query(
●
YAML format="txt",
●
CBOR query="SELECT * FROM library ORDER BY page_count DESC LIMIT 5",
●
SMILE )
●
Binary formats

118
SQL Search API

Pagination

R ead mor e
response = client.sql.query(
format="txt",
cursor="sDHOSBDISBXMLK…", ?
)

SQL Search
Query API Results

Original example

119
SQL Search API

Filtering

R ead mor e
response = client.sql.query(
format="txt",
query="SELECT * FROM library ORDER BY page_count DESC",
filter={
"range": {
"page_count": {
"gte": 100,
"lte": 200
}
}
},
fetch_size=5,
)

120
SQL Search API

SQL Translate API

R ead mor e
response = client.sql.translate(
query="SELECT * FROM library ORDER BY page_count DESC",
fetch_size=10,
)

{
"size": 10,
"_source": false,
"fields": [{"field": "author"}, ...],
"sort": [
SQL Translate {
API "page_count": {
"order": "desc",
}
}
],
"track_total_hits": -1
} 121
SQL Search API

SQL Limitations

R ead mor e

122
23 Time Series Data Stream

123
Time Series Data Stream

●
Time series data refers to data points ordered by time.
●
Data is collected at regular intervals. R ead mor e
●
Example: CPU usage over time.

124
Time Series Data Stream

●
Managing time series data is challenging.
●
The data can grow rapidly (high frequency measurements) R ead mor e
●
How to store this large volume efficiently?
●
Deciding which old data to keep and when to delete it.

125
Time Series Data Stream

Why use Elasticsearch for time series data?

●
Elasticsearch can handle massive volumes of data. R ead mor e
●
Supports real-time data ingestion and querying.
●
Analyze time-series.

Original post Original post

126
Time Series Data Stream

Time series data structure

●
Each data point is a document R ead mor e
●
Each document contains the timestamp field and the data.

26
data

06:00:00 06:00:01 06:00:02 06:00:03 06:00:04

@timestamp 127
Time Series Data Stream

Index Lifecycle Management (ILM)

●
ILM automates the rollover and management of indices. R ead mor e
●
Benefits: Storage optmization, automated data retention, efficient management
of index size.
●
Phases of ILM:

Warm Delete
Hot phase Cold phase
phase phase

128
Time Series Data Stream

ILM visualized

R ead mor e
●
Rollover: age = 30 days & size = 50GB
ILM ●
Delete: 90 days

my_index_0001

129
Time Series Data Stream

ILM visualized

R ead mor e
●
Rollover: age = 30 days & size = 50GB
ILM ●
Delete: 90 days

my_index_0001 age = 30

my_index_0002

130
Time Series Data Stream

ILM visualized

R ead mor e
●
Rollover: age = 30 days & size = 50GB
ILM ●
Delete: 90 days

my_index_0001 age = 60

my_index_0002 age = 30

my_index_0003
131
Time Series Data Stream

ILM visualized

R ead mor e
●
Rollover: age = 30 days & size = 50GB
ILM ●
Delete: 90 days

my_index_0001 age = 90

my_index_0002 age = 60

my_index_0003 age = 30

132
...
Time Series Data Stream

Querying time series data

R ead mor e
{ {
"query": { "aggs": {
"range": { "avg_cpu_usage": {
"@timestamp": { "avg": {
"gte": "2024-11-01T00:00:00", "field": "cpu_usage"
"lte": "2024-11-07T23:59:59" }
} }
} }
} }
}

133
24 Analyzers

134
Analyzers

●
Analyzers process text during indexing and searching.
●
They transform text into tokens. R ead mor e
●
They make the search process efficient and accurate.

Term Document
hello Document 1
world Document 1
imad Document 2
saddik Document 2

Documents
Image origin

135
Analyzers

Analyzer components
●
An analyzer is a combination of 3 components: R ead mor e
●
Character filter
●
Tokenizer
Character filters
●
Token filter (min 0)

Tokenizers
(max 1)

Token filters
(min 0)

136

Analyzer
Analyzers

Built-in analyzers
●
Provide ready-made options for processing text in various ways. R ead mor e
●
Each built-in analyzer is designed for specific types of data.
●
Common analyzers:

None None None None

Standard Tokenizer Standard Tokenizer Whitespace Tokenizer Lowercase Tokenizer

Lowercase filter & None Stop filter

None
Stop filter

137

Standard Analyzer Simple Analyzer Whitespace Analyzer Stop Analyzer

Analyzers

Phases of analysis
●
Index time analysis R ead mor e

Term
Term
hello
Token hello
Tokenizer world
imad
filters world
imad
saddik

Documets Tokens Filtered Tokens

Term Document
hello Document 1
world Document 1
imad Document 2
saddik Document 2

Inverted index 138

Analyzers

Phases of analysis
●
Index time analysis R ead mor e
●
Search time analysis

Term
Term
hello
Token hello
Query Tokenizer world
imad
filters world
imad
saddik

Tokens Filtered Tokens

Term Document
hello Document 1
world Document 1
imad Document 2
saddik Document 2

Inverted index Documets 139

Analyzers

R ead mor e

140
Analyzers

141
25 Synonyms

142
Synonyms

●
Synonyms help enhance search accuracy.
●
Useful for matching variations or related terms. R ead mor e
●
Synonyms are defined using the Solr format.

143
Synonyms

Solr format
●
This is a flexible syntax for defining synonyms. ead m o r e
R
●
It uses two different definitions:
●
Equivalent synonyms: “term 1, term 2, term 3”
●
Explicit synonyms: “term 1, term 2 => term 3”

car
automobile Personal computer PC
race car
voiture i-pod, i pod ipod
...

Equivalent terms

144
Synonyms

●
Synonyms are used within analyzers.
●
You can use synonyms in index and search time. R ead mor e
●
Synonyms are a custom token filter.

None

Standard Tokenizer

Synonyms filter

145

Custom Analyzer
26 Common options

146
Common options

●
Simplify Elasticsearch management.
●
Provide features like human-readable output, date math, and filtering. R ead mor e
●
All Elasticsearch REST APIs support these common options.

147
Common options

Human-readable output
●
Make statistics in a format that humans can understand. R ead mor e
●
It applies to disk space, memory, time, and other metrics.

Example
response = es.cluster.stats(human=True)
pprint(response["nodes"]["jvm"])

Before After
148
Common options

Date math
●
Perform math operations on dates. R ead mor e
●
Operations include: Add, Subtract, Round down to nearest day.
●
Supported time units: y (years), M (moths), etc.
●
The expression starts with an anchor date (“now” or a string ending with ||).

Examples
●
now := 2024-11-16 11:55:00
●
now+1h := 2024-11-16 12:55:00
●
now-1h := 2024-11-16 10:55:00
●
now-1h/d := 2024-11-16 00:00:00
●
2024.11.16||+1M/d := 2024-12-16 00:00:00

149
Common options

Response filtering
●
Inclusive filtering: Specify fields to include. R ead mor e
●
Exclusive filtering: Remove unnecessary fields.
●
Combined filtering

Example
response = es.search(
index=index_name,
body={
"query": {
Before
"match_all": {}
}
},
filter_path="hits.hits._id,hits.hits._source"
)
pprint(response.body)
After 150
27 Change heap size

151
Change heap size

●
By default Elasticsearch uses 50% of the available RAM.
●
This can slow down you PC. R ead mor e
●
You only need 1 or 2GB when dealing with small indices.

152
Change heap size

Steps to change the heap size

1. Start the container. R ead mor e

sudo docker start elasticsearch

153
Change heap size

Steps to change the heap size

1. Start the container. R ead mor e
2. Go inside the container.

sudo docker exec -u 0 -it elasticsearch bash

154
Change heap size

Steps to change the heap size

1. Start the container. R ead mor e
2. Go inside the container.
3. Create the heap.options file inside jvm.options.d folder and add this.

echo "-Xms2g" > /usr/share/elasticsearch/config/jvm.options.d/heap.options

echo "-Xmx2g" >> /usr/share/elasticsearch/config/jvm.options.d/heap.options

cat /usr/share/elasticsearch/config/jvm.options.d/heap.options

155
Change heap size

R ead mor e

156
28 Final project – part 0

157
Final project – part 0

●
No more videos on Elasticsearch concepts and APIs.
●
I'll be focusing on the final project from now on.
●
The final project will cover most of the topics we've learned in previous videos.
●
We will be building a website.
●
Elasticsearch will provide the search functionality.

158
Final project – part 0

Source code

159
29 Final project – part 1

160
Final project – part 1

●
Create an index and index documents.
●
Use size / from when searching.
●
Perform multi-match queries.
●
The theme of the final project is Astronomy.

161
Image credit: Kent E. Biggs
Final project – part 1

●
Frontend is done for you.
●
Install dependencies.

162
Final project – part 1

●
Install dependencies.
●
Setup the backend server.
●
Configure Elasticsearch.

163
30 Final project – part 2

164
Final project – part 2

●
Add the pagination controls.
●
Filter by year.
●
Use aggregations.

165
31 Final project – part 3

166
Final project – part 3

●
Implement a search as you type feature.
●
Utilize the N-gram tokenizer.

Standard
Andr
tokenizer

N-gram
Andr
tokenizer

167
Final project – part 3

●
Why use the N-gram tokenizer?

Standard
Andromeda [andromeda]
tokenizer

N-gram [a, an, and, andr, andro, androm,

Andromeda
tokenizer androme, andromed, andromeda]

N=9

168
32 Final project – part 4

169
Final project – part 4

●
Implement semantic search.
●
Use an embedding model from HuggingFace.
●
Use kNN search to find documents.

Medium article by Sachinsoni

170
33 Final project – part 5

171
Final project – part 5

●
Add the raw APOD data.
●
Contains HTML tags.

{
"date": "2024-11-30",
"title": "<a href=\"ap241130.html\">Winter and Summer on a Little Planet</a>",
"explanation": "\n Explanation: \n\nWinter and summer appear to come on a single night to this\n<a
href=\"https://www.instagram.com/camille.niel_photography/p/C270AVzrKcp/?img_index=1\">stunning little planet</a>.\n\nIt's
planet Earth of course.\n\nThe\n<a href=\"http://srcematematike.si/2014/03/09/math-behind-tiny-planets/\">digitally mapped</a>,\
nnadir centered panorama covers 360x180\ndegrees and is\ncomposed of frames recorded during January and July from the\n<a
href=\"https://en.wikipedia.org/wiki/Col_du_Galibier\">Col du Galibier</a> ...
}

172
Final project – part 5

●
Add the raw APOD data.
●
Contains HTML tags.
●
Use pipelines to remove the HTML tags.

{
"date": "2024-11-30",
"title": "<a
href=\"ap241130.html\">Winter and
{
Summer on a Little Planet</a>",
"date": "2024-11-30",
"explanation": "\n Explanation:
"title": "Winter and Summer on a Little
 \n\nWinter and summer appear to
Planet",
come on a single night to this\n<a
href=\"https://www.instagram.com/c
HTML Strip "explanation": "\n Explanation: \n\
nWinter and summer appear to come
amille.niel_photography/p/C270AVzr
on a single night to this\nstunning little
Kcp/?img_index=1\">stunning little
planet.\n\nIt's planet Earth of course.\n\
planet</a>.\n\nIt's planet Earth of
nThe\ndigitally mapped...
course.\n\nThe\n<a
href=\"http://srcematematike.si/2014/
03/09/math-behind-tiny-
planets/\">digitally mapped...
Ingest pipeline
173

Asme Allowable Stress Table
100% (2)
Asme Allowable Stress Table
2 pages
Elastic Search Presentation
No ratings yet
Elastic Search Presentation
55 pages
An Elasticsearch Crash Course Presentation PDF
No ratings yet
An Elasticsearch Crash Course Presentation PDF
81 pages
MongoDB Schema Design Basics
100% (2)
MongoDB Schema Design Basics
51 pages
Elasticsearch Basic Concepts
100% (2)
Elasticsearch Basic Concepts
25 pages
List All Indices: Shards & Replicas
No ratings yet
List All Indices: Shards & Replicas
5 pages
What Is Elasticsearch
No ratings yet
What Is Elasticsearch
63 pages
ElasticSearch, A Quick Intro
No ratings yet
ElasticSearch, A Quick Intro
22 pages
Elasticsearch Developer Cheat Sheet PDF
No ratings yet
Elasticsearch Developer Cheat Sheet PDF
2 pages
ElasticSearch Howto
No ratings yet
ElasticSearch Howto
8 pages
Engineer II 6.2.2
No ratings yet
Engineer II 6.2.2
492 pages
ES Tutorial PDF
No ratings yet
ES Tutorial PDF
61 pages
Unit 2 Part 2
No ratings yet
Unit 2 Part 2
68 pages
20-ElasticSearch
No ratings yet
20-ElasticSearch
62 pages
05 Chapter Performance MongoDB
No ratings yet
05 Chapter Performance MongoDB
42 pages
Introduction To Elasticsearch.: Ruslan Zavacky
No ratings yet
Introduction To Elasticsearch.: Ruslan Zavacky
75 pages
elastic-search-doc
No ratings yet
elastic-search-doc
12 pages
Networking
No ratings yet
Networking
51 pages
Searching and Indexing
No ratings yet
Searching and Indexing
21 pages
Content Technologies
No ratings yet
Content Technologies
54 pages
Elasticsearch A Complete Guide Bharvi Dixit Rafal Kuc Marek Rogozinski instant download
No ratings yet
Elasticsearch A Complete Guide Bharvi Dixit Rafal Kuc Marek Rogozinski instant download
82 pages
ELK Stack Explanation & Configuration
No ratings yet
ELK Stack Explanation & Configuration
24 pages
Elastic Search
No ratings yet
Elastic Search
19 pages
Mongodbinternalsdevternity 151209084136 Lva1 App6891
No ratings yet
Mongodbinternalsdevternity 151209084136 Lva1 App6891
52 pages
Elasticsearch
100% (2)
Elasticsearch
21 pages
13350264
No ratings yet
13350264
33 pages
ElasticSearch_Interview_Questions
No ratings yet
ElasticSearch_Interview_Questions
24 pages
DF200 - 01 - Indexes and Optimization Mongo DB Training
No ratings yet
DF200 - 01 - Indexes and Optimization Mongo DB Training
69 pages
Elasticsearch Indexing - Sample Chapter
No ratings yet
Elasticsearch Indexing - Sample Chapter
25 pages
Shri Shiva Bharatam - Nivaskara Kavindra Paramananda
No ratings yet
Shri Shiva Bharatam - Nivaskara Kavindra Paramananda
131 pages
Mastering Elasticsearch 5 x Master the intricacies of Elasticsearch 5 and use it to create flexible and scalable search solutions Third Edition Dixit all chapter instant download
100% (2)
Mastering Elasticsearch 5 x Master the intricacies of Elasticsearch 5 and use it to create flexible and scalable search solutions Third Edition Dixit all chapter instant download
65 pages
ElasticSearch IEEE Format1
No ratings yet
ElasticSearch IEEE Format1
3 pages
Elasticsearch
No ratings yet
Elasticsearch
15 pages
Elasticsearch Essentials - Sample Chapter
0% (1)
Elasticsearch Essentials - Sample Chapter
36 pages
Mastering Elasticsearch 5 x Master the intricacies of Elasticsearch 5 and use it to create flexible and scalable search solutions Third Edition Dixit download
No ratings yet
Mastering Elasticsearch 5 x Master the intricacies of Elasticsearch 5 and use it to create flexible and scalable search solutions Third Edition Dixit download
63 pages
Elastic
No ratings yet
Elastic
61 pages
Get (Ebook) MongoDB: The Definitive Guide by Kristina Chodorow, Michael Dirolf ISBN 9781449381561, 1449381561 PDF ebook with Full Chapters Now
100% (4)
Get (Ebook) MongoDB: The Definitive Guide by Kristina Chodorow, Michael Dirolf ISBN 9781449381561, 1449381561 PDF ebook with Full Chapters Now
81 pages
Index
No ratings yet
Index
9 pages
Mongodb
No ratings yet
Mongodb
1 page
Elastic Search
No ratings yet
Elastic Search
19 pages
Inside RavenDB 3 0
No ratings yet
Inside RavenDB 3 0
187 pages
Elasticsearch Blueprints - Sample Chapter
No ratings yet
Elasticsearch Blueprints - Sample Chapter
24 pages
05-DocumentStores (1)
No ratings yet
05-DocumentStores (1)
50 pages
CoreDeveloper-5 5 1
No ratings yet
CoreDeveloper-5 5 1
559 pages
Elastic Stack 7
No ratings yet
Elastic Stack 7
280 pages
Using Elasticsearch and NEST in NET - by Lucas Garcia - Medium
No ratings yet
Using Elasticsearch and NEST in NET - by Lucas Garcia - Medium
1 page
Complete Download Mastering Elasticsearch 2nd Edition Edition Rafal Kuc PDF All Chapters
No ratings yet
Complete Download Mastering Elasticsearch 2nd Edition Edition Rafal Kuc PDF All Chapters
91 pages
5 Indexes
No ratings yet
5 Indexes
51 pages
Elasticsearch: Getting Started With Elasticsearch
No ratings yet
Elasticsearch: Getting Started With Elasticsearch
6 pages
5_Indexes
No ratings yet
5_Indexes
39 pages
Lecture 6 Document Databases Data Formats
No ratings yet
Lecture 6 Document Databases Data Formats
63 pages
MongoDB Reference Card
No ratings yet
MongoDB Reference Card
28 pages
Mastering Elasticsearch 2nd Edition Edition Rafal Kuc - The ebook with rich content is ready for you to download
100% (2)
Mastering Elasticsearch 2nd Edition Edition Rafal Kuc - The ebook with rich content is ready for you to download
86 pages
Mastering Elasticsearch 5 x Master the intricacies of Elasticsearch 5 and use it to create flexible and scalable search solutions Third Edition Dixit - Quickly download the ebook to never miss important content
No ratings yet
Mastering Elasticsearch 5 x Master the intricacies of Elasticsearch 5 and use it to create flexible and scalable search solutions Third Edition Dixit - Quickly download the ebook to never miss important content
64 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
Dod Unit5
No ratings yet
Dod Unit5
15 pages
Unit 2
No ratings yet
Unit 2
85 pages
MongoDB ReferenceCards
No ratings yet
MongoDB ReferenceCards
28 pages
Learn MongoDB in 24 Hours
From Everand
Learn MongoDB in 24 Hours
Alex Nordeen
5/5 (2)
JSON Data Basics
From Everand
JSON Data Basics
Frank Wellington
No ratings yet
Practical JSON Design and Usage: Definitive Reference for Developers and Engineers
From Everand
Practical JSON Design and Usage: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
01-Dict-Forms
No ratings yet
01-Dict-Forms
7 pages
01 Overview Dict Forms
No ratings yet
01 Overview Dict Forms
8 pages
Quarkus in Action Red Hat Developer eBook Final 1
No ratings yet
Quarkus in Action Red Hat Developer eBook Final 1
1 page
Frontend Interview
No ratings yet
Frontend Interview
164 pages
Journals List - 2023
No ratings yet
Journals List - 2023
4 pages
CHAP - 4
No ratings yet
CHAP - 4
21 pages
Script Interview English Msib2
No ratings yet
Script Interview English Msib2
2 pages
Structures
No ratings yet
Structures
6 pages
The Perfect Run III Maxime J Durand Void Herald - Quickly access the ebook and start reading today
No ratings yet
The Perfect Run III Maxime J Durand Void Herald - Quickly access the ebook and start reading today
78 pages
Doors E90
No ratings yet
Doors E90
8 pages
Construction Technology: Earthmoving Technologies
No ratings yet
Construction Technology: Earthmoving Technologies
77 pages
earth leak detector
No ratings yet
earth leak detector
34 pages
Body Water Fluid Compartments 2020
No ratings yet
Body Water Fluid Compartments 2020
36 pages
Q2FY2023 PNB QTR Result
No ratings yet
Q2FY2023 PNB QTR Result
13 pages
SANDVIK CH895 - Spare Parts Catalog
No ratings yet
SANDVIK CH895 - Spare Parts Catalog
84 pages
BOOK Chapter 6 New International Business English - STS' MAIN COURSE BOOK
No ratings yet
BOOK Chapter 6 New International Business English - STS' MAIN COURSE BOOK
16 pages
QS World University Rankings 2019
No ratings yet
QS World University Rankings 2019
65 pages
Fce 9 Vocabulary Food
No ratings yet
Fce 9 Vocabulary Food
3 pages
A New Model For Assessment Fast Food Customer Behavior Case Study: An Iranian Fast-Food Restaurant
No ratings yet
A New Model For Assessment Fast Food Customer Behavior Case Study: An Iranian Fast-Food Restaurant
14 pages
Dda Check Tutorial
100% (3)
Dda Check Tutorial
18 pages
Sodium Content of Your Food: Bulletin #4059
No ratings yet
Sodium Content of Your Food: Bulletin #4059
21 pages
CIR Vs Rufino Tax Digest
No ratings yet
CIR Vs Rufino Tax Digest
2 pages
The Western 1st Edition David Lusted - The 2025 ebook edition is available with updated content
100% (1)
The Western 1st Edition David Lusted - The 2025 ebook edition is available with updated content
53 pages
Background and Justification:: Terms of Reference
No ratings yet
Background and Justification:: Terms of Reference
3 pages
Acc107 p1 Exam Set A Answer Key
No ratings yet
Acc107 p1 Exam Set A Answer Key
9 pages
A Guide To Kubernetes With Rancher
No ratings yet
A Guide To Kubernetes With Rancher
9 pages
Workday Transaction Guide Assign Pay Group: Process Initiator Scope Relevance
No ratings yet
Workday Transaction Guide Assign Pay Group: Process Initiator Scope Relevance
2 pages
Test 4
No ratings yet
Test 4
6 pages
Final Exam of Principles of Marketing
No ratings yet
Final Exam of Principles of Marketing
11 pages
Datasheet ES3012-513
No ratings yet
Datasheet ES3012-513
86 pages
Bank Conurrent Audit-1
No ratings yet
Bank Conurrent Audit-1
10 pages
Nr-310303 Thermal Engineering II
100% (1)
Nr-310303 Thermal Engineering II
8 pages
DDM Manual 1
No ratings yet
DDM Manual 1
49 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.