ElasticSearch 進階教學

26 min readOct 17, 2024

分詞設定

分詞是指文字的分割，而 standard 分祠的話主要以空格等去做切割，中文假如是使用預設而非套件的話則是一個字一個字的做切割。可以使用以下 api 進行分詞測試。

curl -X https://localhost:9200/_analyze -u elastic:<password> --insecure -d '{
 "analyzer": "standard",
 "text": "我是貓咪"
}'

回傳結果

mapping 結構:

我們有新增 index 後可以透過以下 api 查看目前 index 的 mapping 結構

備註: 因為是用本機所以 https 是未受到驗證的，因此在使用 curl 的時候需增加 — insecure

curl -X GET https://localhost:9200/user3/_mappings -u elastic:<password> --insecure

因為範例中我是設定了 first_name, last_name, age, birthday 等資料至 user3, 所以 mapping 的結構才會有那些欄位。

而已下回傳值可以看到 dynamic mapping 針對 age 設定了long, 且 first_name 有 text 以及 keyword 型別等。

{
 "user3": {
  "mappings": {
   "properties": {
    "age": {
     "type": "long"
    },
    "birthday": {
     "type": "date"
    },
    "description": {
     "type": "text",
     "fields": {
      "keyword": {
       "type": "keyword",
       "ignore_above": 256
      }
     }
    },
    "first_name": {
     "type": "text",
     "fields": {
      "keyword": {
       "type": "keyword",
       "ignore_above": 256
      }
     }
    },
    "last_name": {
     "type": "text",
     "fields": {
      "keyword": {
       "type": "keyword",
       "ignore_above": 256
      }
     }
    }
   }
  }
 }
}

Shard & Replica 設定

Shard (分片) 簡單說就是資料儲存的位置，而 Replica 就是他的複本。

優點:

加快讀取的速度
將大的 index 平均分散至其他 node，以至於可以容納更多的 document

備註: ElasticSearch 7.0 之後 primary shard 預設為 1 個, 以前的版本為 5 個 primary shard。 primary shard 設定的數量要注意以免發生 oversharding。oversharding 介接導致過多太小的 shard 且也會佔據一些硬碟等空間。

參考資料:

Day 07 Elasticsearch - Primary Shard 主要分片 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

該文章同步發佈於：我的部落格也歡迎關注我的 Facebook 以及 Instagram 接收軟體相關的資訊！上一篇文章中我們示範了如何使用 Kibana 的 DevTools 利用 REST A...

ithelp.ithome.com.tw

可以透過以下 api 檢查目前 index shard 以及 replica 數量

curl -X GET https://localhost:9200/_indices?v&expand_whildcards=all -u elastic:<password> --insecure

結果:

可以看到我 primary shard & replica shard 皆為 1

如何調整 primary shard & replica shard 數量

在創建 index 的時候透過 settings 去配置 primary shard & replica shard 數量

curl -X PUT https://localhost:9200/user4 -u elastic:<password> --insecure -d '{
 "mappings": {
  
 },
 "settings": {
  "number_of_shards": 2,
  "number_of_replicas": 1
 }
}'

透過上述提到的 API 檢查新增的 user4 index 的 primary shard 個數是否為 2

curl -X GET https://localhost:9200/_indices?v&expand_whildcards=all -u elastic:<password> --insecure

也可以透過以下 api 檢視特定的 index setting

curl -X GET https://localhost:9200/user4/_settings -u elastic:<password> --insecure

結果:

如何調整 shard 的數量

可以透過 split 以及 reindex API 動態調整 shard 數量。

在使用 split API 時需特別注意 source index 需為 read-only 也就是需停止 index 寫入且新 index 的 primary shard 數量需為 source index 的倍數

透過 Split API 調整 primary shard 數量

將 source index 改為 read-only

curl -X PUT https://localhost:9200/users4/_settings -u elastic:<password> --insecure -d '{
  "settings": {
     "index.blocks.write": true
   }
}'

假如改為 read-only 後，還進行 index 寫入的話會出現如下錯誤

2. 執行 split api

curl -X POST https://localhost:9200/user4/_split/user4_split -u elastic:<password> --insecure -d '{
 "settings": {
  "index.number_of_shards": 4
 }
}'

3. 檢查 user4_split index 的設定，查看 primary shard 是否為 4

curl -X GET https://localhost:9200/user4_split/_settings -u elastic:<password> --insecure

4. 檢查 user4_split index 是否有 user4 index 的資料

curl -X GET https://localhost:9200/user4_split/_search -u elastic:<password> --insecure -d '{
 "query": {
  "match_all": {}
 }
}'

5. 取消 user4_split index 的 read-only

curl -X PUT https://localhost:9200/user4_split/_settings -u elastic:<password> --insecure  -d '{
 "index.blocks.write": null
}'

參考資源:

How to Increase Primary Shard Count in Elasticsearch

There are 2 methods to increase the primary shard count in Elasticsearch: _reindex API & the _split API. Before using…

opster.com

https://blog.csdn.net/UbuntuTouch/article/details/108960950

[實作筆記] ElasticSearch Reindex

前情提要初探 ElasticSearch Service (以下簡稱 ESS )，結合 Logstash、Kibana 合稱 ELK，Elastic Cloud…

blog.marsen.me

備份(Snapshot) & 復原 (Restore)

Snapshot 的好處如下:

定期備份無需停機
在資料發生異常時可以還原至特定時間點

在進行 snapshot 之前需要先設定 repository, 簡單說 repository 就是 snapshot 要存放的位置，而 ElasticSearch 目前支援的有 S3、Google Cloud Storage、Microsoft Azure

以下將使用 S3 作為 snapshot repository

ps: 推薦的 S3 IAM Role Policy

{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}

在進行操作前先記得安裝 s3 repository plugin

bin/elasticsearch-plugin install repository-s3

1.在 elasticsearch.yaml 配置 s3 credentials

s3.client.default.access_key: "<access key>"
s3.client.default.secret_key: "<secret key>"
s3.client.default.region: "ap-northeast-1" # 範例 s3 在東京

2. keystore 配置 s3 client access_key & secret_key

bin/elasticsearch-keystore add s3.client.default.access_key
bin/elasticsearch-keystore add s3.client.default.secret_key

3. 新增 S3 snapshot repository

curl -X PUT https://localhost:9200/_snapshot/s3_repository -u elastic:<password> --insecure -d '{
  "type": "s3",
   "settings": {
      "bucket": "es_snapshot_bucket",
      "region": "ap-northeast-1"
    }
}'

4. 檢查 snapshot repository setting

curl -X GET https://localhost:9200/_snapshot/s3_repository -u elastic:<password> --insecure

5. 手動創建 snapshot

curl -X PUT https://localhost:9200/_snapshot/s3_repository/my_snapshot -u elastic:<password> --insecure

回傳值:

{
 "accepted": true
}

6. 檢查目前正在執行 snapshot 的狀態

curl -X GET https://localhost:9200/_snapshot/s3_repository/_current -u elastic:<password> --insecure

7. 緊接著會在 S3 發現以下檔案

參考資源:

Snapshot and restore

A snapshot is a backup of a running Elasticsearch cluster. You can use snapshots to: Regularly back up a cluster with…

www.elastic.co

Create a snapshot

This guide shows you how to take snapshots of a running cluster. You can later restore a snapshot to recover or…

www.elastic.co

如何透過 Snapshot 進行 Restore

可以先透過以下 API 檢查 snaphsot 資料

curl -X https://localhost:9200/_snapshot -u elastic:<password> --insecure

結果:

也可以透過指定的 repository 檢查該 repository 底下有哪些 snapshot

curl -X GET https://localhost:9200/_snapshot/s3_repository/*?verbose=false -u elastic:<password> --insecure

回傳值:

{
 "snapshots": [
  {
   "snapshot": "my_snapshot",
   "uuid": "0bscur9yRuKhcjECbibxCw",
   "repository": "s3_repository",
   "indices": [
    "user2",
    ".security-7",
    "user",
    "user3",
    "user4"
   ],
   "data_streams": [],
   "state": "SUCCESS"
  }
 ],
 "total": 1,
 "remaining": 0
}

2. 還原指定的 index

2–1 在還原前我們先刪除原先的 index, 在進行還原已確認資料是否真的有被還原

curl -X DELETE https://localhost:9200/user4 -u elastic:<password> --insecure

2–2 從指定的 snapshot 還原指定的 index

備註: 假如有多個 index 要還原的話用逗號區隔

curl -X POST https://localhost:9200/_snapshot/s3_repository/my_snapshot/_restore -u elastic:<password> --insecure -d '{
  "indices": "user4"
}'

2–3 檢查 user4 index 資料是否真的有被還原

curl -X GET https://localhost:9200/user4/_search -u elastic:<password> --insecure -d '{
  "query": {
      "match_all": {}
    }
}'

還原 index 的時候也可以使用 rename_pattern 搭配 rename_placement 將還原的 index 重新命名，然後在跟原先的 index 做比較。

如何架設 Cluster

在進行配置 Cluster 先了解以下幾個相關 node

Master Node: cluster 的管理

Data Node: 儲存資料以及資料的搜尋

Client Node: 轉發 request 到 master node / data node

Ingest Node: 預先處理資料的 node

Cluster Health status :

Green: Cluster 處於健康狀態

Yellow: Cluster primary shard 可以正常分配&寫入，但是 replica shard 未正常分配

Red: Cluster primary shard 無法正常分配或者寫入

Elastic Stack 8 EP 2：Elasticsearch Cluster安裝與基本設定 - Jovepater

Elasticsearch Cluster具有高可用性與簡單拓展的特性，讓生產環境安全又穩定！

jovepater.com

Elasticsearch Tutorial: Creating an Elasticsearch cluster | Logz.io

This tutorial provides information on how to set up an Elasticsearch cluster, and will add some operational tips and…

logz.io

Elasticsearch Cluster Setup: A Step-by-Step Guide

Setting up an Elasticsearch cluster involves several critical steps and configurations. Follow the steps in this guide…

opster.com

設定 Data Stream

Data Stream 主要用於較少需要修改(append only)的時間序列資料，例如: log、 metrics、 events 等連續性的資料。

而建立 data stream 需要進行以下幾個步驟

Create Index Lifecycle Policy: 使 data 過一段時間後進行 rollover 產生新的 index
Create Component Template: 配置 index 的 mapping schema 樣板
Create Index Template: 配置 index schema(setting, mapping)樣板, 會搭配使用多個 component template
Create Data Stream: 命名的規則為 .ds-<data-stream>-<yyyy.MM.dd>-<generation>

備註: 一個 index template 可以用於多個 data stream, 而正在使用的 index template 無法刪除

參考資料:

Data streams

A data stream lets you store append-only time series data across multiple indices while giving you a single named…

www.elastic.co

Index Lifecycle Policy

我們來看 ElasticSearch 中的 lifecycle policy 範例

PUT _ilm/policy/my-lifecycle-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "cold": {
        "min_age": "60d",
        "actions": {
          "searchable_snapshot": {
            "snapshot_repository": "found-snapshots"
          }
        }
      },
      "frozen": {
        "min_age": "90d",
        "actions": {
          "searchable_snapshot": {
            "snapshot_repository": "found-snapshots"
          }
        }
      },
      "delete": {
        "min_age": "735d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

會看到設定中含有多個 phase, 分別為以下階段，代表中各個生命週期

hot: 資源可以查詢、寫入
warm: 資源僅可以查詢
cold: 資源僅可以查詢,但是查詢速度較慢
frozen
delete: 資料刪除

每個階段都有相對應能設定的 action，請參考如下:

hot: rollover、force merge

warm: force merge, shrink, allocate

cold: freeze, allocate,

delete: wait for snapshot, delete

參考資料:

喬叔教 Elastic - 11 - 管理 Index 的 Best Practices (3/7) - Index Lifecycle Management (ILM) - iT…

管理 Index 的 Best Practices 系列文章索引 (1/7) - Shard 的數量與 Rollover & Shrink API (2/7) - 三溫暖架構 - Hot Wa...

ithelp.ithome.com.tw

Allocate

Phases allowed: warm, cold. Updates the index settings to change which nodes are allowed to host the index shards and…

www.elastic.co

[Elasticsearch] 分散式特性 & 分散式搜尋的機制

此篇文章是在極客時間學習 Elasticsearch 課程時留下的一些學習筆記，主要內容包含 Elasticsearch 分散式特性 & 分散式搜尋的機制

godleon.github.io

以下為 index lifecycle 所可以執行的 action

allocate: 重新分類 replica 數量

force merge: 將 segment 進行合併，已釋放已被刪除的 document 所佔用的空間, 因為被刪除的 document 不會立即刪除只會標示 delete 而已，真的刪除的動作則是在進行 force merge 後進行

shrink: 縮減 shard 的數量

rollover: 當 index 達到一定大小、document 達到一定數量、過多久時間 (max_age) 後自動創建一個新的 index

參考資源:

Force merge

Phases allowed: hot, warm. Force merges the index into the specified maximum number of segments. Shards that are…

www.elastic.co

[Elasticsearch] 分散式特性 & 分散式搜尋的機制

此篇文章是在極客時間學習 Elasticsearch 課程時留下的一些學習筆記，主要內容包含 Elasticsearch 分散式特性 & 分散式搜尋的機制

godleon.github.io

新增 index lifecycle policy

說明: 每當 index size 達到 10g 時進行 rollover 並且刪除超過 10 天的 index

curl -X PUT https://localhost:9200/_ilm/policy/my-lifecycle-policy -u elastic:<password> --insecure -d '{
 "policy": {
  "phases": {
   "hot": {
    "actions": {
     "rollover": {
      "max_size": "10g"
     }
    }
   },
   "delete": {
    "min_age": "10d",
    "actions": {
     "delete": {}
    }
   }
  }
 }
}'

取得所設定的 policy 設定

curl -X GET https://localhost:9200/_ilm/policy/my-lifecycle-policy -u elastic:<password> --insecure

回傳結果:

{
 "my-lifecycle-policy": {
  "version": 1,
  "modified_date": "2024-10-19T01:44:19.775Z",
  "policy": {
   "phases": {
    "hot": {
     "min_age": "0ms",
     "actions": {
      "rollover": {
       "max_size": "10gb"
      }
     }
    },
    "delete": {
     "min_age": "10d",
     "actions": {
      "delete": {
       "delete_searchable_snapshot": true
      }
     }
    }
   }
  },
  "in_use_by": {
   "indices": [],
   "data_streams": [],
   "composable_templates": []
  }
 }
}

新增 Component Template

這邊創建一個 mapping 含有 @timestamp 以及 mesage 欄位資訊，因為預設 data stream 時間是使用 @timestamp

curl -X PUT https://localhost:9200/_compnent_template/my-mappings -u elastic:<password> --insecure -d '{
 "template": {
  "mappings": {
   "properties": {
    "@timestamp": {
     "type": "date",
     "format": "date_optional_time||epoch_millis"
    },
    "message": {
     "type": "wildcard"
    }
   }
  }
 },
 "_meta": {
  "description": "@timestamp mapping fields"
 }
}'

創建一個需套用 index lifecycle policy 的 component template(設定 index.lifecycle.name 為以上所建立的 lifecycle policy)

curl -X PUT https://localhost:9200/_component_template/my-settings -u elastic:<password> --insecure -d '{
 "template": {
  "settings": {
   "index.lifecycle.name": "my-lifecycle-policy"
  }
 },
 "_meta": {
  "description": "set lifecycle for tempalte"
 }
}'

使用 component template 建立 index template

注意: priority 必須設定超過 200 以免跟內建的 template 衝突

curl -X PUT https://localhost:9200/_index_template/my-index-template -u elastic:<password> --insecure -d '{
 "index_patterns": ["my-data-stream*"],
 "data_stream": {},
 "composed_of": ["my-settings", "my-mappings"],
 "priority": 500,
 "_meta": {
  "description": "my-data-stream index template"
 }
}'

針對 data stream 新增 doc

curl -X POST https://localhost:9200/my-data-stream/_doc -u elastic:<password> --insecure -d '
{
  "@timestamp": "2099-05-06T16:21:15.000Z",
  "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
}
'

會發現新增的 index 為 .ds-<data-stream> 開頭，但表已套用至 data stream

回傳值:

{
 "_index": ".ds-my-data-stream-2024.10.19-000001",
 "_id": "V_SHopIBHOjfSNwhtekY",
 "_version": 1,
 "result": "created",
 "_shards": {
  "total": 2,
  "successful": 1,
  "failed": 0
 },
 "_seq_no": 0,
 "_primary_term": 1
}

參考資源:

ElasticSearch 進階教學

分詞設定

mapping 結構:

Shard & Replica 設定

Day 07 Elasticsearch - Primary Shard 主要分片 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

該文章同步發佈於：我的部落格也歡迎關注我的 Facebook 以及 Instagram 接收軟體相關的資訊！上一篇文章中我們示範了如何使用 Kibana 的 DevTools 利用 REST A...

How to Increase Primary Shard Count in Elasticsearch

There are 2 methods to increase the primary shard count in Elasticsearch: _reindex API & the _split API. Before using…

[實作筆記] ElasticSearch Reindex

前情提要初探 ElasticSearch Service (以下簡稱 ESS )，結合 Logstash、Kibana 合稱 ELK，Elastic Cloud…

備份(Snapshot) & 復原 (Restore)

Snapshot and restore

A snapshot is a backup of a running Elasticsearch cluster. You can use snapshots to: Regularly back up a cluster with…

Create a snapshot

This guide shows you how to take snapshots of a running cluster. You can later restore a snapshot to recover or…

如何架設 Cluster

推薦配置:

Elastic Stack 8 EP 2：Elasticsearch Cluster安裝與基本設定 - Jovepater

Elasticsearch Cluster具有高可用性與簡單拓展的特性，讓生產環境安全又穩定！

Elasticsearch Tutorial: Creating an Elasticsearch cluster | Logz.io

This tutorial provides information on how to set up an Elasticsearch cluster, and will add some operational tips and…

Elasticsearch Cluster Setup: A Step-by-Step Guide

Setting up an Elasticsearch cluster involves several critical steps and configurations. Follow the steps in this guide…

設定 Data Stream

Data streams

A data stream lets you store append-only time series data across multiple indices while giving you a single named…

喬叔教 Elastic - 11 - 管理 Index 的 Best Practices (3/7) - Index Lifecycle Management (ILM) - iT…

管理 Index 的 Best Practices 系列文章索引 (1/7) - Shard 的數量與 Rollover & Shrink API (2/7) - 三溫暖架構 - Hot Wa...

Allocate

Phases allowed: warm, cold. Updates the index settings to change which nodes are allowed to host the index shards and…

[Elasticsearch] 分散式特性 & 分散式搜尋的機制

此篇文章是在極客時間學習 Elasticsearch 課程時留下的一些學習筆記，主要內容包含 Elasticsearch 分散式特性 & 分散式搜尋的機制

Force merge

Phases allowed: hot, warm. Force merges the index into the specified maximum number of segments. Shards that are…

[Elasticsearch] 分散式特性 & 分散式搜尋的機制

此篇文章是在極客時間學習 Elasticsearch 課程時留下的一些學習筆記，主要內容包含 Elasticsearch 分散式特性 & 分散式搜尋的機制

Set up a data stream

To set up a data stream, follow these steps: You can also convert an index alias to a data stream. While optional, we…

Written by Gary Ng

No responses yet

ElasticSearch 進階教學

分詞設定

mapping 結構:

Shard & Replica 設定

Day 07 Elasticsearch - Primary Shard 主要分片 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

該文章同步發佈於：我的部落格 也歡迎關注我的 Facebook 以及 Instagram 接收軟體相關的資訊！ 上一篇文章 中我們示範了如何使用 Kibana 的 DevTools 利用 REST A...

How to Increase Primary Shard Count in Elasticsearch

There are 2 methods to increase the primary shard count in Elasticsearch: _reindex API & the _split API. Before using…

[實作筆記] ElasticSearch Reindex

前情提要初探 ElasticSearch Service (以下簡稱 ESS )，結合 Logstash、Kibana 合稱 ELK，Elastic Cloud…

備份(Snapshot) & 復原 (Restore)

Snapshot and restore

A snapshot is a backup of a running Elasticsearch cluster. You can use snapshots to: Regularly back up a cluster with…

Create a snapshot

This guide shows you how to take snapshots of a running cluster. You can later restore a snapshot to recover or…

如何架設 Cluster

推薦配置:

Elastic Stack 8 EP 2：Elasticsearch Cluster安裝與基本設定 - Jovepater

Elasticsearch Cluster具有高可用性與簡單拓展的特性，讓生產環境安全又穩定！

Elasticsearch Tutorial: Creating an Elasticsearch cluster | Logz.io

This tutorial provides information on how to set up an Elasticsearch cluster, and will add some operational tips and…

Elasticsearch Cluster Setup: A Step-by-Step Guide

Setting up an Elasticsearch cluster involves several critical steps and configurations. Follow the steps in this guide…

設定 Data Stream

Data streams

A data stream lets you store append-only time series data across multiple indices while giving you a single named…

喬叔教 Elastic - 11 - 管理 Index 的 Best Practices (3/7) - Index Lifecycle Management (ILM) - iT…

管理 Index 的 Best Practices 系列文章索引 (1/7) - Shard 的數量與 Rollover &amp; Shrink API (2/7) - 三溫暖架構 - Hot Wa...

Allocate

Phases allowed: warm, cold. Updates the index settings to change which nodes are allowed to host the index shards and…

[Elasticsearch] 分散式特性 & 分散式搜尋的機制

此篇文章是在極客時間學習 Elasticsearch 課程時留下的一些學習筆記，主要內容包含 Elasticsearch 分散式特性 & 分散式搜尋的機制

Force merge

Phases allowed: hot, warm. Force merges the index into the specified maximum number of segments. Shards that are…

[Elasticsearch] 分散式特性 & 分散式搜尋的機制

此篇文章是在極客時間學習 Elasticsearch 課程時留下的一些學習筆記，主要內容包含 Elasticsearch 分散式特性 & 分散式搜尋的機制

Set up a data stream

To set up a data stream, follow these steps: You can also convert an index alias to a data stream. While optional, we…

Written by Gary Ng

No responses yet

該文章同步發佈於：我的部落格也歡迎關注我的 Facebook 以及 Instagram 接收軟體相關的資訊！上一篇文章中我們示範了如何使用 Kibana 的 DevTools 利用 REST A...

管理 Index 的 Best Practices 系列文章索引 (1/7) - Shard 的數量與 Rollover & Shrink API (2/7) - 三溫暖架構 - Hot Wa...