Elasticsearch 先輩との戯れ日記(セットアップなどなど)

この記事何？

お仕事で Elasticsearch 先輩を使うことになったので、そのお戯れの記録

今回調べたものなど

まず Elasticsearch でどんなことができるかを調べるために、公式のチュートリアルとここ載ってる記事を読み漁った。

環境準備

公式さんから最新版(5.4)を Download してきた。 Kibana も合わせてローカルで準備しておく(戯れ用)

Download Elasticsearch Free • Get Started Now | Elastic Download Kibana Free • Get Started Now | Elastic

日本語全文検索用のプラグインである analysis-kuromoji を入れてみる。

# /Users/pokotyamu/Downloads/elasticsearch-5.4.2
$ bin/elasticsearch-plugin install analysis-kuromoji
-> Downloading analysis-kuromoji from elastic
[=================================================] 100%
-> Installed analysis-kuromoji

curl で叩いて、 plugin が入っていることを確認。

$ curl -X GET 'http://localhost:9200/_nodes/plugins?pretty'
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "nodes" : {
    "tfQ1ZsOzQ9CYZtA0jXw2ww" : {
      "name" : "tfQ1ZsO",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1",
      "version" : "5.4.2",
      "build_hash" : "929b078",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "plugins" : [
        {
          "name" : "analysis-kuromoji",
          "version" : "5.4.2",
          "description" : "The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.",
          "classname" : "org.elasticsearch.plugin.analysis.kuromoji.AnalysisKuromojiPlugin",
          "has_native_controller" : false
        }
      ],
     ...
}

基本コマンド

起動は以下で行う。終了は　Ctrl-C 。 -d オプションでバックグラウンド起動可能。

# /Users/pokotyamu/Downloads/elasticsearch-5.4.2
$ bin/elasticsearch

ポートはデフォルトで 9200番が使われるみたい。 crul で叩いてみて、次のように出たらOK

$ curl localhost:9200
{
  "name" : "tfQ1ZsO",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "Y2-U0XlwSxWX771QXbIN0w",
  "version" : {
    "number" : "5.4.2",
    "build_hash" : "929b078",
    "build_date" : "2017-06-15T02:29:28.122Z",
    "build_snapshot" : false,
    "lucene_version" : "6.5.1"
  },
  "tagline" : "You Know, for Search"
}

データ投入してみる

マッピング定義

MySQL	Elasticsearch
database	index
schema	mapping
table	type
record	document

だいたいこんな感じの対応をしている。マッピングの定義はこんな感じ↓

{
    "mappings": {
        "sample": {
            "properties": {
                "name": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "coord": {
                    "type": "geo_point"
                },
                "description": {
                    "type": "string",
                    "analyzer": "kuromoji"
                }
            }
        }
    }
}

sample の部分が type の名前になっている。これを landmark という index に突っ込むためには PUT で書き換えれる。

# curl -X PUT http://localhost:9200/<Index Name>
$ curl -X PUT http://localhost:9200/landmark --data-binary @mapping.json
$ curl -s -X GET http://localhost:9200/landmark | jq .
{
  "landmark": {
    "aliases": {},
    "mappings": {
      "sample": {
        "properties": {
          "coord": {
            "type": "geo_point"
          },
          "description": {
            "type": "text",
            "analyzer": "kuromoji"
          },
          "name": {
            "type": "keyword"
          }
        }
      }
    },
    "settings": {
      "index": {
        "creation_date": "1498037230489",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "9tFY4-F5S-KTyVEAa40DCg",
        "version": {
          "created": "5040299"
        },
        "provided_name": "landmark"
      }
    }
  }
}

ちなみに、このマッピング定義をしないでも、登録はできるらしい。その時は Elasticsearch 側がよしなに値から型定義をしてくれるとか。凄い。

データ投入

今回は Bulk API でデータ投入してみる。 Bulk だと一回のリクエストで一気にデータを突っ込めるみたい。

{ "index" : {} }
{ "name": "スカイツリー", "coord": { "lat": "35.710063", "lon": "139.8107"}, "description": "東京スカイツリー（とうきょうスカイツリー、英: TOKYO SKYTREE）は東京都墨田区押上一丁目にある電波塔（送信所）である。"}
{ "index" : {} }
{ "name": "BLUE NOTE TOKYO", "coord": { "lat": "35.661198", "lon": "139.716207"}, "description": "N.Y.の「Blue Note」を本店に持つジャズクラブとして、南青山にオープンしたブルーノート東京。ジャズをはじめとする多様な音楽ジャンルのトップアーティストたちが、連夜渾身のプレイを繰り広げている。"}
{ "index" : {} }
{ "name": "東京タワー", "coord": { "lat": "35.65858", "lon": "139.745433"}, "description": "東京タワー（とうきょうタワー、英: Tokyo Tower）は、東京都港区芝公園にある総合電波塔とその愛称である。正式名称は日本電波塔（にっぽんでんぱとう）。"}

# curl -X POST http://localhost:9200/<Index Name>/<Type Name>/_bulk --data-binary @landmark.json
$ curl -X POST http://localhost:9200/landmark/sample/_bulk --data-binary @landmark.json
{
    "errors": false, 
    "items": [
        {
            "index": {
                "_id": "AVzJ-slvXdDNsyfvvuMV", 
                "_index": "landmark", 
                "_shards": {
                    "failed": 0, 
                    "successful": 1, 
                    "total": 2
                }, 
                "_type": "sample", 
                "_version": 1, 
                "created": true, 
                "result": "created", 
                "status": 201
            }
        }, 
        {
            "index": {
                "_id": "AVzJ-slwXdDNsyfvvuMW", 
                "_index": "landmark", 
                "_shards": {
                    "failed": 0, 
                    "successful": 1, 
                    "total": 2
                }, 
                "_type": "sample", 
                "_version": 1, 
                "created": true, 
                "result": "created", 
                "status": 201
            }
        }, 
        {
            "index": {
                "_id": "AVzJ-slwXdDNsyfvvuMX", 
                "_index": "landmark", 
                "_shards": {
                    "failed": 0, 
                    "successful": 1, 
                    "total": 2
                }, 
                "_type": "sample", 
                "_version": 1, 
                "created": true, 
                "result": "created", 
                "status": 201
            }
        }
    ], 
    "took": 286
}

検索

最後に検索。検索も json で query を作って実行する。メソッドは GET を使用。検索に使えるのは、調べ中。。。 AND　検索はもちろん、数値の比較とか範囲指定とかもできるみたい。

$ curl -s -X GET http://localhost:9200/landmark/sample/_search -d '{"query":{"match":{"description":"ジャズ"}}}' | jq .
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.928525,
    "hits": [
      {
        "_index": "landmark",
        "_type": "sample",
        "_id": "AVzJ-slwXdDNsyfvvuMW",
        "_score": 0.928525,
        "_source": {
          "name": "BLUE NOTE TOKYO",
          "coord": {
            "lat": "35.661198",
            "lon": "139.716207"
          },
          "description": "N.Y.の「Blue Note」を本店に持つジャズクラブとして、南青山にオープンしたブルーノート東京。ジャズをはじめとする多様な音楽ジャンルのトップアーティストたちが、連夜渾身のプレイを繰り広げている。"
        }
      }
    ]
  }
}

まとめ

ひとまず、ローカルで立ち上げる→データを突っ込む→検索するの流れは試せてた。次はどーやったら、がつっとデータ突っ込めるかを調べてみる。