【Elastic Search】入门

一、Elastic Search总结介绍

二、安装运行

系统需要安装JRE1.6以上版本
到http://www.elasticsearch.org/overview/elkdownloads/下载最新版本的ES，ES可以运行在Windows和Linux上
解压运行bin目录底下的elasticsearch.bat即可启动ES。

三、伪分布式运行Elastic Search

1. 复制两份elasticsearch-1.4.0(包括目录)，分别改名为elasticsearch-1.4.0_2和elasticsearch-1.4.0_3

2. 在elasticsearch-1.4.0/conf/elasticsearch.yml中添加两行配置

cluster.name: "tom_cluster"
node.name: "tom_node_1"

3. 在elasticsearch-1.4.0_2/conf/elasticsearch.yml中添加四行配置

cluster.name: tom_cluster
node.name: "tom_node_2"
transport.tcp.port: 9302
http.port: 9202

4. 在elasticsearch-1.4.0_3/conf/elasticsearch.yml中添加四行配置

cluster.name: tom_cluster
node.name: "tom_node_3"
transport.tcp.port: 9303
http.port: 9203

5.分别运行三个目录/bin/elasticsearch.bat，启动集群

四、Elastic Search安装插件

Elastic Search提供了插件可扩展机制，http://www.searchtech.pro/elasticsearch-plugins提供了一个详细的列表

安装Head插件

1.在bin目录下，执行plugin.bat -install mobz/elasticsearch-head

2.启动elasticsearch,访问http://localhost:9200/_plugin/head

安装bigdesk插件

安装MongoDB River插件

五、理解Elastic Search的两个术语：分片和复本集

Shards & Replicas

An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.

To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.

Sharding is important for two primary reasons:

It allows you to horizontally split/scale your content volume
It allows you distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput

The mechanics of how a shard is distributed and also how its documents are aggregated back into search requests are completely managed by Elasticsearch and is transparent to you as the user.

In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.

Replication is important for two primary reasons:

It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.

To summarize, each index can be split into multiple shards. An index can also be replicated zero (meaning no replicas) or more times. Once replicated, each index will have primary shards (the original shards that were replicated from) and replica shards (the copies of the primary shards). The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may change the number of replicas dynamically anytime but you cannot change the number shards after-the-fact.

By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.

意思是说，一个Node中默认有5个Primary Shard，并且一个Node就是是一个replica，或者说replica=5shards?

启动elasticsearch后，通过REST client添加一条数据，

url：http://localhost:9200/index1/col1/1
method:POST
body

{"a":1, "b":2}

执行完成后，访问http://localhost:9200/_plugin/head查看当前的节点状态，可见节点有个绿色的分片，编号为0-4，只有其中一个分片有数据(包含了刚才新建的这个索引数据)。

上面的操作执行10遍，即，再请求http://localhost:9200/index1/col1/2，http://localhost:9200/index1/col1/10.执行完成后，访问http://localhost:9200/_plugin/head查看当前的节点状态，此时五个分片都有数据，三个分片每个分片2条数据，1个分片1条数据，1个分片3条数据，总共10条数据。所以，在一个节点内部，数据存储是以分片作为更新力度的单位进行保存，也就是说，一个index会分散到不同的分片里面去。

关于分片和副本的概念，可以参考：http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/