File and Collection Management in SolrCloud

Please reprint from the source: http://eksliang.iteye.com/blog/2124078 http://eksliang.iteye.com/
1. Default port allocation when SolrCloud is started inline

When Solr runs the embedded zookeeper service, the solr port + 1000 is used as the client port by default. In addition, the solr port + 1 is used as the zookeeper service port, and the solr port + 2 is used as the main service election port. So in the first example, Solr is running on port 8983, and the embedded zookeeper uses 9983 as the client port and 9984 and 9985 as the service port.

clientPort=9983
server.1=192.168.238.133:9984 :9985
These ports are the ports in the corresponding configuration

 For example of SolrCloud inline start click: http://wiki.apache.org/solr/SolrCloud

 

2.  Manage the cluster through the cluster api (Core Admin)

1). Create an interface (the first automatic allocation)

http://192.168.66.128:8081/solr/admin/collections?action=CREATE&name=collection1&numShards=3&replicationFactor=2&maxShardsPerNode=2&collection.configName=myconf

http://192.168.66.128:8081/solr/admin/collections?action=CREATE&name=collection1&numShards=3&replicationFactor=2&maxShardsPerNode=2&collection.configName=myconf&createNodeSet=192.168.66.128:8083_solr,192.168.66.128:8081_solr,192.168.66.128:8082_solr

   In this way, a collection will come out, which has 3 shards, each shard has 1 data node and 1 backup node, that is, the collection has a total of 6 cores

 

 parameter:

name : The name of the collection to be created
numShard s: The number of logical shards to be created when the collection is created
replicationFactor : The number of replicas of the shard. A replicationFactor of 3 means that each logical shard will have 3 copies.

maxShardsPerNode : The default value is 1, the maximum number of shards on each Solr server node (new in 4.2)

Pay attention to three values: numShards, replicationFactor, liveSolrNode (the current surviving solr node). A normal solrCloud cluster does not allow multiple replicas of the same shard to be deployed on the same liveSolrNode. Therefore, when maxShardsPerNode=1, numShards*replicationFactor>liveSolrNode , an error is reported. Therefore, it is correct because the following conditions are met: numShards*replicationFactor<liveSolrNode*maxShardsPerNode

createNodeSet : If this parameter is not provided, the core will be created on all active nodes, and if this parameter is provided, the core will be created on the specified solr node

For example, I am now creating 3 slices and 1 copy on 5 tomcats. If this parameter is not provided, the result is like this



 

Provide this parameter for example: createNodeSet=192.168.66.128:8083_solr,192.168.66.128:8081_solr,192.168.66.128:8082_solr

The result is this




  collection.configName : The name of the configuration file to use for the new collection. If this parameter is not provided, the collection name will be used as the name of the configuration file.

 

Create an instance of interface 2 (manual allocation): It is recommended to use the following multiple links (3 shards, one backup on each node), because in this way you can create as many times as you want 

http://192.168.66.128:8081/solr/admin/cores?action=CREATE&name=shard1_replica1&instanceDir=shard1_replica1&dataDir=data&collection=collection1&shard=shard1&collection.configName=myconf
http://192.168.66.128:8082/solr/admin/cores?action=CREATE&name=shard1_replica2&instanceDir=shard1_replica2&dataDir=data&collection=collection1&shard=shard1&collection.configName=myconf


http://192.168.66.128:8082/solr/admin/cores?action=CREATE&name=shard2_replica1&instanceDir=shard2_replica1&dataDir=data&collection=collection1&shard=shard2&collection.configName=myconf
http://192.168.66.128:8083/solr/admin/cores?action=CREATE&name=shard2_replica2&instanceDir=shard2_replica2&dataDir=data&collection=collection1&shard=shard2&collection.configName=myconf

http://192.168.66.128:8083/solr/admin/cores?action=CREATE&name=shard3_replica1&instanceDir=shard3_replica1&dataDir=data&collection=collection1&shard=shard3&collection.configName=myconf
http://192.168.66.128:8081/solr/admin/cores?action=CREATE&name=shard3_replica2&instanceDir=shard3_replica2&dataDir=data&collection=collection1&shard=shard3&collection.configName=myconf

 Parameter meaning:

name : the name of the new core

The naming rules of the created core:

coreName_shardName_replicaN

For example: create a collection of pscp, 2 shards, each with 2 backups

are named as follows:

pscp_shard1_replica1

pscp_shard1_replica2

pscp_shard2_replica1

pscp_shard2_replica2

shard : Specify an allocation id, the core will be attached to that shard (write whatever you want, if you don't have this id, it will help you create it for the first time)
collection.configName : Specify a configuration file from zookeeper

instanceDir and dataDir : see his meaning from the following figure

Naming rules: instanceDir has the same name as name, dataDir: it is recommended to be named data

 

Summary 1: Two ways to add a replica to a cluster

http://192.168.66.128:8081/solr/admin/collections?action=ADDREPLICA&collection=collection1&shard=shard2&node=192.168.66.128:8085_solr
The meaning of the above sentence is to add a copy to the shard2 fragment of the collection1, and the address of the copy is on the 192.168.66.128:8085_solr machine.
http://192.168.66.128:8083/solr/admin/cores?action=CREATE&name=shard3_replica1&instanceDir=shard3_replica1&dataDir=data&collection=collection1&shard=shard3&collection.configName=myconf

 

2). Delete the interface

http://localhost:8983/solr/admin/collections?action=DELETE&name=mycollection

   parameter:

name:将被创建的集合别名的名字
collections:逗号分隔的一个或多个集合别名的列表

 

3). Reload the interface. At this time, the corresponding core will reload the configuration file

http://localhost:8983/solr/admin/collections?action=RELOAD&name=mycollection

 parameter:

name:将被重载的集合的名字

 

4). Split fragment interface

http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=<collection_name>&shard=shardId

 collection : the name of the collection

shard:将被分割的碎片 ID

 This command cannot be used on clusters that use custom hashes, because such clusters do not have an explicit hash range. It is only used for clusters with plain or compositeid routes. This command will split the shard corresponding to the given shard index into two new shards. By dividing the shard range into two equal partitions and splitting its documents in the parent shard (the shard being divided) according to the new shard range. The new shards will be named appending_0 and _1. For example: shard=shard1 is split, the new shards will be named shard1_0 and shard1_1. Once new shards are created, they are activated while the parent shard (forked shard) is suspended so there will be no new requests to the parent (forked shard). This feature meets the requirements for seamless segmentation and no downtime. The original fragmented data will not be deleted. Use the new API command to overload shards at the user's discretion. This feature was released from Solr 4.3. Since some bugs were found in the 4.3 release, it is recommended to wait for 4.3.1 to use this feature.

 

3. Upload files to Zookeeper for management through command line tools

The reason why it can be distributed is because ZooKeeper is introduced to save the configuration files uniformly. Therefore, it is necessary to upload the configuration file of SolrCloud to ZooKeeper. Here, the command line is shown for uploading.

To use the command line management tool, you must first have packages. These packages are all jar packages under /WEB-INF/lib in solr.war

Step 1: Create a new folder

On any machine that can communicate with the Zookeeper cluster, create two new folders, for example, the following is my directory

/usr/solrCloud/conf/files  /usr/solrCloud/conf/lib

files: used to save configuration files lib: used to store jar packages

Step 2: Upload the jar and configuration files you need to use

Upload the jar to the lib directory, and put the jars under the solr distribution package (solr-4.8.0\example\solr-webapp\webapp\WEB-INF\lib\ and solr-4.8.0\example\lib\ext\ The following packages are all To) upload all to the lib directory above

Upload the solr configuration file to the above files directory

Step 3: Upload files to Zookeeper for unified management

java -classpath .:/usr/solrCloud/conf/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 192.168.27.18:2181,192.168.27.18:2182,192.168.27.18:2183 -confdir /usr/solrCloud/conf/files  -confname myconf

-cmd upconfig: upload configuration file

-confdir: directory for configuration files 

-confname: specify the corresponding name

Check if the file has been uploaded to the Zookeeper server:

sh zkCli.sh -server localhost:2181
ls /configs/myconf

Step 4: Associate the configuration file uploaded to ZooKeeper with the collection

java -classpath .:/usr/solrCloud/conf/lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection collection1 -confname myconf -zkhost 192.168.27.18:2181,192.168.27.18:2182,192.168.27.18:2183

-cmd linkconfig: "bind" the configuration file for the specified collection

-collection: The name of the collection specified above

-confname: the name of the configuration file on zookeeper

The meaning of the above code is: the created core (collection1) will use the myconf configuration file

For example: Executing the following request will create a core as collection1, then the configuration file he uses is the configuration of myconf in zookeeper

http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=3&replicationFactor=1

Then again, if there is only one configuration on the cluster managed by zookeeper, the created core will use this default configuration. If there are multiple copies, if the fourth step is not performed, creating a core will throw an exception and the build will fail!

For example execute:

http://192.168.66.128:8081/solr/admin/collections?action=CREATE&name=sdf&numShards=3&replicationFactor=1

 Will throw: Because there are two configurations above, but the fourth step is not performed, associating the configuration with the core (name=sdf) to be created

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">16563</int>
</lst>
<lst name="failure">
<str>
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'sdf_shard2_replica1': Unable to create core: sdf_shard2_replica1 Caused by: Could not find configName for collection sdf found:[conf1, myconf]
</str>
<str>
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'sdf_shard1_replica1': Unable to create core: sdf_shard1_replica1 Caused by: Could not find configName for collection sdf found:[conf1, myconf]
</str>
<str>
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'sdf_shard3_replica1': Unable to create core: sdf_shard3_replica1 Caused by: Could not find configName for collection sdf found:[conf1, myconf]
</str>
</lst>
</response>

 

Of course, the fourth step can also be replaced with the following, and the following one is more flexible and recommended (with this step, the fourth step can be completely omitted )

http://192.168.66.128:8081/solr/admin/collections?action=CREATE&name=conf2&numShards=3&replicationFactor=1&collection.configName=myconf
collection.configName=myconf: Specify a configuration in zookeeper for the created core

 

The document is written here, let's see how to modify and delete files uploaded to zookeeper:

The common practice of modification is: re-upload, re-upload will overwrite the above files, so as to achieve the purpose of modification

The way to delete files or directories in zookeeper is as follows:

[zk: 192.168.66.128:2181(CONNECTED) 7] delete /configs/conf1/schema.xml
[zk: 192.168.66.128:2181(CONNECTED) 10] ls /configs/conf1
[solrconfig.xml]
[zk: 192.168.66.128:2181(CONNECTED) 11]

 

 Upload the configuration to zookeeper. If you want the running solr to load these files synchronously, you only need to let solr reload the configuration file and enter in the browser

http://192.168.27.18:8081/solr/admin/collections?action=RELOAD&name=collection1

 

 references:

How to manage the collection official website of the entire cluster through api

https://cwiki.apache.org/confluence/display/solr/Collections+API

Manage solr core official website through api

http://wiki.apache.org/solr/CoreAdmin

SolrCloud deployment official website on tomcat

http://wiki.apache.org/solr/SolrCloudTomcat

Solr deploys the official website on tomcat

http://wiki.apache.org/solr/SolrTomcat

Blogs worth referring to:

http://blog.csdn.net/xyls12345/article/details/27504965

http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html#deploying-solrcloud

http://blog.csdn.net/woshiwanxin102213/article/details/18793271

http://blog.csdn.net/natureice/article/details/9109351

Solrcloud name explanation

http://www.solr.cc/blog/?p=99

solr.xml explained

http://www.abyssss.com/?p=415

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326462624&siteId=291194637