Elasticsearch: mapping custom

Elasticsearch based on the data we provide, according to their estimates, and automatically generate the appropriate mapping. This is a lot of time is very useful. It can help us to automatically save a lot of manual operation, and in most cases, Elasticsearch help us to automatically generate the mapping is effective. Generally speaking, if we want to define your own mapping, then the following steps would be desirable, and it is the recommended method:

  1. To own a data input to the Elasticsearch
  2. Mapping input data obtained above and adjusted on this basis, to arrive at suitable mapping data

In the following few examples to show us to a few places we need to adjust.

 

Adjustment data types

We first enter the following command in our Kibana in:

PUT myindex/_doc/1
{
  "status_code": 404
}

Create an index above myindex called, inside it contains a status_code HTTP request we usually see. Here is our document 404. Let's show this mapping myindex by the following command. Remember this mapping is the first time data input, according to the data we enter a guess:

GET myindex/_mapping

The results show that the above command:

{
  "myindex" : {
    "mappings" : {
      "properties" : {
        "status_code" : {
          "type" : "long"
        }
      }
    }
  }
}

From the above results, we can see that type is long, meaning it will be a data length of 64bit. We know that HTTP is status_code, data is usually less than 1000. Such a 64 bit data clearly storage space is wasted. If our data is small, this may not be anything, but if we have vast amounts of data, wasting storage space that would be great. To this end, we can use that data type is changed to short type, which is 16 bit:

PUT myindex1
{
  "mappings": {
    "properties": {
      "status_code": {
        "type": "short"
      }
    }
  }
}

In the above, mapping myindex1 we created in status_code defined as short type.

If you've read my other article " started elasticsearch (2) ", we also adjust a geo_point data types in there. such as:

PUT twitter/_doc/1
{
  "user": "zhangsan",
  "location": {
    "lat": "39.970718",
    "lon": "116.325747"
  }
}

Elasticsearch help us look at the data types we produce:

GET twitter/_mapping
{
  "twitter" : {
    "mappings" : {
      "properties" : {
        "location" : {
          "properties" : {
            "lat" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "lon" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "user" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

Obviously, the above we can see the lat and lon are estimated to type text, and is a multi-field data type, which is obviously not what we need. In addition, our user types, we do not think it is a type of text. Then we can make the following adjustments:

PUT twitter1
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_point"
      },
      "user": {
        "type": "keyword"
      }
    }
  }
}

In the above twitter1, we can see the adjusted location geo_point type of data types, and user has become a keyword type we want.

Dynamic mapping is not always optimized

For a floating-point number is:

PUT my_index/_doc/1
{
  "price": 1.99
}

We can get its mapping:

GET my_index/_mapping
{
  "my_index" : {
    "mappings" : {
      "properties" : {
        "price" : {
          "type" : "float"
        }
      }
    }
  }
}

We can see from the above, price data type is a float. For most cases, this should be no problem. However, in practical applications, we can convert this float data type is scaled float data types. Scaled float is supported by long data types. long data type can be compressed more efficiently in Lucene, thereby saving storage space. When using scaled float data type, you must configure it to use scaling_factore accuracy:

PUT my_index1/_doc/1
{
  "mappings": {
    "properties": {
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100
      }
    }
  }
}

In the above, we define the type scaled_float price data type, and is defined as 100 scaling_factor. So our data, for example, can be multiplied by 1.99 just 100 becomes 199, making it an integer.

After this transformation, we can try to re-enter a document my_index1 in:

PUT my_index1/_doc/1
{
  "price": 1.99
}

We query the data by the following method:

GET my_index1/_doc/1

The results returned are:

{
  "_index" : "my_index1",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "price" : 1.99
  }
}

We can see from the above, although we modified our data types, but we returned data was right.

 

 

 

 

 

 

 

 

 

 

 

 

Published 512 original articles · won praise 124 · views 900 000 +

Guess you like

Origin blog.csdn.net/UbuntuTouch/article/details/104816322