Introduction to Mapping
Mapping is a means to define the storage and indexing methods of documents and their fields, for example, to mapping
define the following content:
- Which fields need to be defined as full-text search type
- Which fields contain
number
,date
type, etc. - format time format
- Custom rules to control the mapping of dynamically added fields
Mapping Type
Each index has a unique index mapping type
that determines how documents will be indexed. mapping type
Consists of the following two parts
-
Meta-fields
are used to customize how a document's associated metadata is handled. Examples of meta fields include the document's _index, _type, _id, and _source fields. -
The Fields or properties
mapping type contains a list of fields or properties related to the document.
Tokenizer Best Practices
Because of subsequent keyword
and text
design word segmentation issues, best practices for word segmentation are given here. That is, use ik_max_word when indexing, and use ik_smart as a tokenizer when searching , so that the content can be segmented to the maximum extent when indexing, and the desired results can be searched more accurately when searching.
For example, what I want to search for is Xiaomi mobile phone. My idea at this time is to search for the products of Xiaomi mobile phone, not other products such as Xiaomi stereo and Xiaomi washing machine. That is to say, there must be only the word Huawei mobile phone in the product information.
We will use it later "search_analyzer": "ik_smart"
to achieve this requirement.
Field Type
- A simple data type such as the
text
,keyword
,double
,boolean
,long
,date
,ip
type. - It can also be a hierarchical json object (supports attribute nesting).
- It can also be some special types that are not commonly used, such as
geo_point
,geo_shape
,completion
Supporting multiple field types for the same field can better meet our search needs. For example, a string
type of field can be set to text
support full-text search. At the same time, this field can also have keyword
a type for sorting and aggregation. In addition, we can also The word segmentation method can be configured separately for the field, for example"analyzer": "ik_max_word",
text type
text
The type field is used for full-text search, such as the subject of the email, the description of the product in Taobao Jingdong, etc. This kind of field is word-segmented before being indexed and stored , and the word-segmented result is stored instead of the complete field. text
Fields are not suitable for sorting and aggregation. For some structured fields, it is recommended to use types for meaningless fields after word segmentation keyword
, such as email address, host name, product label, etc.
Common parameters include the following
- analyzer: used for word segmentation, including the index storage stage and the search stage (where the query stage can be overridden by the search_analyzer parameter), this parameter is set to the analyzer setting of index or standard analyzer by default
- index: whether it can be searched. default is
true
- fields: Multi-fields allow the same string value to be indexed in different ways at the same time, for example, use different analyzers to make one field for sorting and clustering, and another same string for analysis and full-text search. The following will do a detailed description
- search_analyzer: This field is used to specify the tokenizer used in the search phase
analyzer
, the default setting - search_quote_analyzer: The tokenizer used when searching for a phrase, the default
search_analyzer
setting
keyword type
keyword
Fields used to index structured content (such as email addresses, hostnames, status codes, zip codes or labels), these fields are not meaningful after being split, so the complete fields should be indexed in es, not after word segmentation the result of.
Commonly used for filtering (such as querying all published articles based on their publication status in a blog), sorting and aggregation. keyword
You can only search precisely by field, such as querying article details by article id. If you want to perform a full-text search for related words based on this field, you can use text
the type.
PUT my_index { "mappings": { "properties": { "tags": { "type": "keyword" } } } }
- index: whether it can be searched. default is
true
-
fields: Multi-fields allow the same string value to be indexed in different ways at the same time, for example, use different analyzers to make one field for sorting and clustering, and another same string for analysis and full-text search. The following will do a detailed description
-
null_value: If the field is empty, set the default value, the default is
null
-
ignore_above: Sets the threshold for index field size. This field will not index the value that exceeds the value set by this attribute. The default value is 2147483647, which means that it can receive a value of any size. But this value can be overridden by
PUT Mapping Api
a new one set in the middleignore_above
.
date type
Sorting is supported, and format
the time format can be formatted by field.
json
There is no time type in es, so it can be in the following form in es:
- A formatted string, such as
"2015-01-01"
or"2015/01/01 12:10:30"
-
A
long
number of type that refers to the number of milliseconds since a certain time, such as1420070400001
-
A
integer
number of type that refers to the number of seconds from a certain time
object type
mapping
There is no need to specify field as object
the type, because this is its default type.
json
Types are inherently hierarchical, and documents can also contain object
types for nesting. For example:
PUT my_index { "mappings": { "properties": { "region": { "type": "keyword" }, "manager": { "properties": { "age": { "type": "integer" }, "name": { "properties": { "first": { "type": "text" }, "last": { "type": "text" } } } } } } } }
nest type
nest
A type is a special object
type that allows object
an array to be indexed, and each item in the array can be retrieved independently.
Moreover, there is no concept of internal class in es, but the effect is achieved through a simple list nest
, such as the following structure documents:
Objects in the above format will be indexed according to the following format, so it will be found that the two attribute values in a user no longer match, alice
and white
the relationship with
range type
The following range types are supported:
type | scope |
---|---|
integer_range | -2的31次 to 2的31次-1 . |
float_range | 32-bit single-precision floating-point number |
long_range | -2的63次 to 2的63次-1 . |
double_range | 64-bit double-precision floating-point number |
date_range | unsigned 64-bit integer milliseconds |
ip_range | ipv4 and ipv6 or a mix of both |
Actual combat: using keyword and text types at the same time
Note: term means no word segmentation for keywords when querying, keyword means no word segmentation when indexing
As we have explained above keyword
, text
there is a non-word-segmented index and a post-word-segmented index. We use their fields
attributes to make the current field have keyword
and text
type at the same time.
First we create the index and specify mapping
that title
both set keyword
and text
attribute
We have inserted the following data into es
_index | _type | _id | _score | itemId | title | desc | num | Price |
---|---|---|---|---|---|---|---|---|
idx_item | _doc | rvsX-W4Bo-iJGWqbQ8dk | 1 | 1 | Supor cooking rice SL3200 | Make cooking easier and make life happier | 100 | 200 |
idx_item | _doc | sPsY-W4Bo-iJGWqbscfU | 1 | 3 | Mr. Mighty the Kitchen | You make porridge, I wash the pot | 100 | 30 |
idx_item | _doc | r_sX-W4Bo-iJGWqbhMew | 1 | 2 | Supor good porridge cooker model SL322 | You cook porridge, I cook porridge, let's make porridge easier together | 100 | 190 |
title=”苏泊尔煮饭SL3200“
According to text
the most fine-grained word segmentation settings "analyzer": "ik_max_word"
, the index storage is performed in the following form in es
title.keyword=”苏泊尔煮饭SL3200
Because there is no word segmentation, the index storage form in es is
We first title.keyword
search for the first piece of data, because match
the search will segment the keyword into words and then search, the result after word segmentation contains so the search is successful, we will not find the data "苏泊尔煮饭SL3200"
if we change the search keyword to 苏泊尔
, etc. 煮饭
.
We switch to term
search, which does not segment words, exactly matches the data in es, and only the first piece of data, we will not find the data if we change the search keyword to 苏泊尔
, etc.煮饭
We continued to query title
the use match
, and found the first and third pieces of data, because their indexed data in es contains 苏泊尔
keywords
If we search, we 苏泊尔煮饭SL3200
will find that no data is returned, because there is no such item title
in the index , and the search keyword is not word-divided, so the data cannot be matched. But when we change the content , we can search for the first and third items, because the word- segmented indexes of the first and third items contain fields, so we can find out the first and third items.苏泊尔煮饭SL3200
term
苏泊尔
title
苏泊尔
Actual combat: formatting time and sorting by time
We create an index idx_pro
and format mytimestamp
the and createTime
fields into two time formats
Insert four sets of sample data
We can use sort
parameters to sort, and support array form, that is, use multi-field sorting at the same time, just change it []
to
We can also use range
parameters to search for data within a specified time range, of course, also range
supports types such asinteger
long