The full solution of mapping in elasticsearch

Introduction to Mapping

Mapping is a means to define the storage and indexing methods of documents and their fields, for example, to mapping define the following content:

  • Which fields need to be defined as full-text search type
  • Which fields contain number, datetype, etc.
  • format time format
  • Custom rules to control the mapping of dynamically added fields

Mapping Type

Each index has a unique index  mapping typethat determines how documents will be indexed. mapping typeConsists of the following two parts

  • Meta-fields
    are used to customize how a document's associated metadata is handled. Examples of meta fields include the document's _index, _type, _id, and _source fields.

  • The Fields or properties
    mapping type contains a list of fields or properties related to the document.

Tokenizer Best Practices

Because of subsequent keywordand textdesign word segmentation issues, best practices for word segmentation are given here. That is, use ik_max_word when indexing, and use ik_smart as a tokenizer when searching , so that the content can be segmented to the maximum extent when indexing, and the desired results can be searched more accurately when searching.

For example, what I want to search for is Xiaomi mobile phone. My idea at this time is to search for the products of Xiaomi mobile phone, not other products such as Xiaomi stereo and Xiaomi washing machine. That is to say, there must be only the word Huawei mobile phone in the product information.

We will use it later "search_analyzer": "ik_smart"to achieve this requirement.

Field Type

  • A simple data type such as the text, keyword, double, boolean, long, date, iptype.
  • It can also be a hierarchical json object (supports attribute nesting).
  • It can also be some special types that are not commonly used, such as geo_point, geo_shape,completion

Supporting multiple field types for the same field can better meet our search needs. For example, a stringtype of field can be set to textsupport full-text search. At the same time, this field can also have keyworda type for sorting and aggregation. In addition, we can also The word segmentation method can be configured separately for the field, for example"analyzer": "ik_max_word",

text type

textThe type field is used for full-text search, such as the subject of the email, the description of the product in Taobao Jingdong, etc. This kind of field is word-segmented before being indexed and stored , and the word-segmented result is stored instead of the complete field. textFields are not suitable for sorting and aggregation. For some structured fields, it is recommended to use types for meaningless fields after word segmentation keyword, such as email address, host name, product label, etc.

Common parameters include the following

  • analyzer: used for word segmentation, including the index storage stage and the search stage (where the query stage can be overridden by the search_analyzer parameter), this parameter is set to the analyzer setting of index or standard analyzer by default
  • index: whether it can be searched. default istrue
  • fields: Multi-fields allow the same string value to be indexed in different ways at the same time, for example, use different analyzers to make one field for sorting and clustering, and another same string for analysis and full-text search. The following will do a detailed description
  • search_analyzer: This field is used to specify the tokenizer used in the search phaseanalyzer , the default setting
  • search_quote_analyzer: The tokenizer used when searching for a phrase, the default search_analyzersetting

keyword type

keywordFields used to index structured content (such as email addresses, hostnames, status codes, zip codes or labels), these fields are not meaningful after being split, so the complete fields should be indexed in es, not after word segmentation the result of.

Commonly used for filtering (such as querying all published articles based on their publication status in a blog), sorting and aggregation. keywordYou can only search precisely by field, such as querying article details by article id. If you want to perform a full-text search for related words based on this field, you can use textthe type.

PUT my_index { "mappings": { "properties": { "tags": { "type": "keyword" } } } }
  • index: whether it can be searched. default istrue
  • fields: Multi-fields allow the same string value to be indexed in different ways at the same time, for example, use different analyzers to make one field for sorting and clustering, and another same string for analysis and full-text search. The following will do a detailed description

  • null_value: If the field is empty, set the default value, the default isnull

  • ignore_above: Sets the threshold for index field size. This field will not index the value that exceeds the value set by this attribute. The default value is 2147483647, which means that it can receive a value of any size. But this value can be overridden by PUT Mapping Apia new one set in the middle ignore_above.

date type

Sorting is supported, and formatthe time format can be formatted by field.

jsonThere is no time type in es, so it can be in the following form in es:

  • A formatted string, such as "2015-01-01"or"2015/01/01 12:10:30"
  • A longnumber of type that refers to the number of milliseconds since a certain time, such as1420070400001

  • A integernumber of type that refers to the number of seconds from a certain time

object type

mappingThere is no need to specify field as objectthe type, because this is its default type.

jsonTypes are inherently hierarchical, and documents can also contain objecttypes for nesting. For example:

 
PUT my_index { "mappings": { "properties": { "region": { "type": "keyword" }, "manager": { "properties": { "age": { "type": "integer" }, "name": { "properties": { "first": { "type": "text" }, "last": { "type": "text" } } } } } } } }

nest type

nestA type is a special objecttype that allows objectan array to be indexed, and each item in the array can be retrieved independently.

Moreover, there is no concept of internal class in es, but the effect is achieved through a simple list nest, such as the following structure documents:

Objects in the above format will be indexed according to the following format, so it will be found that the two attribute values ​​​​in a user no longer match, aliceand whitethe relationship with

range type

The following range types are supported:

type scope
integer_range -2的31次 to  2的31次-1.
float_range 32-bit single-precision floating-point number
long_range -2的63次 to  2的63次-1.
double_range 64-bit double-precision floating-point number
date_range unsigned 64-bit integer milliseconds
ip_range ipv4 and ipv6 or a mix of both
 

Actual combat: using keyword and text types at the same time

Note: term means no word segmentation for keywords when querying, keyword means no word segmentation when indexing

As we have explained above keyword, textthere is a non-word-segmented index and a post-word-segmented index. We use their fieldsattributes to make the current field have keywordand texttype at the same time.

First we create the index and specify mappingthat titleboth set keywordand textattribute

 

We have inserted the following data into es

_index _type _id _score itemId title desc num Price
idx_item _doc rvsX-W4Bo-iJGWqbQ8dk 1 1 Supor cooking rice SL3200 Make cooking easier and make life happier 100 200
idx_item _doc sPsY-W4Bo-iJGWqbscfU 1 3 Mr. Mighty the Kitchen You make porridge, I wash the pot 100 30
idx_item _doc r_sX-W4Bo-iJGWqbhMew 1 2 Supor good porridge cooker model SL322 You cook porridge, I cook porridge, let's make porridge easier together 100 190

title=”苏泊尔煮饭SL3200“ According to textthe most fine-grained word segmentation settings "analyzer": "ik_max_word", the index storage is performed in the following form in es

title.keyword=”苏泊尔煮饭SL3200Because there is no word segmentation, the index storage form in es is

We first title.keywordsearch for the first piece of data, because matchthe search will segment the keyword into words and then search, the result after word segmentation contains so the search is successful, we will not find the data "苏泊尔煮饭SL3200"if we change the search keyword to 苏泊尔, etc. 煮饭.

We switch to termsearch, which does not segment words, exactly matches the data in es, and only the first piece of data, we will not find the data if we change the search keyword to 苏泊尔, etc.煮饭

We continued to query titlethe use match, and found the first and third pieces of data, because their indexed data in es contains 苏泊尔keywords

If we search, we 苏泊尔煮饭SL3200will find that no data is returned, because there is no such item titlein the index , and the search keyword is not word-divided, so the data cannot be matched. But when we change the content , we can search for the first and third items, because the word- segmented indexes of the first and third items contain fields, so we can find out the first and third items.苏泊尔煮饭SL3200term苏泊尔title苏泊尔

Actual combat: formatting time and sorting by time

We create an index idx_proand format mytimestampthe and createTimefields into two time formats

 

Insert four sets of sample data

We can use sortparameters to sort, and support array form, that is, use multi-field sorting at the same time, just change it []to

We can also use rangeparameters to search for data within a specified time range, of course, also rangesupports types such asintegerlong

Guess you like

Origin blog.csdn.net/qq_38140936/article/details/103541810