Elasticsearch : Advanced search and nested objects

In a non relational database system, joins can miss. Fortunately, Elasticsearch provides solutions to meet these needs :

Array Type

Read the doc on elasticsearch.org

As its name suggests, it can be an array of native types (string, int, …) but also an array of objects (the basis used for “objects” and “nested”).

Here are some valid indexing examples :

{
    "Article" : [
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [
            {
                "firstname" : "Francois",
                "surname": "francoisg",
                "id" : 18
            },
            {
                "firstname" : "Gregory",
                "surname" : "gregquat"
                "id" : "2"
            }
        ]
      }
    },
    {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [
            {
                "firstname" : "Gregory",
                "surname" : "gregquat",
                "id" : "2"
            }
        ]
      }
}

You can find different Array :

  • Categories : array of integers
  • Tags : array of strings
  • author : array of objects (inner objects or nested)

We explicitely specify this “simple” type as it can be more easy/maintainable to store a flatten value rather than the complete object.
Using a non relational structure should make you think about a specific model for your search engine :

  • To filter : If you just want to filter/search/aggregate on the textual value of an object, then flatten the value in the parent object.
  • To get the list of objects that are linked to a parent (and if you do not need to filter or index these objects), just store the list of ids and hydrate them with Doctrine and Symfony (in French for the moment).

Inner objects

The inner objects are just the JSON object association in a parent. For example, the “authors” in the above example. The mapping for this example could be :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author : 
                            type : object
                            properties : 
                                firstname : ~
                                surname : ~
                                id : 
                                    type : integer

You can Filter or Query on these “inner objects”. For example :

query: author.firstname=Francois will return the post with the id 12 (and not the one with the id 13).

You can read more on the Elasticsearch website

Inner objects are easy to configure. As Elasticsearch documents are “schema less”, you can index them without specify any mapping.

The limitation of this method lies in the manner as ElasticSearch stores your data. Reusing the above example, here is the internal representation of our objects :

[
      {
        "id" : 12
        "title" : An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author.firstname" : ["Francois","Gregory"],
        "author.surname" : ["Francoisg","gregquat"],
        "author.id" : [18,2]
      }
      {
        "id" : 13
        "title" : "A second article",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author.firstname" : ["Gregory"],
        "author.surname" : ["gregquat"],
        "author.id" : [2]
      }
]

The consequence is that the query :

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "firstname": "francois",
          "surname": "gregquat"
        }
      }
    }
  }
}


author.firstname=Francois AND surname=gregquat will return the document “12″. In the case of an inner object, this query can by translated as “Who has at least one author.surname = gregquat and one author.firstname=francois”.

To fix this problem, you must use the nested.

Les nested

First important difference : nested must be specified in your mapping.

The mapping looks like an object one, only the type changes :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author : 
                            type : nested
                            properties : 
                                firstname : ~
                                surname : ~
                                id : 
                                    type : integer

This time, the internal representation will be :

[
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [{
            "firstname" : "Francois",
            "surname" : "Francoisg",
            "id" : 18
        },
        {
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      },
      {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tags" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [{
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      }
]

This time, we keep the object structure.

Nested have their own filters which allows to filter by nested object. If we go on with our example (with the limitation of inner objects), we can write this query :

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "nested" : {
          "path" : "author",
          "filter": {
            "bool": {
              "must": [
                {
                  "term" : {
                    "author.firsname": "francois"
                  }
                },
                {
                  "term" : {
                    "author.surname": "gregquat"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}


hi
We can translate it as “Who has an author object whose surname is equal to ‘gregquat’ and whose firstname is ‘francois’”. This query will return no result.

There is still a problem which is penalizing when working with bug objects : when you want to change a single value of the nester, you have to reindex the whole parent document (including the nested).
If the objects are heavy, and often updated, the impact on performances can be important.

To fix this problem, you can use the parent/child associations.

Parent/Child

Parent/child associations are very similar to OneToMany relationships (one parent, several children).
The relationship remains hierarchical : an object type is only associated to one parent, and it’s impossible to create a ManyToMany relationship.

We are going to link our article to a category :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                category : 
                    mappings : 
                        id : ~
                        name : ~
                        description : ~
                article :
                    mappings:
                        title : ~
                        tag : ~
                        author : ~
                    _routing:
                        required: true
                        path: category
                    _parent:
                        type : "category"
                        identifier: "id" #optional as id is the default value
                        property : "category" #optional as the default value is the type value

When indexing an article, a reference to the Category will also be indexed (category.id).
So, we can index separately categories and article while keeping the references between them.

Like for nested, there are Filters and Queries that allow to search on parents or children :

  • Has Parent Filter / Has Parent Query : Filter/query on parent fields, returns children objects. In our case, we could filter articles whose parent category contains “symfony” in his description.
  • Has Child Filter / Has Child Query : Filter/query on child fields, returns the parent object. In our case, we could filter Categories for which “francoisg” has written an article.
{
  "query": {
    "has_child": {
      "type": "article",
      "query" : {
        "filtered": {
          "query": { "match_all": {}},
          "filter" : {
              "term": {"tag": "symfony"}
          }
        }
      }
    }
  }
}


This query will return the Categories that have at least one article tagged with “symfony”.

The queries are here written in JSON, but are easily transformable into PHP with the Elastica library.

These websites can also be interested to read :

http://obtao.com/blog/2014/04/elasticsearch-advanced-search-and-nested-objects/

猜你喜欢

转载自m635674608.iteye.com/blog/2317749