In a non relational database system, joins can miss. Fortunately, Elasticsearch provides solutions to meet these needs :
Array Type
Read the doc on elasticsearch.org
As its name suggests, it can be an array of native types (string, int, …) but also an array of objects (the basis used for “objects” and “nested”).
Here are some valid indexing examples :
{ "Article" : [ { "id" : 12 "title" : "An article title", "categories" : [1,3,5,7], "tag" : ["elasticsearch", "symfony",'Obtao'], "author" : [ { "firstname" : "Francois", "surname": "francoisg", "id" : 18 }, { "firstname" : "Gregory", "surname" : "gregquat" "id" : "2" } ] } }, { "id" : 13 "title" : "A second article title", "categories" : [1,7], "tag" : ["elasticsearch", "symfony",'Obtao'], "author" : [ { "firstname" : "Gregory", "surname" : "gregquat", "id" : "2" } ] } }
You can find different Array :
- Categories : array of integers
- Tags : array of strings
- author : array of objects (inner objects or nested)
We explicitely specify this “simple” type as it can be more easy/maintainable to store a flatten value rather than the complete object.
Using a non relational structure should make you think about a specific model for your search engine :
- To filter : If you just want to filter/search/aggregate on the textual value of an object, then flatten the value in the parent object.
- To get the list of objects that are linked to a parent (and if you do not need to filter or index these objects), just store the list of ids and hydrate them with Doctrine and Symfony (in French for the moment).
Inner objects
The inner objects are just the JSON object association in a parent. For example, the “authors” in the above example. The mapping for this example could be :
fos_elastica: clients: default: { host: %elastic_host%, port: %elastic_port% } indexes: blog : types: article : mappings: title : ~ categories : ~ tag : ~ author : type : object properties : firstname : ~ surname : ~ id : type : integer
You can Filter or Query on these “inner objects”. For example :
query: author.firstname=Francois
will return the post with the id 12 (and not the one with the id 13).
You can read more on the Elasticsearch website
Inner objects are easy to configure. As Elasticsearch documents are “schema less”, you can index them without specify any mapping.
The limitation of this method lies in the manner as ElasticSearch stores your data. Reusing the above example, here is the internal representation of our objects :
[ { "id" : 12 "title" : An article title", "categories" : [1,3,5,7], "tag" : ["elasticsearch", "symfony",'Obtao'], "author.firstname" : ["Francois","Gregory"], "author.surname" : ["Francoisg","gregquat"], "author.id" : [18,2] } { "id" : 13 "title" : "A second article", "categories" : [1,7], "tag" : ["elasticsearch", "symfony",'Obtao'], "author.firstname" : ["Gregory"], "author.surname" : ["gregquat"], "author.id" : [2] } ]
The consequence is that the query :
{ "query": { "filtered": { "query": { "match_all": {} }, "filter": { "term": { "firstname": "francois", "surname": "gregquat" } } } } }
author.firstname=Francois AND surname=gregquat
will return the document “12″. In the case of an inner object, this query can by translated as “Who has at least one author.surname = gregquat and one author.firstname=francois”.
To fix this problem, you must use the nested.
Les nested
First important difference : nested must be specified in your mapping.
The mapping looks like an object one, only the type changes :
fos_elastica: clients: default: { host: %elastic_host%, port: %elastic_port% } indexes: blog : types: article : mappings: title : ~ categories : ~ tag : ~ author : type : nested properties : firstname : ~ surname : ~ id : type : integer
This time, the internal representation will be :
[ { "id" : 12 "title" : "An article title", "categories" : [1,3,5,7], "tag" : ["elasticsearch", "symfony",'Obtao'], "author" : [{ "firstname" : "Francois", "surname" : "Francoisg", "id" : 18 }, { "firstname" : "Gregory", "surname" : "gregquat", "id" : 2 }] }, { "id" : 13 "title" : "A second article title", "categories" : [1,7], "tags" : ["elasticsearch", "symfony",'Obtao'], "author" : [{ "firstname" : "Gregory", "surname" : "gregquat", "id" : 2 }] } ]
This time, we keep the object structure.
Nested have their own filters which allows to filter by nested object. If we go on with our example (with the limitation of inner objects), we can write this query :
{ "query": { "filtered": { "query": { "match_all": {} }, "filter": { "nested" : { "path" : "author", "filter": { "bool": { "must": [ { "term" : { "author.firsname": "francois" } }, { "term" : { "author.surname": "gregquat" } } ] } } } } } } }
hi
We can translate it as “Who has an author object whose surname is equal to ‘gregquat’ and whose firstname is ‘francois’”. This query will return no result.
There is still a problem which is penalizing when working with bug objects : when you want to change a single value of the nester, you have to reindex the whole parent document (including the nested).
If the objects are heavy, and often updated, the impact on performances can be important.
To fix this problem, you can use the parent/child associations.
Parent/Child
Parent/child associations are very similar to OneToMany relationships (one parent, several children).
The relationship remains hierarchical : an object type is only associated to one parent, and it’s impossible to create a ManyToMany relationship.
We are going to link our article to a category :
fos_elastica: clients: default: { host: %elastic_host%, port: %elastic_port% } indexes: blog : types: category : mappings : id : ~ name : ~ description : ~ article : mappings: title : ~ tag : ~ author : ~ _routing: required: true path: category _parent: type : "category" identifier: "id" #optional as id is the default value property : "category" #optional as the default value is the type value
When indexing an article, a reference to the Category will also be indexed (category.id).
So, we can index separately categories and article while keeping the references between them.
Like for nested, there are Filters and Queries that allow to search on parents or children :
- Has Parent Filter / Has Parent Query : Filter/query on parent fields, returns children objects. In our case, we could filter articles whose parent category contains “symfony” in his description.
- Has Child Filter / Has Child Query : Filter/query on child fields, returns the parent object. In our case, we could filter Categories for which “francoisg” has written an article.
{ "query": { "has_child": { "type": "article", "query" : { "filtered": { "query": { "match_all": {}}, "filter" : { "term": {"tag": "symfony"} } } } } } }
This query will return the Categories that have at least one article tagged with “symfony”.
The queries are here written in JSON, but are easily transformable into PHP with the Elastica library.
These websites can also be interested to read :
- http://euphonious-intuition.com/2013/02/managing-relations-in-elasticsearch/
- http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/”
http://obtao.com/blog/2014/04/elasticsearch-advanced-search-and-nested-objects/