12/03/2016 - ELASTICSEARCH
These are some notes related to elasticsearch.
index
equals to database database
, type
equals to table
and mapping
equals to field
.analyzed
then the field is full-text searchable.not_analyzed
then the field is not full-text searchable, instead, it is used for exact value search like = sign.analyzed
and not_analyzed
at same time with different names."sort":[{"_score":"desc"}]
for sorting by default. This is good if you're doing a full-text search.match
and multi_match
queries provide case-insensitive search capabilities."type": "most_fields"
key-value as part of multi_match
property.AVG
, MIN
, MAX
, SUM
, COUNT
, GROUP BY
like database functions in elasticsearch, you need to use Aggregations."type": "most_fields"
flag would score the record higher. For more information, read Most Fields page.NULL
or NOT NULL
values then you must use missing
and exists
as described in Dealing with Null Values.SearchParseException
error when using sort
property in your query, add "ignore_unmapped": true
to sort
property of your query.boost
flag in your query, you're better of using Function Score Query feature and include "score_mode": "sum"
and "boost_mode": "replace"
flags as part of your query to fetch results in more adequate scoring order.If you want in-depth information, you can visit Basic Concept page.
A cluster is a collection of one or more nodes (servers) that holds your entire data. It provides search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch".
A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID).
An index is a collection of documents (data). An index is identified by a lowercase name. In a single cluster, you can define as many indexes as you want.
A type is a logical category of an index which allows you to store different types of documents in the same index.
A document is a basic unit of information that can be indexed. For example: single customer, product, order so on. The document is expressed in JSON format. An index can contain as many documents as you want.
An index can potentially store a large amount of data that can exceed the disk limits of a single node. This would result in slow search operations. Elasticsearch provides the ability to divide your index into multiple pieces called shards. Sharding is important because:
Elasticsearch allows you to make one or more copies of your index's shards into replicas (replica shard). It is important because in real world failures can be expected at anytime. For example, a shard or node might go offline or disappears. Replication is important because:
By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.