12/03/2016 - ELASTICSEARCH
These are some notes related to elasticsearch.
index equals to database database, type equals to table and mapping equals to field.analyzed then the field is full-text searchable.not_analyzed then the field is not full-text searchable, instead, it is used for exact value search like = sign.analyzed and not_analyzed at same time with different names."sort":[{"_score":"desc"}] for sorting by default. This is good if you're doing a full-text search.match and multi_match queries provide case-insensitive search capabilities."type": "most_fields" key-value as part of multi_match property.AVG, MIN, MAX, SUM, COUNT, GROUP BY like database functions in elasticsearch, you need to use Aggregations."type": "most_fields" flag would score the record higher. For more information, read Most Fields page.NULL or NOT NULL values then you must use missing and exists as described in Dealing with Null Values.SearchParseException error when using sort property in your query, add "ignore_unmapped": true to sort property of your query.boost flag in your query, you're better of using Function Score Query feature and include "score_mode": "sum" and "boost_mode": "replace" flags as part of your query to fetch results in more adequate scoring order.If you want in-depth information, you can visit Basic Concept page.
A cluster is a collection of one or more nodes (servers) that holds your entire data. It provides search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch".
A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID).
An index is a collection of documents (data). An index is identified by a lowercase name. In a single cluster, you can define as many indexes as you want.
A type is a logical category of an index which allows you to store different types of documents in the same index.
A document is a basic unit of information that can be indexed. For example: single customer, product, order so on. The document is expressed in JSON format. An index can contain as many documents as you want.
An index can potentially store a large amount of data that can exceed the disk limits of a single node. This would result in slow search operations. Elasticsearch provides the ability to divide your index into multiple pieces called shards. Sharding is important because:
Elasticsearch allows you to make one or more copies of your index's shards into replicas (replica shard). It is important because in real world failures can be expected at anytime. For example, a shard or node might go offline or disappears. Replication is important because:
By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.