Hello everyone!

We have been investing plenty of personal time and energy for many years to share our knowledge with you all. However, we now need your help to keep this blog running. All you have to do is just click one of the adverts on the site, otherwise it will sadly be taken down due to hosting etc. costs. Thank you.

These are some notes related to elasticsearch.

Basic concept

If you want in-depth information, you can visit Basic Concept page.

Cluster (physical unit)

A cluster is a collection of one or more nodes (servers) that holds your entire data. It provides search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch".

Node (physical unit)

A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID).

Index (non-physical unit)

An index is a collection of documents (data). An index is identified by a lowercase name. In a single cluster, you can define as many indexes as you want.

Type (non-physical unit)

A type is a logical category of an index which allows you to store different types of documents in the same index.

Document (non-physical unit)

A document is a basic unit of information that can be indexed. For example: single customer, product, order so on. The document is expressed in JSON format. An index can contain as many documents as you want.

Shards (physical unit) and Replica (physical unit)

An index can potentially store a large amount of data that can exceed the disk limits of a single node. This would result in slow search operations. Elasticsearch provides the ability to divide your index into multiple pieces called shards. Sharding is important because:

Elasticsearch allows you to make one or more copies of your index's shards into replicas (replica shard). It is important because in real world failures can be expected at anytime. For example, a shard or node might go offline or disappears. Replication is important because:

By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.