Friday, June 26, 2015

Basic Concepts in ElasticSearch


Index (Noun): Index is equivalent to a database in relational database. ElasticSearch stores data in one or more indices. It uses Apache Lucene library to write and read data from index.

Index (Verb): Process of storing data in Index is called indexing.

Type: Type is equivalent to a table in a relational database. A type will contain zero or more documents. Unlike table, structure of a type is flexible and you can add new fields whenever needed.

Field: Field is equivalent to a column in relational database table. Data Types for a type can be defined through mappings.

Document: Document is equivalent to a row in a relational database table. Document consists of fields and each field has a name and one or many values. Each document can have different set of fields. You can consider a document as JSON object stored in way that they can be searched efficiently.

Cluster: A set of ElasticSearch nodes can form a cluster to achieve these - scalability, availability.
More number of nodes can be added to scale it horizontally. ElasticSearch distributes the load among multiple nodes. Data is replicated across various nodes, so in case of failure of a node, another node takes charge of serving the data on behalf of failed node and that way high availability is ensured.

Shard: ElasticSearch distributes data of an index into worker units called shards.So, essentially, it is the shard which stores the documents physically. Each shard is fully capable lucence instance in itself. Balancing of shards across nodes of a cluster is transparently done by ElasticSearch. A Shard can be either a primary shard or replica shard. A document in index belongs to one and only primary shard. ElasticSearch knows how to find a shard which contains particular document of an index. Replica shards are for scalability and fail-overs.
Read can happen either on primary shard or replica shard but write can happen only on primary shard.

1 comment:

  1. I was searching for the right blog to get Hadoop updates to know what is happening in the Big Data industry. I found your blog where I can get a lot of new updates in storing and retrieving the data concept. Thank you admin I would like to share this blog with my friends also. Keep updating, waiting for your next article.
    Regards
    Big Data Hadoop Training in Chennai | Hadoop Course in Chennai

    ReplyDelete