Indexes are used to improve the speed of data retrieval operations on the data store.
An index makes the trade-offs of increased storage overhead and slower writes (since we have to write the data and update the index) for the benefit of faster reads. Indexes are used to quickly locate data without examining every row in a database table. Indexes can be created using one or more database table columns, providing the basis for rapid random lookups and efficient access to ordered records.
An index is a data structure that can be perceived as a table of contents that points us to where actual data lives. So when we create an index on a column of a table, we store that column and a pointer to the whole row in the index. Indexes are also used to create different views of the same data. This is an excellent way to specify different filters or sorting schemes for large data sets without making multiple additional copies of the data.
We can apply this concept to larger data sets as with a traditional relational data store. The trick with indexes is carefully considering how users will access the data. In data sets with many TBs in size but tiny payloads (e.g., 1 KB), indexes are necessary for optimizing data access. Finding a small payload in such a large data set can be a real challenge since we can’t possibly iterate over that much data in any reasonable time. Furthermore, such a large data set is likely spread over several physical devices — this means we need some way to find the correct physical location of the desired data. Indexes are the best way to do this.