A Gentle Introduction to SQL Indexes

In this article, we will take a ‘gentle’ introduction to SQL indexes 👾

7 min readApr 28, 2023

💭 Understanding Indexes

Indexes are used to logically sort data in order to speed up searching and sorting operations. Imagine the index at the beginning of a book to better understand indexes.

Assume you wish to find every instance of the word cheese cake in this book. The simplest method would be to start on page 1 and scan every line of every page for matches. Although this works, it is clearly not a viable option.

While scanning a few pages of text is possible, scanning a complete book is not. As the volume of text to be searched grows, so does the time required to find the desired information.

That is why book indexes exist. An index is an alphabetical list of words that includes references to where they appear in the book. To search for cheese cake, look it up in the index to see what pages it appears on. Then you go to those specific pages to look for your matches.

🥸 What makes an index work?

Simply said, it is the fact that it is correctly sorted. The problem in finding words in a book is not the amount of contentthat must be searched; rather, the content is not sorted by word. An index is not required if the content is ordered in the manner of a dictionary (which is why dictionaries do not have indexes).

Database indexes function in a similar manner. Primary key data is always sorted; it’s something the DBMS performs for you. Consequently, retrieving specific rows by primary key is always a fast and efficient operation.

However, searching for values in other columns is frequently inefficient. What if you want to retrieve all clients who live in a given state, for example? Because the table is not sorted by state, the DBMS must read every row in the table (beginning with the first row) looking for matches, just like you were trying to discover words in a book without an index.

The solution is to use an index. You can create an index on one or more columns to keep a sorted list of the contents for your own usage. When an index is defined, the DBMS uses it in the same way that a book index is used. It scans the sorted index for any matches and then retrieves those specific rows.

🫥 What is indexing?

Indexing is a method of sorting a large number of records based on various fields. When you construct an index on a field in a table, you create another data structure that contains the field value as well as a pointer to the record to which it is related. This index structure is then sorted, making Binary Searches possible.

Well in “general” indexes is a B-tree or Hash table.

B-trees are the most preferred data structure for indexes because they are time efficient — look-ups, removals, and insertions can all be done in logarithmic time. Another important reason B- trees are more widely employed is that the data stored within the B- tree can be sorted. Typically, the RDBMS determines which data structure is utilized for an index. However, with some RDBMSs, you can actually select whatever data structure you want your database to use when you construct the index.

↔️ Types of indexes

Clustered index creates a physical order of rows (it can only be one and is typically also a primary key — if you create a primary key on a database, you also build a clustered index on that table).
Non-clustered index is also a binary tree but it doesn’t create a physical order of rows. So the leaf nodes of non-clustered index contain PK (if it exists) or row index.
Unique/Non-Unique index is an index that help maintain data integrity by ensuring that no two rows of data in a table have identical key values.
A unique index ensures that the values in the index key columns are unique.
A beautiful information: (When a unique constraint is created, a corresponding unique index is automatically created on the column(s).)
Partitioned and non-partitioned indexes
When you construct a non-partitioned index on a partitioned table, a single index object is created that referencesto all rows in the table. Regardless of whether the table data partitions span several table spaces, non-partitioned indexes are always created in a single table space.
Bidirectional indexes allow scanning in both forward and reverse directions.
When you construct an index on a column without specifying any keywords or specifying the ASC keyword, the database server saves the key values in ascending order. When the DESC keyword is specified, the database server keeps the key values in descending order.
Expression-based indexes (function based index) You may create an index that includes expressions. When the dbms creates an index on the same expressions, the performance of queries containing expressions improves.
For example, you could create an index by applying the function higher/lower to the city column to allow for quick, case-insensitive lookups of city names.

👾 As an example, we want to run a query to find all the details of any school located in ‘Jordan’?

🤨 Let’s consider what would happen if there was no index.

DBMS would have to examine every single row in the School database to determine if the location for that row is ‘Jordan.’ And, because we want every row with the location ‘Jordan’ inside it, we can’t just stop looking when we locate one row with the location ‘Jordan,’ because there may be other rows with the location Jordan. As a result, every row up until the last row must be searched, which means that the database must analyze thousands of rows in this situation to discover the rows with the location ‘Jordan.’ This is known as a full table scan.

😶‍🌫 ️What exactly is contained within a db index?

So you now know that a database index is created on a column in a table and that the index keeps the values in that specific column. However, it is critical to note that a database index does not store the values in the other columns of the same table. For example, if we establish an index on the location column, the other column values will not be kept in the index. If we simply stored all of the other columns in the index, it would be the same as making a duplicate of the entire table, which would take up far too much space and be wasteful.

😐 How does a database know when to use an index?

When a query like “SELECT * FROM School WHERE location = ‘Jordan’” is executed, the database will check to see if there is an index on the column(s) being requested. Assuming the location column has an index, the database must assess whether it makes sense to utilize the index to find the values being searched — since there are some cases where using the database index is actually less efficient than simply scanning the entire table.

🫡 Creating an Index

This is the basic create index command.

You can create a unique index that prevents duplicate values from being added into the table.

You can also create a composite index (two or more columns of a table).

Indexes are created with the CREATE INDEX statement (which varies dramatically from one DBMS to another), so I would like to suggest to see your DBMS documentation for creating an index.

⚠️ Indexing Warning ⚠️

Indexes improve retrieval performance while degrading data insertion, modification, and deletion performance. When these operations are performed, the DBMS must dynamically update the index.
Index data can take up lots of storage space.
Not all data is appropriate for indexing. Data that is not sufficiently unique (for example, state) will not benefit as much from indexing as data with a greater number of possible values (First Name or Last Name, for example).
Indexes are utilized in data filtering and sorting. If you often arrange data in a particular order, that data may be suitable for indexing.
An index can have multiple columns (for example, District and City). Only when the data is sorted by District and City would such an index be useful. (If you wish to sort by city, this index is useless.)

I hope you found this article was fun and interesting!❤️