Writer: OLUTAYO AWODELE, senior database consultant, Polaris Bank Limited:
Databases are a critical component of applications that power small and large businesses.
The information stored in the databases gets frequently retrieved or accessed by applications to meet the demands of the users, who will rely on the output as input into other processes.
It is vital to ensure fast access to retrieve records stored in the database within a reasonable threshold.
To ensure that information is readily available when needed, some tasks can be carried out once in the lifetime of an application, and there will be no issues, whereas others must be done regularly for the life span of the application.
One critical activity that falls under a continuous task is database optimisation, and attempting to wrongfully categorise this will lead to disaster in the performance of the application or the database itself.
In this article, I will focus on what optimisation is, optimisation approaches, benefits, and challenges of optimisation.
The techniques and examples I will share are all based on some interventions in resolving database performance degradations in my database administration journey.
What is Database Optimisation?
Database optimisation is analysing various database components, including SQL queries and tuning them to improve database performance.
It is an iterative process to maximise the speed and efficiency of data retrieval. It requires collaboration among all the stakeholders (application developers, database engineers, network engineers, and system engineers) to implement the required adjustments that will result in the improved performance of the database.
Most of the database and application setups work efficiently at the initial stage, but as the data grows, the performance may become degraded.
The degradation may be in the form of queries running slower, CPU usage spikes, or slow I/O responses.
These bottlenecks are signs that regular review and analysis of the database are essential to identify and fix deviations.
Database Optimisation Approaches
Several techniques can be adopted in optimising databases for better performance; some of these are as follows:
1. Make use of indexes
Indexes help to improve the performance of queries running on the database by reducing the number of scan, sort, or join operations.
It is essential to identify columns used to limit record retrieval in a query and create indexes on such columns to speed up the query execution.
For instance, whenever queries run slowly, you can check the columns specified in the WHERE clause, confirm if they are indexed and if not, create appropriate indexes on them.
It is crucial to note that while indexes will make read operations faster, they can slow down write operations because indexes will be updated whenever changes are made to the tables.
2. Data retrieval
In most cases, the scripts that result in database performance degradation were found to be retrieving more records than needed.
For example, selecting all the columns in a table when you only need a few columns to meet the requirements of the application module.
Reducing the amount of record retrieval per time will reduce the pressure on the CPU and other hardware resources.
3. Batch DML (Data Manipulation) operations
Data manipulation languages include insert, delete, and update; these languages make changes to the records stored in the database, and they must be carefully used to improve database performance. Making changes, such as deleting or updating a large chunk of data at a time, can take longer to complete, leading to waits by other processes due to the lock mechanism placed on the table to ensure data consistency.
The waits for other processes while making changes to records in a table can be reduced by batching the DML operations instead of carrying them out as a single transaction. Batching long-running activities will save time during rollbacks in the event of disruptions when the query is executing.
4. Database statistics
The optimiser makes use of the statistics on the table to determine the choice of execution plan to use for queries. If the statistics on the tables are stale, the optimiser may not choose the best
execution plan that will return faster. Not using the best execution plan will not only affect the speed of query execution but will also place much pressure on the hardware resources (CPU, memory, and I/O) of the system. It is a good practice to periodically update statistics on the database objects to ensure the statistics are not at variance with the actual records in the database.
5. Index fragmentation
Changes to the database, such as deletion, update, and insertion of records, lead to fragmentation of the indexes, which can lead to poor database performance.
The best practice is to periodically check the index fragmentation percentage in the database and implement maintenance processes such as index rebuild or reorganisation to remove the fragmentation.
6. Query Optimisation
It involves reviewing and analysing queries running on the database for possible improvements.
EXPLAIN PLAN can be used to understand the query execution plan and where most of the cost lies. Once the highest cost is identified, some adjustments can be implemented, such as re-arrangements of the columns used in the where clause and a review of the use of functions to eliminate bottlenecks.
7. Proper Sizing of Hardware resources
As the data stored in the database grows, there may be a need to increase the resource allocation to meet the increasing demands.
However, increasing the resource allocation every time may not be the solution; rather, it will be necessary to review other parameters to make the right decisions.
In some cases, if the statistics on the tables are not correct, the optimiser can choose an execution plan that seems appropriate based on the available statistics, which may end up causing a CPU spike.
It is pertinent always to check that the queries running on the databases are reviewed and tuned to ensure that hardware resources are only increased or decreased based on proven facts.
8. Connection Pooling
It reduces the cost of opening and closing connections by maintaining a pool of open connections that can be passed from one database operation to another.
It helps manage and reuse database connections efficiently. The impact of connection pooling may be less evident in small applications than in large ones, and implementing this should be carefully considered.
9. Partitioning
It involves dividing large tables into smaller and more manageable pieces. Query performance can be enhanced by allowing the database engine to work on a smaller subset of data, resulting in faster query execution. If partitioning is properly implemented, it eliminates the need to scan the entire dataset but only the partition of the dataset specified in the query.
10. Data Archiving
This is the process of detecting obsolete data that is no longer required in real time for the program to function properly.
Such records can be archived or deleted to reduce the number of records accessed, improving query performance and lowering storage requirements. For example, tables with huge historical datasets or logs should be archived.
Benefits of Database Optimisation
Database optimisation offers several benefits to organisations as follows:
- Cost saving: Optimised databases will require fewer resources, reducing the cost of hardware resources.
- Improved user experience: Faster data access/retrieval enhances user satisfaction.
- Increased query efficiency: Complex queries can execute faster without experiencing delays.
- Scalability: Optimised databases can handle more concurrent requests and large volumes of data.
Challenges of Database Optimisation
- It is time-consuming: optimising databases requires much time and expertise to identify bottlenecks and proffer solutions.
- Complexity: Making multiple adjustments to configurations can create complications in the maintenance of the database system.
Conclusion
Database optimisation is not a one-time event but a continuous activity that must be carried out throughout the lifespan of the database.
It involves many considerations to decide which part of the database to tweak, leading to performance improvement or degradation.
Setting and having a performance baseline helps to spot any deviation in the performance of the database quickly.