Dive into Table Fragmentation and Optimization Techniques in MariaDB, and PostgreSQL

Dive into Table Fragmentation and Optimization Techniques in MariaDB, and PostgreSQL

Introduction

Fragmentation refers to the state of being broken into fragments. In the realm of computer science, it involves storing a file or data in multiple separate areas across memory. This concept extends to databases, where data is stored on hard disks, giving rise to the idea of database fragmentation.

Understand database fragmentation

Database fragmentation is a common occurrence in many database management systems (DBMS) due to the use of hard disks, and structured data is stored in blocks.

When creating a table, the length of each row is unknown initially, as columns may have variable lengths. Consequently, the allocated space for a row might not be fully utilized. In certain DBMS, when rows are deleted, they are often marked as deleted but not immediately removed, leaving behind unused free space.

Even in the context of sharding databases, fragmentation can occur based on how data is stored. Fragmentation can lead to increased operation times for CRUD (Create, Read, Update, Delete) operations. Defragmentation addresses this issue by consolidating smaller chunks into larger ones, resulting in faster CRUD operation times.

Mariadb

To analyze your table and know if it is fragmented you can run :

SHOW TABLE STATUS LIKE 'your_table_name';

After this, you will have a line and you should look at Data_free column; This column shows the amount of space that is allocated but currently not in use. If this value is large, it suggests that there is unused space in the table.

To defragment and optimize your table you 2 options :

OPTIMIZE TABLE your_table_name;
ALTER TABLE your_table_name ENGINE=InnoDB;
ALTER ONLINE TABLE your_table_name ENGINE=InnoDB; # To avoid issues with concurrents operations

PostgreSQL

Here to analyze, you don’t have a straight command but here is a proposition :

SELECT
    table_name,
    pg_size_pretty(total_size) AS total_size,
    pg_size_pretty(data_size) AS data_size,
    pg_size_pretty(index_size) AS index_size,
    pg_size_pretty(total_size - data_size - index_size) AS other_size
FROM (
    SELECT
        table_name,
        pg_total_relation_size(table_name) AS total_size,
        pg_relation_size(table_name) AS data_size,
        pg_total_relation_size(table_name) - pg_relation_size(table_name) AS index_size
    FROM information_schema.tables
    WHERE table_schema = 'public' -- Modify this if your tables are in a different schema
) AS sizes
ORDER BY total_size DESC;

After this, If you find a significant difference between total_size and data_size, it could indicate unused space or fragmentation.

To defragment and optimize your table you also have 2 options :

VACUUM your_table_name;
VACUUM FULL your_table_name;

Conclusion

In conclusion, database fragmentation is a common challenge in the world of database management, impacting the efficiency of CRUD operations. Defragmentation plays a crucial role in mitigating this issue by consolidating data and optimizing storage.

It’s important to note that fragmentation is not limited to traditional relational databases like MariaDB and PostgreSQL; it also affects NoSQL databases, such as MongoDB, albeit in a different manner. In MongoDB, sharded collections and data distribution strategies can influence fragmentation.

Stay tuned for our next article, where we will delve deeper into the unique aspects of database fragmentation in NoSQL databases, particularly focusing on MongoDB. We will explore strategies to address fragmentation and optimize performance in the dynamic landscape of non-relational databases.

sources :