Освобождение свободного места после delete в PostgreSQL

3 minute read

img

Суть проблемы

У вас есть таблицы, либо ряд таблиц, строки которых нужно очистить и единственный способ, которым вы можете это сделать - это операция DELETE.

delete from schema.table where condition;

Конечная цель изменения данных - это особождение места (дискового пространства), но на деле, после удаления данных - свободного места больше не становится.

Частично эту проблему решают команды vacuum, analyze либо vacuum full, но у первых двух есть свои проблемы, подробности есть в документации, либо краткое пояснение:

In PostgreSQL, an UPDATE or DELETE of a row does not immediately remove the old version of the row. This approach is necessary to gain the benefits of multiversion concurrency control (MVCC, see Chapter 13): the row version must not be deleted while it is still potentially visible to other transactions. But eventually, an outdated or deleted row version is no longer of interest to any transaction. The space it occupies must then be reclaimed for reuse by new rows, to avoid unbounded growth of disk space requirements. This is done by running VACUUM.

When ever you delete a row, you essentially mark it as inactive for your snapshot. Other snapshots started by prior transactions can still see it. This means it’s still on disk.

Rows aren’t stored in their own file on disk. In order to delete that row, even when you want, you have to essentially rewrite the files stored on disk without that row. DELETE does not do this and under normal circumstances that’s perfectly fine because dead rows that aren’t visible to a transaction can be reclaimed internally by a simply VACUUM. After being marked for reuse new data can be stored in the heap.

So essentially, only operations that internally cause the table’s heap to be rewritten reclaim space. Those operations are:

  • CLUSTER
  • VACUUM FULL
  • TRUNCATE
  • ALTER TABLE (forms that rewrite the table)

All of these operations require an ACCESS EXCLUSIVE lock. And, TRUNCATE is such a sledge hammer it even violates MVCC.

Решение

Исправлять ситуацию будем с помощью утилиты pgcompacttable, ссылка github.

Ставим зависимости:

  • on Debian-based Linux OS with apt-get install libdbi-perl libdbd-pg-perl
  • RedHat/Centos with yum install perl-Time-HiRes perl-DBI perl-DBD-Pg

Переходим в интерактивный режим psql:

sudo -u postgres psql
\c your_database
create extension if not exists pgstattuple;

grant execute on function pgstattuple(text) to <db user>;
grant execute on function pgstattuple(regclass) to <db user>;
grant execute on function pgstattuple_approx to <db user>;


alter user postgres with password 'your new admin password';
\q

Запускаем SQL запрос для проверки данных в таблицах.

select 
    table_name,
    pg_size_pretty(relation_size + toast_relation_size) as total_size,
    pg_size_pretty(toast_relation_size) as toast_size,
    round(
        ((relation_size - (relation_size - free_space)*100/fillfactor)*100
        /greatest(relation_size, 1))::numeric
    , 1) table_waste_percent,
    pg_size_pretty(
        (relation_size - (relation_size - free_space)*100/fillfactor)::bigint
    ) table_waste,
    round(
        ((toast_free_space + relation_size - (relation_size - free_space)*100/fillfactor)*100
        /greatest(relation_size + toast_relation_size, 1))::numeric
    , 1) total_waste_percent,
    pg_size_pretty(
        (toast_free_space + relation_size - (relation_size - free_space)*100/fillfactor)::bigint
    ) total_waste
from (
    select
        (case 
            when n.nspname = 'public' 
                then format('%I', c.relname) 
            else format('%I.%I', n.nspname, c.relname)
        end) as table_name,
        (select approx_free_space from pgstattuple_approx(c.oid)) as free_space,
        pg_relation_size(c.oid) as relation_size,
        (case 
            when reltoastrelid = 0 
                then 0 
            else (select free_space from pgstattuple(c.reltoastrelid))
        end) as toast_free_space,
        coalesce(pg_relation_size(c.reltoastrelid), 0) as toast_relation_size,
        coalesce(
            (SELECT (regexp_matches(reloptions::text, E'.*fillfactor=(\\d+).*'))[1])
        ,'100')::real AS fillfactor
    from pg_class c
    left join pg_namespace n on (n.oid = c.relnamespace)
    where 
        nspname not in ('pg_catalog', 'information_schema')
        and nspname !~ '^pg_toast' 
        and nspname !~ '^pg_temp' 
        and relkind in ('r', 'm') 
        and (relpersistence = 'p' or not pg_is_in_recovery())
    --put your table name/mask here
    and relname ~ :'tablename'
) t
order by (toast_free_space + relation_size - (relation_size - free_space)*100/fillfactor) desc
limit 20;

Далее забираем себе локально perl скрипт.

git clone https://github.com/dataegret/pgcompacttable.git

И запускаем с необходимыми параметрами, полный список можно посмотреть через:

perl pgcompacttable --man

Команда:

perl pgcompacttable \
--dbname <database_name> \
-n <schema_name> \
-h localhost \
-p 5432 \
-U <user> \
-W <password> \
-t <table_name> \
-f

В дополнение можно провести:

vacuum <table_name>
analyze <table_name>

Большое спасибо всем за внимание! Мой телеграм канал artydev & Co — подписывайтесь!