Housekeeping in Oracle: How to Get Rid of Old Data
Have you ever tried to delete a few million rows from a table with several hundred milllions of rows with a DELETE statement? If yes, then you know that this is a very bad idea. With Oracle Partitioning you can do this more elegant and much faster.
Housekeeping must be done – whether you like it or not
Not long ago, I provided performance support for one of my regular customers. They have a housekeeping job to delete transactional data that is no longer relevant. All transactions older than one year should be deleted from the system (in this case not a data warehouse). Because the amount of data is large, this job runs every night – and it runs for a long time, too long.
Since I have seen similar problems in many data warehouse systems, I proposed a solution with partitioning. I would like to reproduce this solution here – in a slightly simplified form. For my example, I want to delete all outdated orders from an ORDERS table, including the detail information in a child table ORDER_ITEMS. The two tables are connected with a foreign key constraint, and both tables have foreign keys to other tables. The example is based on a demo schema I use for training purposes.
Demo tables ORDERS and ORDER_ITEMS used in the housekeeping job
Existing Housekeeping Job
The most obvious way to delete old orders and order items has the following steps:
- Delete all order items from the detail table ORDER_ITEMS that are older than 12 months. Because the order date is not available in table ORDER_ITEMS, a subquery is used to derive the corresponding keys for ORDER_ID.
- After deleting the order items, the orders can be deleted from the master table ORDERS. The order date is stored in this table and can be used directly in the WHERE condition.
DELETE FROM order_items
WHERE order_id IN (SELECT id
WHERE order_date < ADD_MONTHS(SYSDATE, -12));
DELETE FROM orders
WHERE order_date < ADD_MONTHS(SYSDATE, -12);
This solution deletes all orders and order items correctly, but it is very slow for a large amount of data. So we have to find a more efficient way to get rid of the old data.
Partitioning Strategy for Housekeeping
A common approach to delete old data from large tables is to use partitioning. It is much faster to drop a partition from a table than to delete many rows with a DELETE command. For our use case – a daily housekeeping job – the best stategy is to use daily partitions for both tables. So, before we can implement a new solution, we must partition the tables by day.
Since Oracle 18c, modifying the partitioning strategy for an existing table has become much easier, With the MODIFY PARTITION clause, this can be done with one DDL command. For the ORDERS table, we want to use Interval Range Partitioning by ORDER_DATE, with one partition for every day. The following command (re-)partitions the table and creates a partition for every day that contains at least one order. Index maintenance is also included in the command, so any existing indexes are rebuilt. If the partition key column is part of the index, a local index is created, otherwise a global index.
ALTER TABLE orders
MODIFY PARTITION BY RANGE(order_date)
(PARTITION old_data VALUES LESS THAN (DATE’2020-01-01′));
For the table ORDER_ITEMS, this partitioning strategy cannot be used, because ORDER_DATE is not available in this table. But we can use another partitioning feature: Reference Partitioning. With the clause PARTITION BY REFERENCE, the detail table is partitioned with the same strategy as the master table. Since Oracle 12c, this also works in combination with Interval Partitioning.
The command to modify the partitioning strategy of table ORDER_ITEMS is very simple, we don’t even have to define the interval. The partition key ORDI_ORD_FK is the name of the foreign key constraint. This foreign key constraint between the tables must be available and enabled. If we use Oracle 18c or above, we can do all this with one MODIFY PARTITION clause:
ALTER TABLE order_items
MODIFY PARTITION BY REFERENCE(ordi_ord_fk);
Now, both tables contain daily partitions for all order dates that exist in the ORDERS table. This is the precondition for the new housekeeping job.
Drop Outdated Partitions
Once the ORDERS table is partitioned, deleting all orders for a single day is no longer a major challenge. To do this, simply drop the associated partition. However, two details are worth mentioning:
- With Interval Partitioning, we typically don’t know the partition names, because they are automatically generated and have a name pattern like SYS_P123435. Fortunately, there is another way to specify which partition is affected. With DROP PARTITION FOR (value), we can tell Oracle: “drop the partition in which value fits”,
- Local index partitions are automatically dropped with the table partition. But how about global indexes? The index entires for the dropped partition are not valid when the partition does not exist anymore. This was a major problem in older Oracle versions, and global indexes had to be rebuilt after a DROP PARTITION command. Since Oracle 12c, a features called Asynchronous Global Index Maintenance solves this problem. When you add UPDATE INDEXES to the DDL statement, this does not rebuild the global indexes anymore like in older releases. Instead, the affected indexes are marked (column ORPHANED_ENTRIES = ‘YES’ in USER_INDEXES), but the indexes stay valid. The asynchronous global index maintenance job will then regularly delete the unused entries from the indexes.
ALTER TABLE orders
DROP PARTITION FOR (TO_DATE(‘2021-06-28’, ‘YYYY-MM-DD’))
For the detail table ORDER_ITEMS, it is even simpler to drop a partition. What we have to do is – nothing! With the concept of Reference Partitioning, this is done automatically:
- For each partition that is dropped in the master table (ORDERS), the corresponding partition in the detail table (ORDER_ITEMS) is dropped, too.
- Local index partitions on the detail table are also dropped.
- Global indexes of the detail table are marked with ORPHANED_ENTRIES = ‘YES’ in the same way as the master table, and cleaned up with the asynchronous global index maintenance job.
Improved Housekeeping Job
For our new housekeeping job, we have just to drop all daily partitions from the ORDERS table that are older than one year. Everything else (dropping the table partitions from ORDER_ITEMS and all corresponding local index partitions from both tables, as well as the global index maintenance) is done automatically by Oracle.
The housekeeping job can be implemented as a PL/SQL block (or a procedure) with Dynamic SQL to run the DROP PARTITION commands. In case that the job is not executed every day, it does not drop only one partition, but all partitions that are older than one year.
CURSOR cur_days IS
SELECT DISTINCT order_date
WHERE order_date < ADD_MONTHS(SYSDATE, -12)
ORDER BY order_date;
FOR r_day IN cur_days LOOP
v_date_string := TO_CHAR(r_day.order_date, ‘YYYY-MM-DD’);
‘ALTER TABLE orders
DROP PARTITION FOR (
Details like logging, error handling, etc. are omitted here to keep the example simple and easy to read. But the basic principle of a housekeeping job with partitioning can be implemented with this approach.
Many performance problems with large amounts of data can be avoided if the appropriate features of the Oracle database are used. Unfortunately, many developers are not aware of the powerful capabilities that are available for this purpose.
For this specific use case, we were able to implement an efficient housekeeping job using the following features:
- Interval Range Partitioning of the ORDERS table
- Interval Reference Partitioning of the ORDER_ITEMS table
- Modify Partitioning Strategy for partitioning the existing tables
- Asynchronous Global Index Maintenance for all global indexes on tables ORDERS and ORDER_ITEMS
- Dynamic SQL to drop the partitions within the PL/SQL housekeeping job