Data Guard and ORA-04031 errors on standby
From time to time the Data Guard broker log file shows ORA-04031 errors. The primary site cannot connect to the standby site anymore.
DG 2008-04-09-23:31:44 0 2 0 RSM0: Failed to connect to remote database MYDB_SITE2. Error is ORA-04031
It seems that the available shared_pool is not sufficient on the standby site. Let's have a deeper look.
On the both sides (primary and standby) we have sga_target = sga_max_size = 800M :
SYS@MYDB > show parameter sga_
NAME TYPE VALUE
------------------ ----------- ------------
sga_max_size big integer 800M
sga_target big integer 800M
On the production server the current shared pool size is about 500 MB (automatically tuned by Oracle):
SYS@MYDB > @ssinipar
Wrote file temp.tmp
Enter value for parameter: shared_pool_size
Parameter Session
---------------------------- -------------
__shared_pool_size 494927872
shared_pool_size 100663296
However, on the standby server the current shared pool site is about 240 MB :
Parameter Session
-------------------------- -------------
__shared_pool_size 239075328
shared_pool_size 100663296
Furthermore, the standby database has less free shared pool than the primary database, it is also quite fragmented.
On standby:
POOL NAME BYTES
------------ ---------------- ----------
shared pool free memory 18534000
On primary:
POOL NAME BYTES
------------ ---------------- ----------
shared pool free memory 68049224
On the primary site a lot of resize operations have been performed by Oracle, it seems that this application has a higher parsing than I/O activity :
COMPONENT CURRENT_SIZE OPER_COUNT LAST_OPER_TIME
-------------------- ------------ ---------- -------------------
shared pool 494927872 142 10-APR-008 00:47:23
streams pool 4194304 43 09-APR-008 22:05:46
DEFAULT buffer cache 260046848 185 10-APR-008 00:47:23
The streams pool size can be explained by data pump, which is used to perform the logical backup on the primary side.
The database requires much more shared pool that database block cache. On the standby side only 46 (23+23) SGA resize operations took place, compared to the 370 (142 + 43 + 185) and more on the primary side.
COMPONENT CURRENT_SIZE OPER_COUNT LAST_OPER_TIME
------------------------- ------------ ---------- -------------------
shared pool 239075328 23 10-APR-008 15:11:31
DEFAULT buffer cache 520093696 23 10-APR-008 15:11:31
It is also very interesting to observe that the standby database handles much more database blocks than the primary database:
On the standby side:
select count(*) from v$bh;
COUNT(*)
----------
61748
On the primary database:
select count(*) from v$bh;
COUNT(*)
----------
30591
This can be explained by the fact that the standby database only performs recovery. The blocks are built from the standby redo log file, loaded into the buffer cache in order to be applied on the database which is in recovery mode.
However, the Data Guard broker and its processes also need memory in the shared pool to work. Unfortunately, because the standby database allocates much more database block cache memory than shared pool in the SGA, some ORA-4031 can occur.
Example statement on the standby site :
SELECT dest_id FROM v$archive_dest WHERE (target='LOCAL' OR target='PRIMARY') AND destination = '/export/oradata/MYDB/data99/archivelog'
AND (valid_type <> 'ONLINE_LOGFILE') AND (valid_role <> 'PRIMARY_ROLE') AND (dest_id <= 10) AND (dest_name not in ( select alternate from v
$archive_dest))
Several conclusions can be drawn from this analysis:
- After activation, the standby database will have a SGA configuration which may be in conflict with the application requirements (more cache size than shared pool)
- Oracle does not seem to transfer the SGA modifications from the primary to the standby site but sets the parameters to the requirements of each role (more cache size on the standby site)
- It is not a good idea to set a smaller SGA size on the standby side in order to save some memory space, this error will be amplified.
In some cases, flushing the shared pool could help:
SYS@MYDB > alter system flush shared_pool;
System altered.
A solution could be to set the shared_pool_size to a value corresponding to the lowest limit allowing to operate the standby database without (or with less) ORA-4031 errors. This low-limit seems to be at around 300-350 MB.